Evan
when I try to generate_mapping(create_tables=True) I get the following - is it possible to create the schema through ponyorm or should I use a different tool like pgadmin?
InvalidSchemaName: schema "schema1" does not exist
Alexander
You need to create schema using pgadmin
Evan
had another question - I'm trying to copy from one database to another so and there are 6500 rows in the database with some hops etc
this seems to take a huge amount of time 4-5hours to complete - any idea why? or what I can do to make it faster? Is this an i/o issue (ie i'm loading things from a db locally then saving them back which means there are 2 round trips?)
thank you in advance
Alexander
Probably another process also works with the same DB, and some blocking issues arise.
6500 rows is a very small amount and should be copied instantly
Maybe you create a separate db_session for each row copied, it can slow down the process. You can try to wrap writing all these rows to a second DB into a single db_session
Evan
could this message be related?
"connection to server at "XXXXXX" port XXX failed: FATAL: remaining connection slots are reserved for non-replication superuser connections
does pony somehow leave connections outstanding?
Alexander
How your script that copies data is organized? Does it use multiple threads/processes or async code?
Evan
nope this is what it looks like and the total rows across all tables is ~250k which still seems small the 6.5k rows are for the equivalent of Table_1 below
with orm.db_session:
n = old_db.Table_1.select().count()
for t in tqdm.notebook.tqdm(old_db.Table_1.select(), total=n):
t_ = new_db.Table_1(
attr_1=t.attr_1,
attr_2=t.attr_2,
...
)
for s1 in t.set_1:
new_db.Table_2(
attr_1=s1.attr_1,
attr_2=s2.attr_2,
...
ref_table_1=t_,
)
orm.commit()
Evan
(or at least I think I don't have any async code)
Evan
lmk if the above is not the right approach or if you want some additional information
old_db.Table_1 and new_db.Table_1 reside in different databases but on the same server
Radim
maybe commiting after each line also slowing this down?
Alexander
definitely
Evan
the reason I am doing that was that in the past It would go through everything and just get stuck on the commit (I think) as it was exiting the db_session and i had no way of knowing how long it would take and abandoning at that point meant that I would lose everything done so far as opposed to right now everything commited so far is not "lost"
sounds like that is the wrong approach?
Evan
but it sholds like the code above that I've written is the right pattern? my biggest worry was that it effectively loads from a remote location data locally and then pushes that data back out and this i/o is the cause of the delay but seems like you guys think it's unlikely to be that
Alexander
You insert some rows to Table_1, and for each row, you insert some rows to Table_2. How many rows in Table_2, on average you have for a single row in Table_1?
Each orm.commit() call finishes a transaction and starts a new one. For a single transaction, I think inserting around 10000 rows (in total for Table_1 and Table_2) is good. Each commit takes time, so doing one commit per 1 row may slow down the entire process significantly. Conversely, if you insert 100K rows per commit, it can hang for some time.
Alexander
But it is possible that you problem somehow caused by too big number of opened connections to the DB
Alexander
Pony caches connections per db instance per thread and reuses them for the following db_session. I hope you have a single db instance for receiving DB and you reuse it for all inserts.
Alexander
Also, if I understand correctly, the number 6500 rows you mentioned previously is for Table_1, and you have some unspecified (possibly huge) number of rows in Table_2 for each row in Table_1. Maybe the total number of rows is not that small
Evan
the total is as follows, there are 7 tables and Table 1 has Set(XXX) for each of the other tables basically. So total number of rows across everything is smth like 250k rows
so I'd say that Table 1 has 245/6.5 = 40 rows on average (very rough math)
'Table 1': 6.5k,
'Table 2': 6.5k,
'Table 3': 24k,
'Table 4': 6.5k,
'Table 5': 53k,
'Table 6': 1,
'Table 7': 153k
Evan
ok so maybe by committing all the tim that's overkill but should do it every 100'th row of the Table1 row
Alexander
you can try
Evan
also not sure what you mean by the caching etc. my code will literally look like the one above that I showed you.
I might have another jupyter notebook open where i'm running stuff talking to the database (eg to answer some of the questions you guys posed here).
i assume once I kill the jupyter notebook the connection dies with it right?
Alexander
I think yes
Alexander
Evan
also if I do:
with orm.db_session:
stuff()
with orm.db_session:
more_stuff()
have a I created 2 connections by mistake or is that fine in terms of what you said about having "a single db instance for receiving DB and you reusee it for all inserts"
Evan
how many connections is "a lot"? currently I think the db is set up to allow 22 open connections at a time
Alexander
No, the second db_session should reuse the connection created by the first db_session
Alexander
If the copying script does not contain anything else than what you provided, then Pony should create just a single connection to the receiving DB
Evan
is there a way to check that? like can I print that out or examine some internal orm.XXXX method / attribute?
Alexander
Before copying the data, you can add the following
@old_db.on_connect
def new_connection(db, connection):
print(f"New connection to {db}")
Evan
with orm.db_session:
# Here?
@old_db.on_connect
def new_connection(db, connection):
print(f"New connection to {db}")
n = old_db.Table_1.select().count()
for t in tqdm.notebook.tqdm(old_db.Table_1.select(), total=n):
t_ = new_db.Table_1(
attr_1=t.attr_1,
attr_2=t.attr_2,
...
)
for s1 in t.set_1:
new_db.Table_2(
attr_1=s1.attr_1,
attr_2=s2.attr_2,
...
ref_table_1=t_,
)
orm.commit()
Evan
btw made it commit only every 100 rows and it went through each row 2x as fast but stuck now on the commit
Alexander
> Here?
No, before with db_session, probably right after old_db definition
Evan
sorry also just to clarify old_db is a orm.Database() instance right? so it should look something like the below?
from old_lib import old_db
# old_db.database is a orm.Database() instance
old_db.database.bind(**kwargs)
old_db.database.generate_mapping(create_tables)
with orm.db_session:
@old_db.database.on_connect
def new_connection(db, connection):
print(f"New connection to {db}")
Alexander
No, put
@old_db.database.on_connect
def new_connection(db, connection):
print(f"New connection to {db}")
right after the import (before bind), and add the same for new_db as well
Evan
ok cool thanks
Evan
well at least we know it's not that it's generating a bunch of connections
Evan
i'm pretty sure it's an io issue - meaning i have a local machine that i'm running all this stuff and connecting to a remote server
Evan
dunno if i could somehow run all this server side using pony to test this
Evan
(only have had 1 `new connection print`)
B
Hi
How I can get json field from table without deserialization?
Alexander
You can use raw_sql: https://pastebin.com/ah10FN9L
B
B
_word = select(w for w in Words if w.num == num)
_data = select( (w.word, w.num, w.data_for_words.data_main) for w in _word)
_data is the Query type and _data.get() returns a tuple.
Can I get a dict out of it or should I rely on the order of the tuple and only create my own dict?
Amo
Hi, what is the best way to compare the database before and after an operation in a unit test? I want to check that all but the entity I want to change stays the same.
Amo
Alexander
B
Yes, to_dict should be suitable for this
There is no Entity at the "select" multiple values statement. I find out that if I compose the final DB query with only "select" statements (even if the "select" statements belongs to the different variables) , Pony combain such statements in a single resulting "select" DB request. If I start using "get" statement in the query's chain, there are many "select" DB requests will be generated. I tried to reduce the number of requests to DB, although I don't know how exactly does it affect at the delays of obtaining data from DB within one session "db_session" piece of code, which should have a cache.
Yuri
Hi there, is there any way to define own foreign key for relation?
Or how i can use signed bigint as primary key? cause pony orm prevent it for me and says
ValueError: Value -1001323148007 of attr Model.id is less than the minimum allowed value -2147483648
while pk is defined like a id = PrimaryKey(int, unsigned=False, sql_type='BIGINT')
Alexander
Yuri
Yuri
now another
DataError: 1366 Incorrect string value: '\xF0\x9D\x90\x85\xF0\x9D...'
while
table_options = 'ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci'
Alexander
Do you use Python 2 or Python 3?
Yuri
Alexander
I think this error is not related to Pony, and the string contains symbols that cannot be stored with this charset
Yuri
Is there any way to create morph relation? so like, user has media, media is video or music, and both has one user
Yuri
also interesting how to create or update at once, so like i dont need User.get, if not user, User(..)
Alexander
user = User.get(...) or User(...)
Alexander
Thats the shortest way
Yuri
how do you live without create_or_update and similar, and also without morhp's
Amo
Amo
So user has the 'reverse' relation to media
Amo
Something like
Class user()
media = optional('Media')
Class media()
User = optional(User)
Class video(Media):
Class music(Media) :
Yuri
its true, but in result when i get user.media it will be type of media, not a video or audio
Amo
Have a look at User.media.classtype
Amo
And cast it
Yuri
will pony take all data from table? not only relation id? so i mean when i cast it, all props will be filled
Yuri
its actually case, which is also can't be implemented in laravel normaly
Amo
I don't know about the internals, but what do you want to do?
In my case I used this to create different data classes in my domain model
Amo
To show to users etc. Not to pass them directly back into the database
Yuri
if do like i want, and then select user.media the media will be collection of two different model types
Yuri
its not ok too
Amo
Sorry, I don't understand your issue.
Amo
You can enable SQL log and do a media.get(id=2) on a music entity and check if this helps you
꧁Rhͪaͣpsoͦdͩoͦs93꧂
any news about python3.11 support?
Alexander
Not yet, sorry...
Andrey
#autoleave
Oleksandr
Vitaliy
Hello Alexander! @metaprogrammer
I found a bug related to Json in MySQL.
Let's assume we have simplified model and here are examples of insert/fetch queries:
class User(db.Entity):
emails = Required(Json, default=[])
User(emails=['test@te.st', 'other@ema.il'])
select(u for u in User if 'test@te.st' in u.emails)
This query work fine generating following WHERE condition:
json_contains(`u`.`contact`, '["test@te.st"]', '$') or json_contains_path(`u`.`contact`, 'one', '$."test@te.st"')
As we see you designed it to work for both lists and dicts.
But if we use variable in the query:
value = 'test@te.st'
select(u for u in User if value in u.emails):
it fails with an error:
pony.orm.dbapiprovider.OperationalError: (3143, 'Invalid JSON path expression. The error is around character position 1.')
generating such a condition:
json_contains(`u`.`contact`, '["test@te.st"]', '$') or json_contains_path(`u`.`contact`, 'one', '["test@te.st"]')
After debugging a couple of hour I still not understand how too fix it quickly.
The only workaround is to run query as string like select('u for u in User if %r in u.emails' % value).
Can you please take a look into this issue?