@ponyorm - страница 207 - Telegram web archive

Evan

when I try to generate_mapping(create_tables=True) I get the following - is it possible to create the schema through ponyorm or should I use a different tool like pgadmin? InvalidSchemaName: schema "schema1" does not exist

Alexander

You need to create schema using pgadmin

Evan

had another question - I'm trying to copy from one database to another so and there are 6500 rows in the database with some hops etc this seems to take a huge amount of time 4-5hours to complete - any idea why? or what I can do to make it faster? Is this an i/o issue (ie i'm loading things from a db locally then saving them back which means there are 2 round trips?) thank you in advance

Alexander

Probably another process also works with the same DB, and some blocking issues arise. 6500 rows is a very small amount and should be copied instantly Maybe you create a separate db_session for each row copied, it can slow down the process. You can try to wrap writing all these rows to a second DB into a single db_session

Evan

could this message be related? "connection to server at "XXXXXX" port XXX failed: FATAL: remaining connection slots are reserved for non-replication superuser connections does pony somehow leave connections outstanding?

Alexander

How your script that copies data is organized? Does it use multiple threads/processes or async code?

Evan

nope this is what it looks like and the total rows across all tables is ~250k which still seems small the 6.5k rows are for the equivalent of Table_1 below with orm.db_session: n = old_db.Table_1.select().count() for t in tqdm.notebook.tqdm(old_db.Table_1.select(), total=n): t_ = new_db.Table_1( attr_1=t.attr_1, attr_2=t.attr_2, ... ) for s1 in t.set_1: new_db.Table_2( attr_1=s1.attr_1, attr_2=s2.attr_2, ... ref_table_1=t_, ) orm.commit()

Evan

(or at least I think I don't have any async code)

Evan

lmk if the above is not the right approach or if you want some additional information old_db.Table_1 and new_db.Table_1 reside in different databases but on the same server

Radim

maybe commiting after each line also slowing this down?

Alexander

definitely

Evan

the reason I am doing that was that in the past It would go through everything and just get stuck on the commit (I think) as it was exiting the db_session and i had no way of knowing how long it would take and abandoning at that point meant that I would lose everything done so far as opposed to right now everything commited so far is not "lost" sounds like that is the wrong approach?

Evan

but it sholds like the code above that I've written is the right pattern? my biggest worry was that it effectively loads from a remote location data locally and then pushes that data back out and this i/o is the cause of the delay but seems like you guys think it's unlikely to be that

Alexander

You insert some rows to Table_1, and for each row, you insert some rows to Table_2. How many rows in Table_2, on average you have for a single row in Table_1? Each orm.commit() call finishes a transaction and starts a new one. For a single transaction, I think inserting around 10000 rows (in total for Table_1 and Table_2) is good. Each commit takes time, so doing one commit per 1 row may slow down the entire process significantly. Conversely, if you insert 100K rows per commit, it can hang for some time.

Alexander

But it is possible that you problem somehow caused by too big number of opened connections to the DB

Alexander

Pony caches connections per db instance per thread and reuses them for the following db_session. I hope you have a single db instance for receiving DB and you reuse it for all inserts.

Alexander

Also, if I understand correctly, the number 6500 rows you mentioned previously is for Table_1, and you have some unspecified (possibly huge) number of rows in Table_2 for each row in Table_1. Maybe the total number of rows is not that small

Evan

the total is as follows, there are 7 tables and Table 1 has Set(XXX) for each of the other tables basically. So total number of rows across everything is smth like 250k rows so I'd say that Table 1 has 245/6.5 = 40 rows on average (very rough math) 'Table 1': 6.5k, 'Table 2': 6.5k, 'Table 3': 24k, 'Table 4': 6.5k, 'Table 5': 53k, 'Table 6': 1, 'Table 7': 153k

Evan

ok so maybe by committing all the tim that's overkill but should do it every 100'th row of the Table1 row

Alexander

you can try

Evan

also not sure what you mean by the caching etc. my code will literally look like the one above that I showed you. I might have another jupyter notebook open where i'm running stuff talking to the database (eg to answer some of the questions you guys posed here). i assume once I kill the jupyter notebook the connection dies with it right?

Alexander

I think yes

Alexander

could this message be related? "connection to server at "XXXXXX" port XXX failed: FATAL: remaining connection slots are reserved for non-replication superuser connections does pony somehow leave connections outstanding?

I'm just wondering what may be the reason for this connection slot shortage. Probably the slowness is caused by a too big number of opened parallel connections to the DB, but I cannot guess the reason for it

Evan

also if I do: with orm.db_session: stuff() with orm.db_session: more_stuff() have a I created 2 connections by mistake or is that fine in terms of what you said about having "a single db instance for receiving DB and you reusee it for all inserts"

Evan

how many connections is "a lot"? currently I think the db is set up to allow 22 open connections at a time

Alexander

No, the second db_session should reuse the connection created by the first db_session

Alexander

If the copying script does not contain anything else than what you provided, then Pony should create just a single connection to the receiving DB

Evan

is there a way to check that? like can I print that out or examine some internal orm.XXXX method / attribute?

Alexander

Before copying the data, you can add the following @old_db.on_connect def new_connection(db, connection): print(f"New connection to {db}")

Evan

with orm.db_session: # Here? @old_db.on_connect def new_connection(db, connection): print(f"New connection to {db}") n = old_db.Table_1.select().count() for t in tqdm.notebook.tqdm(old_db.Table_1.select(), total=n): t_ = new_db.Table_1( attr_1=t.attr_1, attr_2=t.attr_2, ... ) for s1 in t.set_1: new_db.Table_2( attr_1=s1.attr_1, attr_2=s2.attr_2, ... ref_table_1=t_, ) orm.commit()

Evan

btw made it commit only every 100 rows and it went through each row 2x as fast but stuck now on the commit

Alexander

> Here? No, before with db_session, probably right after old_db definition

Evan

sorry also just to clarify old_db is a orm.Database() instance right? so it should look something like the below? from old_lib import old_db # old_db.database is a orm.Database() instance old_db.database.bind(**kwargs) old_db.database.generate_mapping(create_tables) with orm.db_session: @old_db.database.on_connect def new_connection(db, connection): print(f"New connection to {db}")

Alexander

No, put @old_db.database.on_connect def new_connection(db, connection): print(f"New connection to {db}") right after the import (before bind), and add the same for new_db as well

Evan

ok cool thanks

Evan

well at least we know it's not that it's generating a bunch of connections

Evan

i'm pretty sure it's an io issue - meaning i have a local machine that i'm running all this stuff and connecting to a remote server

Evan

dunno if i could somehow run all this server side using pony to test this

Evan

(only have had 1 `new connection print`)

B

Hi How I can get json field from table without deserialization?

Alexander

You can use raw_sql: https://pastebin.com/ah10FN9L

B

You can use raw_sql: https://pastebin.com/ah10FN9L

Oh, thanks

B

_word = select(w for w in Words if w.num == num) _data = select( (w.word, w.num, w.data_for_words.data_main) for w in _word) _data is the Query type and _data.get() returns a tuple. Can I get a dict out of it or should I rely on the order of the tuple and only create my own dict?

Amo

Hi, what is the best way to compare the database before and after an operation in a unit test? I want to check that all but the entity I want to change stays the same.

Amo

_word = select(w for w in Words if w.num == num) _data = select( (w.word, w.num, w.data_for_words.data_main) for w in _word) _data is the Query type and _data.get() returns a tuple. Can I get a dict out of it or should I rely on the order of the tuple and only create my own dict?

Entities have a to_dict() method

Alexander

Entities have a to_dict() method

Yes, to_dict should be suitable for this

Alexander

Hi, what is the best way to compare the database before and after an operation in a unit test? I want to check that all but the entity I want to change stays the same.

How do you check that entity is the same? Do you check it in Python?

Amo

How do you check that entity is the same? Do you check it in Python?

Currently I select the entities I changed and compare them e.g. As a dict. But it would be nice to know if there were some side effects that could change other entities (not intended)

B

Yes, to_dict should be suitable for this

There is no Entity at the "select" multiple values statement. I find out that if I compose the final DB query with only "select" statements (even if the "select" statements belongs to the different variables) , Pony combain such statements in a single resulting "select" DB request. If I start using "get" statement in the query's chain, there are many "select" DB requests will be generated. I tried to reduce the number of requests to DB, although I don't know how exactly does it affect at the delays of obtaining data from DB within one session "db_session" piece of code, which should have a cache.

Yuri

Hi there, is there any way to define own foreign key for relation? Or how i can use signed bigint as primary key? cause pony orm prevent it for me and says ValueError: Value -1001323148007 of attr Model.id is less than the minimum allowed value -2147483648 while pk is defined like a id = PrimaryKey(int, unsigned=False, sql_type='BIGINT')

Alexander

Hi there, is there any way to define own foreign key for relation? Or how i can use signed bigint as primary key? cause pony orm prevent it for me and says ValueError: Value -1001323148007 of attr Model.id is less than the minimum allowed value -2147483648 while pk is defined like a id = PrimaryKey(int, unsigned=False, sql_type='BIGINT')

Add size=64 to PrimaryKey

Yuri

Add size=64 to PrimaryKey

thank you

Yuri

now another DataError: 1366 Incorrect string value: '\xF0\x9D\x90\x85\xF0\x9D...' while table_options = 'ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci'

Alexander

Do you use Python 2 or Python 3?

Yuri

Do you use Python 2 or Python 3?

3

Alexander

I think this error is not related to Pony, and the string contains symbols that cannot be stored with this charset

Yuri

Is there any way to create morph relation? so like, user has media, media is video or music, and both has one user

Yuri

also interesting how to create or update at once, so like i dont need User.get, if not user, User(..)

Alexander

user = User.get(...) or User(...)

Alexander

Thats the shortest way

Yuri

how do you live without create_or_update and similar, and also without morhp's

Amo

Is there any way to create morph relation? so like, user has media, media is video or music, and both has one user

You can use inheritance. Video and music inherit from media. And media has a user

Amo

So user has the 'reverse' relation to media

Amo

Something like Class user() media = optional('Media') Class media() User = optional(User) Class video(Media): Class music(Media) :

Yuri

its true, but in result when i get user.media it will be type of media, not a video or audio

Amo

Have a look at User.media.classtype

Amo

And cast it

Yuri

will pony take all data from table? not only relation id? so i mean when i cast it, all props will be filled

Yuri

its actually case, which is also can't be implemented in laravel normaly

Amo

I don't know about the internals, but what do you want to do? In my case I used this to create different data classes in my domain model

Amo

To show to users etc. Not to pass them directly back into the database

Yuri

if do like i want, and then select user.media the media will be collection of two different model types

Yuri

its not ok too

Amo

Sorry, I don't understand your issue.

Amo

You can enable SQL log and do a media.get(id=2) on a music entity and check if this helps you

꧁Rhͪaͣpsoͦdͩoͦs93꧂

any news about python3.11 support?

Alexander

Not yet, sorry...

Andrey

#autoleave

Oleksandr

Весеннее ботообострение

Vitaliy

Hello Alexander! @metaprogrammer I found a bug related to Json in MySQL. Let's assume we have simplified model and here are examples of insert/fetch queries: class User(db.Entity): emails = Required(Json, default=[]) User(emails=['test@te.st', 'other@ema.il']) select(u for u in User if 'test@te.st' in u.emails) This query work fine generating following WHERE condition: json_contains(`u`.`contact`, '["test@te.st"]', '$') or json_contains_path(`u`.`contact`, 'one', '$."test@te.st"') As we see you designed it to work for both lists and dicts. But if we use variable in the query: value = 'test@te.st' select(u for u in User if value in u.emails): it fails with an error: pony.orm.dbapiprovider.OperationalError: (3143, 'Invalid JSON path expression. The error is around character position 1.') generating such a condition: json_contains(`u`.`contact`, '["test@te.st"]', '$') or json_contains_path(`u`.`contact`, 'one', '["test@te.st"]') After debugging a couple of hour I still not understand how too fix it quickly. The only workaround is to run query as string like select('u for u in User if %r in u.emails' % value). Can you please take a look into this issue?