Lucky

Like as long as they have the same superclass?

Alexander

No, if you have Student which inherited from Person, and two instances, Person(id=1) and Student(id=2), then Person.get(id=2) will return Student[1], but Student.get(id=1) will return None, because Person[1] is not Student

Lucky

Can I just do Post.get(…) and later check if isinstance(db_obj, MediaPost)?

Alexander

yes

Lucky

So using the baseclass is fine. Great.

Roman

I have those entities: class Language(db.Entity): id = orm.PrimaryKey(int, auto=True) name = orm.Required(langcodes.Language) labels = orm.Set('Label', lazy=True) class Label(db.Entity): id = orm.PrimaryKey(int, auto=True) text = orm.Required(str, index=True) language = orm.Required(Language) The problem is that when I do: russian = Language.get(name='ru') for l in russian.labels: ... It loads all Label instances for Russian (more, than 600k) into memory. Why is that?

Alexander

It should load all labels belonging to Russian language. Do you mean it loads other labels too?

Alexander

Are you iterating over labels of a single language only, or you have a loop over all languages?

Roman

@metaprogrammer Sorry, I fixed the message. What if I don't want to load all Russian Label instances into memory, but want rather iterate over them?

Alexander

Initially Pony was developed for usage in web applicatons, where you load a small number of objects to render some web page as fast as possible and close db_session. It was not intended to load a millions of objects in a sinlgle batch for some analytic data-processing or something like that. Because of this Pony caches all objects loaded in a single db_session to avoid redundant queries and generate web page as quickle as possible. So, at this moment Pony does not "unload" objects from db_session cache until db_session ends. It is a non-trivial task to unload object from the cache before end of db_session - it is not easy to determine which objects should be unloaded and which should stay in the cache, because they may be used later. Iterating over 600k objects is a slow process even if objects are someway unloaded later, and should be avoided if possible. You can consider the following options: 1) Don't load labels at all, if they need just for another query, use them directly from the database. 2) Don't iterate over all objects, specify condition to retrieve only the small subset of objects that are really necessary: for l in russian.labels.select(lambda label: <some condition>): ... 3) Split operation into several db_sessions, each of them loads only a small subset of labels 4) If it is really necessary to read all 600k names from the database at once, you can load just names and not entire objects: names = select(label.name for label in russian.labels).fetch() for name in names: ... This way only string names will be loaded, but no objects will be created in memory. Pony still caches the result of a query holding these 600k strings, but you can clear it using internal API after you complete the iteration: db._get_cache().query_results.clear()

Roman

Initially Pony was developed for usage in web applicatons, where you load a small number of objects to render some web page as fast as possible and close db_session. It was not intended to load a millions of objects in a sinlgle batch for some analytic data-processing or something like that. Because of this Pony caches all objects loaded in a single db_session to avoid redundant queries and generate web page as quickle as possible. So, at this moment Pony does not "unload" objects from db_session cache until db_session ends. It is a non-trivial task to unload object from the cache before end of db_session - it is not easy to determine which objects should be unloaded and which should stay in the cache, because they may be used later. Iterating over 600k objects is a slow process even if objects are someway unloaded later, and should be avoided if possible. You can consider the following options: 1) Don't load labels at all, if they need just for another query, use them directly from the database. 2) Don't iterate over all objects, specify condition to retrieve only the small subset of objects that are really necessary: for l in russian.labels.select(lambda label: <some condition>): ... 3) Split operation into several db_sessions, each of them loads only a small subset of labels 4) If it is really necessary to read all 600k names from the database at once, you can load just names and not entire objects: names = select(label.name for label in russian.labels).fetch() for name in names: ... This way only string names will be loaded, but no objects will be created in memory. Pony still caches the result of a query holding these 600k strings, but you can clear it using internal API after you complete the iteration: db._get_cache().query_results.clear()

Thank you for a such detailed answer!

М

Hello, I have a question. Can I somehow unbound object from session? I want to retrieve it from db, close session, and then do stuff with this object without any connection with db.

Alexander

Hello, I have a question. Can I somehow unbound object from session? I want to retrieve it from db, close session, and then do stuff with this object without any connection with db.

That might help if I understand you correctly https://docs.ponyorm.org/api_reference.html?highlight=proxy#make_proxy

М

That might help if I understand you correctly https://docs.ponyorm.org/api_reference.html?highlight=proxy#make_proxy

Interesting thing, but not exactly what I looking for. I want to use object outside of any session. To be precise, I want to read its attributes, defined in model.

Jim

Use to_dict() ?

М

Use to_dict() ?

And why didn't I thought about that. Thanks!

Lucky

Hey, I’m running a celery worker and it’s memory needed is fastly raising up to 10+ GB. Apparently I have those objects somewhere laying around. Is there a way to trace such issues down efficiently, like other than “staring at the code”? (I did that, for the last few days)

Alexander

Did you try to use db_session with strict=True option?

Lucky

Will this also prevent opening another db session?

Alexander

No. Maybe I'm not fully understand the last question

Alexander

Do you mean nested db_sessions or what?

Lucky

Do you mean nested db_sessions or what?

yes

Matthew

I have a weird error that I haven’t seen before:

Matthew

21:07:13 UnexpectedError: Object MyModel[new:1] cannot be stored in the database. DataError: integer out of range

Matthew

The two integer fields in that model have values of approx 34,000

Matthew

which seems normal

Matthew

is it the primary key being 32 bit?

Alexander

it should be

Matthew

ah yes it is 🙂

Matthew

As soon as I type out my issue, I come up with the solution 🙂

Alexander

what was the problem?

Matthew

background tasks were failing en masse because a record object couldn’t be saved to the database

Lucky

what was the solution?

Matthew

The solution was to convert the primary key to being 64 bit both in the model and in the database

Anatoliy

Hello

Anatoliy

When i try serialization my object i got "TypeError: keys must be str, int, float, bool or None, not UUID"

Anatoliy

I try swith field to string but it not help

М

When i try serialization my object i got "TypeError: keys must be str, int, float, bool or None, not UUID"

Can you provide your model and serialization code?

Anatoliy

class PaymentSource(db.Entity): _table_ = ("bill","paymentsource") uid = PrimaryKey(uuid.UUID) uid_account = Optional(uuid.UUID) name = Required(str)

Anatoliy

PrimaryKey is UUID

Anatoliy

from pony.orm.serialization import to_dict ps = select(p for p in PaymentSource)[:] result = to_dict(ps) return json.dumps(result)

Anatoliy

It broken on to_dict

Anatoliy

PaymentSource[UUID('f60656a6-0747-11e8-ad2e-875b46de56b8')] PaymentSource[UUID('f60656a7-0747-11e8-ad2e-87e0eda982ea')]

Anatoliy

broken on this

Matthew

Does it work if you call to_dict on each p rather than ps?

Matthew

result = [p.to_dict() for p in ps]

Anatoliy

no it not work

Anatoliy

It broken on PaymentSource[UUID('f60656a6-0747-11e8-ad2e-875b46de56b8')]

Anatoliy

I try use to_dict on one object and got same result

М

Just to clarify: you did try to use to_dict as a method of your object, not as external function, right?

Anatoliy

with db_session: paymentSources = select(p.to_dict for p in PaymentSource)

Anatoliy

translator.expr_columns = monad.getsql() AttributeError: 'HybridMethodMonad' object has no attribute 'getsql'

Anatoliy

if add ()

Anatoliy

throw(NotImplementedError('Unsupported operation: %s' % opname)) File "/usr/local/lib/python3.7/site-packages/pony/utils/utils.py", line 106, in throw raise exc NotImplementedError: Unsupported operation: JUMP_ABSOLUTE

Alexander

Hi, what version of Pony do you use?

М

with db_session: paymentSources = select(p.to_dict for p in PaymentSource)

Not like that

Anatoliy

pip3 list | grep pony pony 0.7.10

М

You get ps as you do now, and then do, what Mattew has written

М

result = [p.to_dict() for p in ps]

Anatoliy

with db_session: ps = select(p for p in PaymentSource)[:] result = [p.to_dict() for p in ps]

Anatoliy

right?

М

Yea

Anatoliy

{'uid': UUID('f60656a6-0747-11e8-ad2e-875b46de56b8'), 'uid_account': None}

Anatoliy

you think this is valid json? :)

М

It's a valid dict)

Anatoliy

i checked simplejson rapidjson metamagic.json

Anatoliy

and got same error :)

Alexander

This is dict object, not json

Alexander

to_dict is just a dict with a Python key/value pairs, you can then serialize it to JSON using json.dumps(d, default=some_custom_function) where some_custom_function knows how to serialize UUID to JSON. UUID is not a standard type for JSON

Anatoliy

yep but i need json

Alexander

UUID is not JSON serializable

Alexander

Not with Pony but with uuid itself

Anatoliy

UUID is not JSON serializable

For example rapidjson can serialize UUID in text

М

To make json, you should convert uuid to string, basically, and then do json.dump and stuff

Alexander

https://stackoverflow.com/a/48159596/4377521 This might help

Anatoliy

Problem with this json.dumps(d, default=some_custom_function)

Anatoliy

it not worked on dict keys

Alexander

but if I understand correctly, UUID object is not dict key, but dict value

Anatoliy

if you try get from select

Anatoliy

[PaymentSource[UUID('f60656a6-0747-11e8-ad2e-875b46de56b8')], PaymentSource[UUID('f60656a7-0747-11e8-ad2e-87e0eda982ea')], PaymentSource[UUID('f60656a8-0747-11e8-ad2e-733dbd6dbaa0')], PaymentSource[UUID('2a11d482-cfb2-11e9-94e8-eb9bc875b46e')]]