Lucky
Like as long as they have the same superclass?
Alexander
No, if you have Student which inherited from Person, and two instances, Person(id=1) and Student(id=2), then Person.get(id=2) will return Student[1], but Student.get(id=1) will return None, because Person[1] is not Student
Lucky
Can I just do Post.get(…) and later check if isinstance(db_obj, MediaPost)?
Alexander
yes
Lucky
So using the baseclass is fine. Great.
Roman
I have those entities:
class Language(db.Entity):
id = orm.PrimaryKey(int, auto=True)
name = orm.Required(langcodes.Language)
labels = orm.Set('Label', lazy=True)
class Label(db.Entity):
id = orm.PrimaryKey(int, auto=True)
text = orm.Required(str, index=True)
language = orm.Required(Language)
The problem is that when I do:
russian = Language.get(name='ru')
for l in russian.labels:
...
It loads all Label instances for Russian (more, than 600k) into memory.
Why is that?
Alexander
It should load all labels belonging to Russian language. Do you mean it loads other labels too?
Alexander
Are you iterating over labels of a single language only, or you have a loop over all languages?
Roman
@metaprogrammer Sorry, I fixed the message.
What if I don't want to load all Russian Label instances into memory, but want rather iterate over them?
Alexander
Initially Pony was developed for usage in web applicatons, where you load a small number of objects to render some web page as fast as possible and close db_session. It was not intended to load a millions of objects in a sinlgle batch for some analytic data-processing or something like that. Because of this Pony caches all objects loaded in a single db_session to avoid redundant queries and generate web page as quickle as possible.
So, at this moment Pony does not "unload" objects from db_session cache until db_session ends. It is a non-trivial task to unload object from the cache before end of db_session - it is not easy to determine which objects should be unloaded and which should stay in the cache, because they may be used later.
Iterating over 600k objects is a slow process even if objects are someway unloaded later, and should be avoided if possible. You can consider the following options:
1) Don't load labels at all, if they need just for another query, use them directly from the database.
2) Don't iterate over all objects, specify condition to retrieve only the small subset of objects that are really necessary:
for l in russian.labels.select(lambda label: <some condition>):
...
3) Split operation into several db_sessions, each of them loads only a small subset of labels
4) If it is really necessary to read all 600k names from the database at once, you can load just names and not entire objects:
names = select(label.name for label in russian.labels).fetch()
for name in names:
...
This way only string names will be loaded, but no objects will be created in memory. Pony still caches the result of a query holding these 600k strings, but you can clear it using internal API after you complete the iteration:
db._get_cache().query_results.clear()
Roman
Initially Pony was developed for usage in web applicatons, where you load a small number of objects to render some web page as fast as possible and close db_session. It was not intended to load a millions of objects in a sinlgle batch for some analytic data-processing or something like that. Because of this Pony caches all objects loaded in a single db_session to avoid redundant queries and generate web page as quickle as possible.
So, at this moment Pony does not "unload" objects from db_session cache until db_session ends. It is a non-trivial task to unload object from the cache before end of db_session - it is not easy to determine which objects should be unloaded and which should stay in the cache, because they may be used later.
Iterating over 600k objects is a slow process even if objects are someway unloaded later, and should be avoided if possible. You can consider the following options:
1) Don't load labels at all, if they need just for another query, use them directly from the database.
2) Don't iterate over all objects, specify condition to retrieve only the small subset of objects that are really necessary:
for l in russian.labels.select(lambda label: <some condition>):
...
3) Split operation into several db_sessions, each of them loads only a small subset of labels
4) If it is really necessary to read all 600k names from the database at once, you can load just names and not entire objects:
names = select(label.name for label in russian.labels).fetch()
for name in names:
...
This way only string names will be loaded, but no objects will be created in memory. Pony still caches the result of a query holding these 600k strings, but you can clear it using internal API after you complete the iteration:
db._get_cache().query_results.clear()
Thank you for a such detailed answer!
М
Hello, I have a question. Can I somehow unbound object from session? I want to retrieve it from db, close session, and then do stuff with this object without any connection with db.
Alexander
Jim
Use to_dict() ?
Lucky
Hey, I’m running a celery worker and it’s memory needed is fastly raising up to 10+ GB.
Apparently I have those objects somewhere laying around.
Is there a way to trace such issues down efficiently,
like other than “staring at the code”? (I did that, for the last few days)
Alexander
Did you try to use db_session with strict=True option?
Lucky
Will this also prevent opening another db session?
Alexander
No. Maybe I'm not fully understand the last question
Alexander
Do you mean nested db_sessions or what?
Lucky
Matthew
I have a weird error that I haven’t seen before:
Matthew
21:07:13 UnexpectedError: Object MyModel[new:1] cannot be stored in the database. DataError: integer out of range
Matthew
The two integer fields in that model have values of approx 34,000
Matthew
which seems normal
Matthew
is it the primary key being 32 bit?
Alexander
it should be
Matthew
ah yes it is 🙂
Matthew
As soon as I type out my issue, I come up with the solution 🙂
Alexander
what was the problem?
Matthew
background tasks were failing en masse because a record object couldn’t be saved to the database
Lucky
what was the solution?
Matthew
The solution was to convert the primary key to being 64 bit both in the model and in the database
Anatoliy
Hello
Anatoliy
When i try serialization my object i got "TypeError: keys must be str, int, float, bool or None, not UUID"
Anatoliy
I try swith field to string but it not help
М
Anatoliy
class PaymentSource(db.Entity):
_table_ = ("bill","paymentsource")
uid = PrimaryKey(uuid.UUID)
uid_account = Optional(uuid.UUID)
name = Required(str)
Anatoliy
PrimaryKey is UUID
Anatoliy
from pony.orm.serialization import to_dict
ps = select(p for p in PaymentSource)[:]
result = to_dict(ps)
return json.dumps(result)
Anatoliy
It broken on to_dict
Anatoliy
PaymentSource[UUID('f60656a6-0747-11e8-ad2e-875b46de56b8')]
PaymentSource[UUID('f60656a7-0747-11e8-ad2e-87e0eda982ea')]
Anatoliy
broken on this
Matthew
Does it work if you call to_dict on each p rather than ps?
Matthew
result = [p.to_dict() for p in ps]
Anatoliy
no it not work
Anatoliy
It broken on PaymentSource[UUID('f60656a6-0747-11e8-ad2e-875b46de56b8')]
Anatoliy
I try use to_dict on one object and got same result
М
Just to clarify: you did try to use to_dict as a method of your object, not as external function, right?
Anatoliy
with db_session:
paymentSources = select(p.to_dict for p in PaymentSource)
Anatoliy
translator.expr_columns = monad.getsql()
AttributeError: 'HybridMethodMonad' object has no attribute 'getsql'
Anatoliy
if add ()
Anatoliy
throw(NotImplementedError('Unsupported operation: %s' % opname))
File "/usr/local/lib/python3.7/site-packages/pony/utils/utils.py", line 106, in throw
raise exc
NotImplementedError: Unsupported operation: JUMP_ABSOLUTE
Alexander
Hi, what version of Pony do you use?
М
Anatoliy
pip3 list | grep pony
pony 0.7.10
М
You get ps as you do now, and then do, what Mattew has written
М
result = [p.to_dict() for p in ps]
Anatoliy
with db_session:
ps = select(p for p in PaymentSource)[:]
result = [p.to_dict() for p in ps]
Anatoliy
right?
М
Yea
Anatoliy
{'uid': UUID('f60656a6-0747-11e8-ad2e-875b46de56b8'), 'uid_account': None}
Anatoliy
you think this is valid json? :)
М
It's a valid dict)
Anatoliy
i checked simplejson rapidjson metamagic.json
Anatoliy
and got same error :)
Alexander
This is dict object, not json
Alexander
to_dict is just a dict with a Python key/value pairs, you can then serialize it to JSON using json.dumps(d, default=some_custom_function) where some_custom_function knows how to serialize UUID to JSON. UUID is not a standard type for JSON
Anatoliy
yep but i need json
Alexander
UUID is not JSON serializable
Alexander
Not with Pony but with uuid itself
М
To make json, you should convert uuid to string, basically, and then do json.dump and stuff
Alexander
https://stackoverflow.com/a/48159596/4377521
This might help
Anatoliy
Problem with this json.dumps(d, default=some_custom_function)
Anatoliy
it not worked on dict keys
Alexander
but if I understand correctly, UUID object is not dict key, but dict value
Anatoliy
if you try get from select
Anatoliy
[PaymentSource[UUID('f60656a6-0747-11e8-ad2e-875b46de56b8')], PaymentSource[UUID('f60656a7-0747-11e8-ad2e-87e0eda982ea')], PaymentSource[UUID('f60656a8-0747-11e8-ad2e-733dbd6dbaa0')], PaymentSource[UUID('2a11d482-cfb2-11e9-94e8-eb9bc875b46e')]]