Anonymous
Hi! I have a model with some Optional fields, but I noticed something, and I want to be sure it's Pony related and not the result of something I did. Suppose I have a model with some required fields and also with X = Optional(date) and Y = Optional(str). Then I create an instance of the model, without providing values for X and Y I noticed that in that case X is assigned a None value, but Y is stored as an empty string. Is that how Pony works? Thanks!
Alexander
Yes, this is how Pony and most other ORM works: https://docs.ponyorm.com/api_reference.html#optional-string-attributes It is possible to use two different values to represent missing strings - empty string and null. It may be confusing if both empty string and null are used in the same column, because some queries may return confusing results. So it is better to use just a single way to represent missing string values, and disallow another way for consistency. It is easier to disallow null than to disallow empty string values. Also, empty strings can be indexed, whereas null values are not indexed.
Anonymous
Ahhh ok
Anonymous
It makes sense
Anonymous
Thanks for the clarification!
Matthew
Is JOIN() no longer in the documentation? I just spent 10 minutes looking on the site and googling but I can't find it?
Matthew
I'm sure I'm just overlooking something
Matthew
I'm trying to speed up a query like this: select(r for r in R if r.p in u.ps) u.ps is approx 3000 items, so there is a big IN part to the query, JOIN() doesn't seem to help, what can I do?
Matthew
Table is approx 2 million rows, with 3000 ps, it takes 25 - 30 seconds
Matthew
ah with inversing the IN and JOIN it gets a lot faster
Alexey
try this way select(r for r in R if exists(u for u in U if r.p == u.ps))
Alexey
ah, wait, u.ps is a set
Matthew
Yep :)
Matthew
I think it's a database issue fundamentally
Matthew
50% of execution time is being taken by filtering a double precision column for some reason
Matthew
-> Seq Scan on review r (cost=0.00..148774.57 rows=1782971 width=474) Filter: ((review_rating >= 1::double precision) AND (review_rating <= 5::double precision))
Matthew
which means i probably need to adjust indexes
Alexey
ok, I see
Henri
Is it possible to make non persistent changes to a table? E.g. creating an entity instance for the need of the current request without writing it to the database? Something like fortunes = Fortune.select()[:] fortunes.append(Fortune(message='Additional fortune added at request time.')) When the model looks like class Fortune(db.Entity): message = Optional(str)
Alexander
If you want to pass the list to some frontend layer with some additional element, maybe you can convert objects to tuples or dictionaries. Something like that: fortunes = Fortune.select() fortune_list = [f.to_dict(["id", "message"]) for f in fortunes] fortune_list.append({id: additional_id, message: additional_message})
Alexander
I'm trying to speed up a query like this: select(r for r in R if r.p in u.ps) u.ps is approx 3000 items, so there is a big IN part to the query, JOIN() doesn't seem to help, what can I do?
Matthew, you can rewrite that query to convert big IN list to subquery. Something like that: select(r for r in R if exists(p for p in P if p.some_attr == u)) It is hard to write exact query without seeing your models, but something like that should work
Henri
Thanks! That works for me.😊
Andrey
Hi! Maybe I have a strange question, but is it possible for an entity with a composite primary key to have a self-reference where one part is a common table? Like for the following situation: CREATE TABLE regions ( file_id INTEGER NOT NULL REFERENCES files (id), region_id INTEGER NOT NULL, PRIMARY KEY (file_id, region_id), outer_region_id INTEGER, FOREIGN KEY (file_id, outer_region_id) REFERENCES regions (file_id, region_id), total_lines INTEGER );(outer_region must be in the same file)
Andrey
I've tried the following code, but received pony.orm.core.DBSchemaError: Column 'file_id' already exists in table 'regions': class Region(db.Entity): _table_ = "regions" file = Required(File, column="file_id") region_id = Required(int, column="region_id") PrimaryKey(file, region_id) outer_region = Optional("Region", reverse="inner_regions", columns=["file_id", "outer_region_id"]) inner_regions = Set("Region", reverse="outer_region") total_lines = Optional(int)
Matthew
2017-06-24 15:21:52 UTC LOG: duration: 3846.543 ms statement: UPDATE "reviewproduct" SET "first_review_scrape" = false WHERE "id" = 2301 AND "asin" = 'XXX' AND "country_code" = 'US' AND "first_review_scrape" = false How come pony is adding those WHERE clauses, instead of just specifying the ID? It's the only cause I can think of for a simple UPDATE taking nearly 4 seconds?
Matthew
this happens with: product.first_review_scrape = False as the only relevant code
Matthew
if I do: if product.first_review_scrape: product.first_review_scrape = False Then that slow update isn't issued, is it by design for pony to issue an update even when the value stays the same (False in this case)?
Alexander
Hi Matthew! > How come pony is adding those WHERE clauses, instead of just specifying the ID? Pony doing this in order to do optimistic updates. Concurrent transactions should be isolated from each other. There are two ways to achieve such isolation. The first one is to use locks. It may be slow, because if one transaction locked a row in a table, other transactions need to wait until that lock will be released in order to access that row. The second way is to use optimistic updates, this is what Pony tries to do. When current transaction read some attributes of an object, it remember the values of these attributes. If it is necessary to update some attributes of this object, Pony issues an UPDATE query with additional checks that the attributes which were read by application code still contain the same values. If another concurrent transaction already updated some of this attributes, the UPDATE statement will not work and Pony will know that two different transactions are trying to change the same object using a potentially inconsistent way to do it. In other words: if you application code read x.a and x.b and then decided to change x.c, but another concurrent transaction already changed x.a or x.b, then new value of x.c`may be wrong because it does not take into account updated value of `x.a or x.b > It's the only cause I can think of for a simple UPDATE taking nearly 4 seconds? I don't think that an additional check can slow down an UPDATE query. Typically, additional constraints can only speed up the query, because they give to the database an additional ways to optimise that query. Speaking about your query, I'm pretty sure that the database is smart enough to understand that id column is primary key and found row using access by id, and check another column when row is already found without any significant slowdown. The only reason I can think why your UPDATE query may be slow is that some concurrent transaction have locked that row and then wait 4 seconds before commit. Typically if some transaction issue UPDATE query it should commit as fast as possible in order to release locks and make that row available to another transactions. > is it by design for pony to issue an update even when the value stays the same Yes, it is by design. Pony sees that the application code want to have that flag to be set to True after the transaction is completed. Even if that flag was set to True initially, it is possible that some concurrent optimistic transaction already set it to False. So if Pony does not issue UPDATE query, assuming that the flag value is already True, the end result may be wrong.
Matthew
ok, I had guessed it was by design :)
Matthew
the if statement fixes it, so it isn't a serious problem
Alexander
It is still interesting why UPDATE query takes so long time (if it is really the UPDATE query which adds slowdown). Maybe concurrent transactions do something suboptimal
Alexander
Do you use db_session with some additional flags?
Matthew
It should be the only transaction using that row, as it is a background task set by a scheduler, so no double accesses
Matthew
no, just plain db_session
Matthew
I am spending the last evening of my vacation "fire fighting" postgres performance, it is a love / hate relationship :)
Matthew
i am hoping the other problem is just a case of adding a better index
Alexander
I wish you success with that
Matthew
select((x.date, x.a) for x in X) Is there a cheap way to get max one result per unique date, with the lowest "a" value? I'm currently doing a blind select, with no limit, but it is too slow
Alexander
select((x.date, min(x.a)) for x in X)
Matthew
Thank you! I was overcomplicating it
Matthew
the problem was not solved by a new index or clever code, but by a manual VACUUM
Alexander
You mean slow UPDATE query?
Matthew
no, my other problem, that was related to min() query
Matthew
There had been no autovacuum for 12 days, the table has a lot of daily data churn
Alexander
I think it should be possible to set up autovacuum
Matthew
It is, and I have tweaked it before, but it didn't work, so a cronjob makes sure that it is happening regularly
Anonymous
Hi
Anonymous
I'm reading through the docs
Anonymous
Quick question
Anonymous
Is it possible to generate/define entities automatically?]
Matthew
from what source of information?
Anonymous
from the database itself
Anonymous
based on its schema
Alexander
Right now no, but we are working on it. Early prototype should be available before the end of this week.
Anonymous
Thanks!!
Anonymous
Amazing work :)
Juan
I think that this project is amazing! Congratulations for everyone that help and comment errors. Thank you for the project
Alexey
Arturo @juan_castano and everyone here - Thank you for your support!
Lucky
Wooh!
Alexander
Guys, I need to reset my telegram account (I forgot my Telegram cloud password, and it cannot be recovered). Hope my previous messages will not be deleted 😅
Henri
I have a model like class World(db.Entity): randomnumber = Optional(int) When doing id_ = randint(1, 10000) return {'id': id_, 'randomNumber': self[id_].randomNumber} I get the error TypeError: 'World' object does not support indexing Any idea what's going on?
Henri
BTW self == World
Matthew
is it an instance of World, or the class World?
Alexander
Replace self[id_] to self.__class__[id_]
Matthew
and maybe World.select().random(1) would be better?
Alexander
Probably so
Matthew
Have you already created the 10,000 rows?
Henri
Have you already created the 10,000 rows?
Good question! The table seems to be empty. So this will be the issue.
Matthew
What general problem are you trying to solve?
Henri
Add Morepath framework with PonyORM to http://frameworkbenchmarks.readthedocs.io
Matthew
So you need a reliable record of 10,000 ID -> random ints ?
Henri
Yeah there is a SQL file I need to import I think.
Matthew
import random class World(db.Entity): random_number = Optional(int) @classmethod def get_random_number(cls): return cls.select().random()[0] @classmethod def create_random_numbers(cls): for _ in range(10000): cls(random_number=random.randrange(0, 100)) commit()
Matthew
untested
Matthew
assumes you want random numbers to be 0 - 99 in range
Henri
and maybe World.select().random(1) would be better?
You mean random_world = World.select().random(1) return { 'id': random_world.id, 'randomNumber': random_world.randomNumber }
Henri
BTW after importing the tables I get now a really strange error when trying the above (Partly translated from German): pony.orm.core.TransactionIntegrityError: Object World[new:1] cannot be stored in the database. IntegrityError: ERROR: NULL-value in row »id« hurts Not-Null-Constraint DETAIL: failed line contains (null, 0). No idea where this comes from as I'm not inserting anything.
Henri
Replace self[id_] to self.__class__[id_]
This seems to work. Thanks. (Still getting the above error. But also get it when using 'World' directly.
Henri
OK this is fixed now! 😃