Alexander
We developed PonyJS during the work on fineartbiblio.com. It was successful for this project, but we discover many corner cases where the code didn't work properly. We were able to use it anyway, because we know internal details of PonyJS implementation, and were able to write workarounds. But it will be too complex and time-consuming to explain all this corner-cases and workarounds to other developers. Because of that, we decided to not release public version of PonyJS. We got useful experience from our internal usage of PonyJS, and I think we have understanding how to write "PonyJS 2.0" properly, so it would be usable for all developers. But right now we just don't have time and resources to develop PonyORM and PonyJS in parallel. After we complete most important tasks related to PonyORM, we can start new implementation of PonyJS based on our previous experience.
Yurii
Well then I'll wait. Hope you got it all. :)
Ehsan
Hi guys
Ehsan
Do you know how I can delete a class in pony?
Alexander
Hi! What do you mean by "delete a class"?
Ehsan
Like: Class person (dB.entity):...
Alexander
Do you already have a database filled with useful data that you don't want to lose, or you can just re-create a fresh database?
Ehsan
I want to create a fresh database
Ehsan
*recreate
Alexander
You can clear all tables from the database. To do this, call db.drop_all_tables(with_all_data=True) or you can manually delete all tables using some dbAdmin interface If you use SQLite, instead of these, just delete the database file. Then you can modify your entities, and when you start your program again, correct tables will be created automatically
Alexander
The tables will be created if db.generate_mapping call has option create_tables=True
Ehsan
I think I have no mapping cause I get this error:
Ehsan
ERDiagramError: No mapping was generated for the database
Alexander
after you define all models, in your main program you need to have the following line: db.generate_mapping(create_tables=True)
Alexander
This is the moment where Pony determines which tables are necessary to represent all specified entities, and checks that all tables are presented inside the database.
Ehsan
What did I miss? MappingError: Database object is not bound with a provider yet
Alexander
Before generate_mapping call, you need to specify database parameters to connect, such as host, login, password, etc. You can do it inside Database(...) constructor or inside db.bind(...) method. The parameters are specific to DBMS you use
Alexander
You can find pony/orm/examples/university1.py example and look at db.bind(...) and db.generate_mappping(...) calls
Ehsan
I just creasted one and run the db.generate_mapping(create_tables=True)
Ehsan
I got this error: ERDiagramError: Entity definition Usage was not found
Alexander
You have an attribute like something = Required("Usage") in some of your models, and don't have Usage class definition
Ehsan
Right!
Ehsan
Hi, I am trying again to delete a database and keep getting this error:
Ehsan
MappingError: Mapping was already generated
Ehsan
I already executed these commands:
Ehsan
db.generate_mapping(create_tables=True) db.drop_all_tables(with_all_data=True)
Ehsan
Found the solution, thanks!
Ehsan
Hi Ngalim! db.bind(...) is used to specify database type and parameters of database connection. db.generate_mapping(...) is used to notify Pony: "all necessary entities are defined". At this moment Pony understands which tables should be used to represent specified entities. It is possible to specify create_tables=True option to generate tables, but even if tables are already created, calling generate_mapping is necessary, because this is the moment when Pony creates internal linking between entities and tables. When application is imported, the sequence of operations should be the following: from pony import orm db = Database() class X(db.Entity): ... class Y(db.Entity): ... db.bind(...) db.generate_mapping(...) with db_session: result = select(...) That is, generate_mapping should be called after all entities are defined, but before first select is issued. If you want to put entities in a separate module, I suggest you to define db right in that module, but perform bind and generate_mapping outside of it. This way you separate entity definitions and database configuration options. It may looks something like that: models.py: db = Database() class Person(db.Entity): name = Required(str) controllers.py: from flask import Blueprint from model import db, Person employee_bp = Blueprint(...) main.py: from flask import Flask from controllers import employee_bp from models import db app = Flask(__name__) ... app.register_blueprint(employee_bp) if __name__ == '__main__': db.bind(...) db.generate_mapping() app.run(...)
Matthew
Is there a way to update one attibute on an entity without Pony validating that all other entities are the same? I think I asked this before but I'm afraid I can't remember the answer
Matthew
I have the primary key of the entity
Matthew
I don't need to load the full data of the entity, just one field, then update that one field
Matthew
x = X[id]
Matthew
x.y = x.y * 2
Matthew
is the equivalent of what I'm doing
Alexander
Matthew, if you are sure that the object with that id really exists, you can do the following: x = X._get_by_raw_pkval_((id,)) x.load(X.y) # or x.load("y") x.y = x.y * 2 But I doubt loading one attribute will be much faster then loading all the non-lazy attributes
Matthew
The problem is not the loading of attributes, the update query is extremely slow. This is Postgres being stupid with indexes but it would be nice to avoid the complex update query
Alexander
Can you explain in more details, what in the update query text causes the slowdown in your case? Is it optimistic checks or what?
Alexander
If you update just one column, then the update query should write this specific column only
Matthew
It is making sure that all other columns still have the same values, I think it is a pony sanity check?
Matthew
Like update x set y=5 where I'd=1 and z="existing value"
Alexander
It is to be sure that the value of z column was not modified by concurrent transaction. If your transaction do the following: a) retrieve x object b) read x.z value c) set x.y value to 5 Then it is reasonable to assume that the value of x.z attribute was somewhat important for calculating a new value of x.y attribute. If another concurrent transaction modified x.z before your current transaction saved new value of x.y, then Pony assumes that you probably need to recalculate x.y
Matthew
Ok, is there a way to bypass this without writing raw sql?
Matthew
If I don't read the attributes, will this protection not happen?
Alexander
Yes, if it is possible to not read irrelevant attributes in application code, then these attributes will not be added to optimistic checks.
Matthew
Aha cool
Matthew
I can cache those in redis easily
Matthew
I am comparing one object to another from the recent past
Alexander
Also it is possible to turn off optimistic checks by specifing optimistic flag in db_session: db_session(optimistic=False) But I consider it as a last resort option, because turning optimistic checks off may lead to lost updates from concurrent transactions
Alexander
I am comparing one object to another from the recent past
It seems you are not the only user who are doing something like that. Maybe we need to support some specific workflow for this use-case, I mean, comparing a bag of attribute values for the object with their past values
Alexander
Also, we can add another type of optimistic check. Instead of comparing all columns with their previous values, we can add support for version column which value will be incremented on each update. Then we need to compare the version column only. But the current approach has the benefit that concurrent transactions can update different columns x.a and x.b in parallel without any errors, whereas using version column can produce some false conflicts for such updates
Matthew
For me I am comparing the values of an object currently in the database with the values of an object that I am considering adding to the database, basically duplicate detection based on custom logic
Matthew
if the new data is considered duplicate, the values for new and existing are averaged for one column
Matthew
x.y = (new_x.y + x.py) / 2
Alexander
What you are considering duplicates? An objects with all attributes equal?
Matthew
most attributes equal
Alexander
So, in that case, id column is not really id
Matthew
the ID of the existing object stays static
Matthew
ID is one of the attributes that I filter out yeah
Alexander
Ok, and what is the types of other attribues of these objects? int, str, what else?
Alexander
So, all columns except x.y are natural key, and x.y is the counter?
Matthew
date, string, foreign key, int, float
Matthew
I am fine with just caching the existing object data in redis for my use case
Alexander
When you have new object you need to be sure that this object is not in the database and then insert it. So, you have query like x = X.get(a=a, b=b, c=c, ...) for all attributes except id and y?
Alexander
Then you need to have a composite key for all this attributes
Matthew
I am worried that would lead to a lot of resource usage, this table has 120 million rows
Matthew
so that would be a massive index for the composite key?
Matthew
Your suggestion is working, when I don't access the attributes of the existing object, it is doing a much more simple update query
Alexander
> so that would be a massive index for the composite key? Yes, but it should make select queries much faster and straightforward, and the queries probably will not lock another indexes, so the performance should be much faster at the expence of some disk space Initially I thought that you can add another unique column hash with md5 hash value of serialized value for combined list of columns used for search and comparison. Then the index should be relatively small in size. But it will work if the total list of columns used for search is fixed and will not be extended later, or you will need to re-calculate md5 hash on adding new attribute
Matthew
I never query on most of these columns, apart from duplicate detection
Alexander
But the question is, how ofthen you need to perform duplicate detection
Matthew
probably about 10 - 15 times per day per parent object, of which i have 6 figures
Matthew
so a few million times per day
Matthew
I calculated that 60% of the rows in my big table could be considered duplicate, so my incentive is to shrink that table for performance and resource usage
Alexander
I think having a single index to find a duplicate object should be a great boost for performance I think it is worth to try At first you can include parent attribute and all other attributes used for duplicate search (I hope it is always the same list of attributes for each query), and as a second option add indexed hash column with md5 hash of another column and use it for duplicate search
Matthew
Thank you, I will think about it.
Matthew
I have it with existing_object.to_dict() cached in redis now, which seems to work, so I will monitor it
Ehsan
Hi, what is the best approach for appending data into an Entity in a for loop? Is it like incrementing the id of an entity and then using Person[id].age=...?
Matthew
Create a new entity, Person(age=1) etc
Matthew
@akozlovsky I had an index on "date" when I also had an index on (product, date) which seemed to be confusing postgres. This seemed to be causing a lot of the issues. Maybe I don't need to worry about accessing previous object attributes now
Alexander
I think the first index is unnecessary