@ponyorm - страница 65 - Telegram web archive

Alexander

We developed PonyJS during the work on fineartbiblio.com. It was successful for this project, but we discover many corner cases where the code didn't work properly. We were able to use it anyway, because we know internal details of PonyJS implementation, and were able to write workarounds. But it will be too complex and time-consuming to explain all this corner-cases and workarounds to other developers. Because of that, we decided to not release public version of PonyJS. We got useful experience from our internal usage of PonyJS, and I think we have understanding how to write "PonyJS 2.0" properly, so it would be usable for all developers. But right now we just don't have time and resources to develop PonyORM and PonyJS in parallel. After we complete most important tasks related to PonyORM, we can start new implementation of PonyJS based on our previous experience.

Yurii

Well then I'll wait. Hope you got it all. :)

Ehsan

Hi guys

Ehsan

Do you know how I can delete a class in pony?

Alexander

Hi! What do you mean by "delete a class"?

Ehsan

Like: Class person (dB.entity):...

Alexander

Do you already have a database filled with useful data that you don't want to lose, or you can just re-create a fresh database?

Ehsan

I want to create a fresh database

Ehsan

*recreate

Alexander

You can clear all tables from the database. To do this, call db.drop_all_tables(with_all_data=True) or you can manually delete all tables using some dbAdmin interface If you use SQLite, instead of these, just delete the database file. Then you can modify your entities, and when you start your program again, correct tables will be created automatically

Alexander

The tables will be created if db.generate_mapping call has option create_tables=True

Ehsan

I think I have no mapping cause I get this error:

Ehsan

ERDiagramError: No mapping was generated for the database

Alexander

after you define all models, in your main program you need to have the following line: db.generate_mapping(create_tables=True)

Alexander

This is the moment where Pony determines which tables are necessary to represent all specified entities, and checks that all tables are presented inside the database.

Ehsan

What did I miss? MappingError: Database object is not bound with a provider yet

Alexander

Before generate_mapping call, you need to specify database parameters to connect, such as host, login, password, etc. You can do it inside Database(...) constructor or inside db.bind(...) method. The parameters are specific to DBMS you use

Alexander

You can find pony/orm/examples/university1.py example and look at db.bind(...) and db.generate_mappping(...) calls

Ehsan

I just creasted one and run the db.generate_mapping(create_tables=True)

Ehsan

I got this error: ERDiagramError: Entity definition Usage was not found

Alexander

You have an attribute like something = Required("Usage") in some of your models, and don't have Usage class definition

Ehsan

Right!

Ehsan

Hi, I am trying again to delete a database and keep getting this error:

Ehsan

MappingError: Mapping was already generated

Ehsan

I already executed these commands:

Ehsan

db.generate_mapping(create_tables=True) db.drop_all_tables(with_all_data=True)

Ehsan

Found the solution, thanks!

Ehsan

Hi Ngalim! db.bind(...) is used to specify database type and parameters of database connection. db.generate_mapping(...) is used to notify Pony: "all necessary entities are defined". At this moment Pony understands which tables should be used to represent specified entities. It is possible to specify create_tables=True option to generate tables, but even if tables are already created, calling generate_mapping is necessary, because this is the moment when Pony creates internal linking between entities and tables. When application is imported, the sequence of operations should be the following: from pony import orm db = Database() class X(db.Entity): ... class Y(db.Entity): ... db.bind(...) db.generate_mapping(...) with db_session: result = select(...) That is, generate_mapping should be called after all entities are defined, but before first select is issued. If you want to put entities in a separate module, I suggest you to define db right in that module, but perform bind and generate_mapping outside of it. This way you separate entity definitions and database configuration options. It may looks something like that: models.py: db = Database() class Person(db.Entity): name = Required(str) controllers.py: from flask import Blueprint from model import db, Person employee_bp = Blueprint(...) main.py: from flask import Flask from controllers import employee_bp from models import db app = Flask(__name__) ... app.register_blueprint(employee_bp) if __name__ == '__main__': db.bind(...) db.generate_mapping() app.run(...)

Matthew

Is there a way to update one attibute on an entity without Pony validating that all other entities are the same? I think I asked this before but I'm afraid I can't remember the answer

Matthew

I have the primary key of the entity

Matthew

I don't need to load the full data of the entity, just one field, then update that one field

Matthew

x = X[id]

Matthew

x.y = x.y * 2

Matthew

is the equivalent of what I'm doing

Alexander

Matthew, if you are sure that the object with that id really exists, you can do the following: x = X._get_by_raw_pkval_((id,)) x.load(X.y) # or x.load("y") x.y = x.y * 2 But I doubt loading one attribute will be much faster then loading all the non-lazy attributes

Matthew

The problem is not the loading of attributes, the update query is extremely slow. This is Postgres being stupid with indexes but it would be nice to avoid the complex update query

Alexander

Can you explain in more details, what in the update query text causes the slowdown in your case? Is it optimistic checks or what?

Alexander

If you update just one column, then the update query should write this specific column only

Matthew

It is making sure that all other columns still have the same values, I think it is a pony sanity check?

Matthew

Like update x set y=5 where I'd=1 and z="existing value"

Alexander

It is to be sure that the value of z column was not modified by concurrent transaction. If your transaction do the following: a) retrieve x object b) read x.z value c) set x.y value to 5 Then it is reasonable to assume that the value of x.z attribute was somewhat important for calculating a new value of x.y attribute. If another concurrent transaction modified x.z before your current transaction saved new value of x.y, then Pony assumes that you probably need to recalculate x.y

Matthew

Ok, is there a way to bypass this without writing raw sql?

Matthew

If I don't read the attributes, will this protection not happen?

Alexander

Yes, if it is possible to not read irrelevant attributes in application code, then these attributes will not be added to optimistic checks.

Matthew

Aha cool

Matthew

I can cache those in redis easily

Matthew

I am comparing one object to another from the recent past

Alexander

Also it is possible to turn off optimistic checks by specifing optimistic flag in db_session: db_session(optimistic=False) But I consider it as a last resort option, because turning optimistic checks off may lead to lost updates from concurrent transactions

Alexander

I am comparing one object to another from the recent past

It seems you are not the only user who are doing something like that. Maybe we need to support some specific workflow for this use-case, I mean, comparing a bag of attribute values for the object with their past values

Alexander

Also, we can add another type of optimistic check. Instead of comparing all columns with their previous values, we can add support for version column which value will be incremented on each update. Then we need to compare the version column only. But the current approach has the benefit that concurrent transactions can update different columns x.a and x.b in parallel without any errors, whereas using version column can produce some false conflicts for such updates

Matthew

For me I am comparing the values of an object currently in the database with the values of an object that I am considering adding to the database, basically duplicate detection based on custom logic

Matthew

if the new data is considered duplicate, the values for new and existing are averaged for one column

Matthew

x.y = (new_x.y + x.py) / 2

Alexander

What you are considering duplicates? An objects with all attributes equal?

Matthew

most attributes equal

Alexander

So, in that case, id column is not really id

Matthew

the ID of the existing object stays static

Matthew

ID is one of the attributes that I filter out yeah

Alexander

Ok, and what is the types of other attribues of these objects? int, str, what else?

Alexander

So, all columns except x.y are natural key, and x.y is the counter?

Matthew

date, string, foreign key, int, float

Matthew

I am fine with just caching the existing object data in redis for my use case

Alexander

When you have new object you need to be sure that this object is not in the database and then insert it. So, you have query like x = X.get(a=a, b=b, c=c, ...) for all attributes except id and y?

Alexander

Then you need to have a composite key for all this attributes

Matthew

I am worried that would lead to a lot of resource usage, this table has 120 million rows

Matthew

so that would be a massive index for the composite key?

Matthew

Your suggestion is working, when I don't access the attributes of the existing object, it is doing a much more simple update query

Alexander

> so that would be a massive index for the composite key? Yes, but it should make select queries much faster and straightforward, and the queries probably will not lock another indexes, so the performance should be much faster at the expence of some disk space Initially I thought that you can add another unique column hash with md5 hash value of serialized value for combined list of columns used for search and comparison. Then the index should be relatively small in size. But it will work if the total list of columns used for search is fixed and will not be extended later, or you will need to re-calculate md5 hash on adding new attribute

Matthew

I never query on most of these columns, apart from duplicate detection

Alexander

But the question is, how ofthen you need to perform duplicate detection

Matthew

probably about 10 - 15 times per day per parent object, of which i have 6 figures

Matthew

so a few million times per day

Matthew

I calculated that 60% of the rows in my big table could be considered duplicate, so my incentive is to shrink that table for performance and resource usage

Alexander

I think having a single index to find a duplicate object should be a great boost for performance I think it is worth to try At first you can include parent attribute and all other attributes used for duplicate search (I hope it is always the same list of attributes for each query), and as a second option add indexed hash column with md5 hash of another column and use it for duplicate search

Matthew

Thank you, I will think about it.

Matthew

I have it with existing_object.to_dict() cached in redis now, which seems to work, so I will monitor it

Ehsan

Hi, what is the best approach for appending data into an Entity in a for loop? Is it like incrementing the id of an entity and then using Person[id].age=...?

Matthew

Create a new entity, Person(age=1) etc

Matthew

@akozlovsky I had an index on "date" when I also had an index on (product, date) which seemed to be confusing postgres. This seemed to be causing a lot of the issues. Maybe I don't need to worry about accessing previous object attributes now

Alexander

I think the first index is unnecessary