Anonymous

This is a part of the example from the documentation, here: https://docs.ponyorm.com/entities.html#hybrid-methods-and-properties def cars_by_color(self, color): return select(car for car in self.cars if car.color == color)Here it takes a color as an argument, so it obviously can't have @property decorator.

Anonymous

But let's try to make the following scheme: class Person(db.Entity): cars = Set('Car') @property def first_car(self): return select(car for car in self.cars).order_by(lambda c: c.purchase_year).first() class Car(db.Entity): make = Required(str) owner = Required(Person) purchase_year = Required(int)And let's add some objects: john = Person() ford = Car(make='Ford', owner=john, purchase_year=2014) toyota = Car(make='Toyota', owner=john, purchase_year=2016) commit()Then, the following doesn't work: print(john.first_car)It raises TypeError: Query can only iterate over entity or another query (not a list of objects).

Anonymous

We could change our property a bit and change self.cars to self.cars.select(), like this: class Person(db.Entity): cars = Set('Car') @property def first_car(self): return select(car for car in self.cars.select()).order_by(lambda c: c.purchase_year).first()Then it works as expected, but is inconsistent with the docs.

Alexander

At this moment, if you have some specific object retrieved from the database: p1 = Person[1] you cannot start a query by iterating over collection: select(c for c in p1.cars if something) You need to iterate over entity: select(c foor c in Car if c.owner == p and something) This is because in Python the source of generator evaluated before generator itself, so when a query is written as: select(c for c in p1.cars if something) Then Python at first evals iter(p1.cars) and then pass it as an argument to generator object. So, iter(p1.cars) is not lazy, and retrieves list of cars before generator can be analyzed On the other side, for second and the following for-loops inside generator Python does not evaluated iterator beforehand, so Pony can inspect the source of for loop and translate it correctly. So the following is possible: select(p for p in Person for c in p.cars if something) Here p is not some already retrieved Person instance, but just abstract Person object inside a generator, so p.cars is just abstract collection and not some specific list of objects. So, if you want to apply first_car property to an object already retrieved from the database, you need to rewrite it a bit: @property def first_car(self): return select(car for car in Car if car.owner == self).order_by(lambda c: c.purchase_year).first() It is possible that in the future we can make iter(john.cars) a lazy object, and then your current form will also work both inside and outside of a query. Sorry, I need to go offline until Monday. Stsouko, thanks for the suggestion, I'll respond you later

Anonymous

Alright, I get it, it's just the documentation makes the same mistake I did – it would be nice to have this fixed so it doesn't confuse anyone else. Also, if I'm correct, having just self.cars.select() works the same and it doesn't break anything. Just consider it when you'll have a moment. And have some rest on the weekend! :-)

Alexander

Ah, ok, I understand now what you mean. I agree, we need to fix it in the documentation #todo

Alexander

Yes, self.cars.select() should work too

Alexander

Thanks, have a good weekend!

Alexander

I would like to offer a new feature. raw data types. operators to which can be choosen from the available list

In my experience, it is a non-trivial task to add a new data type. It requires to implement a converters between Python and DBAPI values, and also write a complex code for query translator. For mutable types (like arrays) the correct code is even more complex, because array value should be represented as a proxy object who tracks changes in situations like Person[1].phone_array.append(new_phone) and notifies parent object that it needs to be saved. I doubt we can provide an universal "raw data type" support. But implementation of PostgreSQL arrays support should be similar to JSON, so maybe we can just add PostgreSQL arrays. Usually we add types which can be implemented in all databases, but here we can make an exception. We even already have a corresponding issue: https://github.com/ponyorm/pony/issues/243

stsouko

I will be glad if you implement an array support in postgres. perhaps in oracle also presented such data type

stsouko

arrays are often used in search tasks. for example in image databases

Alexander

Alright. Let's consider a following scheme: class A(db.Entity): b = Required('B') c = Required('C') xs = Set('X') PrimaryKey(b, c) class B(db.Entity): id = PrimaryKey(int, auto=True) asses = Set(A) class C(db.Entity): id = PrimaryKey(int, auto=True) asses = Set(A) class X(db.Entity): a = PrimaryKey(A) something = Optional(str)

> It's not correct. Since the primary key of object X is an A object, the primary key of x1 should be a1, or rather primary keys of a1. So the expected value would be ((1, 1),) Krzysztof, the problem is, "primary key" meaning may be ambiguous. Consider the following models: db = Database('sqlite', ':memory:') class A(db.Entity): b = Required("B") c = Required("C") PrimaryKey(b, c) class B(db.Entity): id = PrimaryKey(int) a_set = Set(A) class C(db.Entity): x = Required("X") y = Required("Y") a_set = Set(A) PrimaryKey(x, y) class X(db.Entity): id = PrimaryKey(int) c_set = Set(C) class Y(db.Entity): id = PrimaryKey(int) c_set = Set(C) db.generate_mapping(create_tables=True) with db_session(): x1 = X(id=456) y1 = Y(id=789) b1 = B(id=123) c1 = C(x=x1, y=y1) a1 = A(b=b1, c=c1) flush() print(a1) # A[B[123], C[X[456], Y[789]]] A primary key of A instance would mean the following things: 1) A tuple consisting of another entities (B[123], C[X[456], Y[789]]) - this is a "logical" (high-level) primary key, which PonyORM uses internally in db session cache 2) A flat tuple consisting of raw column values (123, 456, 789) - this is a "physical" (low-level) primary key which corresponds to values of foreign key columns. PonyORM uses it when constructing SQL queries 3) A potentially nested tuple of raw column values (123, (456, 789)) - this is what you want You can get (1) using a1._pkval_ and (2) using a1.get_pk(). In order to get (3) you can get a1._pkval_ and process it recursively. Pony does not provide a built-in way to obtain (3) because there was no popular use-case where (3) is necessary

Anonymous

I know, but it just feels wrong when in some cases Entity[e.get_pk()] works, and sometimes it fails. If Pony uses (2) when returning primary key, it should accept it when someone tries to access an entity using that key.

Anonymous

In the case I described, Pony accepted tuple ((1, 1),) and returned a correct object, even though it itself returned another primary key. So it accepts (3), but returns (2), which just *feels* incorrect.

Alexander

> If Pony uses (2) when returning primary key, it should accept it when someone tries to access an entity using that key Sounds logical. I'll think how can we implement it...

Anonymous

If I may, the (3) actually would be better, because it would prevent ambiguities. I feel that they may arise, but I would need some time to find an example.

Anonymous

And Pony accepts (3) already, so making it return (3) as well would be more logical, I think. What do you think?

Alexander

(2) is useful, because can be directly used in SQL. Composite foreign key in SQL is a plain sequence of columns without any nesting. > If I may, the (3) actually would be better, because it would prevent ambiguities. I feel that they may arise, but I would need some time to find an example. Actually I don't think there are any ambiguites in (2). In the end, SQL foreign key is flat for any primary key > And Pony accepts (3) already, so making it return (3) as well would be more logical, I think. What do you think? (3) looks as not as useful comparing to (2), as it requires some additional processing when working with SQL. Maybe I can (a) add nested=True keyword argument to .get_pk() to force it return (3) and (b) improve Entity.__getitem__ to support (2), but I don't understand in details how to do it yet

Matthew

Hi @jake23247 !

Anonymous

@metaprogrammer, I'm sorry to keep you bothered with this, but in the face of new thoughts, I feel I should respond to you anyway.

Anonymous

I still believe the nested key would be the better option. I know that the flat ones can't be ambiguous, so it's no longer about that. But I feel that some *cognitive* ambiguities may still arise. If we have (123, 456, 789) in the code (let's forget for a minute that having it hard-coded would be a bad idea), one may begin to wonder: does that object really has 3 primary keys? Maybe only two, and one of them is an object with two integers as a primary key? Or maybe there is only one object in the primary key, but that deeper object has three primary keys? And so on. I know that developer should know what it's all about, but sometimes it may be confusing. “In the case of ambiguity, refuse the temptation to guess” – but the temptation could be strong here.

Anonymous

About the temptation to guess: the second reason is that having __getitem__ accept flat keys would make the Pony „guess” what developer had in mind. Let's suppose developer passed (123, 456, 789) as an argument. Now that could mean, for example, ((123, 456), 789) (a) or (123, (456, 789)) (b) in the terms of nested keys. What if (a) would be the correct structure, but developer had (b) in mind (because he was mistaken about which element has nested keys, first or second)? Now that will silently pass and return some item if it exists. But this item can't be what the developer (erroneously) asked for. In this case, one advantage of the nested keys would be that Pony could complain about having two elements of primary keys where only one can exist. That could be spotted in tests, but in case of flat keys, it would most probably arise at runtime.

Anonymous

The third reason would be a really nice correlation with the representation of an object. In a repr result, we can clearly see that the keys are nested. Having the same from get_pk would be always obvious. Also, “explicit is better than implicit” – here I would see explicit as explicitly pointing out that the keys are nested, without having to pass additional parameter to a function.

Anonymous

Fourth reason: unclear rules of flat representation. If we have 123, an integer, what would the primary key be? The obvious solution would be the same integer, 123. But let's go further with this. Since the nested key (123, (456,)) would be represented as (123, 456) in the flat form, the (123,) tuple (primary key is the object which has single integer as its key) could be also flattened to the single integer, so just 123. But then we lose the possibility of comparing the type of a returned key, which can be convenient sometimes. But what if (123,) tuple won't be flattened at all? Well, then we would have an exception. I don't feel this would be good.

Anonymous

Your point about flattened keys being more practical with SQL doesn't convince me at all. Pony is already a pretty big layer of abstraction, why should we make exceptions from this in small things? I agree, it may be more convenient to get flat key when further passing it to raw SQL. But Pony already lacks many features that would make converting to SQL easier, and I don't think it's a bad thing.

Anonymous

I think that's it. But one more thing, probably the most important of this all.

Anonymous

I don't write all this because I disagree that strongly with you. I won't stop using Pony if you decide to use flat keys in the end, and I will still reach out to you in case of any future doubts (provided it won't be too much for you).

Anonymous

The reason why I decided to elaborate so much on this, and why I try to notify you whenever I spot something odd, is that Pony really puts a smile on my face, and that is not an exaggeration. Working with Pony is a true joy that makes me want to experiment with different things. In SQL, I experimented because I had to, because it was always a struggle for me. So Pony is a really great thing for me, and I feel that one way I can give back is to provide you with feedback whenever I feel something is wrong, so you could make Pony an even greater tool.

Anonymous

So, whatever you decide will be fine, and Pony will continue to be great no matter what.

Anonymous

I may be a bit wrong in some of the above, or completely wrong – I would appreciate if you would take some time to consider what I wrote and give me some feedback. But you don't have to rush it.

Alexander

We already did fix for that moment. So you can use a = A.select().first() A[a.get_pk()]

Alexander

Alexander, we didn't publish the fix yet

Alexander

Yes, but he should know that we already solve the problem)

Alexander

Hi Krzystsof! Thank you for the full and detailed answer! I will answer a little later, maybe tomorrow

Alexander

And he will be able to use it soon

Alexander

Actually, our fix is a bit different

Jake A

Hi @matthewrobertbell

Jorge

Hi! There is a fast way to copy or clone a row/object changing only the id? Just a simple (and wrong) graphical explanation what I want: customer = Customer[1] customer.id = 2, Customer(customer), that's because i have to save new entities to the db but im busy to write all the required atts for different tables, because im only need different entities, the values doesn't matter. I have this idea just for curiosity, the time i spend writing and searching i would have written all the code that i needed xD

Alexander

There is no ready clone method for entities If you have a dict with values, you can use it several times: obj1 = MyEntity(**kwargs) obj2 = MyEntity(**kwargs)

Jorge

There is no ready clone method for entities If you have a dict with values, you can use it several times: obj1 = MyEntity(**kwargs) obj2 = MyEntity(**kwargs)

hmm okey xD, i have to write the values manually at lease once

Jorge

xDD

Alexander

Dont forget to pop primary key

Jorge

Dont forget to pop primary key

yep!

Jim

and this : obj2 = MyEntity(**obj1.to_dict(exclude=['id']))

Jorge

and this : obj2 = MyEntity(**obj1.to_dict(exclude=['id']))

damn to_dict xDDDDD

Lucky

In my experience, it is a non-trivial task to add a new data type. It requires to implement a converters between Python and DBAPI values, and also write a complex code for query translator. For mutable types (like arrays) the correct code is even more complex, because array value should be represented as a proxy object who tracks changes in situations like Person[1].phone_array.append(new_phone) and notifies parent object that it needs to be saved. I doubt we can provide an universal "raw data type" support. But implementation of PostgreSQL arrays support should be similar to JSON, so maybe we can just add PostgreSQL arrays. Usually we add types which can be implemented in all databases, but here we can make an exception. We even already have a corresponding issue: https://github.com/ponyorm/pony/issues/243

+1 on that

Grigory

@metaprogrammer , could you please look into https://github.com/ponyorm/pony/issues/390 ? It is really important for our project.

Alexander

Hi Grigori! I'll look into it

stsouko

Hello! Is it possible iteratively select from db large amount of rows?

stsouko

pagination for some queries very slow

Matthew

Pagination should be fast, are you doing a query that uses an index properly?

Alexander

Matthew, pagination may be slow with very large offset, as the database needs to find and skip all previous rows

Alexander

stsouko, what ORDER BY do you use with paginated query? Is it order by id, or by some complex set of columns?

stsouko

yes. but I use select of arrays similar to query array or superset of query array

stsouko

ordering by id pk

Matthew

Ah good to know :)

stsouko

for similarity search I use smlar extension.

stsouko

which has GIN or GIST insdex

stsouko

superset is builtin

stsouko

superset also has GIN index

Alexander

> ordering by id pk Instead of using offset, you can add condition and id > last_id, where last_id is the last id from the previous result page

stsouko

in tests I found what search on arrays not cached. every query need many time.

Alexander

Grigori, I think I fixed #390

Grigory

@metaprogrammer , thanks! Are you going to do a bugfix release anytime soon?

Alexander

I'll try to do a release tomorrow

Grigory

wow! That's a tight schedule!

Grigory

@metaprogrammer , I wonder if it is possible to use SQLite "Views" feature to speed-up some queries? Is it possible to work with Views in Pony?

Grigory

And BTW, what about indexes?

Alexander

You can define a view inside a database and then specify a view name as a table name in entity: class MyView(db.Entity): table = 'view1' foo = Required(int) bar = Optional(str) If a view uses aggregation or joins it will be read-only, so you should not attempt to modify attributes of entity

Alexander

Can you specify your question about indexes in more details?

Grigory

We have to add a lot of data to our database in big chunks. The database can already contain, say 1M entries. Then we want to add 1M more, in batches of 1-100K. The problem is, for each entry there is an unique signature, and we have to check if the entry with the same signature is already in the database. My initial idea was to use unique modifier for the signature field. I presume it automatically creates an index for the respective field. However, I'm not sure how Pony checks the existence of entries by field, if these entries were not commited to the database. Does it use linear scan? Or a temporary hashtable? What is the recommended way to solve this problem (adding a lot of entries with check for uniqueness)?

Alexander

You can specify unique index to column in one of the following way: class MyEntity(db.Entity): foo = Required(str, unique=True) bar = Optional(str, unique='index_name') In the former attribute, the index name will be defined automatically (for SQLite it will be unq_MyEntity_foo). If index with this name is not exists in the database it will be added automatically if you call generate_mapping with option create_tables=True) In the latter attribute index name are given explicitly. This is useful if you already have index with that name in the database. If you just write index=True, then Pony will create a new index with a "canonical" name, and having two duplicate indexes will slow down inserts/updates Inside db_session Pony has in-memory index for each unique database index which is known to Pony. So, if you add several objects inside the same db_session with the same value of an unique attribute, Pony will produce error immediately aven the creation of the second object in memory, even before INSERTs were performed. If you already have a row inside a database with a specific value of an unquie column, and then create a new object with the same value of this column, then, if previous object was not loaded in current db_session, Pony will not try immediately search a previous object with this unique value, because it will lead to unnecessary SELECT queries. Instead, Pony will try to do INSERT in a usual way on flush/commit, and then the database will throw exception on foreign key violation. If you want to know beforehand that such unique value exists, you can check it in the following way: if not MyEntity.exists(unique_attr=value): obj2 = MyEntity(unique_attr=value, other_attr=value2) else: print(value, 'already exists') But it will load previous object into memory, so it may be more efficient to just create a new object and catch error later on commit When you do MyEntity.exists(foo=bar), Pony will execute a query SELECT * FROM MyEntity WHERE foo = ? If foo column have an unique index, this query will be very fast. The resulted object will be cached in db_session and added to in-memory index, so the second call of MyEntity.exists will find object in db_session cache and will not execute the same query one more time

Alexander

In short, if you define an attribute as unique, everything should be good

Alexander

in-memory db_session cache for each unique index is a dict with attribute value as a key and object as a value. For composite indexes the dict key is a tuple of attribute values

Grigory

Wow! Thanks for the detailed answer!

Grigory

I'll cite this text on a relevant GitHub issue in our repository, if you don't mind.

Alexander

Of course

Grigory

But how exactly db_session wrappers work when invoked in a nested functions? For example: @db_session def f1(): return Entry() @db_session def f2(): lst = [] for _ in xrange(0,10): lst.append(f1()) How many commits and flush`es will be there? One or 10? Or 11? Does `db_session wrapper knows that it is called from another db_session and trust upper layers sessions to do the actual commits?

Alexander

Inner db_sessions are ignored, except sql_debug option

Grigory

hmmm... In one of my tests removing db_session wrappers from some of the lower leverl functions increased performance about 10 times!