Lucky

Hey @metaprogrammer, did you have the time to look into registering an Entity to a Database at a later time? I just got another use case, that I have a Users table I want to use in different projects. from somewhere import User class Post(later.Entity): poster = Required(User) text = Required(str) # end class db = Database('postgres://...') User.register_db(db) db.register(Post) In that way I could init just that table if I don't already have it, - without creating all the unneeded tables from the other project - without having to duplicate the code and keeping it up to date through several (7+) projects

Lucky

I think it would require to move the registering logic from MetaEntity to a central Database.register() function, and calling that from MetaEntity.

Lucky

So basically like flask's init_app() and Blueprints

Alexander

Hi Luckydonald, I didn't have time to look at this yet, maybe on Wednesday we will have time to think about it. There are several features that are interrelated and can be used in similar situations: - Abstract entities to define common attributed like modification date, etc. - Relationships which are defined using one side only, so you can link a new entity to previously defined User entity without adding something to it - Extending of previously defined database schema with multiple entities These things are interrelated and probably should be designed together I think we can concentrate on it after the next release, otherwise it will be delayed

Lucky

Hi Luckydonald, I didn't have time to look at this yet, maybe on Wednesday we will have time to think about it. There are several features that are interrelated and can be used in similar situations: - Abstract entities to define common attributed like modification date, etc. - Relationships which are defined using one side only, so you can link a new entity to previously defined User entity without adding something to it - Extending of previously defined database schema with multiple entities These things are interrelated and probably should be designed together I think we can concentrate on it after the next release, otherwise it will be delayed

Sounds good!

Alexander

Guys, I have a question for all of you. Suppose the following query select((p.id, p.name) for p in Person) returns [(1, 'John'), (2, 'Mike'), (3, 'John'), (4, 'Alex')] What do you expect to be the result of select(p.name for p in Person) ?

Matthew

a non-ordered list of unique names

Matthew

Trick question? :)

Vitaliy

non-ordered list of unique elements is a set. list can contain multiple elements and it is ordered

Alexander

In such queries Pony returns unique names, like: ['Alex', 'John', 'Mike'] But some users don't like this behavior, and want ['John', 'Mike', 'John', 'Alex'] instead as a default result. So, I want to think about it a bit more, should we change the current behavior or not. The argument that users says for returning all elements without deduplication is that with current behavior aggregate functions give incorrect result, like, if you do scholarship_values = select(s.scholarship for s in Student) total = sum(scholarship_values) The total result will be incorrect, because of removed duplicates. But in my opinion, aggregates should be calculated inside the query, where they can be calculated efficiently, and not after retrieving potentially big number of elements to Python. I like the current behavior, because if we start returning elements with duplicates, the number of duplicates may be surprisingly big in non-trivial queries. For example, if we execute another query, like select(p.name for p in Person for car in p.cars if car.price > 5000) there will be additional duplicates if a single person owns several cars with a high price. And when query becomes even more complex it becomes very hard to predict how much duplicate values it contains, so it will be dangerous to calculate aggregate functions in Python over the result of a such query anyway. So I tend to keep the current behavior. But maybe I'm miss something, and want to hear valid objections if we should keep all duplicates by default

Matthew

But in my opinion, aggregates should be calculated inside the query, - I agree

Matthew

if people want duplicates, they can use without_distinct()

Alexander

Ok, so you think the current default behavior is good

Matthew

Yep :)

Matthew

and I think it mirrors SQL?

Alexander

Not really, if you execute SELECT p.name FROM person It returns duplicate names, so Pony adds DISTINCT for such type of queries: SELECT DISTINCT p.name FROM person and this is the cause of controversy, some users want that Pony does not add DISTINCT automatically to any query

Vitaliy

Why not to add global variable like pony.MODE which can control this behavior?

Alexander

It is possible, but complicates design a bit. Should it be global or limited to db_session, can different components written under assumption of specific value of this flag be used together, etc

Alexander

At this moment I think the reason for adding such a flag is not strong enough

Matthew

if people want duplicates, they can use without_distinct()

i think this is enough control for people

Alexander

Maybe we can add context manager with auto_distinct_disabled: names = select(p.name for p in Person) # do some other queries It should be simple enough. But I think use-cases where users want it contains some problems, like calculating aggregates in Python, etc.

Matthew

Please make a good pony style rust orm one day :)

Alexander

Ok, maybe I'll try some day :)

Matthew

http://diesel.rs/

Matthew

This is the current best one, pony is much nicer

stsouko

auto distinct is much better.

Lucky

Guys, I have a question for all of you. Suppose the following query select((p.id, p.name) for p in Person) returns [(1, 'John'), (2, 'Mike'), (3, 'John'), (4, 'Alex')] What do you expect to be the result of select(p.name for p in Person) ?

I would expect to simply get the colums, including duplicates

Lucky

That’s even pythonic

Lucky

Like for example >>> test = {"a": "foo", "b": "foo"} >>> list(test.items()) [('a', 'foo'), ('b', 'foo')] >>> list(test.values()) ['foo', 'foo']

Lucky

And on sql side you have to explicitly add a distinct, too.

Lucky

It feels being against logic to just throw away results

Lucky

... and might explain some weird bugs I was seeing recently...

Lucky

Yeah, that's really error prone

Lucky

Was it always like this?

Lucky

In such queries Pony returns unique names, like: ['Alex', 'John', 'Mike'] But some users don't like this behavior, and want ['John', 'Mike', 'John', 'Alex'] instead as a default result. So, I want to think about it a bit more, should we change the current behavior or not. The argument that users says for returning all elements without deduplication is that with current behavior aggregate functions give incorrect result, like, if you do scholarship_values = select(s.scholarship for s in Student) total = sum(scholarship_values) The total result will be incorrect, because of removed duplicates. But in my opinion, aggregates should be calculated inside the query, where they can be calculated efficiently, and not after retrieving potentially big number of elements to Python. I like the current behavior, because if we start returning elements with duplicates, the number of duplicates may be surprisingly big in non-trivial queries. For example, if we execute another query, like select(p.name for p in Person for car in p.cars if car.price > 5000) there will be additional duplicates if a single person owns several cars with a high price. And when query becomes even more complex it becomes very hard to predict how much duplicate values it contains, so it will be dangerous to calculate aggregate functions in Python over the result of a such query anyway. So I tend to keep the current behavior. But maybe I'm miss something, and want to hear valid objections if we should keep all duplicates by default

But wouldn't you do Select( p.name for p in Person if exists( car for car in p.cars if car.price > 5000 ) ) anyway?

Lucky

I always though I would get distinct values by using .distinct(<query>)

Vitaliy

I recently switched to Postgres (after reading a conversation in this chat about DBMSs) and also encountered a problem with sorting. In MySQL I used some hack in configuration files, but Postrges does not allow this. I mean that it is not possible to order results via an attributes which is not included to SELECT statements when DISTINCT is used. I use in several entities a metadata JSON-field where defined something like order key with int values. So in many places in my app I needed to either change query logic or to add without_distinct. Many queries works regardless DISTINCT, and in a few ones auto-distinct is really helpful. I agree with @luckydonald that pony should be more pythonic in this case.

Matthew

Can you share the query that you have trouble sorting?

Vitaliy

I have refactored a code, but it should be as simple as this: class Person(db.Entity): name = Required(str) order = Required(str) select(p.name for p in Person).order_by('p.order') And this should raise error like SELECT DISTINCT ON expressions must match initial ORDER BY expressions

Александр

Hi. Tell me can I: - Open session - To get the employee from the database - Closed session - Make changes in the employee - Save employee Here's the employee model. https://pastebin.com/jawqLFnH

Александр

When I try to do that, I get an error. "the database session is over"

Anonymous

Why are you doing the changes after closing session? How are you doing it in the first place?

Александр

- I get employee. - Showing it to the user - User makes changes - Saves employee

Anonymous

What I useually do is once I get the pony object, Copy it to a plain python dict or object, Show that to user, User edit that, inside db_session, get employee again, apply changes by assigning fields from python dict/object to pony object then it is saved

Александр

Yes, I'm doing it now. But it is not convenient. Many unnecessary actions.

Anonymous

That is the only way I know unfortunately. and it checks out with all other ORMs I've used in many other languagues. Maybe there is a better way for pony. Let's see if someone else know a better way. 🤔

Jim

In the second db_session,, I get the employee and update it via the set : employee.set(**dict_with_new_values)

Anonymous

Jim's solution is clearly more convenient. Nice. 😊

Anonymous

I want to ignore code coverage inside generator expressions of Pony. Is there a better way than this https://stackoverflow.com/questions/53280985/how-to-ignore-ponyorm-generator-expressions-in-nose-tests-coverage/53281148#53281148

Jim

https://github.com/nedbat/coveragepy/issues/685

Anonymous

Ned replied: https://stackoverflow.com/a/53541785/1355145

Matthew

I have a pony query which takes 38 seconds for pony to put into a list. the raw query which pony generates takes approx 3 seconds. therer are 650k results. Is this just python being slow, or is it likely I'm doing something wrong?

Matthew

It's a normal select(x for x in X)...

Matthew

no preloading etc

Matthew

Is there a built in way to see what pony is doing with that CPU time?

Matthew

the python process goes to 100% after a few seconds of postgres activity

Matthew

Nevermind, with a little refactoring I made it quite a bit faster

Matthew

I have a query like select((x, x.y) for x in X) , where y is a Required(Y)

Matthew

It's not loading the full attributes of x or x.y, is this a change in behaviour? I seem to remember this working in the past?

Matthew

When I access the results in a for loop, pony is doinig queries to get the full objects, but this seems very slow

Alexander

When the query looks like a select(x for x in X) Pony loads all columns from X When query returns a tuple select((x, something_other) for x in X) Pony loads only primary key of X The reason is to avoid loading duplicates. For example, if query looks like select((x, y) for x in X for y in Y if ...) or select((x, y) for x in X for y in x.y) then the column values of the same X object will repeat in miltiple rows of query result, and it lead to excess parsing of data. To avoid this, Pony selects only primary key value of each object when result is a tuple, and then issues addtional queries select <columns> from X where id in (:p1, :p2, :p3, ...) In this queries each object loads only once, so theoretically it should be more efficient > It's not loading the full attributes of x or x.y, is this a change in behaviour? I seem to remember this working in the past? No, in Pony it always worked this way. Recently there were some optimizations of prefetching logic, maybe they broke something > When I access the results in a for loop, pony is doinig queries to get the full objects, but this seems very slow It should not be slow, although I heard PostgreSQL have some performance problem with queries of type select <columns> from table where id in <long list of ids> Maybe this is the reason

Ø

hi guys, how the experience using Pony when doing CRUD

Micaiah

Hey all, long time no see. Working on a project and I'm wondering if this seems like a good application for Pony. Basically, I want to store schemas and instances of those schemas, and hopefully allow for fields in the schema to be other schemas.

Micaiah

I'm not sure how reasonable it will be trying to do meta-database stuff with Pony and if marshalling/validation will be a nightmare

Carel

@micaiahparker wait what are you doing ? I ask since it may have applicability to a personal project that’s been on my back burner for a while. Like are you pulling the schema down, building and then populating the DB ? What source are you using (xml+wsdl) ? There is the pony schema site, err... it’s navigable from the project site, is that kinda what you are doing ?

Carel

@micaiahparker Is it ok if I pm you ?

Micaiah

I want users to be able to register schemas via a REST api and then be able to submit objects into the db for that schema, as well as preform actions on those objects

Micaiah

Carel you can PM me

Александр

Hey. Is it possible to remove the warning in PyCharm? Without turning off the warning? http://joxi.ru/BA0vQjWuMBEnbm.jpg

Anonymous

Don't think so. I find it annoying as well. But Python is a dynamic language and that generator expression isn't really executed in the first place.

Anonymous

because the generator expression is decompiled and the SQL is constructed.

Anonymous

Or am I wrong? 😮

Alexander

You are right

Alexander

In Pony each Entity class is an instance of EntityMeta metaclass. This metaclass has __iter__ method, but PyCharm doesn't understand it. Maybe it is because Pony specifies metaclass to Entity class in a special way, to make the code valid Python2 and Python3 syntax at the same time

Anonymous

aha !

Alexander

When generator is passed to a function, Python actually calls iter() on Entity, and it returns a special iterator which will throw an exception if somebody actually try to iterate over it. But Pony does not iterate this EntityIter object, and just analyze it instead

Anonymous

😮

Alexander

So, technically Entity subclasses are valid iterables

Jim

I never learned SQL, I have only used ORM (django, pony) and dealing with an existing database with pony is really really easy and fun. thanks !

Alexander

You are welcome