Lucky
Hey @metaprogrammer, did you have the time to look into registering an Entity to a Database at a later time?
I just got another use case, that I have a Users table I want to use in different projects.
from somewhere import User
class Post(later.Entity):
poster = Required(User)
text = Required(str)
# end class
db = Database('postgres://...')
User.register_db(db)
db.register(Post)
In that way I could init just that table if I don't already have it,
- without creating all the unneeded tables from the other project
- without having to duplicate the code and keeping it up to date through several (7+) projects
Lucky
I think it would require to move the registering logic from MetaEntity to a central Database.register() function, and calling that from MetaEntity.
Lucky
So basically like flask's init_app() and Blueprints
Alexander
Hi Luckydonald, I didn't have time to look at this yet, maybe on Wednesday we will have time to think about it.
There are several features that are interrelated and can be used in similar situations:
- Abstract entities to define common attributed like modification date, etc.
- Relationships which are defined using one side only, so you can link a new entity to previously defined User entity without adding something to it
- Extending of previously defined database schema with multiple entities
These things are interrelated and probably should be designed together
I think we can concentrate on it after the next release, otherwise it will be delayed
Lucky
Alexander
Guys, I have a question for all of you. Suppose the following query
select((p.id, p.name) for p in Person)
returns
[(1, 'John'), (2, 'Mike'), (3, 'John'), (4, 'Alex')]
What do you expect to be the result of
select(p.name for p in Person)
?
Matthew
a non-ordered list of unique names
Matthew
Trick question? :)
Vitaliy
non-ordered list of unique elements is a set. list can contain multiple elements and it is ordered
Alexander
In such queries Pony returns unique names, like:
['Alex', 'John', 'Mike']
But some users don't like this behavior, and want
['John', 'Mike', 'John', 'Alex']
instead as a default result. So, I want to think about it a bit more, should we change the current behavior or not.
The argument that users says for returning all elements without deduplication is that with current behavior aggregate functions give incorrect result, like, if you do
scholarship_values = select(s.scholarship for s in Student)
total = sum(scholarship_values)
The total result will be incorrect, because of removed duplicates. But in my opinion, aggregates should be calculated inside the query, where they can be calculated efficiently, and not after retrieving potentially big number of elements to Python.
I like the current behavior, because if we start returning elements with duplicates, the number of duplicates may be surprisingly big in non-trivial queries. For example, if we execute another query, like
select(p.name for p in Person for car in p.cars if car.price > 5000)
there will be additional duplicates if a single person owns several cars with a high price. And when query becomes even more complex it becomes very hard to predict how much duplicate values it contains, so it will be dangerous to calculate aggregate functions in Python over the result of a such query anyway.
So I tend to keep the current behavior. But maybe I'm miss something, and want to hear valid objections if we should keep all duplicates by default
Matthew
But in my opinion, aggregates should be calculated inside the query, - I agree
Matthew
if people want duplicates, they can use without_distinct()
Alexander
Ok, so you think the current default behavior is good
Matthew
Yep :)
Matthew
and I think it mirrors SQL?
Alexander
Not really, if you execute
SELECT p.name FROM person
It returns duplicate names, so Pony adds DISTINCT for such type of queries:
SELECT DISTINCT p.name FROM person
and this is the cause of controversy, some users want that Pony does not add DISTINCT automatically to any query
Vitaliy
Why not to add global variable like pony.MODE which can control this behavior?
Alexander
It is possible, but complicates design a bit. Should it be global or limited to db_session, can different components written under assumption of specific value of this flag be used together, etc
Alexander
At this moment I think the reason for adding such a flag is not strong enough
Matthew
Alexander
Maybe we can add context manager
with auto_distinct_disabled:
names = select(p.name for p in Person)
# do some other queries
It should be simple enough.
But I think use-cases where users want it contains some problems, like calculating aggregates in Python, etc.
Matthew
Please make a good pony style rust orm one day :)
Alexander
Ok, maybe I'll try some day :)
Matthew
http://diesel.rs/
Matthew
This is the current best one, pony is much nicer
stsouko
auto distinct is much better.
Lucky
Guys, I have a question for all of you. Suppose the following query
select((p.id, p.name) for p in Person)
returns
[(1, 'John'), (2, 'Mike'), (3, 'John'), (4, 'Alex')]
What do you expect to be the result of
select(p.name for p in Person)
?
I would expect to simply get the colums, including duplicates
Lucky
That’s even pythonic
Lucky
Like for example
>>> test = {"a": "foo", "b": "foo"}
>>> list(test.items())
[('a', 'foo'), ('b', 'foo')]
>>> list(test.values())
['foo', 'foo']
Lucky
And on sql side you have to explicitly add a distinct, too.
Lucky
It feels being against logic to just throw away results
Lucky
... and might explain some weird bugs I was seeing recently...
Lucky
Yeah, that's really error prone
Lucky
Was it always like this?
Lucky
In such queries Pony returns unique names, like:
['Alex', 'John', 'Mike']
But some users don't like this behavior, and want
['John', 'Mike', 'John', 'Alex']
instead as a default result. So, I want to think about it a bit more, should we change the current behavior or not.
The argument that users says for returning all elements without deduplication is that with current behavior aggregate functions give incorrect result, like, if you do
scholarship_values = select(s.scholarship for s in Student)
total = sum(scholarship_values)
The total result will be incorrect, because of removed duplicates. But in my opinion, aggregates should be calculated inside the query, where they can be calculated efficiently, and not after retrieving potentially big number of elements to Python.
I like the current behavior, because if we start returning elements with duplicates, the number of duplicates may be surprisingly big in non-trivial queries. For example, if we execute another query, like
select(p.name for p in Person for car in p.cars if car.price > 5000)
there will be additional duplicates if a single person owns several cars with a high price. And when query becomes even more complex it becomes very hard to predict how much duplicate values it contains, so it will be dangerous to calculate aggregate functions in Python over the result of a such query anyway.
So I tend to keep the current behavior. But maybe I'm miss something, and want to hear valid objections if we should keep all duplicates by default
But wouldn't you do
Select(
p.name for p in Person
if exists(
car for car in p.cars
if car.price > 5000
)
)
anyway?
Lucky
I always though I would get distinct values by using .distinct(<query>)
Vitaliy
I recently switched to Postgres (after reading a conversation in this chat about DBMSs) and also encountered a problem with sorting. In MySQL I used some hack in configuration files, but Postrges does not allow this.
I mean that it is not possible to order results via an attributes which is not included to SELECT statements when DISTINCT is used. I use in several entities a metadata JSON-field where defined something like order key with int values.
So in many places in my app I needed to either change query logic or to add without_distinct. Many queries works regardless DISTINCT, and in a few ones auto-distinct is really helpful.
I agree with @luckydonald that pony should be more pythonic in this case.
Matthew
Can you share the query that you have trouble sorting?
Vitaliy
I have refactored a code, but it should be as simple as this:
class Person(db.Entity):
name = Required(str)
order = Required(str)
select(p.name for p in Person).order_by('p.order')
And this should raise error like SELECT DISTINCT ON expressions must match initial ORDER BY expressions
Александр
Hi.
Tell me can I:
- Open session
- To get the employee from the database
- Closed session
- Make changes in the employee
- Save employee
Here's the employee model.
https://pastebin.com/jawqLFnH
Александр
When I try to do that, I get an error.
"the database session is over"
Anonymous
Why are you doing the changes after closing session? How are you doing it in the first place?
Александр
- I get employee.
- Showing it to the user
- User makes changes
- Saves employee
Anonymous
What I useually do is once I get the pony object,
Copy it to a plain python dict or object,
Show that to user,
User edit that,
inside db_session, get employee again, apply changes by assigning fields from python dict/object to pony object
then it is saved
Александр
Yes, I'm doing it now. But it is not convenient. Many unnecessary actions.
Anonymous
That is the only way I know unfortunately. and it checks out with all other ORMs I've used in many other languagues.
Maybe there is a better way for pony. Let's see if someone else know a better way. 🤔
Jim
In the second db_session,, I get the employee and update it via the set : employee.set(**dict_with_new_values)
Anonymous
Jim's solution is clearly more convenient. Nice. 😊
Anonymous
I want to ignore code coverage inside generator expressions of Pony. Is there a better way than this https://stackoverflow.com/questions/53280985/how-to-ignore-ponyorm-generator-expressions-in-nose-tests-coverage/53281148#53281148
Jim
https://github.com/nedbat/coveragepy/issues/685
Anonymous
Ned replied: https://stackoverflow.com/a/53541785/1355145
Matthew
I have a pony query which takes 38 seconds for pony to put into a list. the raw query which pony generates takes approx 3 seconds. therer are 650k results. Is this just python being slow, or is it likely I'm doing something wrong?
Matthew
It's a normal select(x for x in X)...
Matthew
no preloading etc
Matthew
Is there a built in way to see what pony is doing with that CPU time?
Matthew
the python process goes to 100% after a few seconds of postgres activity
Matthew
Nevermind, with a little refactoring I made it quite a bit faster
Matthew
I have a query like select((x, x.y) for x in X) , where y is a Required(Y)
Matthew
It's not loading the full attributes of x or x.y, is this a change in behaviour? I seem to remember this working in the past?
Matthew
When I access the results in a for loop, pony is doinig queries to get the full objects, but this seems very slow
Alexander
When the query looks like a select(x for x in X) Pony loads all columns from X
When query returns a tuple select((x, something_other) for x in X) Pony loads only primary key of X
The reason is to avoid loading duplicates. For example, if query looks like
select((x, y) for x in X for y in Y if ...)
or
select((x, y) for x in X for y in x.y)
then the column values of the same X object will repeat in miltiple rows of query result, and it lead to excess parsing of data. To avoid this, Pony selects only primary key value of each object when result is a tuple, and then issues addtional queries
select <columns> from X where id in (:p1, :p2, :p3, ...)
In this queries each object loads only once, so theoretically it should be more efficient
> It's not loading the full attributes of x or x.y, is this a change in behaviour? I seem to remember this working in the past?
No, in Pony it always worked this way. Recently there were some optimizations of prefetching logic, maybe they broke something
> When I access the results in a for loop, pony is doinig queries to get the full objects, but this seems very slow
It should not be slow, although I heard PostgreSQL have some performance problem with queries of type
select <columns> from table where id in <long list of ids>
Maybe this is the reason
Ø
hi guys, how the experience using Pony when doing CRUD
Micaiah
Hey all, long time no see. Working on a project and I'm wondering if this seems like a good application for Pony. Basically, I want to store schemas and instances of those schemas, and hopefully allow for fields in the schema to be other schemas.
Micaiah
I'm not sure how reasonable it will be trying to do meta-database stuff with Pony and if marshalling/validation will be a nightmare
Carel
@micaiahparker wait what are you doing ? I ask since it may have applicability to a personal project that’s been on my back burner for a while. Like are you pulling the schema down, building and then populating the DB ? What source are you using (xml+wsdl) ? There is the pony schema site, err... it’s navigable from the project site, is that kinda what you are doing ?
Carel
@micaiahparker Is it ok if I pm you ?
Micaiah
I want users to be able to register schemas via a REST api and then be able to submit objects into the db for that schema, as well as preform actions on those objects
Micaiah
Carel you can PM me
Александр
Hey.
Is it possible to remove the warning in PyCharm?
Without turning off the warning?
http://joxi.ru/BA0vQjWuMBEnbm.jpg
Anonymous
Don't think so. I find it annoying as well. But Python is a dynamic language and that generator expression isn't really executed in the first place.
Anonymous
because the generator expression is decompiled and the SQL is constructed.
Anonymous
Or am I wrong? 😮
Alexander
You are right
Alexander
In Pony each Entity class is an instance of EntityMeta metaclass. This metaclass has __iter__ method, but PyCharm doesn't understand it. Maybe it is because Pony specifies metaclass to Entity class in a special way, to make the code valid Python2 and Python3 syntax at the same time
Anonymous
Alexander
When generator is passed to a function, Python actually calls iter() on Entity, and it returns a special iterator which will throw an exception if somebody actually try to iterate over it. But Pony does not iterate this EntityIter object, and just analyze it instead
Anonymous
😮
Alexander
So, technically Entity subclasses are valid iterables
Jim
I never learned SQL, I have only used ORM (django, pony) and dealing with an existing database with pony is really really easy and fun. thanks !
Alexander
You are welcome