@ponyorm - страница 150 - Telegram web archive

Anonymous

Is there a way, this will sound very clueless maybe, but is there a way to check when the db is loaded if the schema has changed? That is, I'm still working out how this will all look, and the model is changing, but do I need to assume that the db should be recreated every run?

Alexander

db.generate_mapping() has check_tables=True option by default, it should do simple query select column1, column2, ... from table1 where 0=1 to be sure that the table has all necessary columns. It does not check column types though

Anonymous

Okay. I appreciate the help.

Evgeniy

Hello! I would like to create a wrapper object for the pony query set functionality that behaves indistinguishably from the original in all cases except those that I will redefine. In _ _ init__, it must accept the original query object, and I planned to modify access to individual attributes via getattribute and _ _ setattr__. But I ran into the problem that, for example, select() recognizes my fake object because it checks types. How do I bypass the select () protection?

Alexander

I'm not sure I understand what do you want to achieve

Evgeniy

The code: https://pastebin.com/RjST0gPV

Evgeniy

I expect this to work as a query object, only len() will work via count()

Evgeniy

Maybe I can solve the problem by inheriting from one of the internal Pony types?

Evgeniy

When I try to use an object of this class in the select () function, I get the error " TypeError: 'Multiple Log' object is not iterable". As I understand it, the reason is the internal Pony type checking + stack trace trimming that the Pony engine performs.

Evgeniy

Am I wrong?

Alexander

> When I try to use an object of this class in the select () function Can you show an example?

Evgeniy

Sure

Evgeniy

One minute

Evgeniy

I'm using a wrapper around select() from Pony that looks like this: https://pastebin.com/RjY5TcGv.

Evgeniy

Here is the code where it is used:

Evgeniy

with db_session: all = select_items(x for x in logs) auto_logs = select_items(x for x in all if x.auto == True)

Evgeniy

The first request is made without errors, the second (already with my wrapper) falls with an error.

Evgeniy

logs in the example is a model class.

Evgeniy

This worked when I returned from select_items() not a wrapper, but the original set of objects obtained from Pony ORM select().

Evgeniy

Problem solved! I added this method:

Evgeniy

def __iter__(self): result = self._logs.__iter__() return result

Alexander

Honestly, I don't think this is a good idea to use classes like this. You are basically trying to monkey-patch Pony internals with this, without full understanding how query translation works. This is a guaranteed way to get problems. When Pony translates some expressions to SQL, it doesn't actually call methods and functions inside generator expression, it translates them to SQL. To do this, it uses monads, which understand how to translate any individual expression. Pony knows about types which can be translated to SQL, and have corresponding monads for these types. If you make you own class like MultipleLog, and redefine some methods inside it (like __len__), the will be no corresponding monad which can handle translation. For usual classes which cannot be translated, Pony throws some appropriate exception which explains what part of query cannot be translated. But as you do some monkey-patching with wrapping Pony own classes like Query with another classes, the result is unpredictable.

Alexander

I think you should have clear separation between outer logger code and internal Pony code, so for each object it should be clear does it belong to Pony or to your logging library

Alexander

Also, as a minor point, I see that you wrap methods like __len__ with db_session @db_session def __len__(self): result = count(self._logs) return result In my opinion db_session should never be applied to individual methods, it should wrap whole logical transaction which may include many method calls

Evgeniy

I don't like wrapping Pony functions myself, but I don't know how else to achieve the goal: encapsulate the access interface as much as possible. The library user should not think about db_session or anything else.

Alexander

From reading your API documentation I have an impression that the library user will never interact with query results directly. Any interaction of logging library with Pony is encapsulated in a separate thread, so user is indeed should not be worried about logger library internals. It seems to me that it is not necessary to have wrappers like MultipleLog to achieve full encapsulation of the logger's storage layer https://github.com/pomponchik/polog

Permalink Bot

From reading your API documentation I have an impression that the library user will never interact with query results directly. Any interaction of logging library with Pony is encapsulated in a separate thread, so user is indeed should not be worried about logger library internals. It seems to me that it is not necessary to have wrappers like MultipleLog to achieve full encapsulation of the logger's storage layer https://github.com/pomponchik/polog

Permanent link to the pomponchik/polog project you mentioned. (?)

Evgeniy

This is a new feature that I was going to add. The user must be able to interact with logs inside the program.

Evgeniy

The idea is that it should look simple to the user: import the wrapper for select () + import the model class, and that's it, you can make queries.

Evgeniy

Simple.

Evgeniy

At the same time, I plan to continue writing to the database without any modifications or wrappers around Pony.

Evgeniy

Wrappers are read-only for the result from the database, for user convenience.

Alexander

I think you need to design some API for that, using one of two options (or have both of them for different use-cases): 1) an API for log viewing is totally distinct from Pony. It may return log result as JSON. The benefit is simplicity and total separation, the drawback is the lack of advanced filtering options. So user can do logs_json = logger.get_log(start_time=t1, end_time=t2, exception_type=ZeroDivisionError) for row in log_json: print(row["function"]) 2) an API should expose Pony queries directly without trying to "improve" them by redefining some methods like Query.__len__. This way user can write any custom query pony_log_result = logger.LoggerEntity.select(lambda record: record.time >= t1 and record.time <= t2 and record.exception == "ZeroDivisionError").order_by(logger.LoggerEntity.id) for row in pony_log_result: print(row.function)

Alexander

If you wrap Query object with your own classes which are logically unrelated to Pony, you are basically create some new query language, and it may be quite a complex task to implement correctly and document

Evgeniy

1. Yes, that's what I was going to do. 2. Giving direct access to the Pony interface is contrary to my original idea of minimalism. I even removed the brackets from the decorators when they are not needed :) As little code as possible. I don't need a new query language, so I take exactly the same generator as in the original and pass it to the pseudo-select() function. Just like in the original. The difference is that the user does not have to know Pony thoroughly in order to start using It (for example, it was a surprise to me at the very beginning that len() makes a complete selection from the database). A person should like this approach, and then they can get acquainted with a more powerful full-fledged Pony engine.

Evgeniy

I really like the idea of queries through generators, it's very beautiful. I don't want to go from this to the lambdas.

Evgeniy

It was because of this beauty that I chose Pony, that was the whole plan.

Alexander

There is a tradeoff between minimalism and expressiveness. I'm not sure you can provide both at the same time. If you provide user with some pseudo-select function you need to describe how it works, what you can specify inside, etc. Writing this as a part of documentation for logging library may be not exactly minimal. Instead, I'd suggest to provide user with a way to perform basic queries, as well as a way to go "under the hood" and perform any queries which underlying platform is capable to do. So, the logging library description will look like this: "If you want to perform basic queries for log results, you can use this API. If you need to write queries of arbitrary complexity, use this way to access underlying query library". This underlying query API may be Pony or direct database connection to write arbitrary SQL queries. Regarding len functionality and why it does not perform select count(*): There are two use-cases when you want to know the number of items. You may be interested in this number only, or you want to iterate over result and process each individual object. When someone call __len__ on object, it often have place in situation when you want to iterate over individual items right after that. In that case, performing select count(*) would be non-optimal for two reason: 1) It will be one more query to the database, as after execution select count(*) from t1 where ... we will issue select * from t1 where ... right after that anyway 2) The actual number of rows returned in the second query may be different from the number returned from select count(*), as most databases works on READ COMMITTED isolation level by default, and some new rows may be added to the database by another transaction between these two queries. Because of this, Pony provides two different functions - len and count. Count will execute select count(*), while len, when applied to top-level query, will actually load all rows to avoid second query for iteration over query result

Alexander

If len is only thing you want to change, you can write API like LoggerAPI.select(...) LoggerAPI.count(...)

Evgeniy

There is a tradeoff between minimalism and expressiveness. I'm not sure you can provide both at the same time. If you provide user with some pseudo-select function you need to describe how it works, what you can specify inside, etc. Writing this as a part of documentation for logging library may be not exactly minimal. Instead, I'd suggest to provide user with a way to perform basic queries, as well as a way to go "under the hood" and perform any queries which underlying platform is capable to do. So, the logging library description will look like this: "If you want to perform basic queries for log results, you can use this API. If you need to write queries of arbitrary complexity, use this way to access underlying query library". This underlying query API may be Pony or direct database connection to write arbitrary SQL queries. Regarding len functionality and why it does not perform select count(*): There are two use-cases when you want to know the number of items. You may be interested in this number only, or you want to iterate over result and process each individual object. When someone call __len__ on object, it often have place in situation when you want to iterate over individual items right after that. In that case, performing select count(*) would be non-optimal for two reason: 1) It will be one more query to the database, as after execution select count(*) from t1 where ... we will issue select * from t1 where ... right after that anyway 2) The actual number of rows returned in the second query may be different from the number returned from select count(*), as most databases works on READ COMMITTED isolation level by default, and some new rows may be added to the database by another transaction between these two queries. Because of this, Pony provides two different functions - len and count. Count will execute select count(*), while len, when applied to top-level query, will actually load all rows to avoid second query for iteration over query result

Yes, I understand the reason why count() works this way in Pony, but I don't assume that anyone in their right mind would ever select ALL entries from the log at all :) This would be extremely strange behavior for the user, but an error using len() by mistake is quite likely, since this is the native behavior of the user to any iterated objects. About flexibility-Yes, in the logging library that I used earlier (django-db-logger), it was. I found it ugly and inconvenient. it turns out that in order to fully use this library, I must study another one.

Evgeniy

(I knew how to use django, it's ugly in principle)

Evgeniy

If len is only thing you want to change, you can write API like LoggerAPI.select(...) LoggerAPI.count(...)

I'm not sure what the innovation is. Is the method called from a class?

Evgeniy

To view the number of logs in a filtered category is the most frequent action that the user will do, I think. This should be as native as possible.

Evgeniy

Native == len()

Evgeniy

My approach would be wrong for OPM in General, but it is more expected when using the logger.

Alexander

Maybe something like class MyLogger(object): def __init__(self, log_entity): self.log_entity = log_entity def select(self, gen): pony_result = self.log_entity.select(gen) return LogResults(self.log_entity, pony_result) def count(self, gen): return self.log_entity.select(gen).count() class LogResults(object): def __init__(self, log_entity, query_result): self.log_entity = log_entity self.query_result = query_result def __len__(self): return self.query_result.count() def __iter__(self): return iter(self.query_result)

Evgeniy

Yes, a post-processing class for getting additional data, which is about what I planned to do. In addition to length, there will be methods for calculating the success rate for functions and much more. But all this still rests on the set of data from the query, which must be obtained in a convenient way.

Anonymous

I had some performance problem using ponyorm backed by postgres with JSONB object, this is because the json objects are a litle bit complex and the standard json library not work very efficiently like simplejson or ujson... is posiblle to use a different json library to dump/load ?

Lucky

Yes, I understand the reason why count() works this way in Pony, but I don't assume that anyone in their right mind would ever select ALL entries from the log at all :) This would be extremely strange behavior for the user, but an error using len() by mistake is quite likely, since this is the native behavior of the user to any iterated objects. About flexibility-Yes, in the logging library that I used earlier (django-db-logger), it was. I found it ugly and inconvenient. it turns out that in order to fully use this library, I must study another one.

Wait, len(query) does return the overall database results without the WHERE clauses?

Alexander

Yes. Hmm, I don't see where it is described in doc...

Lucky

Well, that's counter intuitive

Lucky

I mean len(Entity) would make sense, but len(Entity.select(foo=bar)) should return a count of that select part...

Alexander

Ok, maybe we need to reconsider this... The reason was to guarantee that len(query) returns actual number of items you can receive from this query

Lucky

So len(Entity) == len(Entity.select(lambda: False)) ?

Alexander

Don't quite understand

Alexander

len(Entity.select(lambda: False)) is 0

Lucky

len(Entity.select(lambda: False)) is 0

Oh. In that case it is correct.

Lucky

Huh.

Lucky

now I am confused

Alexander

Currently len(query) fetches all query result into memory and returns the number of rows. Next for item in query iteration does not issues the same query again, it iterates over already fetched data. It was possible to implement len as select count(*) from .... But in that case the following for item in query iteration needs to perform a new query select * from ..., and the actual number of rows may be different from what was returned by len, as another transaction could insert/delete some rows in between these queries

Lucky

Ooh.

Lucky

That makes sense actually

Lucky

Can however quicky eat away your memory if used improperly

Alexander

Maybe we need to change that, to prevent slow queries and memory ovehead if used improperly. If someone needs guarantees that len result is actually the same as number of items, it can fetch all data manually and calculate len in Python

Lucky

I think for most cases it makes sense how it is.

Lucky

Like the problem is happening only with a certain amount of data, say 1000+ rows maybe

Lucky

Otherwise if you have 5 of them, it's still perfectly reasonable

Muhammed

I am using subdomain with Flask's blueprint structure. I also initialized the Flask app as Pony(app). But I get the db_session required error in subdomain view functions. Is the use of Pony(app) changing when using subdomain?

Muhammed

In url_value_preprocessor validator

Muhammed

Hello! My code: print(g.store, current_user.authorized_stores()[:]) print(g.store not in current_user.authorized_stores()) print(g.store not in current_user.authorized_stores()[:]) and output: Store[2] [Store[2]] True True also my authorized_stores is: def authorized_stores(self): return select(sa.store_ref for sa in self.store_authorizations_set)

Muhammed

Is it a bug or am I do an error?

Muhammed

I mean the store[2] is in the query, but why does the comparison return True?

Alexander

Hi! What is current_user.authorized_stores()?

Alexander

How it calculates the list?

Muhammed

Oh, sorry. I'm doing a Flask application. class WebUser(db.Entity, UserMixin): id = PrimaryKey(int, auto=True) store_authorizations_set = Set('StoreAuthorization') def authorized_stores(self): return select(sa.store_ref for sa in self.store_authorizations_set) class StoreAuthorization(db.Entity): id = PrimaryKey(int, auto=True) is_admin = Required(bool, default=False) store_ref = Required(Store) web_users_set = Set(WebUser) class Store(db.Entity): id = PrimaryKey(int, auto=True) store_authorizations_set = Set('StoreAuthorization')

Alexander

Are g.store and current_user from the same db_session? I suspect not

Muhammed

I initialize with Pony(app) and g.store is created url_value_preprocessor. Comparison is in decorator same as login_required

Alexander

I didn't use url_value_preprocessor function. Is it work inside db_session created by Pony(app), or it have separate db_session?

Alexander

Probably url_value_preprocessor runs before Pony(app) has a chance to create db_session

Alexander

In that case, instead of saving store inside g, you can save store.id, and then compare it with ids of authorized stores: g.store_id in [store.id for store in current_user.authorized_stores()]