@ponyorm

Страница 74 из 75
stsouko
16.10.2018
08:31:19
Hello! Is it possible iteratively select from db large amount of rows?

pagination for some queries very slow

Matthew
16.10.2018
08:32:27
Pagination should be fast, are you doing a query that uses an index properly?

Alexander
16.10.2018
08:35:42
Matthew, pagination may be slow with very large offset, as the database needs to find and skip all previous rows

Google
Alexander
16.10.2018
08:36:21
stsouko, what ORDER BY do you use with paginated query? Is it order by id, or by some complex set of columns?

stsouko
16.10.2018
08:36:40
yes. but I use select of arrays similar to query array or superset of query array

ordering by id pk

Matthew
16.10.2018
08:37:52
Ah good to know :)

stsouko
16.10.2018
08:37:58
for similarity search I use smlar extension.

which has GIN or GIST insdex

superset is builtin

superset also has GIN index

Alexander
16.10.2018
08:39:35
> ordering by id pk Instead of using offset, you can add condition and id > last_id, where last_id is the last id from the previous result page

stsouko
16.10.2018
08:42:48
in tests I found what search on arrays not cached. every query need many time.

Alexander
16.10.2018
10:48:27
Grigori, I think I fixed #390

Grigori
16.10.2018
10:50:18
@metaprogrammer , thanks! Are you going to do a bugfix release anytime soon?

Alexander
16.10.2018
10:50:57
I'll try to do a release tomorrow

Google
Grigori
16.10.2018
10:51:10
wow! That's a tight schedule!

@metaprogrammer , I wonder if it is possible to use SQLite "Views" feature to speed-up some queries? Is it possible to work with Views in Pony?

And BTW, what about indexes?

Alexander
16.10.2018
12:48:15
You can define a view inside a database and then specify a view name as a table name in entity: class MyView(db.Entity): table = 'view1' foo = Required(int) bar = Optional(str) If a view uses aggregation or joins it will be read-only, so you should not attempt to modify attributes of entity

Can you specify your question about indexes in more details?

Grigori
16.10.2018
12:59:26
We have to add a lot of data to our database in big chunks. The database can already contain, say 1M entries. Then we want to add 1M more, in batches of 1-100K. The problem is, for each entry there is an unique signature, and we have to check if the entry with the same signature is already in the database. My initial idea was to use unique modifier for the signature field. I presume it automatically creates an index for the respective field. However, I'm not sure how Pony checks the existence of entries by field, if these entries were not commited to the database. Does it use linear scan? Or a temporary hashtable? What is the recommended way to solve this problem (adding a lot of entries with check for uniqueness)?

Alexander
16.10.2018
13:20:53
You can specify unique index to column in one of the following way: class MyEntity(db.Entity): foo = Required(str, unique=True) bar = Optional(str, unique='index_name') In the former attribute, the index name will be defined automatically (for SQLite it will be unq_MyEntity_foo). If index with this name is not exists in the database it will be added automatically if you call generate_mapping with option create_tables=True) In the latter attribute index name are given explicitly. This is useful if you already have index with that name in the database. If you just write index=True, then Pony will create a new index with a "canonical" name, and having two duplicate indexes will slow down inserts/updates Inside db_session Pony has in-memory index for each unique database index which is known to Pony. So, if you add several objects inside the same db_session with the same value of an unique attribute, Pony will produce error immediately aven the creation of the second object in memory, even before INSERTs were performed. If you already have a row inside a database with a specific value of an unquie column, and then create a new object with the same value of this column, then, if previous object was not loaded in current db_session, Pony will not try immediately search a previous object with this unique value, because it will lead to unnecessary SELECT queries. Instead, Pony will try to do INSERT in a usual way on flush/commit, and then the database will throw exception on foreign key violation. If you want to know beforehand that such unique value exists, you can check it in the following way: if not MyEntity.exists(unique_attr=value): obj2 = MyEntity(unique_attr=value, other_attr=value2) else: print(value, 'already exists') But it will load previous object into memory, so it may be more efficient to just create a new object and catch error later on commit When you do MyEntity.exists(foo=bar), Pony will execute a query SELECT * FROM MyEntity WHERE foo = ? If foo column have an unique index, this query will be very fast. The resulted object will be cached in db_session and added to in-memory index, so the second call of MyEntity.exists will find object in db_session cache and will not execute the same query one more time

In short, if you define an attribute as unique, everything should be good

in-memory db_session cache for each unique index is a dict with attribute value as a key and object as a value. For composite indexes the dict key is a tuple of attribute values

Grigori
16.10.2018
13:35:01
Wow! Thanks for the detailed answer!

I'll cite this text on a relevant GitHub issue in our repository, if you don't mind.

Alexander
16.10.2018
13:40:01
Of course

Grigori
16.10.2018
14:05:24
But how exactly db_session wrappers work when invoked in a nested functions? For example: @db_session def f1(): return Entry() @db_session def f2(): lst = [] for _ in xrange(0,10): lst.append(f1()) How many commits and flush`es will be there? One or 10? Or 11? Does `db_session wrapper knows that it is called from another db_session and trust upper layers sessions to do the actual commits?

Alexander
16.10.2018
14:07:22
Inner db_sessions are ignored, except sql_debug option

Grigori
16.10.2018
14:13:53
hmmm... In one of my tests removing db_session wrappers from some of the lower leverl functions increased performance about 10 times!

probably, its a bug

Alexander
16.10.2018
14:14:17
That's strange...

Grigori
16.10.2018
14:14:53
It is fully reproducible, so I will try to make a minimum example

Alexander
16.10.2018
14:15:18
That's great

Grigori
16.10.2018
14:18:40
another funny detail is that if I use with db_session statement inside that function, the perfomance is normal

Google
Grigori
16.10.2018
14:19:00
so it is definitely something about the wrapper

Carel
16.10.2018
16:35:43
I was reading the docs and saw mention of getting into an interactive session. It seems one can do so when invoking a module. Import code DB.bind(...) With db_session : Code.interact(locals=locals()) Will get me half way but when I query an entity it c

Alexander
16.10.2018
16:42:16
Pony has a special mode which allows to not wrap a block of code with db_session. It may be useful when working in a console, because it is not possible to wrap multiple console commands with a single context manager, and typically a user want to have a single db_session when working in a console. So, when Pony is imported it tries to detect is it currently running from a console or not. If yes, Pony sets pony.MODE flag to "INTERACTIVE". After that each operation which requires db_session starts it implicitly if db_session wasn't started yet. This is what called "interactive mode" in Pony docs

Carel
16.10.2018
17:08:57
Sorry I typed that like a dog having a scratch (I hit send instead of delete). I meant to say/ask if there is anything one should do to setup a script to drop into a interactive pony session. So if I want to interrogate the db while running a script as “python -m MODULE” can I use the code shown above as is or, given what you’ve said, should I do this : import code from pony.orm import * ... db.bind(...) ... pony.MODE =“INTERACTIVE” code.interactive(locals=locals()) Calling pdb.set_trace in a db_session seems to work alright for this.

Thanks for the epic answer though.

Alexander
16.10.2018
17:12:30
I think the last code should work better, but after exiting from interactive mode you should restore the previous value of pony.MODE

The former code should works too, but it will ignore db_session around functions that user called inside an interactive session, and it may be surprising behavior

Pony interactive mode starts db_session implicitly, but it will not ignore explicitly started db_session

Carel
16.10.2018
17:16:35
Ok, cool thanks ;)

It works as you say, which is kinda cool, now I can drop into an interactive shell in a Tornado app like one might in a Django one :)

Alexander
16.10.2018
17:30:36
I think it will not work in Tornado this way

Carel
16.10.2018
17:32:35
Oh, I’m doing this outside of the tornado eventloop. So querying the db by itself. I think if one does this with in the eventloop it’d be quite a different setup.

Alexander
16.10.2018
17:39:16
By default Pony ties db_session to a current thread. It works well in single- or multi-threaded applications, where each thread represents idependent flow of work, but in an async application a single thread continually switches between different contexts. So, in async application Pony need to switch current db_session accordingly. To do that, Pony understand how to wrap db_aession around a generator function. When generator function yield a next result and suspends, Pony removes its db_session from the current thread, and reactivate it on entering the generator again. But it should be top-level db_session that wrapped the generator function

Carel
16.10.2018
17:48:20
Wow, you really have all the bases covered. I’ll try an async function in a bit, just to try it out.

Alexander
16.10.2018
17:52:19
Python's new async functions is a bit different than usual generators. We added it support recently, it is on GitHub, but not in the official released version yet. https://github.com/ponyorm/pony/issues/380 There was some comment that the support of async functions is not complete, but we was not able to reproduce the problem yet.

Luckydonald
18.10.2018
06:04:02
Hey, I'm trying to create Entities without having to have a Database() already set up. So basically I use a different class (later = Later(), which just stores them in a list when using later.Entity), and at a later time try to get it into the database object: later.attach(db). I keep getting lost in those metaclass inherence problems. Any Idea how we could make this work? I'm thinking about something like flask's blueprints here. Here is my code so far: # -*- coding: utf-8 -*- from pony import orm from pony.orm import PrimaryKey, Required, Optional, Set from pony.orm.core import EntityMeta, Entity from luckydonaldUtils.logger import logging # from ...secrets import POSTGRES_HOST, POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB __author__ = 'luckydonald' # __all__ = [] logger = logging.getLogger(__name__) class LaterMeta(EntityMeta): def __init__(self, name, bases, cls_dict): # self = later.Entity super(EntityMeta, self).__init__(name, bases, cls_dict) self._database_ = None # When created, check the LaterMeta class for the database for base_class in bases: if isinstance(base_class, LaterMeta): base_class._later.entities.append((self, name, bases, cls_dict)) # end if # end for # end def # end class class Later(object): """ :var Entity: LaterMeta """ def __init__(self): self._database_ = orm.Database() self.Entity = type.__new__(LaterMeta, 'Later', (Test,), {}) # type: LaterMeta self.Entity._later = self self.entities = list() def attach(self, database): class EntityMetaDatabased(EntityMeta): #def __new__(meta, name, bases, cls_dict): # return type.__new__(meta, name, bases, cls_dict) # pass __new__ = type.__new__ __init__ = object.__init__ #def __init__(*args): #type.__init__(name, bases, cls_dict) # pass e = EntityMetaDatabased() #class EntittyLater(database.Entity): # pass for table_args in self.entities: #entity = EntityMetaDatabased.__init__(*table_args) # lel = type.__new__(EntityMeta, 'Entity', (table_args[0].__class__,), {}) this = table_args[0] EntityMeta.__init__(this, table_args[0].__name__, (self,), {}) #database.Entity() # entity = type.__new__(EntittyLater, 'Entity', (), {}) # type: Entity pass later = Later() class Project(later.Entity): id = PrimaryKey(int, auto=True) proj = Required(str) # project identifier user = Required('User') default_lang = Optional(str) translations = Set('Translation') access = Set('Access') class User(later.Entity): id = PrimaryKey(int, auto=True) projects = Set(Project) translations = Set('Translation') access = Set('Access') states = Set('State') class Translation(later.Entity): id = PrimaryKey(int, auto=True) lang = Optional(str) project = Required(Project) key = Optional(str) # The programic key for the translation value = Optional(str) # The text user = Required(User) approved = Optional(bool) class Access(later.Entity): id = PrimaryKey(int, auto=True) project = Required(Project) user = Optional(User) # Either token or user token = Optional(str) # Either token or user class State(later.Entity): user = Required(User) chat_id = PrimaryKey(int) value = Optional(int) db = orm.Database() later.attach(db) db.bind(provider='sqlite', filename=":memory:", create_db=True) db.generate_mapping()

So I want to get rid of def foo(db): class Bar(db.Entity) pass db = Database() foo(db) and replace it with something an IDE can inspect.

Alexander
18.10.2018
10:46:57
Can you remind for what purpose do you use this pattern? ``` def foo(db): class Bar(db.Entity) pass db = Database() foo(db) ``` Is it to split entities to several modules?

Luckydonald
18.10.2018
19:08:10
To be able to have them in another file, on in case of migrations (pony_up) load and unload them after migrating



Google
stsouko
19.10.2018
06:40:36
`Try to def __dir__``

Luckydonald
19.10.2018
06:46:32
Can you remind for what purpose do you use this pattern? ``` def foo(db): class Bar(db.Entity) pass db = Database() foo(db) ``` Is it to split entities to several modules?
Digging through it the problem so far is that there is no easy to use database.register(Entity) but all the checking and code is in the EntityMeta making that hard extend, use and so to postpone it until we have the database ready

Basically I was hoping: Normal process: db.Entity calls self._database_.register_entity(self) Blueprint-is process: some sort of LaterEntity could just do self.register_queue.append(self), and have a function register_db(db): for e in self.register_queue: db.register_entity(e)

So the goal would be like: from .models.something import Car db = Database(postgres="...") db.register_entity(Car)

@metaprogrammer I had a thought about it. I think I can't get it to work without changes in pony. Basically the problem is that the metaclass already jumps into action, and several parts require a strict subclassing/instanceof which I can't really monkey patch

Could maybe one of you guys look at it? That part of code isn't very well documented

Alexander
20.10.2018
15:03:56
Hi @luckydonald, I'll think about it, may be we can change the process of entity class creation to make it more flexible. I cannot promise any specific at this point, as it may be a complex task

> So the goal would be like: from .models.something import Car db = Database(postgres="...") db.register_entity(Car) I still don't understand in all details what is the use case, and why early-binding entity to database is not convenient in your case

Luckydonald
21.10.2018
08:40:39
Yes, Basically it allows to easily switch databases on the fly, use the same models on different databases simultaneously and also to split those into several files, without circular imports (from model.foo import Bar which itself needs to import the other file from .. import db)





I have u = User.get(id=user_id) and p = Project.get(user=u, proj=project_key) and now need to know if they are connected with a UserAccess entity

Alexander
26.10.2018
17:15:35
Hi @luckydonald! First, it is not necessary to have UserAccess entity in this diagram, because it is equivalent to many-to-many relationship. If you replace your current code with: class User(db.Entity): ... access_projects = Set("Project", reverse="access_users", table="useraccess", column="project") class Project(db.Entity): ... access_users = Set("User", reverse="access_projects", column="user") You even don't need a migration to work with previous tables The real need in separate UserAccess entity arises only when it has some additional attributes, like access_level or something like that Anyway, if you really need to have a separate UserAccess entity, you can use exists type of query: if not (p.access_all or UserAccess.exists(user=u, project=p)): return 403, {...} Also you can get all accessible projects for user using attribute lifting: user.access_projects.project will return a collection of all accessible projects to a specific user. Similar, project.access_users.user returns a collections of all users which can access a specific project. You can use attribute lifting in queries: select(p for p in Project if (p.access_all or u in p.access_users.user) and <some other conditions>)

Alexander
26.10.2018
17:21:34
I don't understand what you mean by column="project". If you speak about UserAccess.exists(user=u, project=p) then yes, it will work even if project primary key is composite

Luckydonald
26.10.2018
17:22:35
No, I ment your proposal for the class User(db.Entity):. Also you are right, I don't need a seperate table when using two Sets.

Alexander
26.10.2018
17:25:47
I think yes, you can define additional many-to-many relationship between User and Project even if Project primary key includes reference to User But I recommend to use simple id primary key for Project, and define secondary unique composite key Project.name, Project.author

At this moment diagram editior does not allow to define secondary composite keys, but you can add it manually after you take code of model definitions from the site

Google
Luckydonald
26.10.2018
17:29:37
So, basically I'd add composite_key(name, author) where I normaly would have PrimaryKey(name, author)?

Is there a reason why one is CamelCase and one is snake_case?

Alexander
26.10.2018
17:34:29
This is because PrimaryKey is an attribute class (like Required, Optional, etc.) which additionally can be used to define composite primary keys, and composite_index is just a function that registers additional indexes inside an entity. Maybe it would be more clear to have separate composite_pk function instead of (ab)using PrimaryKey attribute for composite primary keys

Luckydonald
26.10.2018
17:37:07
I would name it primary_key(...), as there already is composite_key(...) but in general I agree.

Alexander
26.10.2018
17:38:03
It was initially designed to be similar to SQL which has create table t1 ( foo int primary key, bar int ) as well as create table t2 ( foo int, bar int, primary key (foo, bar) ) so the same keyword name for single and composite primary keys

Luckydonald
26.10.2018
17:41:09
yeah, I like that approach. I was just confusing shortly that they are different in capitalisation.

Alexander
26.10.2018
17:42:18
Ups, thanks for reporting

Luckydonald
26.10.2018
17:51:29
Is there some kind of enum type, which would automatically fill the database? I'm thinking if I could model the Languages as a table, but having to insert those per hand wouldn't be fun...

Alexander
26.10.2018
17:59:56
At this moment Pony does not support enums. You can define the following class class LANGUAGES: EN = 'EN' RU = 'RU' class MyEntity(db.Entity): lang = Required(str) ... MyEntity(lang=LANGUAGES.EN)

Luckydonald
26.10.2018
18:01:03
Thanks

Alexander
26.10.2018
18:08:13
Sure

Alexander
26.10.2018
18:16:04
May be, I need to review the code. Often pool requests don't implement all necessary functionality (support of all databases, correct translation of queries to SQL, etc.) Hope I can look into it soon

Luckydonald
26.10.2018
18:28:08
Coming ? https://github.com/ponyorm/pony/pull/392
That one just translates enums to an TEXT field. I was thinking of something like class Language(Enum): EN = 0 DE = 1 becoming something like: DROP TABLE IF EXISTS Language; CRATE TABLE Language id INTEGER PRIMARY KEY, value TEXT NOT NULL ; INSERT INTO Language ( id, value ) VALUES ( (0, 'EN'), (1, 'DE') );

Alexander
26.10.2018
18:35:36
Some databases have native concept of enums. In my understanding this is what most users want when they says about enum support in Pony This is enums in PostgreSQL: https://www.postgresql.org/docs/11/static/datatype-enum.html And this is in MySQL: https://dev.mysql.com/doc/refman/8.0/en/enum.html But these enums are not totally equivalent to Python enums. There are many different ways to map Python enums to database enums, and it is hard to choose canonical mapping of Python enums to SQL enums. Some time ago I planned to add a special Enum type to Pony which is as close to database enum types as possible, but now with Python standard Enum class having a separate Pony Enum class looks a bit strange

Luckydonald
26.10.2018
18:41:01
Hmm, python enums doesn't have to start with 0, they could just define FOOBAR = 4458

Страница 74 из 75