@ponyorm

Страница 43 из 75
Alexander
08.10.2017
20:04:55
Probably so

Matthew
08.10.2017
20:08:15
for now i plan on converting all Entity.get(id=id) calls to Entity[id] in background tasks, so that if the object is not in the database yet, it will automatically retry

Alexander
08.10.2017
20:36:18
Maybe it is easier to have separate table ScheduledObject which used to communication with background jobs, in the following way: class ScheduledObject(db.Entity): obj_class_name = Required(str) obj_id = Required(int) obj_status = Required(str) in_process = Required(bool, default=False) processed = Required(bool, default=False) class MyObject(db.Entity): ... def after_insert(self): ScheduledObject( obj_class_name=self.__class__.__name__, obj_id=self.id, obj_status='after_insert' ) Then, backgrond task may do something like that: while True: sleep(1) with db_session: # select objects to process this cycle objects_to_process = select( x for x in scheduled_object if not x.in_process and not x.processed ).order_by(ScheduledObject.id ).for_update().page(1, pagesize=10) # mark them as 'in process' # and commit transaction for x in objects_to_process: x.in_process = True commit() # now another background jobs # will skip these objects # Transaction 2: start processing of # selected objects for x in objects_to_process: obj_class = db.entities[x.obj_class_name] obj = obj_class[x.obj_id] process_object(obj) x.in_process = False x.processed = True commit()

Matthew
08.10.2017
20:37:47
It seems a lot more complicated to be honest

Google
Matthew
08.10.2017
20:37:51
for my use case

Alexander
08.10.2017
20:39:39
The beneft of using the table is, it will be part of the same transaction. If background job can see the task in ScheduledObject table, than the transaction in which the object was created is already completed

Using retry seems a bit brittle to me: there is no guarantee that the initial transaction will be completed to the moment of retry

Matthew
08.10.2017
20:41:56
Sure, but for my usecase it's always a web form being posted that creates these objects, so as long as the entire system isn't screwed it's a short transaction

99.9% of the time, this already worked, because all background workers were busy, so by the time the job was processed, the transaction had already completed

so now in the 0.1% of the time, it retries a few seconds later

Alexander
08.10.2017
20:43:42
Ok, in this case it may work

Matthew
08.10.2017
20:44:10
This explains so much weirdness for the past few years in the system :D

What happens if this is the code: def after_insert(self): flush() # create background jobs

would it then have the data in the database?

Alexander
08.10.2017
20:48:20
To the moment after_insert is called, flush is already happened (that is INSERT command is already executed). So, the data IS in the database, but not visible to other transactions yet, because the transaction is not finished yet

After the row is inserted, it is still not visible to another processes, because you can do rollback instead of commit

Matthew
08.10.2017
20:50:22
ah ok

Google
Matthew
08.10.2017
21:08:44
yep

Luckydonald
08.10.2017
21:11:26
Eh, similar to foo = Foo.get(...) If not foo: Raise exception Where is the big plus that you can change your primary key structure in the database but still your code will continue to work.

Matthew
08.10.2017
21:12:42
I'm extremely unlikely to change the primary key structure

Luckydonald
08.10.2017
21:13:04
I was too — but then I sadly needed it xS

Matthew
08.10.2017
21:13:35
do you mean something like a composite primary key?

Luckydonald
08.10.2017
21:14:27
As I learned the hard way that Telegram's file_id can change suddenly.

That was the primary key. But they change the format of it, so my database was creating duplicates

Matthew
08.10.2017
21:16:34
It's usually a bad idea to use foreign data as a primary key, I just use a normal autoincrement integer as a primary key then have the foreign data as another field (that can be unique if needed)

Luckydonald
08.10.2017
21:16:55
That's how I have it now to

Just saying, using the get(), I was still able to continue functioning until I had all of the code replaced

Alexander
08.10.2017
21:19:05
You mean, you change the field from primary key to secondary key, and still was able to use it for search

Henri
08.10.2017
21:42:09
Yes, I think it is possible. Maybe we need to change the current hooks API to something more general
It would make sense to have also session hooks like before_commit or after_commit which not depend on an entity but on the db_session. That allows to be triggered from the view not the model.

Alexander
08.10.2017
21:44:27
Yes, I agree. I'll think about it

Matthew
08.10.2017
21:45:23
do you mean passing functions like db_session(before_common=function) ?

Henri
08.10.2017
21:46:38
Rather trigger a hook inside the db_session.

Matthew
08.10.2017
21:46:38
commit, not common

Alexander
08.10.2017
21:49:49
Maybe something like: with db_session: ... set_after_commit_hook(my_function) But some details may be untrivial. For example, if the same function was added several times during the same db_session, should it be called several times after commit is happened

Matthew
08.10.2017
21:50:39
I think it does make sense to have on entities

Google
Matthew
08.10.2017
21:51:07
I have instances that are created in multiple places in my app, and i always want an after_commit hook to fire

Henri
08.10.2017
21:52:58
I want to make sure that the side effects are triggered after the database session has been committed.

Matthew
08.10.2017
21:53:58
Yes, with a hypothetical Entity.after_commit, that would happen

Henri
08.10.2017
21:55:23
But I need to set the hook on runtime in the view not in the model.

Alexander
08.10.2017
21:56:21
It seems, different users have different use-cases and expectations for hooks, so common API which looks uncomplicated, satisfies all use-cases and intuitively obvious is untrivial to develop

Matthew
08.10.2017
21:56:48
Henri, by view do you mean a view in a webapp?

Henri
08.10.2017
22:01:10
Maybe it would be even better to have the hook on the entity instance, this would also allow to set it during the db_session and not when defining the entity. Like Entity.set_after_commit_hook(my_function)

Also Entity.set_after_insert_hook(my_function).

Matthew
08.10.2017
22:02:59
why would you not know in advance what code you would want to run?

Henri
08.10.2017
22:04:33
For example when adding a new user I want to send a verification email. But only when I'm sure the user was really aded to the database.

Matthew
08.10.2017
22:05:18
I mean why does that need to be a function that you dynamically pass? you could do this when defining the entity

Henri
08.10.2017
22:07:21
The idea is to have the possibility to add side effects to the transaction. So you can use the pony transaction manager also for sending emails or writing data to disk.

The side effects must only be triggered when the transaction finished successfully.

Matthew
08.10.2017
22:09:13
Yes, that could happen in an Entity.after_commit method

Henri
08.10.2017
22:13:16
Yep if could set it on runtime from the view. As it's quite difficult for me if I have no access to the request object which is the case in the model when defining the entity.

The same for the settings which are much easier to access from the view then from the model. Talking here about Morepath integration.

Matthew
08.10.2017
22:20:34
This seems like an unusual usecase to be honest. You could hack around it by doing: def after_commit(self): for f in self.after_commit_functions: f(self)

and then in your view, Entity.after_commit_functions = [some functions]

or maybe just: with db_session: # do stuff with the database # now do post-transaction stuff

if it is view-specific code

Google
Henri
08.10.2017
22:28:04
Here how http://transaction.readthedocs.io uses commit hooks: Before-commit hook Sometimes, applications want to execute some code when a transaction is committed. For example, one might want to delay object indexing until a transaction commits, rather than indexing every time an object is changed. Or someone might want to check invariants only after a set of operations. A pre-commit hook is available for such use cases: use addBeforeCommitHook(), passing it a callable and arguments. The callable will be called with its arguments at the start of the commit (but not for substransaction commits). After-commit hook Sometimes, applications want to execute code after a transaction commit attempt succeeds or aborts. For example, one might want to launch non transactional code after a successful commit. Or still someone might want to launch asynchronous code after. A post-commit hook is available for such use cases: use addAfterCommitHook(), passing it a callable and arguments. The callable will be called with a Boolean value representing the status of the commit operation as first argument (true if successfull or false iff aborted) preceding its arguments at the start of the commit (but not for substransaction commits). Commit hooks are not called for transaction.abort().

Matthew
08.10.2017
22:31:07
This seems like overkill when you could just call functions before / after a with db_session: block

Henri
08.10.2017
22:33:47
or maybe just: with db_session: # do stuff with the database # now do post-transaction stuff
Yeah maybe I must create a hook in Morepath. I'm using https://github.com/morepath/more.pony which is wrapping the db_session inside the request session so I cannot get outside of the db_session from the view.

I just found something similar in SQLAlchemy: the "Transaction Events" (http://docs.sqlalchemy.org/en/latest/orm/session_events.html#transaction-events). Especially SessionEvents.before_commit(), SessionEvents.after_commit() and SessionEvents.after_rollback().

Barry
10.10.2017
21:38:39
I'm inserting time series data into a database that I am pulling from multiple CSV files (where a CSV file is some range of time series data for a specific point of data and a point of data could be the measured temperature of a room, measured humidity of a room, position of a valve, etc.) Everything is working well enough, but I'm trying to determine the best way to ensure that I do not insert duplicate entries. (continued)

The primary key of my time series data table is a composite of the Point Id ( a foreign key to my points table) and the time/date stamp. My code iterates through the data from the CSV file creating 1 record at a time for each measurement of data. This all happens for each point of data in each device that is being monitored. I am currently only committing to the database when the records are created for all entries from a point have been loaded from the CSV file.

Matthew
10.10.2017
21:40:29
Hi!

Barry
10.10.2017
21:41:00
This results in my commit causing an exception in the event of duplicate entries and the entire series fails. I'm not wanting to commit after each entry because of the speed hit.

Hi. I have more to describe, but is this the proper place to bring this up?

Matthew
10.10.2017
21:41:43
Yes it's fine to say this stuff here

Barry
10.10.2017
21:44:03
Okay. My initial thought was to create queue of point files that fail and then process through those committing after each entry. That way i was only slowing the process down when duplicates were involved. Is there a better or prefered way to prevent duplicate entries?

I can post my Entity definitions and the function that is doing the work if that is useful information.

Matthew
10.10.2017
21:45:01
How do you define a duplicate entry? If it matches the primary key?

Barry
10.10.2017
21:45:21
class Point(db.Entity): id = PrimaryKey(int, auto=True) file_name = Required(str, unique=True) device_name = Required(str) point_name = Required(str) data = Set('PointData') type = Optional(PointType) class PointData(db.Entity): time_data = Required(datetime,6) value = Required(str) point = Required(Point) PrimaryKey(point,time_data)

yes

the composite primary key of PointData

Matthew
10.10.2017
21:45:51
Are you inserting from a single process / thread or multiple?

Barry
10.10.2017
21:46:08
single

I have about 5,000 CSV files for a given period of time that I process through.

Matthew
10.10.2017
21:46:52
Have you tried doing an exists() query for each PointData insert?

Google
Barry
10.10.2017
21:46:54
I've been handling it in memory for the past few months, but I'm looking to move beyond that.

Matthew
10.10.2017
21:48:22
Could PointDatas for the same Point be spread across multiple CSV files?

Barry
10.10.2017
21:48:41
I thought about that, but I wasn't sure about the performance hit. Duplicates are unlikely unless someone tries to load the same file a second time. I thought that I'd be better off just blasting them and then going back through the failures. If the hit isn't bad from exists() though I can do that.

Yes. sort of

I run a report each month that gives me the CSV files for the previous month.

There should only be one CSV file per point during a given month though and that is all that I am processing at a time.

Matthew
10.10.2017
21:50:34
I think it's likely that exists() would be cheap. If not, you can dedupe in python (maybe a dict with the primary key combo as a key) on a per file basis.

the exists() should be quick because it is doing an index lookup

Barry
10.10.2017
21:51:08
Okay. I will give that a shot. Thanks

Matthew
10.10.2017
21:51:14
No problem

if PointData.exists(lambda pd: pd.point == my_point and pd.time_data == my_time_data): # create the PointData

that should work

Barry
10.10.2017
21:52:55
Thanks again. I was just reasoning my way through that. I appreciate the help

Matthew
10.10.2017
21:53:44
No problem, I've done similar projects before and I don't remember performance ever being an issue.

Bjorn
12.10.2017
10:09:36
I'm currently testing pony 0.7.2 for usage at Dematic One test seeks to assert that a postgresql and sqlite behave identical for a number of orm.db_session calls. This works perfectly in isolation. However when I'm trying to switch from one provider to another as a part of a loop, pony bites back with the message that: "'Database object was already bound to %s provider'" (ll 547 in core). I then looked at the code and found that db.disconnect does not set database.provider = None (def disconnect(...):... line 599. Is it me doing something wrong?

db.disconnect should set database.provider to "None" shouldn't it?

Страница 43 из 75