Joery

This is not possible with the free plan, right?

Alexander

Hi joery! Actually, we changed free plan to allow copying of other users' public diagrams, but the code contains a bug which disallows it right now. We will fix it on Monday, and after that the user with free plan will be able to copy diagrams of other users

Alexander

Hi Matthew! The after_insert hook works before commit. At this moment, we already know IDs of newly created objects, but other transactions still canot see these objects. If you want to pass IDs to another process, you probably need to accumulate them in some local list, and send that list to another process after the commit is finished.

Matthew

ah suddenly it all makes sense :)

Alexander

@luckydonald I have a plan to add mass update operaton with the followng syntax in the future: update(obj.set(foo=x, bar=y) for obj in MyObject if obj.baz == z) But the problem with mass update is that it happens inside the database, and pony does not know which objects were actually changed. Because of this, pony cannot update cached in-memory information about objects, and also cannot invoke after_update hooks. This complicates mass update implementation

Matthew

would it be possible in theory to have an after_commit hook?

Alexander

Yes, I think it is possible. Maybe we need to change the current hooks API to something more general

Matthew

why is after_insert setup to run before the commit?

Alexander

Because typical use-case is to update some other objects during the same atomic transaction, like: class Student(db.Entity): name = Required(str) group = Optional(Group) def after_insert(self): if self.group is not None: self.group.count_of_students += 1

Matthew

ah ok

Alexander

I think it is a non-trivial question when is the best moment to invoke a hook: right after SQL command is executed, after all previously modified objects are flushed to the database, or after the transaction is finished

Alexander

Currently it is the second: after flush() is completed

Matthew

I think they all have their place, so ideally all would be available as options

Alexander

Probably so

Matthew

for now i plan on converting all Entity.get(id=id) calls to Entity[id] in background tasks, so that if the object is not in the database yet, it will automatically retry

Alexander

Maybe it is easier to have separate table ScheduledObject which used to communication with background jobs, in the following way: class ScheduledObject(db.Entity): obj_class_name = Required(str) obj_id = Required(int) obj_status = Required(str) in_process = Required(bool, default=False) processed = Required(bool, default=False) class MyObject(db.Entity): ... def after_insert(self): ScheduledObject( obj_class_name=self.__class__.__name__, obj_id=self.id, obj_status='after_insert' ) Then, backgrond task may do something like that: while True: sleep(1) with db_session: # select objects to process this cycle objects_to_process = select( x for x in scheduled_object if not x.in_process and not x.processed ).order_by(ScheduledObject.id ).for_update().page(1, pagesize=10) # mark them as 'in process' # and commit transaction for x in objects_to_process: x.in_process = True commit() # now another background jobs # will skip these objects # Transaction 2: start processing of # selected objects for x in objects_to_process: obj_class = db.entities[x.obj_class_name] obj = obj_class[x.obj_id] process_object(obj) x.in_process = False x.processed = True commit()

Matthew

It seems a lot more complicated to be honest

Matthew

for my use case

Alexander

The beneft of using the table is, it will be part of the same transaction. If background job can see the task in ScheduledObject table, than the transaction in which the object was created is already completed

Alexander

Using retry seems a bit brittle to me: there is no guarantee that the initial transaction will be completed to the moment of retry

Matthew

Sure, but for my usecase it's always a web form being posted that creates these objects, so as long as the entire system isn't screwed it's a short transaction

Matthew

99.9% of the time, this already worked, because all background workers were busy, so by the time the job was processed, the transaction had already completed

Matthew

so now in the 0.1% of the time, it retries a few seconds later

Alexander

Ok, in this case it may work

Matthew

This explains so much weirdness for the past few years in the system :D

Matthew

What happens if this is the code: def after_insert(self): flush() # create background jobs

Matthew

would it then have the data in the database?

Alexander

To the moment after_insert is called, flush is already happened (that is INSERT command is already executed). So, the data IS in the database, but not visible to other transactions yet, because the transaction is not finished yet

Alexander

After the row is inserted, it is still not visible to another processes, because you can do rollback instead of commit

Matthew

ah ok

Lucky

for now i plan on converting all Entity.get(id=id) calls to Entity[id] in background tasks, so that if the object is not in the database yet, it will automatically retry

Huh? Because it rises a key error or something similar and you will retry

Matthew

yep

Lucky

Eh, similar to foo = Foo.get(...) If not foo: Raise exception Where is the big plus that you can change your primary key structure in the database but still your code will continue to work.

Matthew

I'm extremely unlikely to change the primary key structure

Lucky

I was too — but then I sadly needed it xS

Matthew

do you mean something like a composite primary key?

Lucky

As I learned the hard way that Telegram's file_id can change suddenly.

Lucky

That was the primary key. But they change the format of it, so my database was creating duplicates

Matthew

It's usually a bad idea to use foreign data as a primary key, I just use a normal autoincrement integer as a primary key then have the foreign data as another field (that can be unique if needed)

Lucky

That's how I have it now to

Lucky

Just saying, using the get(), I was still able to continue functioning until I had all of the code replaced

Alexander

You mean, you change the field from primary key to secondary key, and still was able to use it for search

Henri

Yes, I think it is possible. Maybe we need to change the current hooks API to something more general

It would make sense to have also session hooks like before_commit or after_commit which not depend on an entity but on the db_session. That allows to be triggered from the view not the model.

Alexander

Yes, I agree. I'll think about it

Matthew

do you mean passing functions like db_session(before_common=function) ?

Henri

Rather trigger a hook inside the db_session.

Matthew

commit, not common

Alexander

Maybe something like: with db_session: ... set_after_commit_hook(my_function) But some details may be untrivial. For example, if the same function was added several times during the same db_session, should it be called several times after commit is happened

Lucky

You mean, you change the field from primary key to secondary key, and still was able to use it for search

yes

Matthew

I think it does make sense to have on entities

Matthew

I have instances that are created in multiple places in my app, and i always want an after_commit hook to fire

Henri

I want to make sure that the side effects are triggered after the database session has been committed.

Matthew

Yes, with a hypothetical Entity.after_commit, that would happen

Henri

But I need to set the hook on runtime in the view not in the model.

Alexander

It seems, different users have different use-cases and expectations for hooks, so common API which looks uncomplicated, satisfies all use-cases and intuitively obvious is untrivial to develop

Matthew

Henri, by view do you mean a view in a webapp?

Henri

Maybe it would be even better to have the hook on the entity instance, this would also allow to set it during the db_session and not when defining the entity. Like Entity.set_after_commit_hook(my_function)

Henri

Also Entity.set_after_insert_hook(my_function).

Matthew

why would you not know in advance what code you would want to run?

Henri

For example when adding a new user I want to send a verification email. But only when I'm sure the user was really aded to the database.

Matthew

I mean why does that need to be a function that you dynamically pass? you could do this when defining the entity

Henri

The idea is to have the possibility to add side effects to the transaction. So you can use the pony transaction manager also for sending emails or writing data to disk.

Henri

The side effects must only be triggered when the transaction finished successfully.

Matthew

Yes, that could happen in an Entity.after_commit method

Henri

Yep if could set it on runtime from the view. As it's quite difficult for me if I have no access to the request object which is the case in the model when defining the entity.

Henri

The same for the settings which are much easier to access from the view then from the model. Talking here about Morepath integration.

Matthew

This seems like an unusual usecase to be honest. You could hack around it by doing: def after_commit(self): for f in self.after_commit_functions: f(self)

Matthew

and then in your view, Entity.after_commit_functions = [some functions]

Matthew

or maybe just: with db_session: # do stuff with the database # now do post-transaction stuff

Matthew

if it is view-specific code

Henri

Here how http://transaction.readthedocs.io uses commit hooks: Before-commit hook Sometimes, applications want to execute some code when a transaction is committed. For example, one might want to delay object indexing until a transaction commits, rather than indexing every time an object is changed. Or someone might want to check invariants only after a set of operations. A pre-commit hook is available for such use cases: use addBeforeCommitHook(), passing it a callable and arguments. The callable will be called with its arguments at the start of the commit (but not for substransaction commits). After-commit hook Sometimes, applications want to execute code after a transaction commit attempt succeeds or aborts. For example, one might want to launch non transactional code after a successful commit. Or still someone might want to launch asynchronous code after. A post-commit hook is available for such use cases: use addAfterCommitHook(), passing it a callable and arguments. The callable will be called with a Boolean value representing the status of the commit operation as first argument (true if successfull or false iff aborted) preceding its arguments at the start of the commit (but not for substransaction commits). Commit hooks are not called for transaction.abort().

Matthew

This seems like overkill when you could just call functions before / after a with db_session: block

Henri

or maybe just: with db_session: # do stuff with the database # now do post-transaction stuff

Yeah maybe I must create a hook in Morepath. I'm using https://github.com/morepath/more.pony which is wrapping the db_session inside the request session so I cannot get outside of the db_session from the view.

Henri

I just found something similar in SQLAlchemy: the "Transaction Events" (http://docs.sqlalchemy.org/en/latest/orm/session_events.html#transaction-events). Especially SessionEvents.before_commit(), SessionEvents.after_commit() and SessionEvents.after_rollback().

Anonymous

I'm inserting time series data into a database that I am pulling from multiple CSV files (where a CSV file is some range of time series data for a specific point of data and a point of data could be the measured temperature of a room, measured humidity of a room, position of a valve, etc.) Everything is working well enough, but I'm trying to determine the best way to ensure that I do not insert duplicate entries. (continued)

Anonymous

The primary key of my time series data table is a composite of the Point Id ( a foreign key to my points table) and the time/date stamp. My code iterates through the data from the CSV file creating 1 record at a time for each measurement of data. This all happens for each point of data in each device that is being monitored. I am currently only committing to the database when the records are created for all entries from a point have been loaded from the CSV file.

Matthew

Hi!

Anonymous

This results in my commit causing an exception in the event of duplicate entries and the entire series fails. I'm not wanting to commit after each entry because of the speed hit.

Anonymous

Hi. I have more to describe, but is this the proper place to bring this up?

Matthew

Yes it's fine to say this stuff here