Vitaliy
Hello @metaprogrammer! Is it possible to join a table to itself using sqlite? It was possible for MySQL andf PostgeSQL, but I have switched some entities to separate DB (sqlite) and have encountered a problem
Alexander
Hi Vitaliy! It should be no difference here between SQLite and other databases, self-join in Pony should look the same. Something like:
select(
(p1, p2) for p1 in Person for p2 in Person
if p1.name == p2.name and p1.id < p2.id
)
or
select(
child for parent in Person for child in parent.children
if parent.country == 'USA'
)
Or maybe I don't fully understand your question
Vitaliy
Hmm… It seems the problem is not with joining, but with datetime field. I try to explain:
Vitaliy
Here is the Entity that stores traffic counters for network switches. And Interval entity (see comments).
Every row contains amount of data transmitted and received through specific port of switch.
So to calculate throughput I need from the current row to substract the previous one.
For Hour interval I need to get rows which stored every minute (see query example bellow).
This query using mysql and pgsql works OK, but using sqlite it returns empty set.a
class Traffic(sqlite.Entity):
id = PrimaryKey(int, auto=True)
switch = Required(str)
port = Required(int, size=16, unsigned=True, min=1)
timestamp = Required(datetime, default=datetime.utcnow, precision=0)
rx = Required(int, size=64)
tx = Required(int, size=64)
intervals = Set(lambda: Interval)
class Interval(db.Entity):
id = PrimaryKey(str, 16, py_check=lambda v: v in INTERVAL_VALUES) # id can be like hour, day, week, month
level = Required(int, size=8, unsigned=True) # we need it to correct cleanup of old data
duration = Required(timedelta, unique=True) # Examples: day=1d00h:00m, hour=0d:01h:00m, week=7d:00h:00m etc.
sample = Required(timedelta) # how often to slice rx/tx metrics (i.e. day - every 15m, week - every hour)
switch_traffic = Set(lambda: Traffic)
# ...
# intervals for traffic statistics
# hour = 60 ticks
Interval(id='Hour', level=1, duration=timedelta(hours=1), sample=timedelta(minutes=1))
# day = 96 ticks
Interval(id='Day', level=2, duration=timedelta(hours=24), sample=timedelta(minutes=15))
# ...
# i is Interval['Hour']
select(
(
t2.timestamp,
(8 * max(0, int(t2.rx) - int(t1.rx)) / i.sample.total_seconds()), # received bps
(8 * max(0, int(t2.tx) - int(t1.tx)) / i.sample.total_seconds()), # transmitted bps
)
for t2 in Traffic
for t1 in Traffic
if t1.timestamp == t2.timestamp - i.sample and i in t1.intervals and t1.switch == 'sw1' and t1.port == 1
)
Jim
A question about multiprocessing. I l want to run a poolprocessexecutor. should I open a db_session for earch task ? or globaly in the function running the executor ?
Alexander
in SQLite there is no native support of timedeltas, so Pony stores timedeltas as a float values. Because of that, the following checks is incorrect:
if t1.timestamp == t2.timestamp - i.sample
because it is always error to compare floating point values with ==.
I think you can use something like
if abs(t1.timestamp - t2.timestamp) < (i.sampe + some_small_value)
Overall, the tables with many-to-many relationships between them looks a bit overcomplicated to me, probably the same data can be stored and queried in a more efficient way. But I need to go offline in ten minutes, so don't have enough time to suggest another way
Vitaliy
I guess t1.timestamp - t2.timestamp will fail because minus operator between two datetimes is not supported in pony, only with datetime and timedelta
Alexander
Alexander
Oh, t1.timestamp is datetime, not timedelta, I've misread it
Alexander
Anyway, I still suspect that the problem is somehow caused by comparison of floating point values
Alexander
Can you pastebin a link to the resulted SQLite query? I need to go now, can think about it a few hours later
Alexander
Or maybe not all data were copied to SQLite. Besides Traffic and Interval tables, there is also intermediate many-to-many table between them
Vitaliy
Solved via this: if t1.timestamp < t2.timestamp and t1.timestamp >= t2.timestamp - i.sample
Buy the way M2M in this case is fully justified 🙂
Alexander
I'm not sure this expression is always correct...
Vitaliy
In my small amount of data it is corrent, I need more time to ensure it
Alexander
in this schema same-level intervals can overlap, so this expression can find more than one row
Vitaliy
Traffic collector runs every minute, there is no overlapping
Alexander
I mean, day intervals start every 15 minutes, etc
Vitaliy
Yes, you are right, did not figure out what you mean 🙂
Vitaliy
Just checked. No, it works perfect, because any row with timestamp which is not multiple of 15min is not tagged with 'Day' and within i.sample interval there will be only two rows
Alexander
Ok, good. I'd probably write between(t2.timestamp, t1.timestamp - i.sample - epsilon, t1.timestamp - i.sample + epsilon)
Vitaliy
I tried, but between seems act like a <= x <= b, but I need a <= x < b
Vitaliy
But it probably doesn’t matter is your case
Alexander
for small epsilon (like, microsecond) it should be irrelevant
Matthew
https://rustpython.github.io/demo/ maybe this could lead to pony with SQLite in the browser :)
stsouko
without GIL?
Matthew
I don't know, but I don't think web assembley can use multiple cores currently anyways
Anzor
Hi all! I have used Pony years ago and that was really nice and simple but still powerful for a small internal project. Now I think about trying it in a middle-sized e-commerce project, but not sure it is production-ready because I have never used ORM in production at all. Can you guys calm me down and say "just do it"?
J.
Relax and "just do it" 👍
Anzor
gracias )
Carel
Congrats on landing the job !
Александр
Hello.
Tell me how to implement such a request in a pony?
select ((w.id, w.first_name, w.last_name) for w in Staff_worker_info if 10 <= w.id <= 30)
Александр
I get an error
NotImplementedError: Expression is too complex to decompile, try to pass query as string, e.g. select("x for x in Something")
Alexander
if between(w.id, 10, 30)
or
if w.id >= 10 and w.id <= 30
Александр
EXACTLY. between !!!
Александр
Thank. With your ORM completely forgot)
Alexander
Sure. I hope we eventually implement direct support of a <= b <= c too
Stefan
Hi Pony team! I am pretty amazed by how nice, pythonic my queries look with it!
Alexander
Hi Stefan! You are welcome
stsouko
Hi!
this is my recipe for entity definition without db instance.
https://pastebin.com/PHDZ0Gdm
stsouko
I will be glad if it will be useful to someone
stsouko
https://stackoverflow.com/a/53776166/7418584
Stefan
👍 thanks
Alexander
Hi guys. API question.
We made new function that can be used for cases when you are trying to use object from another db_session.
with db_session:
s = Student.select().first()
...
with db_session:
s = s.foobar()
So, you may use s object now in the new db_session.
How this function works:
if current db session is not session where this object was loaded -> return new instance of this object that belongs to current db_session
otherwise just return this object back
How would you call such function?
Alexander
It may be useful for GUI applications where you attach an object to some GUI control, and the object can exists for a long time. So, each time you start a new db_session you can do
obj = ui_control.pony_object
obj = obj.xyz()
ui_control.pony_object = obj
and use it without 'The database session is over' exception
Alexander
But what is a good name for that function?
Alexey
To me this way would be more clear
with db_session:
obj = pony.orm.refresh_obj(obj)
Vitaliy
I implemented such function in my project to work with pony objects across threads and named it refresh 🙂
Alexander
refresh_obj(obj) sounds good.
Originally we called it obj.reload()
But I want to point out that when object is already from the current db_session than that function call does not execute SELECT and just return object as is, so you can call the function as often as you want without a performance hit.
I fear that "refresh" or "reload" can mislead some users so they will believe the object attributes are always re-loaded from the database on this call. They are reloaded indeed, but only when previous obj was from old db_session.
Other names like rebind are possible, but may sound too obscure
Alexander
I think at this moment I like obj = obj.refresh() or obj = refresh_object(obj) the best
Lucky
Lucky
Btw, a real database .reload() would be very helpful too
Jim
reget() recall() ?
Alexander
You are right.
How about
entity.refresh_session()
That sounds even better. But I need to poiny out, it is necessary to use the return value, because previous object is not changed:
ui_control.obj = ui_control.obj.refresh_session()
If someone writes just obj.refresh_session() it will not have any effect
Lucky
Alexander
Because the following situation is possible:
Alexander
with db_session:
s1 = Student[1]
with db_session:
s2 = Student[1]
assert s1 is not s2
s1.refresh()
In this case we already have Student[1] object s2 inside the second db_session, and in Python we cannot combine two independently allocated objects into one
Alexander
Instead, the following is true:
with db_session:
s1 = Student[1]
with db_session:
s2 = Student[1]
assert s1 is not s2
x = s1.refresh()
assert x is s2
assert x is not s1
Lucky
But would the following be true?
With dbsession:
S1 = Student[1]
s2 = Student [1]
S3 = Student.get(id=1)
assert s1 == s2 == s3
Lucky
Pardon, poor phone pseudo python programming.
Jim
Clone() clone_from_session () ?
Lucky
Alexander
In this example, s1 and s3 is the same object, Student.get just returns previously created indtance. But if we create two object with the same id in the different db_sessions
with db_session:
s1 = Student[1]
with db_session:
s2 = Student[1]
They need to be different. s1 == s2 still be True, but s1 is s2 will return False
Lucky
Alexander
In multi-threaded applcation there may be active db_session in each thread. Also, if you wrap db_session around generator function, db_session will suspend on yield. It will not be active until you resume generator again, but it will not be finished yet
Lucky
Lucky
Which technically also means no as it does switching contexts in always the same python process...?
Lucky
So, how about a global cache list and instead setting
Entity._session = ...
to track if it was loaded in this session?
Lucky
If entity._session != this.session:
Raise sessionException
Lucky
That way you always have the same object, and you can easily refresh it internal states
Alexander
If entity._session != this.session:
Raise sessionException
This is how it works now
But some people don't like that Pony throws DatabaseSessionIOver on attempt to work with previously created object.
In web application I think it is better to have independent db_session for each request. But in GUI application it may be convenient to attach long-lived object to some UI control and use it accross multiple db_sessions
Alexander
Lucky
Lucky
I can't think of any reason why you would want to keep an unusable old reference around?
Alexander
Each session has internal cache (a.k.a IdentityMap) of all loaded objects, and each object is loaded exactly once.
New session has new cache, independent from the previous one, because the cache of a previous session is already obsolete
Lucky
That said, I can't think of any reason for db_session in the first place