Alexander
The attributes get column names inside generate_mapping call
Anonymous
This prompts two use-cases: 1) Where I'm merely interrogating the model (before mapping)
Anonymous
2) Where I'm interrogating the mapped model.
Anonymous
This must be common when attempting to migrate databases?
Alexander
I think, most often a deveoper expects that the migration tool can make changes without manual interrogating of attribute->column mapping. But maybe sometimes it is useful
Anonymous
😊 I have a web app, where the user needs to uploads data into a database for subsequent analytics. Before uploading we need to map their data (often large csv and txt files) into the database schema. We find their headers from their flat files and list them, and then list all of our tables and headers. Then we ask the user to draw lines from their upload to our schema. Web-widget: https://gojs.net/latest/samples/treeMapper.html However as our schema evolves it is nice to expand the database by programmatically reading the orm.Database model.
Alexander
It in an interesting case. Maybe it is easier to load array of values loaded from csv columns into a single JSON column? Then it will be not necessary to modify schema for each specific task
Anonymous
We already do that during staging of the data using pickle/buffer. However as we need to run property specific analytics across the data the schema evolves about every 2.5 months. The life cycle is approximately: V.0.0.1: tables x columns x rows x fields. V.0.1.0: Some columns have field xyz in common: Let's refactor to V.0.0.1 + table xyz. V.0.1.1: Some fields have something in common: Let's refactor to V.0.1.0 + some field identifier ... and so it evolves... Fun Fact: Scientist != Engineers.
Anonymous
So I force the dear "Data Scientists" to map their chaotic data to a model, and leave a Research Software Engineer to look for ways to normalise their chaotic schemas. It's a bit like the game "whack-a-mole". 😂
Alexander
What database do you use?
Anonymous
postgres.
Anonymous
450 Gb ram. 32 Tb disk
Anonymous
48 cores
Alexander
And what tool you use to do analytics? Do you use Pony for that?
Anonymous
Typical dataset: 7-10 zip folders with 40-90 Gb of raw text.
Anonymous
Yes.
Anonymous
Nobody accesses any of our servers.
Anonymous
They commit patches to a version control system and dataworkers (bots) then start processing the data.
Anonymous
They only way that their code is released into production is if it passes a code review and test dataset.
Anonymous
Any SQL get's thrown out by the code review.
Anonymous
only pony models are allowed through.
Anonymous
If the test suite runs too slow with their data (5% sample), they normally start optimizing prior to releasing it to the workers.
Alexander
If you use JSON column, you can access individual properties to do analytic queries, I think. The example below is for SQLite, but it should work with PostgreSQL as well from pony.orm import * db = Database('sqlite', ':memory:') class X(db.Entity): id = PrimaryKey(int, auto=True) data = Required(Json) set_sql_debug(True) db.generate_mapping(create_tables=True) with db_session: x1 = X(id=1, data={'foo':10, 'bar': 20}) x2 = X(id=2, data={'foo':30, 'bar': 40}) with db_session: query = select(max(int(x.data['foo'])) for x in X if x.data['bar'] > 30) print(query.get()) Maybe we need to add explicit casting to some specific types like float to make it fully universal
Matthew
I seem to be constructing a query incorrectly, what am I missing? it's the select / count subquery that breaks it: https://gist.github.com/anonymous/e5f394457e3cb3898d08b3d40feae239
Alexander
What error do you get?
Matthew
AttributeError: 'QuerySetMonad' object has no attribute 'getsql'
Alexander
What you expect to get as a query result?
Matthew
for the sub query: something like (1, 20), (3, 13) etc
Matthew
so review rating and then how many reviews there are with that rating
Alexander
And what should be the result of a main query?
Matthew
product, total number of reviews for that product, subquery results
Matthew
the query works fine without the subquery
Alexander
So it should be like nested tables?
Matthew
It doesn't need to be nested
Matthew
do you think trying to be nested is breaking it?
Matthew
I know the possible review scores in advance, so maybe that can be used in the query?
Matthew
potential scores are [1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0]
Matthew
I have managed to write it with a query for getting scores for each product
Alexander
At this moment Pony does not support nested tables as a query result. Not all databases allow them. I think it is not necessary to calculate all these information inside a single query, it can be split to two different query with roughly the same total performance. So, instead of: products = select( (p, count(p.reviews), select( (r.review_rating, models.count()) for r in models.Review if r.product == p)) for p in ReviewProduct if JOIN(User[1] in p.users)) You can write two queries, convert them to dict in Python and then combine these dicts if you wish to do it: query1 = select( (p, count(p.reviews)) for p in ReviewProduct if JOIN(User[1] in p.users) ) review_counts = { product: review_count for product, review_count in query1 } query2 = select( (r.product, r.review_rating, models.count()) for r in Review if JOIN(User[1] in r.product.users)) rating_counts = defaultdict(dict) for product, review_raiting, review_count in query2: rating_counts[product][review_raiting] = review_count
Matthew
That's a good way of doing it
Matthew
My query-per-product method takes 9 seconds, so I will optimise with your code :)
Anonymous
Has anyone seen this error before: ''' pony.orm.dbapiprovider.OperationalError: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. '''
Anonymous
Im seeing it in a production build, and Im at a lose for what the deal is.
Anonymous
Not sure if its pony, or if its postgres issue.
Anonymous
I would assume the latter
Alexander
It will be easier to tell the answer if you post full traceback. I need to go offline for a hour, and then I can tell what is the reason for the error
Alexander
You can message traceback directly to me, if you want
Matthew
it's fast now
Matthew
product_score_counts = collections.defaultdict(dict) for _ in models.select((r.product, r.review_rating, models.count()) for r in models.Review if models.JOIN(current_user_object() in r.product.users)): product_score_counts[_[0]][_[1]] = _[2] products = sorted([(k, sum(v.values())) for k, v in product_score_counts.items()], key=lambda _: _[1], reverse=True)
Adam
Hello im having problems with sqlite: TransactionIntegrityError: Object Autodelete[new:92] cannot be stored in the database. IntegrityError: FOREIGN KEY constraint failed
Adam
sorry kinda new to this .. so not sure what you mean
stsouko
Are you using multiprocessing?
Adam
asyncio
stsouko
For except this errors storing to db I do in single thread
Adam
maybe not the most cleanest but i figured out a way but just doing a forced commit
Alexander
Hello im having problems with sqlite: TransactionIntegrityError: Object Autodelete[new:92] cannot be stored in the database. IntegrityError: FOREIGN KEY constraint failed
It seems that your Autodelete object refers to some other object which was already deleted by some concurrent transaction
Farid
does ponyorm have support for mptt algorithm?
Matthew
start a group for Russian and kick the bots
Farid
how do you manage trees on ponyorm?
Alexander
Which trees?
Farid
graphs
Farid
for example in django orm they have django-mptt for maintaining trees. https://github.com/django-mptt/django-mptt
Alexander
As I know pony doesn't have anything about storing trees implemented. I'm not sure but you can store tree as a JSON, probaly.
Alexander
You can wait for pony author which is offline right now, he might give you some efficient suggesttions.
Alexander
I think bots are a bit noisy :)
Alexander
Hi Farid, at this moment PonyORM does not have special support for maintaining trees, you need to do it manually. In the future I want to add support for hierarchical structures, but probably not too soon
Lucky
Why that?
Lucky
Why that?
It was mainly in here for the permalink feature. If the rest of the bot was to noisy, how about @PermalinkBot? It serves just the github permalink feature.
Alexander
I got the impression that the permalink feature at the moment is not too much in demand by users
Alexander
I still don't get the meaning of the bot
Alexander
+1
Alexander
Also, when someone post a link to github repository, typically it is not for specific commit, so when a bot post a message after each link it looks a bit like a spam
Alexander
Perhaps we can add it again in the future, if users ask about it