@ponyorm - страница 60 - Telegram web archive

Alexander

The attributes get column names inside generate_mapping call

Anonymous

I think we can add db.entities and entity._adict_ to documentation. But the suggested text needs to be rephrased. db.entities is a dict and not a list, it maps entity name to entity class. entity._adict_ maps attribute name to attribute descriptor object. It has nothing to do with table or column names. It is often more convenient to use entity._attrs_ list instead of entity._adict_ mapping. entity._attrs_ is a list of all attribute descriptors listed in a declaration order. In order to get table name someone can access entity._table_ property, its value can be a string or a tuple if table name includes schema name as well. The column name can be retrieved from attribute descriptor using attr.column property. Some attributes map to multiple columns, so it is better to use attr.columns property, which returns list of all columns for given attribute. With this changes, the example code will look as: for entity_name, entity_class in db.entities.items(): print(entity_name, entity_class._table_) for attr in entity_class._attrs_: print(attr.name, attr.columns)

After generate mapping I get: PackSize PackSize id ['id'] products ['product'] size ['size'] pack_type ['pack_type'] UoM_length ['UoM_length'] length ['length'] width ['width'] height ['height'] UoM_volume ['UoM_volume'] volume ['volume'] UoM_weight ['UoM_weight'] weight ['weight'] LuM LuM

Anonymous

This prompts two use-cases: 1) Where I'm merely interrogating the model (before mapping)

Anonymous

2) Where I'm interrogating the mapped model.

Anonymous

This must be common when attempting to migrate databases?

Alexander

I think, most often a deveoper expects that the migration tool can make changes without manual interrogating of attribute->column mapping. But maybe sometimes it is useful

Anonymous

😊 I have a web app, where the user needs to uploads data into a database for subsequent analytics. Before uploading we need to map their data (often large csv and txt files) into the database schema. We find their headers from their flat files and list them, and then list all of our tables and headers. Then we ask the user to draw lines from their upload to our schema. Web-widget: https://gojs.net/latest/samples/treeMapper.html However as our schema evolves it is nice to expand the database by programmatically reading the orm.Database model.

Alexander

It in an interesting case. Maybe it is easier to load array of values loaded from csv columns into a single JSON column? Then it will be not necessary to modify schema for each specific task

Anonymous

We already do that during staging of the data using pickle/buffer. However as we need to run property specific analytics across the data the schema evolves about every 2.5 months. The life cycle is approximately: V.0.0.1: tables x columns x rows x fields. V.0.1.0: Some columns have field xyz in common: Let's refactor to V.0.0.1 + table xyz. V.0.1.1: Some fields have something in common: Let's refactor to V.0.1.0 + some field identifier ... and so it evolves... Fun Fact: Scientist != Engineers.

Anonymous

So I force the dear "Data Scientists" to map their chaotic data to a model, and leave a Research Software Engineer to look for ways to normalise their chaotic schemas. It's a bit like the game "whack-a-mole". 😂

Alexander

What database do you use?

Anonymous

postgres.

Anonymous

450 Gb ram. 32 Tb disk

Anonymous

48 cores

Alexander

And what tool you use to do analytics? Do you use Pony for that?

Anonymous

Typical dataset: 7-10 zip folders with 40-90 Gb of raw text.

Anonymous

Yes.

Anonymous

Nobody accesses any of our servers.

Anonymous

They commit patches to a version control system and dataworkers (bots) then start processing the data.

Anonymous

They only way that their code is released into production is if it passes a code review and test dataset.

Anonymous

Any SQL get's thrown out by the code review.

Anonymous

only pony models are allowed through.

Anonymous

If the test suite runs too slow with their data (5% sample), they normally start optimizing prior to releasing it to the workers.

Alexander

If you use JSON column, you can access individual properties to do analytic queries, I think. The example below is for SQLite, but it should work with PostgreSQL as well from pony.orm import * db = Database('sqlite', ':memory:') class X(db.Entity): id = PrimaryKey(int, auto=True) data = Required(Json) set_sql_debug(True) db.generate_mapping(create_tables=True) with db_session: x1 = X(id=1, data={'foo':10, 'bar': 20}) x2 = X(id=2, data={'foo':30, 'bar': 40}) with db_session: query = select(max(int(x.data['foo'])) for x in X if x.data['bar'] > 30) print(query.get()) Maybe we need to add explicit casting to some specific types like float to make it fully universal

Anonymous

If you use JSON column, you can access individual properties to do analytic queries, I think. The example below is for SQLite, but it should work with PostgreSQL as well from pony.orm import * db = Database('sqlite', ':memory:') class X(db.Entity): id = PrimaryKey(int, auto=True) data = Required(Json) set_sql_debug(True) db.generate_mapping(create_tables=True) with db_session: x1 = X(id=1, data={'foo':10, 'bar': 20}) x2 = X(id=2, data={'foo':30, 'bar': 40}) with db_session: query = select(max(int(x.data['foo'])) for x in X if x.data['bar'] > 30) print(query.get()) Maybe we need to add explicit casting to some specific types like float to make it fully universal

Its a neat trick for unstructured data, and I can see how works for the scientist who knows the data set. However the data remains opaque for 2nd level scientists unless the 1st (data provider) is forced to map it to the existing schema. And that's really the root problem that we're trying to overcome.

Matthew

I seem to be constructing a query incorrectly, what am I missing? it's the select / count subquery that breaks it: https://gist.github.com/anonymous/e5f394457e3cb3898d08b3d40feae239

Alexander

What error do you get?

Matthew

AttributeError: 'QuerySetMonad' object has no attribute 'getsql'

Alexander

What you expect to get as a query result?

Matthew

for the sub query: something like (1, 20), (3, 13) etc

Matthew

so review rating and then how many reviews there are with that rating

Alexander

And what should be the result of a main query?

Matthew

product, total number of reviews for that product, subquery results

Matthew

the query works fine without the subquery

Alexander

So it should be like nested tables?

Matthew

It doesn't need to be nested

Matthew

do you think trying to be nested is breaking it?

Matthew

I know the possible review scores in advance, so maybe that can be used in the query?

Matthew

potential scores are [1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0]

Matthew

I have managed to write it with a query for getting scores for each product

Alexander

At this moment Pony does not support nested tables as a query result. Not all databases allow them. I think it is not necessary to calculate all these information inside a single query, it can be split to two different query with roughly the same total performance. So, instead of: products = select( (p, count(p.reviews), select( (r.review_rating, models.count()) for r in models.Review if r.product == p)) for p in ReviewProduct if JOIN(User[1] in p.users)) You can write two queries, convert them to dict in Python and then combine these dicts if you wish to do it: query1 = select( (p, count(p.reviews)) for p in ReviewProduct if JOIN(User[1] in p.users) ) review_counts = { product: review_count for product, review_count in query1 } query2 = select( (r.product, r.review_rating, models.count()) for r in Review if JOIN(User[1] in r.product.users)) rating_counts = defaultdict(dict) for product, review_raiting, review_count in query2: rating_counts[product][review_raiting] = review_count

Matthew

That's a good way of doing it

Matthew

My query-per-product method takes 9 seconds, so I will optimise with your code :)

Anonymous

Has anyone seen this error before: ''' pony.orm.dbapiprovider.OperationalError: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. '''

Anonymous

Im seeing it in a production build, and Im at a lose for what the deal is.

Anonymous

Not sure if its pony, or if its postgres issue.

Anonymous

I would assume the latter

Alexander

It will be easier to tell the answer if you post full traceback. I need to go offline for a hour, and then I can tell what is the reason for the error

Alexander

You can message traceback directly to me, if you want

Anonymous

You can message traceback directly to me, if you want

Awesome! Thanks for the help

Matthew

At this moment Pony does not support nested tables as a query result. Not all databases allow them. I think it is not necessary to calculate all these information inside a single query, it can be split to two different query with roughly the same total performance. So, instead of: products = select( (p, count(p.reviews), select( (r.review_rating, models.count()) for r in models.Review if r.product == p)) for p in ReviewProduct if JOIN(User[1] in p.users)) You can write two queries, convert them to dict in Python and then combine these dicts if you wish to do it: query1 = select( (p, count(p.reviews)) for p in ReviewProduct if JOIN(User[1] in p.users) ) review_counts = { product: review_count for product, review_count in query1 } query2 = select( (r.product, r.review_rating, models.count()) for r in Review if JOIN(User[1] in r.product.users)) rating_counts = defaultdict(dict) for product, review_raiting, review_count in query2: rating_counts[product][review_raiting] = review_count

Turns out that I could just use query2 and then sum its results in a loop, removing the need for query 1 :)

Matthew

it's fast now

Matthew

product_score_counts = collections.defaultdict(dict) for _ in models.select((r.product, r.review_rating, models.count()) for r in models.Review if models.JOIN(current_user_object() in r.product.users)): product_score_counts[_[0]][_[1]] = _[2] products = sorted([(k, sum(v.values())) for k, v in product_score_counts.items()], key=lambda _: _[1], reverse=True)

Adam

Hello im having problems with sqlite: TransactionIntegrityError: Object Autodelete[new:92] cannot be stored in the database. IntegrityError: FOREIGN KEY constraint failed

stsouko

Hello im having problems with sqlite: TransactionIntegrityError: Object Autodelete[new:92] cannot be stored in the database. IntegrityError: FOREIGN KEY constraint failed

You trying to create parallel storing to db?

Adam

sorry kinda new to this .. so not sure what you mean

stsouko

Are you using multiprocessing?

Adam

asyncio

stsouko

For except this errors storing to db I do in single thread

Adam

maybe not the most cleanest but i figured out a way but just doing a forced commit

Alexander

Hello im having problems with sqlite: TransactionIntegrityError: Object Autodelete[new:92] cannot be stored in the database. IntegrityError: FOREIGN KEY constraint failed

It seems that your Autodelete object refers to some other object which was already deleted by some concurrent transaction

Farid

does ponyorm have support for mptt algorithm?

Matthew

start a group for Russian and kick the bots

Farid

how do you manage trees on ponyorm?

Alexander

Which trees?

Farid

graphs

Farid

for example in django orm they have django-mptt for maintaining trees. https://github.com/django-mptt/django-mptt

Alexander

As I know pony doesn't have anything about storing trees implemented. I'm not sure but you can store tree as a JSON, probaly.

Alexander

You can wait for pony author which is offline right now, he might give you some efficient suggesttions.

Alexander

I think bots are a bit noisy :)

Alexander

Hi Farid, at this moment PonyORM does not have special support for maintaining trees, you need to do it manually. In the future I want to add support for hierarchical structures, but probably not too soon

Lucky

Why that?

Lucky

Why that?

It was mainly in here for the permalink feature. If the rest of the bot was to noisy, how about @PermalinkBot? It serves just the github permalink feature.

Alexander

I got the impression that the permalink feature at the moment is not too much in demand by users

Alexander

I still don't get the meaning of the bot

Alexander

+1

Alexander

Also, when someone post a link to github repository, typically it is not for specific commit, so when a bot post a message after each link it looks a bit like a spam

Alexander

Perhaps we can add it again in the future, if users ask about it

Lucky

Also, when someone post a link to github repository, typically it is not for specific commit, so when a bot post a message after each link it looks a bit like a spam

The idea behind it is that github links aren't future prove.

Lucky

Here you can find more examples: https://github.com/ponyorm/pony/blob/orm/pony/orm/examples/estore.py https://github.com/ponyorm/pony/blob/orm/pony/orm/examples/university1.py

Take this one for examply. At a later point in time, the files might be changed, moved, or even deleted.