@ponyorm

Страница 51 из 75
Alexander
07.12.2017
12:24:07
It works in Python, not in SQL. So all rows from g1 are loaded into memory

Bjorn
07.12.2017
12:25:41
Yes - I take the hit in memory, but the syntax is easier for to read. I'll look at the ponyorm source code and return ..

Matthew
07.12.2017
12:29:07
The performance would probably be pathological with a lot of query results

filter is fine imo

Google
Chukwudi
08.12.2017
21:58:22
@fabirovsky Good day. I noticed that you viewed my LinkedIn profile.

Yuri
08.12.2017
21:59:05
Yes, I did it.

Chukwudi
08.12.2017
21:59:30
Yuri
08.12.2017
23:09:58
You sent a link and I decide to open it. No more.

Chukwudi
09.12.2017
03:07:19
Ahmad Ghulam
10.12.2017
05:44:38
Hi guys, I encountered an exception when tried to use .to_json method saying user nobody has no permission to view ..etc. I search for google and found ponyjs slipped in the ponyorm ref doc but no actual links to it. I could read some question at stackexchange answered by @akozlovsky (i recall) that pony has some user management hidden in it, but couldn't find complete doc about it. I currently doing some APi development but still looking for user management/policy on it, if pony has built in support for it it would be great, any thought?

Ahmad Ghulam
10.12.2017
09:33:12
I'm using falcon as API engine, would it fit altogether, I haven't take a look Morepath before..

Henri
10.12.2017
09:36:38
No experience with falcon I'm actually helping with development of Morepath and I also wrote the ponyorm integration. It fits really perfectly.

Ahmad Ghulam
10.12.2017
09:37:26
Ok, I'll take a look then, thanks..

Henri
10.12.2017
09:41:33
Here is an example (still beta) : https://github.com/yacoma/morepath-realworld-example-app

Bonbot
10.12.2017
09:41:35
Here is an example (still beta) : https://github.com/yacoma/morepath-realworld-example-app
Permanent link to the yacoma/morepath-realworld-example-app project you mentioned. (?)

Pablo
10.12.2017
15:07:37
/whoami@bonbot

Google
Bonbot
10.12.2017
15:07:38
/whoami@bonbot
You are Pablo Maldonado @pmaldonado (user#236659297)

Alexander
10.12.2017
15:09:18
I still don't get the meaning of the bot

Juan
10.12.2017
15:14:21
+1

Matthew
13.12.2017
11:47:54
https://gist.github.com/anonymous/a073810f82f4c5d36ff74e955977f209

The error is:

pony.orm.core.ERDiagramError: Reverse attribute for ListingAlertsListing.alerts not found

Do I need two Alert Sets?

Ideally I'd only have one Alerts set

Alexander
13.12.2017
12:30:23
Your reverses is not correct btw.

It cannot be done the way you want. Alexander (not me) will explain why soon.

Alexander
13.12.2017
12:32:04
In Pony each relationship is defined by two attributes. So you need to add previous_alerts Set attribute to ListingAlertListing class. Also, if I understand the meaning of the schema correctly, reverse option values for current_listing and previous_listing attributes of Alert class should be alerts and previous_alerts

Maybe I'm not fully understand what you want to achieve with that schema

In most cases you can just ignore previous_alerts attribute, but sometimes it may be useful too

Matthew
13.12.2017
13:54:51
Thank you, I used two sets and it worked, having one set didn't matter in the end

Bjorn
15.12.2017
16:22:25
Hi Guys, I'm selecting a slice with orm.select(o for o in db.Order)[start:end] and can do: [:] with the end part being empty. [10:20] also works, but [10:] fails. Is this a bug or a feature?

[10:-1] also seems non-compliant: with orm.db_session: g = orm.select(o for o in db.Order)[10:-1] print([i.id for i in g]) >>> []

Alexander
15.12.2017
16:34:44
Slice is translated to LIMIT section of SQL query. The section purpose is to limit the number of rows returned from the database and not load all millions of rows into memory. Such query can be executed much faster. On the other hand, the slice [10:] means "throw out just first 10 rows and the load all millions of the remaining rows into a memory", it is not supported in SQL and cannot improve the query performance. If you really want to load all rows and then trow out the first 10 you can do it in memory after the query is executed: orm.select(o for o in db.Order)[:][10:]

Bjorn
15.12.2017
16:37:33
I need to chain a stack of generators together to retrieve N rows.

However I don't know how many rows I need to pull from the data source to populate the page that the user is looking at. All I know is that user has clicked onto page N and that each page contained, say for example 50 rows.

With a chain of generators, the operation is similar to: records = [] while len(records) < 50: records.append(next((f(e(d(c(b(a(some_table)))))

Google
Bjorn
15.12.2017
16:41:22
Note that the chain of generators (f,e,d,c,b,a) is created dynamically.

it almost as I'm trying to use the query as generator: q = orm.select(o for o in db.Order) # query object next_row = next(q)

However if I load all rows and discard them, I loose the predictability of the memory footprint that the generator would give me.

Alexander
15.12.2017
16:46:12
Try to use page method of the query. It is equivalent to slice, but its API looks nicer: q = orm.select(o for o in Order) objects = q.page(10, pagesize=50)

I doubt you can make en efficient code with chain of generators in this case

Bjorn
15.12.2017
17:11:32
I think it isn't that horrible: sqlite already uses fetchmany https://docs.python.org/3/library/sqlite3.html#sqlite3.Cursor.fetchmany psycopg uses itersize http://initd.org/psycopg/docs/cursor.html#cursor.itersize so the performance hit has already been taken. The missing element is that the orm doesn't implement the iterator method next on the query object which exploits the features of the db-providers.

For now I will implement a manual slice manager that then repeatedly executes orm.select( .....)[offset: offset+pagesize] and increments the offset. ?

Alexander
15.12.2017
18:07:37
There are several things here: 1) It is significantly more efficient to skip initial rows directly in the database and not in the application memory, because in that case database can retrieve just primary keys of initial rows and not allocate all row content in memory. When someone do iteration on query result just to trow initial rows out, all layers of ORM, database connection driver and database engine need to perform unnecessary work of allocating row data in various buffers. So, good paginated query contains LIMIT...OFFSET... clause. It is even better to have condition AND id > max_prev_id in WHERE section of query instead of OFFSET, but it is not always possible. 2) Using iteration on cursor is a bit slower then fetching all rows in a single action, but may be better if we iterate over millions of rows and previous rows can be garbage collected before we retrieve all rows from the query. If ORM uses identity map pattern, the previous rows cannot be garbage collected, because they all remain cached inside the identity map. So, an ORM with identity map does not have benefit from using iteration API of database cursor object

Bjorn
19.12.2017
13:10:41
Hep - back again ? Could we perhaps add the following to the documentation under mapping customization (https://docs.ponyorm.com/entities.html#mapping-customization), please: ... To list the declared entities in a database model, db.entities lists all tables and their entityMeta objects as a dictionary. To view the individual tables' column names programmatically use table._adict_ example: >>> for table,cols in m.entities.items(): ... print(table, cols) ... for colname, table_col_name in cols._adict_.items(): ... print(colname, table_col_name) ... PackSize <class 'dce_dbmodels.model.get_new_model.<locals>.PackSize'> # key:table, value: columns id PackSize.id # key: column name, value: "table.column_name. products PackSize.products size PackSize.size pack_type PackSize.pack_type UoM_length PackSize.UoM_length length PackSize.length width PackSize.width height PackSize.height UoM_volume PackSize.UoM_volume volume PackSize.volume UoM_weight PackSize.UoM_weight weight PackSize.weight LuM <class 'dce_dbmodels.model.get_new_model.<locals>.LuM'> name LuM.name pack_sizes LuM.pack_sizes ...

(unless of course if I'm abusing the api?)

Alexander
19.12.2017
13:29:37
I think we can add db.entities and entity._adict_ to documentation. But the suggested text needs to be rephrased. db.entities is a dict and not a list, it maps entity name to entity class. entity._adict_ maps attribute name to attribute descriptor object. It has nothing to do with table or column names. It is often more convenient to use entity._attrs_ list instead of entity._adict_ mapping. entity._attrs_ is a list of all attribute descriptors listed in a declaration order. In order to get table name someone can access entity._table_ property, its value can be a string or a tuple if table name includes schema name as well. The column name can be retrieved from attribute descriptor using attr.column property. Some attributes map to multiple columns, so it is better to use attr.columns property, which returns list of all columns for given attribute. With this changes, the example code will look as: for entity_name, entity_class in db.entities.items(): print(entity_name, entity_class._table_) for attr in entity_class._attrs_: print(attr.name, attr.columns)

Bjorn
19.12.2017
13:33:54
Hep - back again ? Could we perhaps add the following to the documentation under mapping customization (https://docs.ponyorm.com/entities.html#mapping-customization), please: ... To list the declared entities in a database model, db.entities lists all tables and their entityMeta objects as a dictionary. To view the individual tables' column names programmatically use table._adict_ example: >>> for table,cols in m.entities.items(): ... print(table, cols) ... for colname, table_col_name in cols._adict_.items(): ... print(colname, table_col_name) ... PackSize <class 'dce_dbmodels.model.get_new_model.<locals>.PackSize'> # key:table, value: columns id PackSize.id # key: column name, value: "table.column_name. products PackSize.products size PackSize.size pack_type PackSize.pack_type UoM_length PackSize.UoM_length length PackSize.length width PackSize.width height PackSize.height UoM_volume PackSize.UoM_volume volume PackSize.volume UoM_weight PackSize.UoM_weight weight PackSize.weight LuM <class 'dce_dbmodels.model.get_new_model.<locals>.LuM'> name LuM.name pack_sizes LuM.pack_sizes ...
The revised example code gives me: >>> for entity_name, entity_class in m.entities.items(): ... print(entity_name, entity_class._table_) ... for attr in entity_class._attrs_: ... print(attr.name, attr.columns) ... PackSize None id [] products [] size [] pack_type [] UoM_length [] length [] width [] height [] UoM_volume [] volume [] UoM_weight [] weight [] LuM None name [] pack_sizes [] VuM None Where I think the dot notation is actual helpful as it tells us what table it belongs to.

Alexander
19.12.2017
13:34:38
I think it is because you do it before calling generate_mapping

Bjorn
19.12.2017
13:34:50
Qoute: [flat is better than nested]

(Y)

Yes

Alexander
19.12.2017
13:35:13
The attributes get column names inside generate_mapping call

Bjorn
19.12.2017
13:37:40
This prompts two use-cases: 1) Where I'm merely interrogating the model (before mapping)

2) Where I'm interrogating the mapped model.

Google
Bjorn
19.12.2017
13:38:50
This must be common when attempting to migrate databases?

Alexander
19.12.2017
13:44:40
I think, most often a deveoper expects that the migration tool can make changes without manual interrogating of attribute->column mapping. But maybe sometimes it is useful

Bjorn
19.12.2017
13:50:04
? I have a web app, where the user needs to uploads data into a database for subsequent analytics. Before uploading we need to map their data (often large csv and txt files) into the database schema. We find their headers from their flat files and list them, and then list all of our tables and headers. Then we ask the user to draw lines from their upload to our schema. Web-widget: https://gojs.net/latest/samples/treeMapper.html However as our schema evolves it is nice to expand the database by programmatically reading the orm.Database model.

Alexander
19.12.2017
13:55:38
It in an interesting case. Maybe it is easier to load array of values loaded from csv columns into a single JSON column? Then it will be not necessary to modify schema for each specific task

Bjorn
19.12.2017
14:00:05
We already do that during staging of the data using pickle/buffer. However as we need to run property specific analytics across the data the schema evolves about every 2.5 months. The life cycle is approximately: V.0.0.1: tables x columns x rows x fields. V.0.1.0: Some columns have field xyz in common: Let's refactor to V.0.0.1 + table xyz. V.0.1.1: Some fields have something in common: Let's refactor to V.0.1.0 + some field identifier ... and so it evolves... Fun Fact: Scientist != Engineers.

Admin


Bjorn
19.12.2017
14:01:56
So I force the dear "Data Scientists" to map their chaotic data to a model, and leave a Research Software Engineer to look for ways to normalise their chaotic schemas. It's a bit like the game "whack-a-mole". ?

Alexander
19.12.2017
14:02:45
What database do you use?

Bjorn
19.12.2017
14:02:58
postgres.

450 Gb ram. 32 Tb disk

48 cores

Alexander
19.12.2017
14:03:53
And what tool you use to do analytics? Do you use Pony for that?

Bjorn
19.12.2017
14:03:57
Typical dataset: 7-10 zip folders with 40-90 Gb of raw text.

Yes.

Nobody accesses any of our servers.

They commit patches to a version control system and dataworkers (bots) then start processing the data.

They only way that their code is released into production is if it passes a code review and test dataset.

Any SQL get's thrown out by the code review.

only pony models are allowed through.

If the test suite runs too slow with their data (5% sample), they normally start optimizing prior to releasing it to the workers.

Google
Alexander
19.12.2017
14:11:42
If you use JSON column, you can access individual properties to do analytic queries, I think. The example below is for SQLite, but it should work with PostgreSQL as well from pony.orm import * db = Database('sqlite', ':memory:') class X(db.Entity): id = PrimaryKey(int, auto=True) data = Required(Json) set_sql_debug(True) db.generate_mapping(create_tables=True) with db_session: x1 = X(id=1, data={'foo':10, 'bar': 20}) x2 = X(id=2, data={'foo':30, 'bar': 40}) with db_session: query = select(max(int(x.data['foo'])) for x in X if x.data['bar'] > 30) print(query.get()) Maybe we need to add explicit casting to some specific types like float to make it fully universal

Matthew
19.12.2017
16:07:46
I seem to be constructing a query incorrectly, what am I missing? it's the select / count subquery that breaks it: https://gist.github.com/anonymous/e5f394457e3cb3898d08b3d40feae239

Alexander
19.12.2017
16:09:11
What error do you get?

Matthew
19.12.2017
16:09:29
AttributeError: 'QuerySetMonad' object has no attribute 'getsql'

Alexander
19.12.2017
16:11:50
What you expect to get as a query result?

Matthew
19.12.2017
16:12:16
for the sub query: something like (1, 20), (3, 13) etc

so review rating and then how many reviews there are with that rating

Alexander
19.12.2017
16:12:44
And what should be the result of a main query?

Matthew
19.12.2017
16:13:33
product, total number of reviews for that product, subquery results

the query works fine without the subquery

Alexander
19.12.2017
16:13:55
So it should be like nested tables?

Matthew
19.12.2017
16:14:34
It doesn't need to be nested

do you think trying to be nested is breaking it?

I know the possible review scores in advance, so maybe that can be used in the query?

potential scores are [1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0]

I have managed to write it with a query for getting scores for each product

Страница 51 из 75