Ayrat
так что противоречит
Ayrat
как ты достанешь данные из объекта в ооп?
Bonart
так что противоречит
Нет-нет-нет Дэвид Блейн :) Класс - это возможность сочетать непустые данные с нетривиальным поведением, но не обязанностью
Ayrat
джава соблюдает теорию и не даёт делать автосвойств заставляя программистов делать ТРУЪ ооп
Bonart
Ага - тривиальное поведение
Ayrat
тем не менее, это поведение. Есть объект, есть его поведение (пусть и тривиальное, но поведение).
ты общаешься с объектом через достаньДанные и положиДанные, а не напрямую тягаешь структуры туда-сюда.
システム
Hi guys,
Sorry for long post and some necroposting, but unfortunately I read chat only from time to time.
@Dolfik :
Firstly, looks like you should stay with your RDBMS.
But I would recommend hiring a data platform developer because you are describing horrible things.
Secondly, most of the problems you do have because of the nonoptimal design of data storages and data access layer.
69 joins could be okay in stored procedures which are preparing data for some reports.
For instance, recent example - 200k LOC SP in SAS P&L report.
And >100 fields could be okay in systems like MS Dynamics AX or MS Dynamics Nav but not in your case.
In most systems optimal organization would be the following:
a) use only stored procedures or their analog in your RDBMS for fetching data in appropriate format;
b) physical storage of data will differ from returned data presentation;
c) do appropriate normalization/ denormalization and physical layer optimization according to performance traces;
d) this SPs will be used by your app DAL with data prepared for usage;
e) modify data only by the same kind stored procedures, even maintenance operation should be done via service SPs;
f) all data modification should write detailed logs - who, what and when (use date and time with offset information, also remember about collations);
g) quite often a full snapshot of changed record could be used with log purpose;
h) in the high loaded system with required low latency in dozens of milliseconds use async write;
i) Consider transactional write of logs (i.e., in MS SQL Server it could be Service Broker);
j) For search via changes and data you could use something like ELK or embed full-text search (if you have MS SQL Server);
and data into this full-text search engine indexes you could write from logs queue (if you have snapshots there);
in case of any changes in addition to search query change you could either change only newly added data or also alter old data as well;
search queries could be stored as part of visualization engine (i.e., Kibana) or within RDBMS itself;
k) one more time - performance traces is a mandatory thing;
Example:
online retail system
>50k/sec changes of users' baskets
>200k/sec users' baskets reading requests
these operations are logical and change a lot of things within the system;
real-time calculation of products stocks based on changes;
these specific changes logs volume - around 1TB per month
and many other requests to this specific server (system has much more components);
MS SQL Server in VM, eight dedicated vCPU core and 32GB RAM, average CPU usage 15%-20%
if you want to discuss more questions - do not hesitate to write me, I'll try to respond ASAP;
Ayrat :
About scale-up - there could be even machines with >2TB RAM and dozens of vCPUs.
I.e., PDW, Oracle Exadata, IBM Netezza or IBM Z series mainframes.
But it makes sense only for MPP systems with enormous data warehouses and the significant amount of reports.
And hundreds or thousands of shards it is also okay but for the particular case.
Speaking about Kafka - in most cases, it scales almost linearly.
But as always you should consider how you would add and remove Kafka nodes dynamically and how you would spread it across your cluster of VMs.
Typical proper Kafka usage - some data loss is affordable, and you need to process >1Gbps of data.
Also, pay attention, if you have huge events messages, dozens of MBs, there could be some problems.
システム
@gsomix and others who took a part in discussion
I feel sorry how you feel within the industry.
But, unfortunately, as always there are few moments, and I would like to share my experience (>10 years in the industry, >18 years in development) and thoughts:
1) code does not matter, and the result does matter;
2) business does pay only in case if you solve business problems;
3) if you want to develop community - there should be the differentiator, some unique idea which will help to unite people;
better Java or better C# is not an idea for building community (but this could be good for selling to business, see below);
4) enterprise could also be exciting, but "all companies are equal in there fail, and unique in success" (quotation from some management book);
5) sharing code back to OSS - there are a lot of legal stuff, i.e., in consulting business code is a property of the customer, you cannot write a library in your work time and post it to GitHub;
also, in the USA there is another common thing - patents and licenses on the ideas and methodologies used in the code, and you need to take them into account;
and last one here - most of the code in OSS is paid, i.e., Linux kernel development;
6) in most cases big businesses are publishing only a little part of their work, only things which are not "know-how";
but even big business sometimes cannot make a language widely used, i.e., D was developed by Alexandrescu as R&D project, and its current state is, I would say, suspicious;
7) selling of language - there two parts here.
First - selling to developers, you will need some cool features (when my developers saw how WSDL type provider helped me on one integration/migration project they just fell in love with it) which increase personal performance (could be different metrics here).
And you should keep in mind that this performance means LOC count only for small amount of people.
Example: to my mind, posts here in chat about UI on code is terrible, because such solution would terribly decrease developers' performance on a project where there are millions LOC in WPF XAML (just imagine how much it would be in code) and I totally agree with Vasiliy (sorry, don't remember exact nickname).
Another example: I am not sure you know Perl, but it was widespread due to incredibly fast result delivery.
Second - selling to business, you should show how your proposed solution could help business to solve business problems.
For example: increase RoR, revenue, and profit; decrease risks, development time, maintenance costs.
And here "a bit better C#" is excellent because it means part of the .NET ecosystem and clear understanding within legal department whom you need apply claims.
One legal counsel told me that if we bought a ready product, he would understand who is responsible, but if we would take OSS and sell to customers who are accountable for it and how we could provide guarantees?
8) my personal experience:
a) I almost always use F# for really quick prototyping and scripting - viva la type providers with SRTPs, and I am the one who miss type classes, HKT, macros support, normal metaprogramming, and other fun stuff which decrease engineering time and code;
b) I use it within a lot of data processing flows; it is easy to maintain, and for regular data platform engineer it is just one more language, more comfortable than Scala. I.e., in one project there is SQL CLR and SQL SSIS assemblies written in F#;
c) different kind of data science - F# could only be used by engineers, not data scientists because for now it requires development skills and library support (IronPython through dynamic vs. Python type provider);
d) but I will not set F# as mandatory for my engineers, though I can, because there are not enough good killer features for developers and business yet (but I recommend it to all senior level engineers);
e) a programming language is just an instrument, you need to know a lot of them and use appropriate for each task;
システム
Long story short - there are:
1) a gap between selling to F# consumers (developers) and selling to consumers of F# consumers' results (business);
2) audience mistargeting;
3) misunderstanding of business models which use OSS;
4) previous right time is already missed, so let's create the next one:)
In case of any questions - feel free to ask me. I am always open to discussion, but, unfortunately, not always have enough time.
Ayrat
wtf!
システム
Sorry:) these are just few thoughts about few last topics in chat and I do not know ways to use cut-off in telegram
システム
I hope it will be usefull to mentioned guys
x
システム
Konnichiwa, Alexey, unfortunately I do not have japanese keyboard right now, and as you, I think, alredy noticed I'm not a nihonjin:)
Ayrat
yeah, about scaling. There are actually powerful server machines. But scaling vertically never will be linear in term of cost/performance gained. So It's up to particular business to decide where to stop vertical scaling.
That's why I prefer linear scaling solutions, and kafka powered by event-driven architecture is ideal for this task.
Ayrat
Its not ideal for particular businesses:)
システム
Ayrat you could write in Russian, I understand, but cannot write on English keyboard
Ayrat
Да чтоб тебя
Ayrat
:D
x
😂
Vasily
ШПИЕН
x
а мы тут такие все
システム
😂
Vasily
РУССКУЮ РАСКЛАДКУ ВЫУЧИТЬ НЕ МОЖЕТ
x
вали транслитом
Hog
ЗАПАДЛО!
Ayrat
Короче, я вот работал с БД 2 ТБ инмемори 128 ядер + такая же реплика на чтение/фейловер.
И короче, оно стоило конских денег. КОНСКИХ, КАРЛ
Ayrat
Не говоря о том что это был неприкасаемый монолит из говна и палок, но без палок
Nikolay
Hi guys,
Sorry for long post and some necroposting, but unfortunately I read chat only from time to time.
@Dolfik :
Firstly, looks like you should stay with your RDBMS.
But I would recommend hiring a data platform developer because you are describing horrible things.
Secondly, most of the problems you do have because of the nonoptimal design of data storages and data access layer.
69 joins could be okay in stored procedures which are preparing data for some reports.
For instance, recent example - 200k LOC SP in SAS P&L report.
And >100 fields could be okay in systems like MS Dynamics AX or MS Dynamics Nav but not in your case.
In most systems optimal organization would be the following:
a) use only stored procedures or their analog in your RDBMS for fetching data in appropriate format;
b) physical storage of data will differ from returned data presentation;
c) do appropriate normalization/ denormalization and physical layer optimization according to performance traces;
d) this SPs will be used by your app DAL with data prepared for usage;
e) modify data only by the same kind stored procedures, even maintenance operation should be done via service SPs;
f) all data modification should write detailed logs - who, what and when (use date and time with offset information, also remember about collations);
g) quite often a full snapshot of changed record could be used with log purpose;
h) in the high loaded system with required low latency in dozens of milliseconds use async write;
i) Consider transactional write of logs (i.e., in MS SQL Server it could be Service Broker);
j) For search via changes and data you could use something like ELK or embed full-text search (if you have MS SQL Server);
and data into this full-text search engine indexes you could write from logs queue (if you have snapshots there);
in case of any changes in addition to search query change you could either change only newly added data or also alter old data as well;
search queries could be stored as part of visualization engine (i.e., Kibana) or within RDBMS itself;
k) one more time - performance traces is a mandatory thing;
Example:
online retail system
>50k/sec changes of users' baskets
>200k/sec users' baskets reading requests
these operations are logical and change a lot of things within the system;
real-time calculation of products stocks based on changes;
these specific changes logs volume - around 1TB per month
and many other requests to this specific server (system has much more components);
MS SQL Server in VM, eight dedicated vCPU core and 32GB RAM, average CPU usage 15%-20%
if you want to discuss more questions - do not hesitate to write me, I'll try to respond ASAP;
Ayrat :
About scale-up - there could be even machines with >2TB RAM and dozens of vCPUs.
I.e., PDW, Oracle Exadata, IBM Netezza or IBM Z series mainframes.
But it makes sense only for MPP systems with enormous data warehouses and the significant amount of reports.
And hundreds or thousands of shards it is also okay but for the particular case.
Speaking about Kafka - in most cases, it scales almost linearly.
But as always you should consider how you would add and remove Kafka nodes dynamically and how you would spread it across your cluster of VMs.
Typical proper Kafka usage - some data loss is affordable, and you need to process >1Gbps of data.
Also, pay attention, if you have huge events messages, dozens of MBs, there could be some problems.
Wow, thanks
システム
Ayrat
システム
Ayrat
документация только юзерам (т.е. тем кто поставляет евенты), тем кто потребляет - хер лысый
Nikolay
Hi guys,
Sorry for long post and some necroposting, but unfortunately I read chat only from time to time.
@Dolfik :
Firstly, looks like you should stay with your RDBMS.
But I would recommend hiring a data platform developer because you are describing horrible things.
Secondly, most of the problems you do have because of the nonoptimal design of data storages and data access layer.
69 joins could be okay in stored procedures which are preparing data for some reports.
For instance, recent example - 200k LOC SP in SAS P&L report.
And >100 fields could be okay in systems like MS Dynamics AX or MS Dynamics Nav but not in your case.
In most systems optimal organization would be the following:
a) use only stored procedures or their analog in your RDBMS for fetching data in appropriate format;
b) physical storage of data will differ from returned data presentation;
c) do appropriate normalization/ denormalization and physical layer optimization according to performance traces;
d) this SPs will be used by your app DAL with data prepared for usage;
e) modify data only by the same kind stored procedures, even maintenance operation should be done via service SPs;
f) all data modification should write detailed logs - who, what and when (use date and time with offset information, also remember about collations);
g) quite often a full snapshot of changed record could be used with log purpose;
h) in the high loaded system with required low latency in dozens of milliseconds use async write;
i) Consider transactional write of logs (i.e., in MS SQL Server it could be Service Broker);
j) For search via changes and data you could use something like ELK or embed full-text search (if you have MS SQL Server);
and data into this full-text search engine indexes you could write from logs queue (if you have snapshots there);
in case of any changes in addition to search query change you could either change only newly added data or also alter old data as well;
search queries could be stored as part of visualization engine (i.e., Kibana) or within RDBMS itself;
k) one more time - performance traces is a mandatory thing;
Example:
online retail system
>50k/sec changes of users' baskets
>200k/sec users' baskets reading requests
these operations are logical and change a lot of things within the system;
real-time calculation of products stocks based on changes;
these specific changes logs volume - around 1TB per month
and many other requests to this specific server (system has much more components);
MS SQL Server in VM, eight dedicated vCPU core and 32GB RAM, average CPU usage 15%-20%
if you want to discuss more questions - do not hesitate to write me, I'll try to respond ASAP;
Ayrat :
About scale-up - there could be even machines with >2TB RAM and dozens of vCPUs.
I.e., PDW, Oracle Exadata, IBM Netezza or IBM Z series mainframes.
But it makes sense only for MPP systems with enormous data warehouses and the significant amount of reports.
And hundreds or thousands of shards it is also okay but for the particular case.
Speaking about Kafka - in most cases, it scales almost linearly.
But as always you should consider how you would add and remove Kafka nodes dynamically and how you would spread it across your cluster of VMs.
Typical proper Kafka usage - some data loss is affordable, and you need to process >1Gbps of data.
Also, pay attention, if you have huge events messages, dozens of MBs, there could be some problems.
Ну почему у нас так получилось:
Начинал базу данных проектировать человек, который не очень хорошо с этим знаком, проектировали базу данных сначала в MS SQL Server, затем конвертировали в Oracle. Изначально было всё очень просто 5-10 джоинов на запрос, потом начали меняться требования, и база стала пухнуть, изначально было 100 таблиц, сейчас 780, соответственно всё это нужно было делать так, чтобы не сломать существующую логику
Ayrat
и с этого начинается долгая боль
Vasily
Прохладная история
Vasily
Я видел фильм, который начинался так же (с)
Nikolay
И получился такой франкенштейн
Vasily
Мне кажется, самый фейл случился, когда выбрали структуру table per class
Nikolay
Vasily
Хотя у вас реально документ и куча свойств
Vasily
EF тут ни при чем
Nikolay
Ну EF нагенерил, и получилось table per class
Ayrat
Прохладная история
А ещё там был класс констант на пару десятков тысяч записей.
Все способы работы с этой очередью заключались чтобы битово сложить (как флаги) какие-то рандомные константы и подать на вход одному методу из либы
Vasily
При работе с бд правило простое
システム
Nikolay
А вообще да, у нас документ, и куча свойств
Vasily
ТО, что изменяется редко - это таблицы и колонки
Анна
Какой у нас многоязычный чат теперь
x
😂😂👌
Vasily
То, что часто - это строки в таблицах
Vasily
Про подобной композиции 4 миллиона джоинов не нужны
Vasily
Т.к. документ собирается из свойств
Vasily
А не хранится монолитным куском говна
システム
Nikolay
Проект
Объект
Титул
Вид документации
Часть
Версия изменений
Документ 1
Документ 2
(содержит данные из документа 1)
Документ 3
(содержит данные из документов 1 и 2)
Nikolay
Вот такая структура
Nikolay
Это поверхностно
Nikolay
Ещё в документах куча связей
Vasily
Ну как бы логично хранить мух отдельно, а котлеты отдельно
x
хернуть джейсон в блоб
x
пусть лежит
Vasily
Ну я бы не стал
Vasily
В данном случае
Nikolay
Тут связи очень важны
x
ну ладно, в xml
Vasily
В json можно хранить конфигурацию связей
Nikolay
x
скюл сервер умеет в xml
Nikolay
Vasily
Это как, простите?
Vasily
С точки зрения доменной области выглядит уже криво