Skip to content

Commit 4cf4a1d

Browse files
authored
Editors pass #2 (#279)
* Editors pass #2 * typo
1 parent 75c1819 commit 4cf4a1d

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

pgml-docs/docs/blog/data-is-living-and-relational.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Data is Living and Relational
2525
</div>
2626

2727

28-
A common problem with data science and machine learning tutorials is the published and studied data sets are often nothing like what you’ll find in industry.
28+
A common problem with data science and machine learning tutorials is the published and studied datasets are often nothing like what you’ll find in industry.
2929

3030
<center markdown>
3131

@@ -42,11 +42,11 @@ They are:
4242
- usually denormalized into a single tabular form, e.g. a CSV file
4343
- often relatively tiny to medium amounts of data, not big data
4444
- always static, with new rows never added
45-
- sometimes pre-treated to clean or simplify the data
45+
- sometimes pretreated to clean or simplify the data
4646

47-
As Data Science transitions from academia into industry, these norms influence organizations and applications. Professional Data Scientists need teams of Data Engineers to move data from production databases into data warehouses and denormalized schemas which are more familiar, and ideally easier to work with. Large offline batch jobs are a typical integration point between Data Scientists and their Engineering counterparts, who primarily deal with online systems. As the systems grow more complex, additional specialized Machine Learning Engineers are required to optimize performance and scalability bottlenecks between databases, warehouses, models and applications.
47+
As Data Science transitions from academia into industry, these norms influence organizations and applications. Professional Data Scientists need teams of Data Engineers to move data from production databases into data warehouses and denormalized schemas, which are more familiar and ideally easier to work with. Large offline batch jobs are a typical integration point between Data Scientists and their Engineering counterparts, who primarily deal with online systems. As the systems grow more complex, additional specialized Machine Learning Engineers are required to optimize performance and scalability bottlenecks between databases, warehouses, models and applications.
4848

49-
This eventually leads to expensive maintenance and to terminal complexity: new improvements to the system become exponentially more difficult. Ultimately, previously working models start getting replaced by simpler solutions, so the business can continue to iterate. This is not a new phenomenon, see the fate of the Netflix Prize.
49+
This eventually leads to expensive maintenance and terminal complexity: new improvements to the system become exponentially more difficult. Ultimately, previously working models start getting replaced by simpler solutions, so the business can continue to iterate. This is not a new phenomenon, see the fate of the Netflix Prize.
5050

5151
Announcing the PostgresML Gym 🎉
5252
-------------------------------
@@ -55,17 +55,17 @@ Instead of starting from the academic perspective that data is dead, PostgresML
5555

5656
![relational data](/images/illustrations/uml.png)
5757

58-
Relationa data:
58+
Relational data:
5959

6060
- is normalized for real time performance and correctness considerations
61-
- has new rows added and updated constantly, which form the incomplete features for a prediction
61+
- has new rows added and updated constantly, which form incomplete features for a prediction
6262

63-
Meanwhile, denormalized data sets:
63+
Meanwhile, denormalized datasets:
6464

65-
- may grow to billions of rows, where single updates multiple into mass rewrites
66-
- often span multiple iterations of the schema, where software bugs leave behind outliers
65+
- may grow to billions of rows, where single updates multiply into mass rewrites
66+
- often span multiple iterations of the schema, with software bugs leaving behind outliers
6767

68-
We think it’s worth attempting to move the machine learning process and modern data architectures beyond the status quo. To that end, we’re building the PostgresML Gym, a free offering, to provide a test bed for real world ML experimentation in a Postgres database. Your personal Gym will include the PostgresML dashboard, several tutorial notebooks to get you started, and access to your own personal PostgreSQL database, supercharged with our machine learning extension.
68+
We think it’s worth attempting to move the machine learning process and modern data architectures beyond the status quo. To that end, we’re building the PostgresML Gym, a free offering, to provide a test bed for real world ML experimentation, in a Postgres database. Your personal Gym will include the PostgresML dashboard, several tutorial notebooks to get you started, and access to your own personal PostgreSQL database, supercharged with our machine learning extension.
6969

7070
<center>
7171
<video autoplay loop muted width="90%" style="box-shadow: 0 0 8px #000;">

0 commit comments

Comments
 (0)