Skip to content

Commit bce0109

Browse files
author
Montana Low
committed
syntax
1 parent 54201df commit bce0109

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

pgml-docs/docs/blog/postgres-full-text-search-is-awesome.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,11 +32,11 @@ If you actually want to meaningfully improve search results, you generally need
3232

3333
> _If you want to improve your search results, don't rely on expensive O(n*m) word frequency statistics. Get new sources of data instead. It's the relational nature of relevance that underpins why a relational database forms the ideal search engine._
3434
35-
Postgres made the right call to avoid the costs required to compute Inverse Document Frequency in their search indexing, given its meager benefit. Instead, it offers the most feature-complete relational data platform. [Elasticsearch will tell you](https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html) you can't join data in a **_naively_** distributed system at read time, because it is prohibitively expensive. Instead you'll have to join the data eagerly at indexing time, which is even more prohibitively expensive. That's good for their business since you're the one paying for it, and it will scale until you're bankrupt.
35+
Postgres made the right call to avoid the costs required to compute Inverse Document Frequency in their search indexing, given its meager benefit. Instead, it offers the most feature-complete relational data platform. [Elasticsearch will tell you](https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html) you can't join data in a **_naively distributed system_** at read time, because it is prohibitively expensive. Instead you'll have to join the data eagerly at indexing time, which is even more prohibitively expensive. That's good for their business since you're the one paying for it, and it will scale until you're bankrupt.
3636

3737
What you really should do, is leave the data normalized inside Postgres, which will allow you to join additional, related data at query time. It will take multiple orders of magnitude less compute to index and search a normalized corpus, meaning you'll have a lot longer (potentially forever) before you need to distribute your workload, and then maybe you can do that intelligently instead of naively. Instead of spending your time building and maintaining pipelines to shuffle updates between systems, you can work on new sources of data to really improve relevance.
3838

39-
With PostgresML, you can now skip straight to full on machine learning when you have the related data. You can load your feature store into the same database as your search corpus. Each feature can live in its own independent table, with its own update cadence, rather than having to reindex and denormalize entire documents back to ElasticSearch, or worse, large portions of the entire corpus, when a single thing changes.
39+
With PostgresML, you can now skip straight to full on machine learning when you have the related data. You can load your feature store into the same database as your search corpus. Each data source can live in its own independent table, with its own update cadence, rather than having to reindex and denormalize entire documents back to ElasticSearch, or worse, large portions of the entire corpus, when a single thing changes.
4040

4141
With a single SQL query, you can do multiple passes of re-ranking, pruning and personalization to refine a search relevance score.
4242

0 commit comments

Comments
 (0)