Skip to content

Commit 54201df

Browse files
author
Montana Low
committed
touch up
1 parent 11040f0 commit 54201df

File tree

2 files changed

+7
-3
lines changed

2 files changed

+7
-3
lines changed

pgml-docs/docs/blog/postgres-full-text-search-is-awesome.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ This is good enough for most of the use cases out there, without introducing any
2626
<figcaption>What we were promised</figcaption>
2727
</figure>
2828

29-
Academics have spent decades inventing many algorithms that use orders of magnitude more compute eking out marginally better results that often aren't worth it in practice. Not to generally disparage academia, their work has consistently improved our world, but we need to pay attention to tradeoffs.
29+
Academics have spent decades inventing many algorithms that use orders of magnitude more compute eking out marginally better results that often aren't worth it in practice. Not to generally disparage academia, their work has consistently improved our world, but we need to pay attention to tradeoffs. SQL is another acronym similiarly pioneered in the 1970's. One difference between SQL and BM25 is that everyone has heard of the former before reading this blog post, for good reason.
3030

3131
If you actually want to meaningfully improve search results, you generally need to add new data sources. Relevance is much more often revealed by the way other things **_relate_** to the document, rather than the content of the document itself. Google proved the point 23 years ago. Pagerank doesn't rely on the page content itself as much as it uses metadata from _links to the pages_. We live in a connected world and it's the interplay among things that reveal their relevance, whether that is links for websites, sales for products, shares for social posts... It's the greater context around the document that matters.
3232

@@ -46,18 +46,20 @@ With a single SQL query, you can do multiple passes of re-ranking, pruning and p
4646

4747
These queries can execute in milliseconds on large production-sized corpora with Postgres's multiple indexing strategies. You can do all of this without adding any new infrastructure to your stack.
4848

49-
The following full blown example is for demonstration purposes only. You may want to try the PostgresML Gym to work up to the full understanding.
49+
The following full blown example is for demonstration purposes only of a 3rd generation search engine. You can test it for real in the PostgresML Gym to build up a complete understanding.
5050

5151
<center markdown>
5252
[Try the PostgresML Gym](https://gym.postgresml.org/){ .md-button .md-button--primary }
5353
</center>
5454

5555
```sql title="search.sql" linenums="1"
5656
WITH query AS (
57-
-- construct a query context with data that would typically be
57+
-- construct a query context with arguments that would typically be
5858
-- passed in from the application layer
5959
SELECT
60+
-- a keyword query for "my" OR "search" OR "terms"
6061
tsquery('my | search | terms') AS keywords,
62+
-- a user_id for personalization later on
6163
123456 AS user_id
6264
),
6365
first_pass AS (
@@ -81,6 +83,7 @@ second_pass AS (
8183
-- grab more data from outside the documents
8284
JOIN document_embeddings ON document_embeddings.document_id = documents.id
8385
JOIN user_embeddings ON user_embeddings.user_id = query.user_id
86+
-- of course we be re-ranking
8487
ORDER BY similarity_score DESC
8588
-- further prune results to top performers for more expensive ranking
8689
LIMIT 1000

pgml-docs/docs/stylesheets/extra.css

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@
7373

7474
p.author {
7575
font-size: 0.7rem;
76+
margin-bottom: 2em;
7677
}
7778
p.author img {
7879
border-radius: 50%;

0 commit comments

Comments
 (0)