Skip to content

Request fragments #5596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 26 commits into
base: main
Choose a base branch
from
Draft

Request fragments #5596

wants to merge 26 commits into from

Conversation

dureuill
Copy link
Contributor

@dureuill dureuill commented May 27, 2025

Usage page: https://www.notion.so/meilisearch/Search-in-images-usage-1c14b06b651f80c1bf9effe56dbeef54

API side

  • Add searchRequestFragments embedder setting
  • Add indexingRequestFragments embedder setting
  • Add multimodal experimental feature
  • Add media search parameter

Implementation side

  • Rename ValueTemplate to InjectableTemplate
  • Add JsonTemplate: new type that renders all of the strings of a JSON template as liquid templates using a document or a search query
  • Support searchRequestFragments and indexingRequestFragments in REST embedder, such that their presence modifies the way documents are embedded
  • Add the concept of input-- an embeddable input-- and extractors: a way to turn a document into input
  • Modify the API of Embedder:
    • embed_search accepts q and media
    • embed_index accepts a list of inputs, does not directly return embeddings, but rather an object that encapsulates in-flight requests.
  • Modify DB format to cater for extractors
    • extractor_id -> (embedder_id, label, kind) where kind = "userProvided" | "fragment" | "documentTemplate"
    • embedder_id -> extractor_ids
    • (extractor_id, vector_id) -> document_ids where vector_id is the index in the arroy writers
  • Upgrade support
    • Dumpless upgrade
    • Dump
    • meilitool + export
  • Stats support
  • New indexer support
    • DocumentChangeExtractorDiff to find out which vectors to add, remove or regenerate depending on a DocumentChange
  • Old settings indexer support (+dump import support)
  • New settings indexer support
    • ExtractorChangeDocumentDiff to find out which vectors to add, remove or regenerate for a document depending on a SettingsChange
  • Support media in search
  • Support media in multisearch
  • Analytics
  • Tests

@Kerollmops Kerollmops marked this pull request as draft May 28, 2025 07:34
@dureuill dureuill force-pushed the request-fragments branch from 9dfe57c to 1ed4d9d Compare June 17, 2025 09:08
@dureuill dureuill requested review from Kerollmops and irevoire June 23, 2025 15:24
@dureuill dureuill force-pushed the request-fragments branch 3 times, most recently from 759bac4 to 15a7679 Compare June 25, 2025 09:07
@dureuill dureuill added this to the v1.16.0 milestone Jun 25, 2025
@dureuill dureuill added the db change A database was modified label Jun 25, 2025
Copy link

Hello, I'm a bot 🤖

You are receiving this message because you declared that this PR make changes to the Meilisearch database.
Depending on the nature of the change, additional actions might be required on your part. The following sections detail the additional actions depending on the nature of the change, please copy the relevant section in the description of your PR, and make sure to perform the required actions.

Thank you for contributing to Meilisearch ❤️

This PR makes forward-compatible changes

Forward-compatible changes are changes to the database such that databases created in an older version of Meilisearch are still valid in the new version of Meilisearch. They usually represent additive changes, like adding a new optional attribute or setting.

  • Detail the change to the DB format and why they are forward compatible
  • Forward-compatibility: A database created before this PR and using the features touched by this PR was able to be opened by a Meilisearch produced by the code of this PR.

This PR makes breaking changes

Breaking changes are changes to the database such that databases created in an older version of Meilisearch need changes to remain valid in the new version of Meilisearch. This typically happens when the way to store the data changed (change of database, new required key, etc). This can also happen due to breaking changes in the API of an experimental feature. ⚠️ This kind of changes are more difficult to achieve safely, so proceed with caution and test dumpless upgrade right before merging the PR.

  • Detail the changes to the DB format,
    • which are compatible, and why
    • which are not compatible, why, and how they will be fixed up in the upgrade
  • /!\ Ensure all the read operations still work!
    • If the change happened in milli, you may need to check the version of the database before doing any read operation
    • If the change happened in the index-scheduler, make sure the new code can immediately read the old database
    • If the change happened in the meilisearch-auth database, reach out to the team; we don't know yet how to handle these changes
  • Write the code to go from the old database to the new one
    • If the change happened in milli, the upgrade function should be written and called here
    • If the change happened in the index-scheduler, we've never done it yet, but the right place to do it should be here
  • Write an integration test here ensuring you can read the old database, upgrade to the new database, and read the new database as expected

@dureuill dureuill force-pushed the request-fragments branch from d034946 to 1fc6890 Compare June 29, 2025 22:15
@dureuill dureuill force-pushed the request-fragments branch from 1fc6890 to f5c23f8 Compare June 30, 2025 07:54
doc_alloc: &'doc Bump,
) -> Result<Value, Error> {
let document = ParseableDocument::new(document, doc_alloc);
let v: Vec<u32> = vec![];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will create a fields field, but it will be empty!

  • consider removing the fields field or to populate it


impl<'doc> Extractor<'doc> for RequestFragmentExtractor<'doc> {
type DocumentMetadata = ();
type Input = Value;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider renaming Input?

self.embed_chunks(unused_vectors_distribution)
}

pub fn drain(mut self, unused_vectors_distribution: &C::ErrorMetadata) -> Result<C> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding a panicking, albeit it might be a trade-off we don't want to make 🤔

error,
self.embedder_name,
unused_vectors_distribution,
&self.metadata,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider passing metadata as owned here and clearing inputs in this case

@@ -389,6 +506,9 @@ impl ArroyWrapper {
let reader = reader?;
let mut searcher = reader.nns(limit);
if let Some(filter) = filter {
if reader.item_ids().is_disjoint(filter) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kerollmops consider doing this at the arroy level

Copy link
Contributor Author

@dureuill dureuill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catch and return an error when searchFragments is empty

@dureuill dureuill force-pushed the request-fragments branch from bd0c135 to 1c07b17 Compare July 1, 2025 08:09
@dureuill dureuill force-pushed the request-fragments branch from 1c07b17 to 7db5495 Compare July 1, 2025 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
db change A database was modified
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant