Skip to content

Introduce a new route to export indexes #5670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

Kerollmops
Copy link
Member

@Kerollmops Kerollmops commented Jun 12, 2025

This PR introduces a new route to export/transfer indexes and documents to another Meilisearch instance. This PR is linked to this one on the Mini-Dashboard meilisearch/mini-dashboard#622.

Fixes #5713

To Do

  • Implement the core uploading
  • Add a parameter to define the payload size
  • when indexes is not set default to "*"
  • Support settings to export patterns
    • Remove the skip embeddings parameter
    • Add a filter setting
    • Support JSON filters (Value not only a string)
  • Create a PRD and usage page
  • Disable it via an experimental feature
  • Add the overrideSettings parameter
  • Count exported documents by index name, not pattern
  • Read and fix db changes advices
  • Talk to the team about using it for the replication (to retrieve late documents). Answer: nope.

This PR makes forward-compatible changes

Forward-compatible changes are changes to the database such that databases created in an older version of Meilisearch are still valid in the new version of Meilisearch. They usually represent additive changes, like adding a new optional attribute or setting.

  • Detail the change to the DB format and why they are forward compatible.
    We introduce a new Export task in the task queue. There are no settings, experimental flags, or anything but a new task type.
  • Forward compatibility: A Meilisearch produced by this PR's code could open a database created before this PR and using the features touched by this PR.
    I opened a database generated with Meilisearch from the main branch and opened it with a database generated with Meilisearch from this branch. Once opened, I called the export route to export documents to another Meilisearch.

@Kerollmops Kerollmops added this to the v1.16.0 milestone Jun 12, 2025
@Kerollmops Kerollmops added the db change A database was modified label Jun 12, 2025
Copy link

Hello, I'm a bot 🤖

You are receiving this message because you declared that this PR make changes to the Meilisearch database.
Depending on the nature of the change, additional actions might be required on your part. The following sections detail the additional actions depending on the nature of the change, please copy the relevant section in the description of your PR, and make sure to perform the required actions.

Thank you for contributing to Meilisearch ❤️

This PR makes forward-compatible changes

Forward-compatible changes are changes to the database such that databases created in an older version of Meilisearch are still valid in the new version of Meilisearch. They usually represent additive changes, like adding a new optional attribute or setting.

  • Detail the change to the DB format and why they are forward compatible
  • Forward-compatibility: A database created before this PR and using the features touched by this PR was able to be opened by a Meilisearch produced by the code of this PR.

This PR makes breaking changes

Breaking changes are changes to the database such that databases created in an older version of Meilisearch need changes to remain valid in the new version of Meilisearch. This typically happens when the way to store the data changed (change of database, new required key, etc). This can also happen due to breaking changes in the API of an experimental feature. ⚠️ This kind of changes are more difficult to achieve safely, so proceed with caution and test dumpless upgrade right before merging the PR.

  • Detail the changes to the DB format,
    • which are compatible, and why
    • which are not compatible, why, and how they will be fixed up in the upgrade
  • /!\ Ensure all the read operations still work!
    • If the change happened in milli, you may need to check the version of the database before doing any read operation
    • If the change happened in the index-scheduler, make sure the new code can immediately read the old database
    • If the change happened in the meilisearch-auth database, reach out to the team; we don't know yet how to handle these changes
  • Write the code to go from the old database to the new one
    • If the change happened in milli, the upgrade function should be written and called here
    • If the change happened in the index-scheduler, we've never done it yet, but the right place to do it should be here
  • Write an integration test here ensuring you can read the old database, upgrade to the new database, and read the new database as expected

@Kerollmops Kerollmops changed the title Introduce a new route to export documents Introduce a new route to export indexes Jun 12, 2025
@Kerollmops Kerollmops force-pushed the export-and-transfer-route branch 2 times, most recently from a7685fe to 4ce1621 Compare June 16, 2025 12:46
@Kerollmops Kerollmops force-pushed the export-and-transfer-route branch from 3335525 to a743da3 Compare June 25, 2025 13:27
@Kerollmops Kerollmops force-pushed the export-and-transfer-route branch from d21c1fc to 6303121 Compare June 26, 2025 11:57
@Kerollmops Kerollmops force-pushed the export-and-transfer-route branch from 1bb7abb to 0f1dd36 Compare June 26, 2025 16:11
@Kerollmops Kerollmops force-pushed the export-and-transfer-route branch from 445b8e1 to 7219299 Compare June 27, 2025 10:33
@Kerollmops Kerollmops marked this pull request as ready for review June 27, 2025 13:08
@Kerollmops Kerollmops marked this pull request as draft June 27, 2025 14:36
@Kerollmops Kerollmops requested a review from Mubelotix June 30, 2025 10:11
@Kerollmops Kerollmops marked this pull request as ready for review June 30, 2025 11:35
@Kerollmops Kerollmops requested a review from Mubelotix June 30, 2025 16:59
Comment on lines +181 to +182
ExporingTheSettings,
ExporingTheDocuments,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Suggested change
ExporingTheSettings,
ExporingTheDocuments,
ExportingTheSettings,
ExportingTheDocuments,

// 3. we batch the export.
let to_export = self.queue.tasks.get_kind(rtxn, Kind::Export)? & enqueued;
if !to_export.is_empty() {
let task_id = to_export.iter().next().expect("There must be only one export task");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let task_id = to_export.iter().next().expect("There must be only one export task");
let task_id = to_export.iter().next().expect("There must be at least one export task");

tag = "Export",
security(("Bearer" = ["export", "*"])),
responses(
(status = OK, description = "Known nodes are returned", body = Export, content_type = "application/json", example = json!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(status = OK, description = "Known nodes are returned", body = Export, content_type = "application/json", example = json!(
(status = 202, description = "Export successfully enqueued", body = SummarizedTaskView, content_type = "application/json", example = json!(

Comment on lines +74 to +76
// TODO make it experimental?
// index_scheduler.features().check_network("Using the /network route")?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO make it experimental?
// index_scheduler.features().check_network("Using the /network route")?;

Comment on lines 74 to 76
request.send_string("").map_err(into_backoff_error)
})?;
let already_existed = response.status() == 200;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
request.send_string("").map_err(into_backoff_error)
})?;
let already_existed = response.status() == 200;
request.send_bytes(Default::default()).map_err(into_backoff_error)
})?;
let index_exists = response.status() == 200;

Comment on lines 698 to 699
indexes: BTreeMap<IndexUidPattern, DetailsExportIndexSettings>,
indexes: BTreeMap<String, DetailsExportIndexSettings>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, can we keep the index UID pattern, here, please?

Comment on lines 426 to 427
indexes: indexes.iter().map(|(p, s)| (p.clone(), s.clone().into())).collect(),
indexes: BTreeMap::new(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this.

Comment on lines 370 to 371
indexes: indexes.iter().map(|(p, s)| (p.clone(), s.clone().into())).collect(),
indexes: BTreeMap::new(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this.

Comment on lines 296 to 297
indexes: indexes.iter().map(|(p, s)| (p.clone(), s.clone().into())).collect(),
indexes: BTreeMap::new(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And keep this.

@@ -127,7 +157,7 @@ impl IndexScheduler {
progress.update_progress(progress_step);

output.insert(
(*pattern).clone(),
uid.clone(),
Copy link
Member Author

@Kerollmops Kerollmops Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need to convert the uid into an IndexUidPattern, here.

@@ -30,7 +30,7 @@ impl IndexScheduler {
payload_size: Option<&Byte>,
indexes: &BTreeMap<IndexUidPattern, ExportIndexSettings>,
progress: Progress,
) -> Result<BTreeMap<IndexUidPattern, DetailsExportIndexSettings>> {
) -> Result<BTreeMap<String, DetailsExportIndexSettings>> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this IndexUidPattern, please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
db change A database was modified
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Export/Transfert
2 participants