fixed misleading error message for max_fields_limit_exceeded #5522

CodeMan62 · 2025-04-19T17:02:28Z

Pull Request

Related issue

Fixes #5508

What does this PR do?

Fixes misleading error message for max_fields_limit_exceeded

PR checklist

Please check if your PR fulfills the following requirements:

Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
Have you read the contributing guidelines?
Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

ManyTheFish · 2025-04-24T08:50:33Z

Hello @CodeMan62,
Thank you for this PR.
Can you fix the failing test? In this case, it is quite simple:

run cargo insta test -- --test-threads=4 to run the tests
run cargo insta review to review all the modified snapshots

CodeMan62 · 2025-04-24T12:34:54Z

@ManyTheFish Done

ManyTheFish · 2025-04-24T14:26:10Z

Hello @CodeMan62,
The new error message is better than the previous one. 👍
However, the section Suggestion for the new error message of the related issue suggests a more explicit message with the previous count of unique fields, the number of additional fields, and some notes.

Could you please build a message like this? Don't hesitate to ask if you miss any information!

CodeMan62 · 2025-04-24T14:29:57Z

Hey @ManyTheFish
let me see what i can do here

ManyTheFish · 2025-04-29T08:38:31Z

crates/milli/src/error.rs

+    #[error("A single index cannot have more than 65,535 unique fields across all documents. \n\ note: prior to this batch of tasks, the index already contained 65,535 unique fields.\n\
+        note: other documents from the same batch might have successfully added new unique fields before this one")]


This error message deserves to be less hardcoded, for instance, the 65,535 is not a completely hardcoded value in Meilisearch and corresponds to FieldId::MAX as u32 + 1.

The good way, in my sense, would be to define a MAX_ATTRIBUTES_PER_INDEX in

meilisearch/crates/milli/src/lib.rs

Line 129 in 3f683c4

pub const MAX_POSITION_PER_ATTRIBUTE: u32 = u16::MAX as u32 + 1;

, and use this constant to generate the error.

"... the index already contained 65,535 unique fields ...", here the value should be computed dynamically, there is a scenario where there are like 65500 fields in the index, and a document brings 40 new fields, the perfect message would be "... Adding the document with id {document_id} would add {new_field_count} new unique field to the index ... the index already contained {number_of_field_before_extracting_document} unique fields ..." -> "... Adding the document with id 1 would add 40 new unique field to the index ... the index already contained 65500 unique fields ..."

We are a bit picky, but the goal here is to help the users as much as possible to understand why their specific usage triggers an error. So returning a personalized message is really helpful 😄

CodeMan62 · 2025-05-05T16:36:42Z

Hey @ManyTheFish I am here with a silly question UserError::AttributeLimitReached is used in so many function so I am completely confused here can you just tell me that how I will use this in this function If this is my updated AttributedLimitReached
updated AttributedLimitReached

AttributeLimitReached{
        document_id: String,
        new_field_count: usize,
        no_of_existing_fields: usize,
        max: u32, 
    },

function to use in

fn create_fields_mapping(
    index_field_map: &mut FieldIdMapWithMetadata,
    batch_field_map: &DocumentsBatchIndex,
) -> Result<HashMap<FieldId, FieldId>> {
    batch_field_map
        .iter()
        // we sort by id here to ensure a deterministic mapping of the fields, that preserves
        // the original ordering.
        .sorted_by_key(|(&id, _)| id)
        .map(|(field, name)| match index_field_map.id(name) {
            Some(id) => Ok((*field, id)),
            None => index_field_map
                .insert(name)
                .ok_or(Error::UserError(UserError::AttributeLimitReached))
                .map(|id| (*field, id)),
        })
        .collect()
}

this will make this pr complete faster
thanks 🙏

ManyTheFish · 2025-05-06T07:35:32Z

Hey @CodeMan62,
Indeed, the issue is trickier than it seems. You have several different cases where AttributeLimitReached is returned:

When extracting document fields
When extracting searchable attributes
When extracting faceted attributes
When extracting geo fields
When extracting vector fields

Most of the time, you can access the document ID, but in the example you provided, you can't because it's a complete batch that we are processing. I want you to know that this code will be removed in the future because we are refactoring this part so that you could have an Option<String> for the document_id and fill it when you can.
For the counters, you could refactor a bit of the code to:

Count the initial fields.
Collect all the refused fields in a vector before returning the error

To ease the implementation and see how much information we are missing, I suggest putting everything optional in your structure and filling it out as much as possible. We will remove the useless Options after.

Then, for the max value, don't store it, it's a constant value for now, so I suggest doing what I said before:
#5522 (comment)

CodeMan62 · 2025-05-08T09:50:09Z

Thanks for clarification, will do it asap

CodeMan62 · 2025-05-10T11:29:09Z

@ManyTheFish one think i want to point out is I will not put everything optional I will try to debug as much as I can and try to refactor the code

CodeMan62 · 2025-05-14T08:07:41Z

Hey @ManyTheFish can you please check my latest commit and tell me if the modification I done in primary_key.rs file if it is good to go then just tell me so I can finish the work of remaining

ManyTheFish

@CodeMan62 I made a review, we are close to the end I feel

ManyTheFish · 2025-05-14T08:52:56Z

crates/milli/src/fields_ids_map.rs

@@ -107,12 +107,16 @@ impl crate::documents::FieldIdMapper for FieldsIdsMap {

 pub trait MutFieldIdMapper {
    fn insert(&mut self, name: &str) -> Option<FieldId>;
+    fn len(&mut self) -> i32;


i32? why not a usize or a u32?

Ok i will do usize

ManyTheFish · 2025-05-14T08:58:27Z

crates/milli/src/fields_ids_map/global.rs

@@ -118,9 +118,12 @@ impl<'indexing> GlobalFieldsIdsMap<'indexing> {
        self.local.metadata(id)
    }
 }
-
+#[warn(unconditional_recursion)]


Nope, I suggest finding a workaround to that:

Suggested change

#[warn(unconditional_recursion)]

This warning comes from the fact that you are calling GlobalFieldsIdsMap::len() in GlobalFieldsIdsMap::len()

You may want to call len on the self.local

Ok

EDIT:- Thanks for review i will fix this and also update all the places where AttributeLimitReached used any remaining review ??

No, If the tests are green we can merge :)

How i mean if we have a warning how the CI will go further

How i mean if we have a warning how the CI will go further

Ah sorry, the warning must be fixed and not skipped 😬

crates/milli/src/error.rs

repro.py

CodeMan62 · 2025-05-15T09:13:21Z

@ManyTheFish I request you to stop the CI until PR get's ready and also see my latest commit I found a solution for our warning that we were getting

CodeMan62 · 2025-05-19T08:08:09Z

@ManyTheFish all the tests are passing everything is done let me know if anything left?

irevoire · 2025-05-19T08:19:56Z

Hey @CodeMan62 you must still fix clippy 😬

CodeMan62 · 2025-05-19T09:27:58Z

Hey @CodeMan62 you must still fix clippy 😬

Hey @irevoire clippy fixed this time

CodeMan62 · 2025-05-20T10:06:08Z

@ManyTheFish review if it looks fine then let me just do cargo insta review and then this PR can good to go

ManyTheFish · 2025-05-20T12:14:23Z

Hey @CodeMan62,

it seem to remain two failing tests:

error_document_field_limit_reached_in_one_document (crates/meilisearch/tests/documents/add_documents.rs:1525)
error_document_field_limit_reached_over_multiple_documents (crates/meilisearch/tests/documents/add_documents.rs:1608)

rust Fmt is not happy:

Diff in /home/runner/work/meilisearch/meilisearch/crates/meilisearch-types/src/error.rs:413:
                     UserError::InvalidStoreFile => Code::InvalidStoreFile,
                     UserError::NoSpaceLeftOnDevice => Code::NoSpaceLeftOnDevice,
                     UserError::MaxDatabaseSizeReached => Code::DatabaseSizeLimitReached,
-                    UserError::AttributeLimitReached{ .. } => Code::MaxFieldsLimitExceeded,
+                    UserError::AttributeLimitReached { .. } => Code::MaxFieldsLimitExceeded,
                     UserError::InvalidFilter(_) => Code::InvalidSearchFilter,
                     UserError::InvalidFilterExpression(..) => Code::InvalidSearchFilter,
                     UserError::FilterOperatorNotAllowed { .. } => Code::InvalidSearchFilter,

CodeMan62 · 2025-05-20T12:17:01Z

yes that is what i asking if impl. is fine then i just fix the test and fmt let me do both now

CodeMan62 · 2025-05-20T15:42:09Z

@ManyTheFish everything is passing i think it is good to go now

ManyTheFish · 2025-05-20T13:51:39Z

crates/meilisearch/tests/documents/add_documents.rs

@@ -1619,7 +1619,7 @@ async fn error_document_field_limit_reached_over_multiple_documents() {
        "indexedDocuments": 0
      },
      "error": {
-        "message": "A document cannot contain more than 65,535 fields.",
+        "message": "Adding the document with id None would add 1 new unique field to the index. The index already contained 65535 unique fields",


I am not sure to understand why the document id is not found 🤔
Moreover, this document is adding more than 1 field 🤔

Suggested change

"message": "Adding the document with id None would add 1 new unique field to the index. The index already contained 65535 unique fields",

"message": "Adding the document with id "wow" would add 1 new unique field to the index. The index already contained 65535 unique fields",

let me refactor again and see what I have missed

So what i did is i made every document_id None and that's what I shouldn't do I am refactoring it

CodeMan62 · 2025-05-22T19:12:34Z

Not making it draft the this work will be complete by Monday

CodeMan62 · 2025-05-26T16:03:30Z

@ManyTheFish can you help me here adress what is wrong??

ManyTheFish · 2025-05-28T11:51:09Z

crates/milli/src/update/new/document.rs

        let field_id =
            fields_ids_map.id_or_insert(field_name).ok_or(UserError::AttributeLimitReached {
-                document_id: Some(doc_id),
+                document_id: field_name.to_string(),


Document_id

I'd say it's the most important part, field_name.to_string() is the name of the field you are currently reading, not the value neither the document id.

If you want to have the documentid, you have to pass it to write_to_obkv and use it every time you want to return UserError::AttributeLimitReached in this function.

The caller of the function is here:

meilisearch/crates/milli/src/update/new/extract/documents.rs

Lines 121 to 126 in 83cd28b

let content = write_to_obkv(

&content,

vector_content.as_ref(),

&mut new_fields_ids_map,

&mut document_buffer,

)?;

And you can retrieve the external document id using the document change described here:

meilisearch/crates/milli/src/update/new/document_change.rs

Lines 16 to 38 in 83cd28b

pub enum DocumentChange<'doc> {

Deletion(Deletion<'doc>),

Update(Update<'doc>),

Insertion(Insertion<'doc>),

}

pub struct Deletion<'doc> {

docid: DocumentId,

external_document_id: &'doc str,

}

pub struct Update<'doc> {

docid: DocumentId,

external_document_id: &'doc str,

new: Versions<'doc>,

from_scratch: bool,

}

pub struct Insertion<'doc> {

docid: DocumentId,

external_document_id: &'doc str,

new: Versions<'doc>,

}

new_field_count

To compute the new_field_count, I suggest not returning the error directly but increment a counter instead that will be used to produce the errror:

let Some(field_id) = fields_ids_map.id_or_insert(field_name) else { refused_fields.insert(field_name); // Hashset continue; } // then at the end of the function if refused_fields.is_empty() { writer.finish().unwrap(); Ok(KvReaderFieldId::from_slice(document_buffer)) } else { UserError::AttributeLimitReached { document_id: external_document_id, new_field_count: refused_fields.len(), number_of_existing_field: no_of_existing_fields, } }

thanks for this I will do it asap

CodeMan62 added 2 commits April 19, 2025 22:22

fixed misleading error message for max_fields_limit_exceeded

662cc96

Merge branch 'main' into issue#5508

9bb72bc

fix tests

b44d2fa

added note s

c09faa8

ManyTheFish reviewed Apr 29, 2025

View reviewed changes

refactor error message

c3acc2c

CodeMan62 force-pushed the issue#5508 branch from 8954d3b to c3acc2c Compare May 14, 2025 08:06

ManyTheFish requested changes May 14, 2025

View reviewed changes

i32 to usize, fmt fix

94443f6

CodeMan62 force-pushed the issue#5508 branch from bf6f6c0 to 94443f6 Compare May 14, 2025 09:06

martin-g reviewed May 15, 2025

View reviewed changes

crates/milli/src/error.rs Outdated Show resolved Hide resolved

repro.py Outdated Show resolved Hide resolved

fix warning in

cc8b11d

CodeMan62 marked this pull request as draft May 15, 2025 09:09

Delete repro.py

ea8d62b

CodeMan62 added 5 commits May 15, 2025 14:49

fix spell in error message

baf8022

fix transform.rs

0c5ab2f

final refactor

cddc150

fix fmt

aae1bac

Merge branch 'main' into issue#5508

b3ce81d

clippy fix

e6e8239

types types error message

659e6c7

CodeMan62 marked this pull request as ready for review May 20, 2025 09:36

fmt fix + tests pass

dc4f01e

ManyTheFish reviewed May 21, 2025

View reviewed changes

some fix

88051fd

CodeMan62 added 2 commits May 23, 2025 12:23

final refactor

ece1326

refactor: document_id

419d9d3

ManyTheFish reviewed May 28, 2025

View reviewed changes

		#[error("A single index cannot have more than 65,535 unique fields across all documents. \n\ note: prior to this batch of tasks, the index already contained 65,535 unique fields.\n\
		note: other documents from the same batch might have successfully added new unique fields before this one")]

	"message": "Adding the document with id None would add 1 new unique field to the index. The index already contained 65535 unique fields",
	"message": "Adding the document with id "wow" would add 1 new unique field to the index. The index already contained 65535 unique fields",

	let content = write_to_obkv(
	&content,
	vector_content.as_ref(),
	&mut new_fields_ids_map,
	&mut document_buffer,
	)?;

	pub enum DocumentChange<'doc> {
	Deletion(Deletion<'doc>),
	Update(Update<'doc>),
	Insertion(Insertion<'doc>),
	}

	pub struct Deletion<'doc> {
	docid: DocumentId,
	external_document_id: &'doc str,
	}

	pub struct Update<'doc> {
	docid: DocumentId,
	external_document_id: &'doc str,
	new: Versions<'doc>,
	from_scratch: bool,
	}

	pub struct Insertion<'doc> {
	docid: DocumentId,
	external_document_id: &'doc str,
	new: Versions<'doc>,
	}

fixed misleading error message for max_fields_limit_exceeded #5522

Are you sure you want to change the base?

fixed misleading error message for max_fields_limit_exceeded #5522

Uh oh!

Conversation

CodeMan62 commented Apr 19, 2025

Pull Request

Related issue

What does this PR do?

PR checklist

Uh oh!

ManyTheFish commented Apr 24, 2025

Uh oh!

CodeMan62 commented Apr 24, 2025

Uh oh!

ManyTheFish commented Apr 24, 2025

Uh oh!

CodeMan62 commented Apr 24, 2025

Uh oh!

ManyTheFish Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CodeMan62 commented May 5, 2025

Uh oh!

ManyTheFish commented May 6, 2025

Uh oh!

CodeMan62 commented May 8, 2025

Uh oh!

CodeMan62 commented May 10, 2025

Uh oh!

CodeMan62 commented May 14, 2025

Uh oh!

ManyTheFish left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CodeMan62 May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CodeMan62 May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CodeMan62 commented May 15, 2025

Uh oh!

CodeMan62 commented May 19, 2025

Uh oh!

irevoire commented May 19, 2025

Uh oh!

CodeMan62 commented May 19, 2025

Uh oh!

CodeMan62 commented May 20, 2025

Uh oh!

ManyTheFish commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CodeMan62 commented May 20, 2025

Uh oh!

CodeMan62 commented May 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ManyTheFish Apr 29, 2025 •

edited

Loading

CodeMan62 May 14, 2025 •

edited

Loading

CodeMan62 May 14, 2025 •

edited

Loading

ManyTheFish commented May 20, 2025 •

edited

Loading