Skip to content

fixed misleading error message for max_fields_limit_exceeded #5522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

CodeMan62
Copy link
Contributor

Pull Request

Related issue

Fixes #5508

What does this PR do?

Fixes misleading error message for max_fields_limit_exceeded

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

@ManyTheFish
Copy link
Member

Hello @CodeMan62,
Thank you for this PR.
Can you fix the failing test? In this case, it is quite simple:

  1. run cargo insta test -- --test-threads=4 to run the tests
  2. run cargo insta review to review all the modified snapshots

@CodeMan62
Copy link
Contributor Author

@ManyTheFish Done

@ManyTheFish
Copy link
Member

Hello @CodeMan62,
The new error message is better than the previous one. 👍
However, the section Suggestion for the new error message of the related issue suggests a more explicit message with the previous count of unique fields, the number of additional fields, and some notes.

Could you please build a message like this? Don't hesitate to ask if you miss any information!

@CodeMan62
Copy link
Contributor Author

Hey @ManyTheFish
let me see what i can do here

Comment on lines 111 to 112
#[error("A single index cannot have more than 65,535 unique fields across all documents. \n\ note: prior to this batch of tasks, the index already contained 65,535 unique fields.\n\
note: other documents from the same batch might have successfully added new unique fields before this one")]
Copy link
Member

@ManyTheFish ManyTheFish Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message deserves to be less hardcoded, for instance, the 65,535 is not a completely hardcoded value in Meilisearch and corresponds to FieldId::MAX as u32 + 1.

The good way, in my sense, would be to define a MAX_ATTRIBUTES_PER_INDEX in

pub const MAX_POSITION_PER_ATTRIBUTE: u32 = u16::MAX as u32 + 1;
, and use this constant to generate the error.

"... the index already contained 65,535 unique fields ...", here the value should be computed dynamically, there is a scenario where there are like 65500 fields in the index, and a document brings 40 new fields, the perfect message would be "... Adding the document with id {document_id} would add {new_field_count} new unique field to the index ... the index already contained {number_of_field_before_extracting_document} unique fields ..." -> "... Adding the document with id 1 would add 40 new unique field to the index ... the index already contained 65500 unique fields ..."

We are a bit picky, but the goal here is to help the users as much as possible to understand why their specific usage triggers an error. So returning a personalized message is really helpful 😄

@CodeMan62
Copy link
Contributor Author

Hey @ManyTheFish I am here with a silly question UserError::AttributeLimitReached is used in so many function so I am completely confused here can you just tell me that how I will use this in this function If this is my updated AttributedLimitReached
updated AttributedLimitReached

AttributeLimitReached{
        document_id: String,
        new_field_count: usize,
        no_of_existing_fields: usize,
        max: u32, 
    },

function to use in

fn create_fields_mapping(
    index_field_map: &mut FieldIdMapWithMetadata,
    batch_field_map: &DocumentsBatchIndex,
) -> Result<HashMap<FieldId, FieldId>> {
    batch_field_map
        .iter()
        // we sort by id here to ensure a deterministic mapping of the fields, that preserves
        // the original ordering.
        .sorted_by_key(|(&id, _)| id)
        .map(|(field, name)| match index_field_map.id(name) {
            Some(id) => Ok((*field, id)),
            None => index_field_map
                .insert(name)
                .ok_or(Error::UserError(UserError::AttributeLimitReached))
                .map(|id| (*field, id)),
        })
        .collect()
}

this will make this pr complete faster
thanks 🙏

@ManyTheFish
Copy link
Member

Hey @CodeMan62,
Indeed, the issue is trickier than it seems. You have several different cases where AttributeLimitReached is returned:

  1. When extracting document fields
  2. When extracting searchable attributes
  3. When extracting faceted attributes
  4. When extracting geo fields
  5. When extracting vector fields

Most of the time, you can access the document ID, but in the example you provided, you can't because it's a complete batch that we are processing. I want you to know that this code will be removed in the future because we are refactoring this part so that you could have an Option<String> for the document_id and fill it when you can.
For the counters, you could refactor a bit of the code to:

  1. Count the initial fields.
  2. Collect all the refused fields in a vector before returning the error

To ease the implementation and see how much information we are missing, I suggest putting everything optional in your structure and filling it out as much as possible. We will remove the useless Options after.

Then, for the max value, don't store it, it's a constant value for now, so I suggest doing what I said before:
#5522 (comment)

@CodeMan62
Copy link
Contributor Author

Thanks for clarification, will do it asap

@CodeMan62
Copy link
Contributor Author

@ManyTheFish one think i want to point out is I will not put everything optional I will try to debug as much as I can and try to refactor the code

@CodeMan62
Copy link
Contributor Author

Hey @ManyTheFish can you please check my latest commit and tell me if the modification I done in primary_key.rs file if it is good to go then just tell me so I can finish the work of remaining

Copy link
Member

@ManyTheFish ManyTheFish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeMan62 I made a review, we are close to the end I feel

@@ -107,12 +107,16 @@ impl crate::documents::FieldIdMapper for FieldsIdsMap {

pub trait MutFieldIdMapper {
fn insert(&mut self, name: &str) -> Option<FieldId>;
fn len(&mut self) -> i32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i32? why not a usize or a u32?

Copy link
Contributor Author

@CodeMan62 CodeMan62 May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok i will do usize

@@ -118,9 +118,12 @@ impl<'indexing> GlobalFieldsIdsMap<'indexing> {
self.local.metadata(id)
}
}

#[warn(unconditional_recursion)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I suggest finding a workaround to that:

Suggested change
#[warn(unconditional_recursion)]

This warning comes from the fact that you are calling GlobalFieldsIdsMap::len() in GlobalFieldsIdsMap::len()

You may want to call len on the self.local

Copy link
Contributor Author

@CodeMan62 CodeMan62 May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

EDIT:- Thanks for review i will fix this and also update all the places where AttributeLimitReached used any remaining review ??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, If the tests are green we can merge :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How i mean if we have a warning how the CI will go further

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How i mean if we have a warning how the CI will go further

Ah sorry, the warning must be fixed and not skipped 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solved it

@CodeMan62 CodeMan62 marked this pull request as draft May 15, 2025 09:09
@CodeMan62
Copy link
Contributor Author

@ManyTheFish I request you to stop the CI until PR get's ready and also see my latest commit I found a solution for our warning that we were getting

@CodeMan62
Copy link
Contributor Author

@ManyTheFish all the tests are passing everything is done let me know if anything left?

@irevoire
Copy link
Member

Hey @CodeMan62 you must still fix clippy 😬

@CodeMan62
Copy link
Contributor Author

Hey @CodeMan62 you must still fix clippy 😬

Hey @irevoire clippy fixed this time

@CodeMan62 CodeMan62 marked this pull request as ready for review May 20, 2025 09:36
@CodeMan62
Copy link
Contributor Author

@ManyTheFish review if it looks fine then let me just do cargo insta review and then this PR can good to go

@ManyTheFish
Copy link
Member

ManyTheFish commented May 20, 2025

Hey @CodeMan62,

it seem to remain two failing tests:

  • error_document_field_limit_reached_in_one_document (crates/meilisearch/tests/documents/add_documents.rs:1525)
  • error_document_field_limit_reached_over_multiple_documents (crates/meilisearch/tests/documents/add_documents.rs:1608)

rust Fmt is not happy:

Diff in /home/runner/work/meilisearch/meilisearch/crates/meilisearch-types/src/error.rs:413:
                     UserError::InvalidStoreFile => Code::InvalidStoreFile,
                     UserError::NoSpaceLeftOnDevice => Code::NoSpaceLeftOnDevice,
                     UserError::MaxDatabaseSizeReached => Code::DatabaseSizeLimitReached,
-                    UserError::AttributeLimitReached{ .. } => Code::MaxFieldsLimitExceeded,
+                    UserError::AttributeLimitReached { .. } => Code::MaxFieldsLimitExceeded,
                     UserError::InvalidFilter(_) => Code::InvalidSearchFilter,
                     UserError::InvalidFilterExpression(..) => Code::InvalidSearchFilter,
                     UserError::FilterOperatorNotAllowed { .. } => Code::InvalidSearchFilter,

@CodeMan62
Copy link
Contributor Author

yes that is what i asking if impl. is fine then i just fix the test and fmt let me do both now

@CodeMan62
Copy link
Contributor Author

@ManyTheFish everything is passing i think it is good to go now

@@ -1619,7 +1619,7 @@ async fn error_document_field_limit_reached_over_multiple_documents() {
"indexedDocuments": 0
},
"error": {
"message": "A document cannot contain more than 65,535 fields.",
"message": "Adding the document with id None would add 1 new unique field to the index. The index already contained 65535 unique fields",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to understand why the document id is not found 🤔
Moreover, this document is adding more than 1 field 🤔

Suggested change
"message": "Adding the document with id None would add 1 new unique field to the index. The index already contained 65535 unique fields",
"message": "Adding the document with id "wow" would add 1 new unique field to the index. The index already contained 65535 unique fields",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me refactor again and see what I have missed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what i did is i made every document_id None and that's what I shouldn't do I am refactoring it

@CodeMan62
Copy link
Contributor Author

Not making it draft the this work will be complete by Monday

@CodeMan62
Copy link
Contributor Author

@ManyTheFish can you help me here adress what is wrong??

let field_id =
fields_ids_map.id_or_insert(field_name).ok_or(UserError::AttributeLimitReached {
document_id: Some(doc_id),
document_id: field_name.to_string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document_id

I'd say it's the most important part, field_name.to_string() is the name of the field you are currently reading, not the value neither the document id.

If you want to have the documentid, you have to pass it to write_to_obkv and use it every time you want to return UserError::AttributeLimitReached in this function.

The caller of the function is here:

let content = write_to_obkv(
&content,
vector_content.as_ref(),
&mut new_fields_ids_map,
&mut document_buffer,
)?;

And you can retrieve the external document id using the document change described here:

pub enum DocumentChange<'doc> {
Deletion(Deletion<'doc>),
Update(Update<'doc>),
Insertion(Insertion<'doc>),
}
pub struct Deletion<'doc> {
docid: DocumentId,
external_document_id: &'doc str,
}
pub struct Update<'doc> {
docid: DocumentId,
external_document_id: &'doc str,
new: Versions<'doc>,
from_scratch: bool,
}
pub struct Insertion<'doc> {
docid: DocumentId,
external_document_id: &'doc str,
new: Versions<'doc>,
}

new_field_count

To compute the new_field_count, I suggest not returning the error directly but increment a counter instead that will be used to produce the errror:

let Some(field_id) =  fields_ids_map.id_or_insert(field_name) else {
    refused_fields.insert(field_name); // Hashset
    continue;
}

// then at the end of the function

if refused_fields.is_empty() {
    writer.finish().unwrap();
    Ok(KvReaderFieldId::from_slice(document_buffer))
} else {
    UserError::AttributeLimitReached {
        document_id: external_document_id,
        new_field_count: refused_fields.len(),
        number_of_existing_field: no_of_existing_fields,
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this I will do it asap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error message for max_fields_limit_exceeded is misleading
4 participants