-
Notifications
You must be signed in to change notification settings - Fork 2.1k
fixed misleading error message for max_fields_limit_exceeded #5522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hello @CodeMan62,
|
@ManyTheFish Done |
Hello @CodeMan62, Could you please build a message like this? Don't hesitate to ask if you miss any information! |
Hey @ManyTheFish |
crates/milli/src/error.rs
Outdated
#[error("A single index cannot have more than 65,535 unique fields across all documents. \n\ note: prior to this batch of tasks, the index already contained 65,535 unique fields.\n\ | ||
note: other documents from the same batch might have successfully added new unique fields before this one")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message deserves to be less hardcoded, for instance, the 65,535
is not a completely hardcoded value in Meilisearch and corresponds to FieldId::MAX as u32 + 1
.
The good way, in my sense, would be to define a MAX_ATTRIBUTES_PER_INDEX
in
meilisearch/crates/milli/src/lib.rs
Line 129 in 3f683c4
pub const MAX_POSITION_PER_ATTRIBUTE: u32 = u16::MAX as u32 + 1; |
"... the index already contained 65,535 unique fields ...", here the value should be computed dynamically, there is a scenario where there are like 65500 fields in the index, and a document brings 40 new fields, the perfect message would be "... Adding the document with id {document_id} would add {new_field_count} new unique field to the index ... the index already contained {number_of_field_before_extracting_document} unique fields ..."
-> "... Adding the document with id 1 would add 40 new unique field to the index ... the index already contained 65500 unique fields ..."
We are a bit picky, but the goal here is to help the users as much as possible to understand why their specific usage triggers an error. So returning a personalized message is really helpful 😄
Hey @ManyTheFish I am here with a silly question AttributeLimitReached{
document_id: String,
new_field_count: usize,
no_of_existing_fields: usize,
max: u32,
}, function to use in fn create_fields_mapping(
index_field_map: &mut FieldIdMapWithMetadata,
batch_field_map: &DocumentsBatchIndex,
) -> Result<HashMap<FieldId, FieldId>> {
batch_field_map
.iter()
// we sort by id here to ensure a deterministic mapping of the fields, that preserves
// the original ordering.
.sorted_by_key(|(&id, _)| id)
.map(|(field, name)| match index_field_map.id(name) {
Some(id) => Ok((*field, id)),
None => index_field_map
.insert(name)
.ok_or(Error::UserError(UserError::AttributeLimitReached))
.map(|id| (*field, id)),
})
.collect()
} this will make this pr complete faster |
Hey @CodeMan62,
Most of the time, you can access the document ID, but in the example you provided, you can't because it's a complete batch that we are processing. I want you to know that this code will be removed in the future because we are refactoring this part so that you could have an
To ease the implementation and see how much information we are missing, I suggest putting everything optional in your structure and filling it out as much as possible. We will remove the useless Then, for the |
Thanks for clarification, will do it asap |
@ManyTheFish one think i want to point out is I will not put everything optional I will try to debug as much as I can and try to refactor the code |
Hey @ManyTheFish can you please check my latest commit and tell me if the modification I done in primary_key.rs file if it is good to go then just tell me so I can finish the work of remaining |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeMan62 I made a review, we are close to the end I feel
crates/milli/src/fields_ids_map.rs
Outdated
@@ -107,12 +107,16 @@ impl crate::documents::FieldIdMapper for FieldsIdsMap { | |||
|
|||
pub trait MutFieldIdMapper { | |||
fn insert(&mut self, name: &str) -> Option<FieldId>; | |||
fn len(&mut self) -> i32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i32
? why not a usize
or a u32
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok i will do usize
@@ -118,9 +118,12 @@ impl<'indexing> GlobalFieldsIdsMap<'indexing> { | |||
self.local.metadata(id) | |||
} | |||
} | |||
|
|||
#[warn(unconditional_recursion)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I suggest finding a workaround to that:
#[warn(unconditional_recursion)] |
This warning comes from the fact that you are calling GlobalFieldsIdsMap::len()
in GlobalFieldsIdsMap::len()
You may want to call len
on the self.local
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
EDIT:- Thanks for review i will fix this and also update all the places where AttributeLimitReached
used any remaining review ??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, If the tests are green we can merge :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How i mean if we have a warning how the CI will go further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How i mean if we have a warning how the CI will go further
Ah sorry, the warning must be fixed and not skipped 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solved it
@ManyTheFish I request you to stop the CI until PR get's ready and also see my latest commit I found a solution for our warning that we were getting |
@ManyTheFish all the tests are passing everything is done let me know if anything left? |
Hey @CodeMan62 you must still fix clippy 😬 |
Hey @irevoire clippy fixed this time |
@ManyTheFish review if it looks fine then let me just do |
Hey @CodeMan62, it seem to remain two failing tests:
rust Fmt is not happy: Diff in /home/runner/work/meilisearch/meilisearch/crates/meilisearch-types/src/error.rs:413:
UserError::InvalidStoreFile => Code::InvalidStoreFile,
UserError::NoSpaceLeftOnDevice => Code::NoSpaceLeftOnDevice,
UserError::MaxDatabaseSizeReached => Code::DatabaseSizeLimitReached,
- UserError::AttributeLimitReached{ .. } => Code::MaxFieldsLimitExceeded,
+ UserError::AttributeLimitReached { .. } => Code::MaxFieldsLimitExceeded,
UserError::InvalidFilter(_) => Code::InvalidSearchFilter,
UserError::InvalidFilterExpression(..) => Code::InvalidSearchFilter,
UserError::FilterOperatorNotAllowed { .. } => Code::InvalidSearchFilter, |
yes that is what i asking if impl. is fine then i just fix the test and fmt let me do both now |
@ManyTheFish everything is passing i think it is good to go now |
@@ -1619,7 +1619,7 @@ async fn error_document_field_limit_reached_over_multiple_documents() { | |||
"indexedDocuments": 0 | |||
}, | |||
"error": { | |||
"message": "A document cannot contain more than 65,535 fields.", | |||
"message": "Adding the document with id None would add 1 new unique field to the index. The index already contained 65535 unique fields", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure to understand why the document id is not found 🤔
Moreover, this document is adding more than 1 field 🤔
"message": "Adding the document with id None would add 1 new unique field to the index. The index already contained 65535 unique fields", | |
"message": "Adding the document with id "wow" would add 1 new unique field to the index. The index already contained 65535 unique fields", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me refactor again and see what I have missed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what i did is i made every document_id
None
and that's what I shouldn't do I am refactoring it
Not making it draft the this work will be complete by Monday |
@ManyTheFish can you help me here adress what is wrong?? |
let field_id = | ||
fields_ids_map.id_or_insert(field_name).ok_or(UserError::AttributeLimitReached { | ||
document_id: Some(doc_id), | ||
document_id: field_name.to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document_id
I'd say it's the most important part, field_name.to_string()
is the name of the field you are currently reading, not the value neither the document id.
If you want to have the documentid, you have to pass it to write_to_obkv
and use it every time you want to return UserError::AttributeLimitReached
in this function.
The caller of the function is here:
meilisearch/crates/milli/src/update/new/extract/documents.rs
Lines 121 to 126 in 83cd28b
let content = write_to_obkv( | |
&content, | |
vector_content.as_ref(), | |
&mut new_fields_ids_map, | |
&mut document_buffer, | |
)?; |
And you can retrieve the external document id using the document change described here:
meilisearch/crates/milli/src/update/new/document_change.rs
Lines 16 to 38 in 83cd28b
pub enum DocumentChange<'doc> { | |
Deletion(Deletion<'doc>), | |
Update(Update<'doc>), | |
Insertion(Insertion<'doc>), | |
} | |
pub struct Deletion<'doc> { | |
docid: DocumentId, | |
external_document_id: &'doc str, | |
} | |
pub struct Update<'doc> { | |
docid: DocumentId, | |
external_document_id: &'doc str, | |
new: Versions<'doc>, | |
from_scratch: bool, | |
} | |
pub struct Insertion<'doc> { | |
docid: DocumentId, | |
external_document_id: &'doc str, | |
new: Versions<'doc>, | |
} |
new_field_count
To compute the new_field_count, I suggest not returning the error directly but increment a counter instead that will be used to produce the errror:
let Some(field_id) = fields_ids_map.id_or_insert(field_name) else {
refused_fields.insert(field_name); // Hashset
continue;
}
// then at the end of the function
if refused_fields.is_empty() {
writer.finish().unwrap();
Ok(KvReaderFieldId::from_slice(document_buffer))
} else {
UserError::AttributeLimitReached {
document_id: external_document_id,
new_field_count: refused_fields.len(),
number_of_existing_field: no_of_existing_fields,
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this I will do it asap
Pull Request
Related issue
Fixes #5508
What does this PR do?
Fixes misleading error message for
max_fields_limit_exceeded
PR checklist
Please check if your PR fulfills the following requirements:
Thank you so much for contributing to Meilisearch!