-
-
Notifications
You must be signed in to change notification settings - Fork 32.7k
Fixed #35846 -- Improved reproducibility on staticfiles manifests. #19586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fixed #35846 -- Improved reproducibility on staticfiles manifests. #19586
Conversation
Sorts the keys in the dictionary when dumping to JSON to ensure that the order the filesystem returns the files in does not cause the manifest file to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! Thank you for your contribution 💪
As it's your first contribution be sure to check out the patch review checklist.
If you're fixing a ticket from Trac make sure to set the "Has patch" flag and include a link to this PR in the ticket!
If you have any design or process questions then you can ask in the Django forum.
Welcome aboard ⛵️!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @matthews-noriker for starting this work! 🌟
Could you please elaborate on why you chose to sort the entire payload and not just self.hashed_files
?
# Change the order of the hashed files | ||
storage.staticfiles_storage.hashed_files = dict(reversed(hashed_files.items())) | ||
# The manifest file content should not change. | ||
storage.staticfiles_storage.save_manifest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please use mock.patch
here to ensure proper cleanup? (pseudo code below, untested)
# Change the order of the hashed files | |
storage.staticfiles_storage.hashed_files = dict(reversed(hashed_files.items())) | |
# The manifest file content should not change. | |
storage.staticfiles_storage.save_manifest() | |
# Change the order of the hashed files | |
with mock.patch.object(storage.staticfiles_storage, "hashed_files", dict(reversed(hashed_files.items())) | |
storage.staticfiles_storage.save_manifest() | |
# The manifest file content should not change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, originally I implemented it as you suggest as it solves this specific issue with minimum other side effects.
However I figured that this way means that if there are any other future changes to the manifest file format, this approach will automatically catch those without a future developer needing to specifically sort their "new" field for example.
Happy to just sort this one key if that is preferable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure of what would be preferable in this case. Your solution is clean and clear, and I quite like it. But it's a little bit more broad than the ticket scope, so part of my role is ensuring that if we go "bigger", we have a good reason.
If we keep the current solution, we should ideally have a test that cover the whole extent of the change. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your thoughts. I like the current solution because it is really simple, the alternative is this
payload = {
"paths": dict(sorted(self.hashed_files.items())), # <-- this line change
"version": self.manifest_version,
"hash": self.manifest_hash,
}
I'm less of a fan of this for two reasons. One, its more verbose. Two, what this issue is actually about is the serialisation of the dictionary to a JSON file. We don't really care what order the keys are in the python objects but we do specifically care that the order in the output file is always the same. Therefore I think it makes more sense to do the sorting at the serialisation step aka the json.dumps
.
The dictionary itself is constructed directly within the save_manifest()
function and both version
and hash
are strings. Therefore the only part which can change is the order of the keys in paths
, so there isn't any other useful tests that can be written to verify the whole extent as nothing else is able to change.
If someone edits the manifest format and introduces a new field then I would expect them to consider whether it is necessary for them to test that this new field is deterministic.
I can add a comment to the save_manifest()
function stating that the output file must be deterministic as a reminder to future developers perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @matthews-noriker for sharing your reasoning. I've been thinking on what you said and it makes sense, but after a closer inspection, I noticed that the code is already sorting the hashed_files
(a few lines above), so I think the best course of action is to:
- store the sorted list of hashed files in a variable
- use it in the two places
- do not use
sort_keys
in the json dumps call
This fix is a bit more "neutral" while still addressing the reported issue. Not to mention that the double sorting is avoided!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nessita I have made the changes you have suggested 👍
8307a9c
to
71539d0
Compare
Use a mock to avoid affecting other tests when reordering the hashed files.
71539d0
to
8967982
Compare
Changed method of sorting based on PR feedback.
Trac ticket number
ticket-35846
Branch description
The ordering of the paths in staticfiles.json for the ManifestStaticFilesStorage backend is non-deterministic as it is determined by the order the files are returned by os.scandir, which is non-deterministic.
This causes problems for reproducible builds as the file content will not always be the same for the same set of static files.
This PR sorts the keys in the resulting JSON file to make the outputted file deterministic. I do not believe this change will have any backwards incompatible consequences.
The test written forces the order of the files in the list to be re-ordered and then assets that the newly generated JSON file matches the original.
Checklist
main
branch.