Skip to content

Fixed #35846 -- Improved reproducibility on staticfiles manifests. #19586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

matthews-noriker
Copy link

@matthews-noriker matthews-noriker commented Jun 24, 2025

Trac ticket number

ticket-35846

Branch description

The ordering of the paths in staticfiles.json for the ManifestStaticFilesStorage backend is non-deterministic as it is determined by the order the files are returned by os.scandir, which is non-deterministic.

This causes problems for reproducible builds as the file content will not always be the same for the same set of static files.

This PR sorts the keys in the resulting JSON file to make the outputted file deterministic. I do not believe this change will have any backwards incompatible consequences.

The test written forces the order of the files in the list to be re-ordered and then assets that the newly generated JSON file matches the original.

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.
  • I have attached screenshots in both light and dark modes for any UI changes.

Sorts the keys in the dictionary when dumping to JSON to ensure
that the order the filesystem returns the files in does not cause
the manifest file to change.
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! Thank you for your contribution 💪

As it's your first contribution be sure to check out the patch review checklist.

If you're fixing a ticket from Trac make sure to set the "Has patch" flag and include a link to this PR in the ticket!

If you have any design or process questions then you can ask in the Django forum.

Welcome aboard ⛵️!

Copy link
Contributor

@nessita nessita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @matthews-noriker for starting this work! 🌟

Could you please elaborate on why you chose to sort the entire payload and not just self.hashed_files?

Comment on lines 574 to 577
# Change the order of the hashed files
storage.staticfiles_storage.hashed_files = dict(reversed(hashed_files.items()))
# The manifest file content should not change.
storage.staticfiles_storage.save_manifest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use mock.patch here to ensure proper cleanup? (pseudo code below, untested)

Suggested change
# Change the order of the hashed files
storage.staticfiles_storage.hashed_files = dict(reversed(hashed_files.items()))
# The manifest file content should not change.
storage.staticfiles_storage.save_manifest()
# Change the order of the hashed files
with mock.patch.object(storage.staticfiles_storage, "hashed_files", dict(reversed(hashed_files.items()))
storage.staticfiles_storage.save_manifest()
# The manifest file content should not change.

Copy link
Author

@matthews-noriker matthews-noriker Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, originally I implemented it as you suggest as it solves this specific issue with minimum other side effects.

However I figured that this way means that if there are any other future changes to the manifest file format, this approach will automatically catch those without a future developer needing to specifically sort their "new" field for example.

Happy to just sort this one key if that is preferable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure of what would be preferable in this case. Your solution is clean and clear, and I quite like it. But it's a little bit more broad than the ticket scope, so part of my role is ensuring that if we go "bigger", we have a good reason.

If we keep the current solution, we should ideally have a test that cover the whole extent of the change. What do you think?

Copy link
Author

@matthews-noriker matthews-noriker Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your thoughts. I like the current solution because it is really simple, the alternative is this

payload = {
    "paths": dict(sorted(self.hashed_files.items())),  # <-- this line change
    "version": self.manifest_version,
    "hash": self.manifest_hash,
}

I'm less of a fan of this for two reasons. One, its more verbose. Two, what this issue is actually about is the serialisation of the dictionary to a JSON file. We don't really care what order the keys are in the python objects but we do specifically care that the order in the output file is always the same. Therefore I think it makes more sense to do the sorting at the serialisation step aka the json.dumps.

The dictionary itself is constructed directly within the save_manifest() function and both version and hash are strings. Therefore the only part which can change is the order of the keys in paths, so there isn't any other useful tests that can be written to verify the whole extent as nothing else is able to change.

If someone edits the manifest format and introduces a new field then I would expect them to consider whether it is necessary for them to test that this new field is deterministic.

I can add a comment to the save_manifest() function stating that the output file must be deterministic as a reminder to future developers perhaps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @matthews-noriker for sharing your reasoning. I've been thinking on what you said and it makes sense, but after a closer inspection, I noticed that the code is already sorting the hashed_files (a few lines above), so I think the best course of action is to:

  • store the sorted list of hashed files in a variable
  • use it in the two places
  • do not use sort_keys in the json dumps call

This fix is a bit more "neutral" while still addressing the reported issue. Not to mention that the double sorting is avoided!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nessita I have made the changes you have suggested 👍

@matthews-noriker matthews-noriker force-pushed the 35846-Reproducibility-of-staticfiles-manifests branch from 8307a9c to 71539d0 Compare June 25, 2025 09:40
Use a mock to avoid affecting other tests when reordering the hashed files.
@matthews-noriker matthews-noriker force-pushed the 35846-Reproducibility-of-staticfiles-manifests branch from 71539d0 to 8967982 Compare June 25, 2025 09:46
Changed method of sorting based on PR feedback.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants