Skip to content

[v6] plumbing: format/packfile, Optimise packfile delta processing #1523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

pjbgf
Copy link
Member

@pjbgf pjbgf commented Apr 17, 2025

The memory churn for processing delta objects was quite high, due to the performance optimisation which kept in memory all the inflated contents of each object.

The changes introduced allows for supporting storage implementations (i.e. storage/filesystem) to only keep in memory the contents of the parents needed to resolve the contents of the delta being processed. The buffers used for all the contents are reused to avoid unnecessary churn.

Users can opt-out by using WithHighMemoryUsage() when creating the Parser, or HighMemoryMode in the storage option.

A new ioutil.Copy was introduced to simplify the process of reusing buffers during Copy operations.

⚠️ Additionally, the storage started using KeepDescriptors by default, which enables the same PackFile being re-parsed several times during a given git operation. This change is still being tested for its impact.

Fixes #1188.


Benchmarks comparing v6-transport with this PR while cloning kubernetes/kubernetes:

Filesystem storage (2.3GB less memory churn)

image
image

Memory storage (1GB less memory churn)

image
image

@pjbgf pjbgf added this to the v6.0.0 milestone Apr 17, 2025
@runxiyu
Copy link

runxiyu commented Apr 17, 2025

This may also fix #1451. I will test it against my workloads when I have time.

Edit: Nevermind, it doesn't.

@@ -0,0 +1,17 @@
package ioutil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very confusingly named, since it's also the name of the deprecated io/ioutil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@runxiyu this is an existing package, renaming it is outside of the scope of this PR.

@@ -135,7 +135,7 @@ func TestResolveExternalRefsInThinPack(t *testing.T) {

checksum, err := parser.Parse()
assert.NoError(t, err)
assert.NotEqual(t, plumbing.ZeroHash, checksum)
assert.NotEqual(t, checksum, plumbing.ZeroHash)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this make a difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is largely to align with the assertion construct, whereby the second argument is the expected whereas the third is the actual.

pjbgf added 14 commits May 24, 2025 16:23
The new func abstracts away the use of sync while managing the buffers
used for io.CopyBuffer calls.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Ensures that:
- Slices are at least of the initial length.
- No data is kept between Put and Get operations.
- The slice size is increased to 32kb.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Avoid on-demand allocation of buffers to hold parent content by
reusing buffers from sync.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
The memory churn for processing delta objects is quite high, due to
the performance optimisation which keeps in memory all the inflated
contents of each object.

The new default only keeps in memory the contents of the parents needed
to resolve the contents of the delta being processed. The buffers used
for all the contents are reused to avoid unnecessary churn.

Users can opt-out by using WithHighMemoryUsage() when calling the Parser.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
pjbgf added 5 commits May 28, 2025 10:58
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
The initial low-memory mode did not work well on specific scenarios
which was leading to some tests to fail. The changes resolve that and
rename the HighMemoryUsage option to HighMemoryMode instead.

It introduces the LowMemoryCapable interface, which enables storage
implementations to opt-in/out of this mode.
When a storage does not implement that interface, high-memory mode
would be the default.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
In filesystem storage, the packfile caching is only used if KeepDescriptors
is enabled. This should make overall operations using filesystem storage more
efficient.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants