[v6] plumbing: format/packfile, Optimise packfile delta processing #1523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

pjbgf wants to merge 19 commits into go-git:main from pjbgf:perf-improv

Member

pjbgf commented Apr 17, 2025 •

edited

Loading

The memory churn for processing delta objects was quite high, due to the performance optimisation which kept in memory all the inflated contents of each object.

The changes introduced allows for supporting storage implementations (i.e. storage/filesystem) to only keep in memory the contents of the parents needed to resolve the contents of the delta being processed. The buffers used for all the contents are reused to avoid unnecessary churn.

Users can opt-out by using WithHighMemoryUsage() when creating the Parser, or HighMemoryMode in the storage option.

A new ioutil.Copy was introduced to simplify the process of reusing buffers during Copy operations.

⚠️ Additionally, the storage started using KeepDescriptors by default, which enables the same PackFile being re-parsed several times during a given git operation. This change is still being tested for its impact.

Fixes #1188.

Benchmarks comparing v6-transport with this PR while cloning kubernetes/kubernetes:

Filesystem storage (2.3GB less memory churn)

Memory storage (1GB less memory churn)

pjbgf added the performance label

pjbgf added this to the v6.0.0 milestone

runxiyu commented Apr 17, 2025 •

edited

Loading

This may also fix #1451. I will test it against my workloads when I have time.

Edit: Nevermind, it doesn't.

This was referenced Apr 18, 2025

Blame is very slow #14

Open

Proposal: Gitea git cat-file Subprocess Management Optimization go-gitea/gitea#33952

Open

runxiyu reviewed

View reviewed changes

utils/ioutil/sync.go

		@@ -0,0 +1,17 @@
		package ioutil

runxiyu Apr 20, 2025

I think this is very confusingly named, since it's also the name of the deprecated io/ioutil

Member Author

pjbgf May 24, 2025

@runxiyu this is an existing package, renaming it is outside of the scope of this PR.

plumbing/format/packfile/parser_test.go

@@ @@ -135,7 +135,7 @@ func TestResolveExternalRefsInThinPack(t *testing.T) { @@
               	checksum, err := parser.Parse()
               	assert.NoError(t, err)
-              	assert.NotEqual(t, plumbing.ZeroHash, checksum)
+              	assert.NotEqual(t, checksum, plumbing.ZeroHash)

runxiyu Apr 20, 2025

How would this make a difference?

Member Author

pjbgf May 24, 2025

This is largely to align with the assertion construct, whereby the second argument is the expected whereas the third is the actual.

pjbgf force-pushed the perf-improv branch from c2b87a8 to 8b5d7fc Compare

April 24, 2025 07:03

pjbgf mentioned this pull request

memory leak with clone #315

Open

pjbgf added 14 commits

May 24, 2025 16:23


          utils: ioutil, Add Copy to simplify buffer reuse

492eaff

The new func abstracts away the use of sync while managing the buffers
used for io.CopyBuffer calls.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          plumbing: format/packfile, Use require instead of assert

043aaff

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          plumbing: format/packfile, Replace CopyBuffer with new ioutil.Copy

0a23626

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          utils: Refactor ByteSlice Pool

c4dbeb4

Ensures that:
- Slices are at least of the initial length.
- No data is kept between Put and Get operations.
- The slice size is increased to 32kb.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          plumbing: format, Avoid multiple Put into sync.Pool

b714d51

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          plumbing: http, Replace Copy with new ioutil.Copy

4f3bf80

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          plumbing: transport, Replace Copy with new ioutil.Copy

f34e3a8

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          git: Replace Copy with new ioutil.Copy

82d995e

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          storage: filesystem, Replace Copy with new ioutil.Copy

3bb1c09

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          plumbing: object, Replace Copy with new ioutil.Copy

77f1bff

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          plumbing: format/packfile, Optimise parent content memory

Avoid on-demand allocation of buffers to hold parent content by
reusing buffers from sync.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          plumbing: format/packfile, Remove unused isInvalid func

46b2ea4

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          plumbing: format/packfile, Add new opt-out low memory mode

97ab6df

The memory churn for processing delta objects is quite high, due to
the performance optimisation which keeps in memory all the inflated
contents of each object.

The new default only keeps in memory the contents of the parents needed
to resolve the contents of the delta being processed. The buffers used
for all the contents are reused to avoid unnecessary churn.

Users can opt-out by using WithHighMemoryUsage() when calling the Parser.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          plumbing: format/packfile, Improve tests

56275ed

Signed-off-by: Paulo Gomes <pjbgf@linux.com>

pjbgf force-pushed the perf-improv branch from 8b5d7fc to 56275ed Compare

May 24, 2025 17:56

pjbgf added 5 commits

May 28, 2025 10:58


          *: Add Require() to error checks

e81cd76

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          plumbing: format/packfile, Fix issues with new Low Memory Mode

f457677

The initial low-memory mode did not work well on specific scenarios
which was leading to some tests to fail. The changes resolve that and
rename the HighMemoryUsage option to HighMemoryMode instead.

It introduces the LowMemoryCapable interface, which enables storage
implementations to opt-in/out of this mode.
When a storage does not implement that interface, high-memory mode
would be the default.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          storage: filesystem, Add support for low-memory mode

aa8dc7a

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          *: tests, Close file and add Require()

0b2ab6d

Signed-off-by: Paulo Gomes <pjbgf@linux.com>


          plumbing: transport, Use KeepDescriptors by default

f9f2ae4

In filesystem storage, the packfile caching is only used if KeepDescriptors
is enabled. This should make overall operations using filesystem storage more
efficient.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>

pjbgf force-pushed the perf-improv branch from 7afc0de to f9f2ae4 Compare

May 29, 2025 08:24

pjbgf mentioned this pull request

Bisected regression runs genOffsetHash leading to wasteful loop when processing ofs-delta #1451

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels