Skip to content

ENH avoid np.square(X) in enet_coordinate_descent to save memory #31665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 27, 2025

Conversation

lorentzenchr
Copy link
Member

@lorentzenchr lorentzenchr commented Jun 26, 2025

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

This PR replaces np.square(X).sum(axis=0) by np.einsum("ij,ij->j", X, X) to avoid memory allocation of the size of X(usually the largest object).

Any other comments?

This also improves timing a bit.

We might even consider to write the loop explicitly like in (

cdef float64_t[::1] _sqeuclidean_row_norms64_dense(
const float64_t[:, ::1] X,
intp_t num_threads,
):
"""Compute the squared euclidean norm of the rows of X in parallel.
This is faster than using np.einsum("ij, ij->i") even when using a single thread.
).

Copy link

github-actions bot commented Jun 26, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 47d0663. Link to the linter CI: here

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a quick memory benchmark of einsum vs np.square(X).sum(axis=0)?

@lorentzenchr
Copy link
Member Author

lorentzenchr commented Jun 27, 2025

import numpy as np

rng = np.random.default_rng(42)
X = rng.standard_normal((100, 10_000))
print(f"X allocates {X.nbytes * 1e-6} MB of memory")
# X allocates 8.0 MB of memory

%timeit np.square(X).sum(axis=0)
# 1.09 ms ± 53.2 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit np.einsum("ij,ij->j", X, X)
# 631 μs ± 29.3 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%load_ext memory_profiler
%memit np.square(X).sum(axis=0)
peak memory: 87.97 MiB, increment: 7.98 MiB

In [6]: %memit np.einsum("ij,ij->j", X, X)
peak memory: 88.34 MiB, increment: 0.03 MiB

Similar with transposed X.
This shows that np.square indeed doubles the memory (+8 MB).

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @lorentzenchr

@OmarManzoor OmarManzoor merged commit 687e84a into scikit-learn:main Jun 27, 2025
36 checks passed
@lorentzenchr lorentzenchr deleted the cd_fast_avoid_square_X branch June 27, 2025 20:08
lorentzenchr added a commit to lorentzenchr/scikit-learn that referenced this pull request Jun 27, 2025
jeremiedbb added a commit that referenced this pull request Jun 29, 2025
Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
Co-authored-by: Virgil Chan <virchan.math@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants