-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH Make GaussianProcessRegressor.predict
faster when return_std and return_cov are false
#31431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…into gpr_optimisation
Thanks for the PR! The fix looks fine. Could you do a quick benchmark with some toy data to show that this PR actually fixes the performance issue reported in #31374? |
Sure! Thanks. |
I experimented with the diabetes dataset on a x86_64 system with an Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz CPU. Code: import numpy as np
import time
from sklearn.datasets import load_diabetes
from sklearn.gaussian_process import GaussianProcessRegressor
diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target
seeds = 30
times = []
for seed in range(seeds):
gpr = GaussianProcessRegressor(
random_state=seed
)
gpr.fit(X, y)
start_time = time.time()
predictions = gpr.predict(X, return_std=False, return_cov=False)
times.append(time.time() - start_time)
# print(f"Mean time to predict along {seeds} runs in new version: {np.mean(times)} seconds")
print(f"Mean time to predict along {seeds} runs in old version: {np.mean(times)} seconds") Output is:
In conclusion, avoiding the |
Thanks for the quick benchmark, I put together a slightly different one just to double-check and it seems like indeed import numpy as np
import time
from sklearn.datasets import make_regression
from sklearn.gaussian_process import GaussianProcessRegressor
X, y = make_regression(n_samples=5_000)
gpr = GaussianProcessRegressor(
random_state=0
)
gpr.fit(X, y)
%timeit gpr.predict(X, return_std=False, return_cov=False)
%prun -s cumulative -l 10 gpr.predict(X, return_std=False, return_cov=False) Output on
Output in this PR (
|
GaussianProcessRegressor.predict
faster when return_std and return_cov are false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @RafaAyGar
Reference Issues/PRs
Fixes #31374
What does this implement/fix? Explain your changes.
This PR avoids the unnecessary execution of the
solve_triangular()
function inside theGaussianProcessRegressor()
predict()
function when the argumentsreturn_std
andreturn_cov
are set toFalse
.A non-regression test is also implemented to check that
y_mean
is returned alone (not a tuple) whenreturn_std=False
andreturn_cov=False
. This behavior also existed previously and was not covered by the tests.Any other comments?
N/A.