-
-
Notifications
You must be signed in to change notification settings - Fork 26k
CI Avoid Windows timeout by switching to OpenBLAS #31641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
OK, so I could spot two architectures of CPU:
and
The failure seems therefore related to MKL and the AMD CPU. I'll switch to openblas to see if the tests are passing and we might want to try reporting the issue. |
@@ -297,7 +297,7 @@ def remove_from(alist, to_remove): | |||
], | |||
"package_constraints": { | |||
"python": "3.10", | |||
"blas": "[build=mkl]", | |||
"blas": "[build=openblas]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should rename the mkl to openblas but I wanted to test without to have to change anything else.
Could you please update the lock file only for the windows build instead? There is a commandline option in the script to do so. Use |
I pushed a commit changing only the Windows lock-file and renaming the lock-file to have openblas rather than mkl. Let's see what the CI says 🤞. |
Windows CI passed and it ran on the problematic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a good enough work-around to get back to a somewhat normal CI.
Merging this to be able to make other PRs go forward. One possible improvement for a later PR would be to put the CPU info on Windows in the same place as we do In an ideal world, we would also further understand the issue but this seems time-consuming (would first need a reproducer with scipy/numpy only, and maybe eventually a reproducer in Note for completeness: I was able to reproduce by using action-tmate to have an interactive SSH session in a GHA runner (same AMD CPU as in Azure), see https://github.com/mxschmitt/action-tmate?rgh-link-date=2025-06-24T14%3A01%3A52Z. |
Thanks @lesteve to have finish this one. |
After some time of silence, the jury of the maintainability award judges this PR worth of the 5th trophy 🏆 (after #25069 (comment)). As the Nobel committee does, the price is split between @glemaitre and @lesteve. Race: The jury kindly asks for support about future suggestions of nominees. |
Not sure about a maintainability award for this one, maybe more of a firefighting mode award 😅. This may blow up in our face one day 1 but oh well, for now things are back to normal, let's hope for some time 🤞. Footnotes
|
PR to debug the issue where the Windows CI fails when a node with a single physical core is detected.