Skip to content

TSAN failures seen running PyO3 tests with the free-threaded build #130421

Open
@ngoldbaum

Description

@ngoldbaum

EDIT June 13: see #130421 (comment) for current status

I'm seeing TSAN warnings running the PyO3 tests using CPython commit 38642bf

I've done this on an M3 Macbook Pro running MacOS Sequoia as well as @nascheme's cpython_sanity docker image which has LLVM 20 installed (as well as Python3.13 with TSAN and some packages, but I didn't use that). I think it's only possible to run TSAN on both the rust code and CPython using LLVM 20 and I can't easily install that on my Mac right now since it's not yet packaged on homebrew.

See this comment in the PyO3 repo if you want to use the docker image, there are some small tweaks you need to do before it will work correctly.

On an ARM Mac, I installed llvm from homebrew and then did CONFIGURE_OPTS="--with-thread-sanitizer" pyenv install 3.14t-dev to get a TSAN CPython build. You'll also need to install a rust toolchain.

You'll also need a copy of PyO3 checked out to this branch.

Because homebrew doesn't have LLVM 19, I had to resort to just running the cargo tests as normal using a CPython with TSAN. I think this should still detect races happening inside CPython.

pyenv global 3.14t-dev
pip install nox
cd pyo3
nox -s test

Here is the full output from one invocation on my Mac: https://gist.github.com/ngoldbaum/e198d87149617ecdaf881f29a03b8126

Here are a sampling of the warning summaries:

data race pytime.c:1163 in py_get_monotonic_clock
data race typeobject.c:2235 in _PyType_AllocNoTrack
data race weakrefobject.c:413 in get_or_create_weakref
data race pytime.c:1180 in py_get_monotonic_clock
data race pytime.c:1162 in py_get_monotonic_clock
data race object.c:343 in _Py_IncRef

You can ignore all the test_compiler_error messages - nightly rust always has compiler error message failures.

Another way to trigger these failures is with cargo stress, which runs the tests in a loop to try to trigger safety issues like this. I have a hacked together version of cargo stress on this branch that makes it so that instead of crashing if a thread writes to stderr, it prints the stderr to the terminal and continues. If you run TSAN with TSAN_OPTIONS=exit_code=0, my version of cargo stress will happily continue running after the first TSAN warning. This is a good way to generate lots of warnings quickly without waiting to rerun the full test suite manually.

Here are some additional summaries that I see in a cargo stress run:

data race tupleobject.c:173 in PyTuple_Pack
data race typeobject.c:3368 in best_base

And here is the full terminal output (this ran for about 10 seconds before I killed it with ctrl-c): https://gist.github.com/ngoldbaum/1d1e29c8e10f0ac979ef27a95c73d39f

When I try to do the same tests in the docker container using a version of 3.14t-dev I built on the container, I don't see any of the TSAN reports seen above. Maybe they don't happen on x86_64?

Also note that there is a race inside PyO3 triggered by the PyO3 testtest_thread_safety_2, you may see that if you are running the tests inside the docker container. There are also two test failures due to unexpected panics that I only see under TSAN in the docker container. I'm not sure what's happening with the failures yet.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions