Skip to content

Corrupt .pyc files stay on disk after failed writes #126606

Closed
@xavierholt

Description

@xavierholt

Bug report

Bug description:

If writing a .pyc file fails (in my case due to a file size limit imposed by ulimit), it can leave corrupt data sitting on disk. This causes a crash the second time you run the program, when the interpreter tries to load the corrupt .pyc file instead of the original .py file:

root@e7138ea2e2b5:/mnt# python3 crashme.py --limit 1024 --import
Setting ulimit to 1024...
Importing a "big" library...
root@e7138ea2e2b5:/mnt# python3 crashme.py --limit 1024 --import
Setting ulimit to 1024...
Importing a "big" library...
Traceback (most recent call last):
  File "/mnt/crashme.py", line 17, in <module>
    from fakelib import bigfile
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 991, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1124, in get_code
  File "<frozen importlib._bootstrap_external>", line 753, in _compile_bytecode
EOFError: marshal data too short

This was a very hard bug to track down, and it seems like there's a chance to catch this and handle it gracefully. If I do a too-large write myself, I get an OSError: [Errno 27] File too large, and I'd imagine the interpreter can see something similar. Ideally (in my opinion - feedback appreciated!), the interpreter would notice this and then:

  • Log a warning.
  • Delete the corrupt file.
  • Set sys.dont_write_bytecode = True (this is the workaround I've been using).

Here's the code I've been using to test this (bigfile can be pretty much anything, but it does seem to have to be part of a module before the interpreter will write a .pyc file for it).

import argparse
import resource
import tempfile

parser = argparse.ArgumentParser()
parser.add_argument('-l', '--limit',  type=int)
parser.add_argument('-i', '--import', action='store_true', dest='impoort')
parser.add_argument('-w', '--write',  type=int)
args = parser.parse_args()

if args.limit is not None:
    print('Setting ulimit to %d...' % args.limit)
    resource.setrlimit(resource.RLIMIT_FSIZE, (args.limit, args.limit))

if args.impoort:
    print('Importing a "big" library...')
    from fakelib import bigfile

if args.write is not None:
    print('Writing a %d byte file...' % args.write)
    with tempfile.TemporaryFile() as file:
        file.write(b'a' * args.write)

Tested in Python 3.10 on OSX and Python 3.12 in an Ubuntu 24 container.

CPython versions tested on:

3.10, 3.12

Operating systems tested on:

Linux, macOS

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.12only security fixes3.13bugs and security fixes3.14bugs and security fixesstdlibPython modules in the Lib dirtopic-importlibtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions