tools/mpy-tool.py: Allow dumping MPY segments into their own files. #17306
+60
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR lets
tools/mpy-tool.py
extract MPY segments into their own files, one file per segment.This is something I wrote some time ago but I guess it cannot hurt to be upstreamed. When debugging issues related with compiled code generated by
@micropython.viper
or@micropython.native
, it is of great help being able to get hold of generated code segments to pass to objdump or ghidra/idapro/cutter/etc., without having to dump memory from gdb or writing custom file/hex dumpers.A pair of new command line arguments were added, namely "-e"/"--extract" that takes a filename prefix to use as a base for the generated files' name, and "--extract-only" that - combined with "--extract" - allows selecting which kind of segments should be dumped to the filesystem.
So, for example, assuming there's a file called "module.mpy", running "./mpy-tool.py --extract segments module.mpy" would yield a series of files with names like "segments_0_module.py_QSTR_module.py.bin", "segments_1_module.py_META__module_.bin",
"segments_2_module.py_QSTR_function.bin", etc. In short the file name format is
<base>_<count>_<sourcefile>_<segmentkind>_<segmentname>.bin
, with<segmentkind>
being META, QSTR, OBJ, or CODE. Source file names and segment names will only contain characters in the range "a-zA-Z0-9_-." to avoid having output file names with unexpected characters.The "--extract-only" option can accept one or more kinds, separated by commas and treated as case insensitive strings. The supported kinds match what is currently handled by the "MPYSegment" class in "tools/mpy-tool.py": "META", "QSTR", "OBJ", and "CODE". The absence of this command line option implies dumping every segment found.
If "--extract" is passed along with "--merge", dumping is performed after the merge process takes place, in order to dump all possible segments that match the requested segment kinds.
Testing
Besides my own usage, I've attached a zipfile containing the compiled version of
tests/micropython/native_try_deep.py
for x64 and its dumped output. To reproduce those files the commands to run are:To check that the
CODE
segments actually contain executable code, runningobjdump -b binary -M x86-64 -m i386:x86-64 --adjust-vma=0x1000 -z --start-address=0x1008 -D native_try_deep_7_native_try_deep.py_CODE_f.bin
should dump valid x64 code to STDOUT, as generated bympy-cross
(it skips the first two header words).native_try_deep.zip
Trade-offs and Alternatives
Given that this bit of code isn't executed unless explicitly required and for a niche scenario, the only issue it has would be that it increases the overall code complexity by a tiny amount and potential security issues when the output file prefix is used in a malicious way.
As far as alternatives go, I used to run
mpy-tool.py -x -d <mpyfile>
to figure out the binary code start offset by looking at the hex pairs on screen (and good luck if somebody remapped their terminal colour scheme :) no idea if the output is colourblind safe though). After a while I wrote my own cut-downmpy-tool.py
equivalent to run as a ghidra plugin, but then it would require keeping up with MPY format changes and whatnot, and I wasn't sure it would work in all possible cases.Having
mpy-tool.py
dump the segments itself is probably the best compromise for the time being, it is tool-agnostic and doesn't require anything special to get it working.