py/emitnative: Some more load/store improvements. #17477

agatti · 2025-06-11T09:13:03Z

Summary

Here's the last - I hope - PR with Viper load/store improvements (well, for native too since they share those emitter primitives).

Some ports should, in theory, actually shrink down in size:

Xtensa has one single function per operation, each covering all operand sizes by flipping bits in a single instruction template (like the Thumb emitter in the previous PR in the series), still using narrow opcodes whenever possible
RV32 also has one single function per operation with a single function template each, with even more duplicated code removed, now emitting compressed opcodes whenever possible (even for store operations)
x86 already had the necessary opcode emitters to cover all load/store cases but they weren't exposed to Viper, so all operations should be one opcode each and - in theory - emit faster code
x64, like x86, also had all the necessary opcode emitters already there, with the same performance benefits

However, the Arm emitter is probably slightly bigger since there's one function per operation/size combination. I've tried to offset this by merging some single-opcode emitter functions into entities that generate the shortest sequence for all offsets. Halfword opcodes have a different format so I'd still have four functions rather than six - not really worth the effort for such a platform that, as I understand it, isn't as memory constrained as other ones supported by MicroPython.

Thumb only got redundant single-opcode emitters removed, so it's equivalent as far as emitted code goes.

I haven't replaced all the bit-width fit checks in the various emitters with the new macros, in order to keep this patch series more readable. For the time being I'll replace those as I encounter them in code that gets changed.

Boundary tests were cleaned up and improved as well - I didn't go as far as using unittest's sub-tests (one per bit width tested), but if that's desired I can migrate those tests to that framework.

Who would have thought that getting and setting simple variables would have triggered all this work? :)

Testing

tests/mcropython/viper_ptr* were executed successfully on the following ports:

QEMU (VIRT_RV32, SABRELITE, and MPS_AN385) both directly and with --via-mpy --emit native
ESP8266 (NodeMCU v3) both directly and with --via-mpy --emit native using the same prelude injection method introduced for natmod tests (-b was already taken in tests/run-tests.py, so only --begin is available)
ESP32 (generic D0WD board), both directly and with --via-mpy --emit native
x64 Unix using the STANDARD variant directly
x86 Unix using the COVERAGE variant directly

Trade-offs and Alternatives

As mentioned, the Arm emitter is slightly larger and it can be made a bit smaller if it's required - adding some complexity to that emitter backend.

codecov · 2025-06-11T09:20:22Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.56%. Comparing base (1eb27e1) to head (1b0cdc0).
Report is 10 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #17477      +/-   ##
==========================================
- Coverage   98.57%   98.56%   -0.01%     
==========================================
  Files         169      169              
  Lines       21968    21946      -22     
==========================================
- Hits        21654    21632      -22     
  Misses        314      314

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2025-06-11T09:24:40Z

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  -336 -0.039% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +8 +0.002% TEENSY40
        rp2:   +16 +0.002% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:   -55 -0.012% VIRT_RV32

dpgeorge

Thanks for this, it looks really good! A nice clean up and enhancement.

py/asmrv32.c

tests/run-tests.py

dpgeorge · 2025-06-25T06:31:43Z

py/asmrv32.h

 #define ASM_SUB_REG_REG(state, rd, rs) asm_rv32_opcode_sub(state, rd, rd, rs)
 #define ASM_XOR_REG_REG(state, rd, rs) asm_rv32_emit_optimised_xor(state, rd, rs)
 #define ASM_CLR_REG(state, rd)
+#define ASM_LOAD8_REG_REG_REG(state, rd, rs1, rs2) \


It looks like this particular macro isn't needed? If it's omitted then I think emitnative.c will generate the same code using its generic fall back code.

(Same comment for ASM_STORE8_REG_REG_REG.)

Right now they act the same as the fallback indeed, the reason for their existence is that it was some preparatory work for further changes in the RV32 emitter I planned to introduce (optional support for more RV32 C sub-extensions that introduce more compressed opcodes). This technically applies to ASM_LOAD16_REG_REG_REG and ASM_STORE16_REG_REG_REG too, as whilst the emitted code sequence is different the number of opcodes and the final sequence size is the same.

I don't mind taking them out now to potentially re-introduce them later (even if under a different form).

OK, it' fine to keep them.

This commit adds two macros that lets check whether a given value can fit an arbitrary number of bits, either as a signed or as an unsigned number. The native emitter code backends perform a lot of bit size checks to see if a particular code sequence can be emitted instead of a generic one, and each platform backend has its own ad-hoc macros (usually one per bit count and signedness). With these macros there's a single way to perform those checks, plus there's no more chance for off-by-one mask length errors when dealing with signed numbers. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

This commit expands the implementation of Viper load/store operations that are optimised for the RV32 platform. Given the way opcodes are encoded, all value sizes are implemented with only two functions - one for loads and one for stores. This should reduce duplication with existing operations and should, in theory, save space as some code is removed. Both load and store emitters will generate the shortest possible sequence (as long as the stack pointer is not involved), using compressed opcodes when possible. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

This commit expands the implementation of Viper load/store operations that are optimised for the Arm platform. Now both load and store emitters should generate the shortest possible sequence in all cases. Redundant specialised operation emitters have been folded into the general case implementation - this was the case of integer-indexed load/store operations with a fixed offset of zero. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

This commit expands the implementation of Viper load/store operations that are optimised for the Xtensa platform. Now both load and store emitters should generate the shortest possible sequence in all cases. Redundant specialised operation emitters have been aliased to the general case implementation - this was the case of integer-indexed load/store operations with a fixed offset of zero. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

This commit removes load/store opcode implementations that have been made redundant in 1f5ba69. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

This commit expands the implementation of Viper load/store operations that are optimised for the x86 platform. Unlike other platforms, x86 already implemented all necessary functions and all it took to expose these to Viper after the infrastructure refactoring was to add a few defines. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

This commit expands the implementation of Viper load/store operations that are optimised for the x86 platform. Like x86, x64 already implemented all necessary functions and all it took to expose these to Viper after the infrastructure refactoring was to add a few defines. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

This commit reworks the Viper pointer boundary tests in order to make them more accurate and easier to extend. The tests are now easier to reason about in their output, using easier to read values, and bit thresholds are now more configurable. If a new conditional code sequence is introduced, adding a new bit threshold is just a matter of adding a value into a tuple at the beginning of the relevant test file. Load tests have also been made more accurate, with better function templates to test register-indexed operations. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

This commit introduces a mechanism to customise the code that is injected to the board when performing a test file upload and execution. A new argument, --begin", is added so regular Python code can be inserted in the injected fragment between the module file creation and the effective file import. This is needed for running larger tests (usually ones that have been pre-compiled with "--via-mpy --emit native") on ESP8266, as that board does not have enough memory to fit certain blocks of code unless additional configuration is performed. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

This commit cleans up the single entry point integer-indexed load/store emitters that have been built by merging the single operand type load/store functions in 1f5ba69. To follow the same convention found in RV32 and Xtensa emitters, the function operand size is not named after the left shift amount to apply to the initial offset to get its true byte offset, but by a generic "operand size". Also, those functions were updated to use the new MP_FIT_UNSIGNED macros to perform bit width checks when figuring out which opcode encoding is the best one to use. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

dpgeorge added the py-core Relates to py/ directory in source label Jun 16, 2025

dpgeorge reviewed Jun 25, 2025

View reviewed changes

py/asmrv32.c Outdated Show resolved Hide resolved

tests/run-tests.py Outdated Show resolved Hide resolved

dpgeorge reviewed Jun 25, 2025

View reviewed changes

agatti force-pushed the optimise-all-viper-emitters branch from 1371a54 to 2293a54 Compare June 26, 2025 18:56

agatti added 10 commits July 1, 2025 15:34

py/asmthumb: Remove redundant load/store opcode implementations.

7bb5f2d

This commit removes load/store opcode implementations that have been made redundant in 1f5ba69. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>

dpgeorge force-pushed the optimise-all-viper-emitters branch from 2293a54 to 1b0cdc0 Compare July 1, 2025 05:43

dpgeorge approved these changes Jul 1, 2025

View reviewed changes

dpgeorge merged commit 1b0cdc0 into micropython:master Jul 1, 2025
67 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

py/emitnative: Some more load/store improvements. #17477

py/emitnative: Some more load/store improvements. #17477

Uh oh!

agatti commented Jun 11, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jun 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 11, 2025 •

edited

Loading

Uh oh!

dpgeorge left a comment

Uh oh!

Uh oh!

Uh oh!

dpgeorge Jun 25, 2025

Uh oh!

agatti Jun 26, 2025

Uh oh!

dpgeorge Jul 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

py/emitnative: Some more load/store improvements. #17477

py/emitnative: Some more load/store improvements. #17477

Uh oh!

Conversation

agatti commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Trade-offs and Alternatives

Uh oh!

codecov bot commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dpgeorge left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dpgeorge Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

agatti Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

dpgeorge Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

agatti commented Jun 11, 2025 •

edited

Loading

codecov bot commented Jun 11, 2025 •

edited

Loading

github-actions bot commented Jun 11, 2025 •

edited

Loading