-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
py/emitnative: Some more load/store improvements. #17477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
py/emitnative: Some more load/store improvements. #17477
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #17477 +/- ##
==========================================
- Coverage 98.57% 98.56% -0.01%
==========================================
Files 169 169
Lines 21968 21946 -22
==========================================
- Hits 21654 21632 -22
Misses 314 314 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Code size report:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, it looks really good! A nice clean up and enhancement.
#define ASM_SUB_REG_REG(state, rd, rs) asm_rv32_opcode_sub(state, rd, rd, rs) | ||
#define ASM_XOR_REG_REG(state, rd, rs) asm_rv32_emit_optimised_xor(state, rd, rs) | ||
#define ASM_CLR_REG(state, rd) | ||
#define ASM_LOAD8_REG_REG_REG(state, rd, rs1, rs2) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this particular macro isn't needed? If it's omitted then I think emitnative.c
will generate the same code using its generic fall back code.
(Same comment for ASM_STORE8_REG_REG_REG
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now they act the same as the fallback indeed, the reason for their existence is that it was some preparatory work for further changes in the RV32 emitter I planned to introduce (optional support for more RV32 C sub-extensions that introduce more compressed opcodes). This technically applies to ASM_LOAD16_REG_REG_REG
and ASM_STORE16_REG_REG_REG
too, as whilst the emitted code sequence is different the number of opcodes and the final sequence size is the same.
I don't mind taking them out now to potentially re-introduce them later (even if under a different form).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it' fine to keep them.
1371a54
to
2293a54
Compare
This commit adds two macros that lets check whether a given value can fit an arbitrary number of bits, either as a signed or as an unsigned number. The native emitter code backends perform a lot of bit size checks to see if a particular code sequence can be emitted instead of a generic one, and each platform backend has its own ad-hoc macros (usually one per bit count and signedness). With these macros there's a single way to perform those checks, plus there's no more chance for off-by-one mask length errors when dealing with signed numbers. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit expands the implementation of Viper load/store operations that are optimised for the RV32 platform. Given the way opcodes are encoded, all value sizes are implemented with only two functions - one for loads and one for stores. This should reduce duplication with existing operations and should, in theory, save space as some code is removed. Both load and store emitters will generate the shortest possible sequence (as long as the stack pointer is not involved), using compressed opcodes when possible. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit expands the implementation of Viper load/store operations that are optimised for the Arm platform. Now both load and store emitters should generate the shortest possible sequence in all cases. Redundant specialised operation emitters have been folded into the general case implementation - this was the case of integer-indexed load/store operations with a fixed offset of zero. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit expands the implementation of Viper load/store operations that are optimised for the Xtensa platform. Now both load and store emitters should generate the shortest possible sequence in all cases. Redundant specialised operation emitters have been aliased to the general case implementation - this was the case of integer-indexed load/store operations with a fixed offset of zero. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit removes load/store opcode implementations that have been made redundant in 1f5ba69. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit expands the implementation of Viper load/store operations that are optimised for the x86 platform. Unlike other platforms, x86 already implemented all necessary functions and all it took to expose these to Viper after the infrastructure refactoring was to add a few defines. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit expands the implementation of Viper load/store operations that are optimised for the x86 platform. Like x86, x64 already implemented all necessary functions and all it took to expose these to Viper after the infrastructure refactoring was to add a few defines. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit reworks the Viper pointer boundary tests in order to make them more accurate and easier to extend. The tests are now easier to reason about in their output, using easier to read values, and bit thresholds are now more configurable. If a new conditional code sequence is introduced, adding a new bit threshold is just a matter of adding a value into a tuple at the beginning of the relevant test file. Load tests have also been made more accurate, with better function templates to test register-indexed operations. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit introduces a mechanism to customise the code that is injected to the board when performing a test file upload and execution. A new argument, --begin", is added so regular Python code can be inserted in the injected fragment between the module file creation and the effective file import. This is needed for running larger tests (usually ones that have been pre-compiled with "--via-mpy --emit native") on ESP8266, as that board does not have enough memory to fit certain blocks of code unless additional configuration is performed. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit cleans up the single entry point integer-indexed load/store emitters that have been built by merging the single operand type load/store functions in 1f5ba69. To follow the same convention found in RV32 and Xtensa emitters, the function operand size is not named after the left shift amount to apply to the initial offset to get its true byte offset, but by a generic "operand size". Also, those functions were updated to use the new MP_FIT_UNSIGNED macros to perform bit width checks when figuring out which opcode encoding is the best one to use. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2293a54
to
1b0cdc0
Compare
Summary
Here's the last - I hope - PR with Viper load/store improvements (well, for native too since they share those emitter primitives).
Some ports should, in theory, actually shrink down in size:
However, the Arm emitter is probably slightly bigger since there's one function per operation/size combination. I've tried to offset this by merging some single-opcode emitter functions into entities that generate the shortest sequence for all offsets. Halfword opcodes have a different format so I'd still have four functions rather than six - not really worth the effort for such a platform that, as I understand it, isn't as memory constrained as other ones supported by MicroPython.
Thumb only got redundant single-opcode emitters removed, so it's equivalent as far as emitted code goes.
I haven't replaced all the bit-width fit checks in the various emitters with the new macros, in order to keep this patch series more readable. For the time being I'll replace those as I encounter them in code that gets changed.
Boundary tests were cleaned up and improved as well - I didn't go as far as using unittest's sub-tests (one per bit width tested), but if that's desired I can migrate those tests to that framework.
Who would have thought that getting and setting simple variables would have triggered all this work? :)
Testing
tests/mcropython/viper_ptr*
were executed successfully on the following ports:VIRT_RV32
,SABRELITE
, andMPS_AN385
) both directly and with--via-mpy --emit native
--via-mpy --emit native
using the same prelude injection method introduced for natmod tests (-b
was already taken intests/run-tests.py
, so only--begin
is available)--via-mpy --emit native
Trade-offs and Alternatives
As mentioned, the Arm emitter is slightly larger and it can be made a bit smaller if it's required - adding some complexity to that emitter backend.