Skip to content

py/obstr: Support tuples as pre/suffix in startswith() and endswith(). #16812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 2, 2025

Conversation

glenn20
Copy link
Contributor

@glenn20 glenn20 commented Feb 25, 2025

This PR adds support for passing tuples as the prefix/suffix argument to the str.startswith() and str.endswith() methods. This improves compatibility with CPython.

The methods will return True if the string starts/ends with any of the prefixes/suffixes in the tuple.

The PR also adds uniform support for the optional start and end arguments of str.startswith() and str.endswith().

Taken together, these provide full compatibility with CPython str.startswith() and str.endswith() methods (see python docs).

The new signatures of the methods are:

bool str.startswith(prefix: str | tuple[str] [, start: int | None [, end: int | None]])
bool str.endswith(suffix: str | tuple[str] [, start: int | None [, end: int | None]])

Testing

The tests in tests/basics/string_{starts,ends}with.py have been extended to test the new features. These tests are passing on the unix port.

Tradeoffs

This PR should be fully backward compatible with existing python code running on micropython.

While improving compliance wih cpython, this PR will provide a small increase in code size. This may be mitigated by reusing the new objstr.c:get_substring_data() function to reduce code size elsewhere (eg. objstr.c:str_count() and objstr.c:str_finder()).

Copy link

github-actions bot commented Feb 25, 2025

Code size report:

   bare-arm:   +24 +0.042% 
minimal x86:   -47 -0.025% 
   unix x64:    +0 +0.000% standard
      stm32:   +16 +0.004% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:   +16 +0.002% RPI_PICO_W
       samd:   +12 +0.004% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:   +12 +0.003% VIRT_RV32

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Feb 25, 2025
@dpgeorge
Copy link
Member

Thanks, this is something I've wanted for a while but was afraid it would be a big feature. It's quite impressive how small the code size change is here!

You will need to get the CI passing, I think just remove the endswith test from misc/non_compliant.py.

@glenn20
Copy link
Contributor Author

glenn20 commented Feb 25, 2025

Thanks, this is something I've wanted for a while but was afraid it would be a big feature. It's quite impressive how small the code size change is here!

Ta. It got smaller once I realised startswith() and endswith() can share the same implementation. May even reduce the code size if we reuse get_substring_data() in str_count() and str_finder().

You will need to get the CI passing, I think just remove the endswith test from misc/non_compliant.py.

Done. I'm still getting failures on extmod/select_poll_fd, but that is also true for master.

Copy link

codecov bot commented Feb 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.54%. Comparing base (69ffd2a) to head (eb45d97).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #16812   +/-   ##
=======================================
  Coverage   98.54%   98.54%           
=======================================
  Files         169      169           
  Lines       21855    21864    +9     
=======================================
+ Hits        21536    21545    +9     
  Misses        319      319           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dpgeorge
Copy link
Member

CI now passing.

@dpgeorge dpgeorge added this to the release-1.25.0 milestone Feb 25, 2025
@glenn20
Copy link
Contributor Author

glenn20 commented Feb 25, 2025

May even reduce the code size if we reuse get_substring_data() in str_count() and str_finder().

I just pushed a commit to do this and there is a tiny overall code size reduction (compared to main):

   bare-arm:    -8 -0.014% 
minimal x86:    -3 -0.002% 
   unix x64:   -72 -0.008% standard
      stm32:   -96 -0.024% PYBV10
     mimxrt:  -104 -0.028% TEENSY40
        rp2:   -72 -0.008% RPI_PICO_W
       samd:  -100 -0.037% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:   -78 -0.017% VIRT_RV32

Let me know if you prefer to back this out. If you'd like to keep it, I can squash the separate commits.

@glenn20 glenn20 force-pushed the feature-str-startswith-tuple branch from db2eaf4 to d81c594 Compare February 25, 2025 07:31
@cclauss
Copy link
Contributor

cclauss commented Feb 25, 2025

This is great to see!! This should enable MicroPython code to pass ruff rule PIE810.

Once this has been merged, you might also consider adding str.removeprefix() or str.removesuffix() which is suggested by ruff rule FURB188 .

@glenn20
Copy link
Contributor Author

glenn20 commented Feb 26, 2025

Once this has been merged, you might also consider adding str.removeprefix() or str.removesuffix() which is suggested by ruff rule FURB188 .

Done ;). See #16821. Thanks for the suggestion @cclauss.

@dpgeorge
Copy link
Member

@glenn20 I like the additional commit which refactors code using get_substring_data(). I think that should stay as a separate commit, but to make that commit a bit smaller, I suggest placing the new get_substring_data() function near the top of the file in the first commit that adds it (so it doesn't need to be moved in the third commit). Eg put it just after check_is_str_or_bytes() and before the ********* comment line.

@glenn20 glenn20 force-pushed the feature-str-startswith-tuple branch 2 times, most recently from 667d942 to 8ed4384 Compare February 28, 2025 11:54
@glenn20
Copy link
Contributor Author

glenn20 commented Feb 28, 2025

Commits split, reordered, squashed and rebased.

py/objstr.c Outdated
@@ -748,23 +768,11 @@ static mp_obj_t str_finder(size_t n_args, const mp_obj_t *args, int direction, b

GET_STR_DATA_LEN(args[0], haystack, haystack_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haystack_len is no longer used, so this could be maybe replaced with something slightly smaller/efficient?

Also, is haystack really needed now? I guess it is to compute the correct offset for return value.

@glenn20 glenn20 force-pushed the feature-str-startswith-tuple branch from 8ed4384 to a815856 Compare March 1, 2025 22:50
This change allows tuples to be passed as the prefix/suffix argument to the
`str.startswith()` and `str.endswith()` methods.  The methods will return
`True` if the string starts/ends with any of the prefixes/suffixes in the
tuple.

Also adds full support for the `start` and `end` arguments to both methods
for compatibility with CPython.

Tests have been updated for the new behaviour.

Signed-off-by: Glenn Moloney <glenn.moloney@gmail.com>
@dpgeorge dpgeorge force-pushed the feature-str-startswith-tuple branch from a815856 to eb45d97 Compare March 2, 2025 11:17
@dpgeorge dpgeorge merged commit eb45d97 into micropython:master Mar 2, 2025
64 checks passed
@dpgeorge
Copy link
Member

dpgeorge commented Mar 2, 2025

Very good, nice enhancement, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants