-
Notifications
You must be signed in to change notification settings - Fork 396
Add more tests for Unicode case-insensitivity in regexes. #5198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more tests for Unicode case-insensitivity in regexes. #5198
Conversation
assertMatches(ranges, "\u0180") // neither mapped to nor from, but in the specified range | ||
// 0540-0550 | ||
assertMatches(ranges, "\u0547") // in range | ||
assertMatches(ranges, "\u0577") // mapped from 0577 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong code point in comment?
Also: When you write "mapped from X" do you mean the code point in the test is folded to from that value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yes, the comment should say "0547".
Also: When you write "mapped from X" do you mean the code point in the test is folded to from that value?
Yes, that's right. It might be clearer to write "0547 folds to 0577" in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that would be much clearer.
// FB00 | ||
assertMatches(ranges, "\uFB00") // ff LATIN SMALL LIGATURE FF | ||
// 0175-0182 (contains 017F which folds to 's') | ||
if (!executingInJVM) { // looks like a JVM bug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to find a reference for this? (or is this why I'm not requested as a reviewer on this PR yet and I'm prematurely reviewing? :P).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I haven't been able to find a reference. The other way around does match (using r-t
in the pattern, and matching "\u017F"
, though, so it is most likely a bug. I should add that reverse thing as a test case to demonstrate that, actually.
I should probably file a bug to OpenJDK as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug filed; pending review.
Also FYI this is in the context of scala-wasm#83. |
eae9211
to
1cf0160
Compare
Can be reviewed again. I suggest we wait for the bug to be reviewed and added to the bug database before merging, so that we can directly include a link. |
`Formatter` is an entire beast of its own. It is a poor fit for the debugging output of another area of the test suite.
1cf0160
to
5696a1a
Compare
Bug report accepted upstream. I inserted the link. |
This was woefully under-tested.