-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Yaml] Remove escaping regex #19782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Yaml] Remove escaping regex #19782
Conversation
f470831
to
e1ea62b
Compare
@@ -48,7 +52,7 @@ class Escaper | |||
*/ | |||
public static function requiresDoubleQuoting($value) | |||
{ | |||
return preg_match('/'.self::REGEX_CHARACTER_TO_ESCAPE.'/u', $value); | |||
return strlen($value) !== strcspn($value, implode('', self::$escapees)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would force escaping values containing "
, /
and \
that weren't escaped before. Does it matter ?
b534d61
to
87bc631
Compare
@@ -0,0 +1,15 @@ | |||
<?php |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about this file :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, i should stop using -A
^^
'\\x18', '\\x19', '\\x1a', '\\e', '\\x1c', '\\x1d', '\\x1e', '\\x1f', | ||
'\\N', '\\_', '\\L', '\\P'); | ||
private static $escapees = array( | ||
'\\', '\\\\', '\\"', '"', '/', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this fixes a "bug", /
was missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be done in 2.7 then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works in 2.7, it should be escaped but that's not mandatory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so why doing it then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that's not clear what we should do here. Non-printable characters must be escaped but /
is a printable character (the spec says \/
is available for json compatibily), so I guess it's up to us to decide what to do.
For readability /
should indeed be better while \/
is more compliant with json. If you prefer /
I'll revert this change.
As it's not a bug fix, this should be done in master. |
d317288
to
b377df8
Compare
ok, retargeted and rebased. |
That's the main idea, refactorings do not fix anything, so they should be done in master (the only exception is when it gives a huge performance improvement and the changes are trivial). |
5be4264
to
820dc13
Compare
@@ -684,7 +684,7 @@ public static function evaluateBinaryScalar($scalar) | |||
|
|||
private static function isBinaryString($value) | |||
{ | |||
return !preg_match('//u', $value) || preg_match('/[^\x09-\x0d\x20-\xff]/', $value); | |||
return !preg_match('//u', $value) || preg_match('/[^\x00-\x1f\x20-\xff]/', $value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because these caracters are escapable in double quoted strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, maybe we should add a comment with some reference to an external source or something like that to help understanding the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact i'm not even sure we should use !!binary
when dumping, all characters are supported in double-quoted scalars anyway.
I'll try to find what the spec says about that.
Well, since you changed some char ranges, this looks like a bug fix to me. About the const, removing it is a BC break. |
@nicolas-grekas the whole class is marked as internal, so not covered by the BC promise |
return preg_match('/[ \s \' " \: \{ \} \[ \] , & \* \# \?] | \A[ \- ? | < > = ! % @ ` ]/x', $value); | ||
// First character is an indicator | ||
// @see http://yaml.org/spec/1.2/spec.html#c-indicator | ||
if ($value && 1 === strspn($value[0], '-?&*!|<>\'"=%@`')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is broken for the empty string (btw, the simpler fix for that is probably to add the empty string in the array of values above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, good catch!
'slash' => array('/', '/'), | ||
'backslash' => array('\\', '\\'), | ||
'next-line' => array("\xC2\x85", '"\\N"'), | ||
'non-breaking-space' => array('�', '�'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be "\_"
in YAML ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, the test is wrong.
So, to help moving this one forward, I'm personnaly 👎 on this:
This is subjective, with no clear technical benefit, and will create merge conflicts that are thus not worth it. |
@nicolas-grekas this PR mostly intents to improve readability and maintainability (do you think it's obvious what It also improves a bit scalars dumping when quotes are in the middle of the scalar (e.g. Anyway, we should at least update the tests which are currently useless. |
Closing as I agree with @nicolas-grekas here. @GuilhemN Can you submit a PR for the tests though? |
This PR was merged into the 2.7 branch. Discussion ---------- [Yaml] Fix the tests | Q | A | ------------- | --- | Branch? | 2.7 | Bug fix? | no | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | | License | MIT | Doc PR | Fix tests that are currently useless (previously part of #19782). Commits ------- 5107b94 [Yaml] Fix the tests
This PR replaces regexs in
Escaper
by call tostrcspn
. It is much easier to read imo and performance aren't degraded.I also updated the data provider of a test because it was in practice testing nothing (it is checking if values are quoted as expected but as they always began with
\t
it was always quoted anyway).Edit: quotes are moved to indicators as they are supported inside plain scalars as said by @stefk: