Skip to content

[Finder] Optimize the hot-path #15826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 18, 2015
Merged

Conversation

nicolas-grekas
Copy link
Member

Q A
Bug fix? yes
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #15824
License MIT
Doc PR -

A significant part of the perf gain in #15802 was related to filters not being applied recursively...
#15824 fixing this, performance dropped again.

This PR optimizes the hot path by replacing a regexp test by a simple isset when possible.
Blackfire diff after #15824 is the following:
https://blackfire.io/profiles/compare/9e489018-998d-4acb-92a0-46011828e83b/graph

preg_match is not called anymore, and Symfony\Component\Finder\Iterator\RecursiveDirectoryIterator::current() is also cut by two.

When this isset optimization is disabled and replaced by a concatenation of all the regexps patterns in a single bigger one, the gain is still significant but lower:
https://blackfire.io/profiles/compare/db86b80e-b63e-4fc9-9ff3-9ed32baeb948/graph

This makes me think that an other and last round of optimization is possible by merging all regexps in one in MultiplePcreFilterIterator filters. If someone wants to work on this, please do it :)

@stof
Copy link
Member

stof commented Sep 17, 2015

This makes me think that an other and last round of optimization is possible by merging all regexps in one in MultiplePcreFilterIterator filters. If someone wants to work on this, please do it :)

this is quite hard to do though, as it is not just a matter of concatenating regexes (combining regexes is very hard to do, and almost impossible in case complex regexes are involved)

@@ -73,6 +87,9 @@ public function getChildren()
if ($children instanceof self) {
// parent method will call the constructor with default arguments, so unreadable dirs won't be ignored anymore
$children->ignoreUnreadableDirs = $this->ignoreUnreadableDirs;
$children->subPath = $children->getSubPath();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a command before this to explain it is a performance optimization to avoid redoing the same work in all childrens

@stof
Copy link
Member

stof commented Sep 17, 2015

@nicolas-grekas this seems to make PHP segfault

@stof
Copy link
Member

stof commented Sep 17, 2015

@nicolas-grekas can you post the profiling and benchmarking results of the ExcludeDirectoryFilter optimization you posted, as this is a totally different optimization than the initial one ?

@nicolas-grekas nicolas-grekas force-pushed the finder-perf branch 12 times, most recently from cd91ade to 8611f74 Compare September 17, 2015 14:12
@@ -31,6 +31,10 @@ class RecursiveDirectoryIterator extends \RecursiveDirectoryIterator
*/
private $rewindable;

private $rootPath;
private $subPath;
private $directorySeparator = '/';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some documentation on them to make the code understandable by someone reading the code later

@nicolas-grekas
Copy link
Member Author

@stof the profiling of the ExcludeDirectoryFilter is embedded in the above profiles: you need to click on the "current" node (bottom right) then on the magnifying glass on the left box to draw the graph around this node.

@@ -31,6 +31,11 @@ class RecursiveDirectoryIterator extends \RecursiveDirectoryIterator
*/
private $rewindable;

// these 3 properties take part of the performance optimization to avoid redoing the same work in all iterations
private $rootPath;
private $subPath = false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null is the return value of $this->getSubPath() on hhvm, thus requiring false here as "not set" placeholder

@nicolas-grekas
Copy link
Member Author

   1010 files
    110 directories
     10 iterations
      6 cases
      2 adapters

            case             php        gnu_find
               0          92,396          33,565
               1          99,673          32,776
               2         103,553          54,970
               3         102,736          61,805
               4         105,615          19,072
               5         107,359          35,432

      0 Find files by name containing a*
      1 Find files by name containing a* not containing *a
      2 Find files by name containing ~^a.*~ not containing ~.*a$~
      3 Find files by path containing a*
      4 Find files by path containing a* not containing *a
      5 Find files by path containing ~^a.*~ not containing ~.*a$~

See #15802 (comment) for comparison.

@@ -61,7 +61,7 @@ class Regex implements ValueInterface
*/
public static function create($expr)
{
if (preg_match('/^(.{3,}?)([imsxuADU]*)$/', $expr, $m)) {
if (preg_match('/^(.{2,}?)([imsxuADU]*)$/', $expr, $m)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason for this change ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the regex class rejects empty expressions //, but it should not, they are perfectly valid

}
}
if ($patterns) {
$this->excludedPattern = '#(^|/)('.implode('|', $patterns).')(/|$)#';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using non-capturing groups here. It will make the regex matching more efficient (the regex engine will not need to capture the groups)

@stof
Copy link
Member

stof commented Sep 18, 2015

👍 on this one

@nicolas-grekas
Copy link
Member Author

hhvm fixed, good to merge
Status: reviewed

@fabpot
Copy link
Member

fabpot commented Sep 18, 2015

Thank you @nicolas-grekas.

@fabpot fabpot merged commit f156de6 into symfony:2.3 Sep 18, 2015
fabpot added a commit that referenced this pull request Sep 18, 2015
This PR was merged into the 2.3 branch.

Discussion
----------

[Finder] Optimize the hot-path

| Q             | A
| ------------- | ---
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #15824
| License       | MIT
| Doc PR        | -

A significant part of the perf gain in #15802 was related to filters not being applied recursively...
#15824 fixing this, performance dropped again.

This PR optimizes the hot path by replacing a regexp test by a simple `isset` when possible.
Blackfire diff after #15824 is the following:
https://blackfire.io/profiles/compare/9e489018-998d-4acb-92a0-46011828e83b/graph

`preg_match` is not called anymore, and  `Symfony\Component\Finder\Iterator\RecursiveDirectoryIterator::current()` is also cut by two.

When this `isset` optimization is disabled and replaced by a concatenation of all the regexps patterns in a single bigger one, the gain is still significant but lower:
https://blackfire.io/profiles/compare/db86b80e-b63e-4fc9-9ff3-9ed32baeb948/graph

This makes me think that an other and last round of optimization is possible by merging all regexps in one in MultiplePcreFilterIterator filters. If someone wants to work on this, please do it :)

Commits
-------

f156de6 [Finder] Optimize the hot-path
@nicolas-grekas nicolas-grekas deleted the finder-perf branch September 18, 2015 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants