-
Notifications
You must be signed in to change notification settings - Fork 1.7k
JS: Disable type extraction #19640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
JS: Disable type extraction #19640
Conversation
7ba856a
to
2cd92ef
Compare
/** | ||
* Interface for accessing name-resolution info about type names. | ||
*/ |
Check warning
Code scanning / CodeQL
Class QLDoc style. Warning
/** | ||
* Interface for accessing name-resolution info about expressions. | ||
*/ |
Check warning
Code scanning / CodeQL
Class QLDoc style. Warning
9521fbb
to
2035925
Compare
This query depended on the cons-hashing performed by type extraction to determine if two types are the same. This is not trivial to restore, but not important enough to reimplement right now, so for now just simplifying the query's ability to recognise that two types are the same.
This was used in the very old dist-compare tool, but has no use anymore
4ba014d
to
a039c1b
Compare
3a1fae6
to
1729f7a
Compare
This check existed on the code path for full type extraction, but not for plain single-file extraction.
The 'extractTypeScriptFiles' override did not incorporate the file type and one of our unit tests was expecting this. The test was previously passing for the wrong reasons.
1729f7a
to
5289e4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing performance boost 👍
Kudos on detailed and super easy to understand doc on public API javascript/ql/lib/semmle/javascript/internal/BindingInfo.qll
😍
@@ -239,7 +239,7 @@ public AutoBuild() { | |||
this.outputConfig = new ExtractorOutputConfig(LegacyLanguage.JAVASCRIPT); | |||
this.trapCache = ITrapCache.fromExtractorOptions(); | |||
this.typeScriptMode = | |||
getEnumFromEnvVar("LGTM_INDEX_TYPESCRIPT", TypeScriptMode.class, TypeScriptMode.FULL); | |||
getEnumFromEnvVar("LGTM_INDEX_TYPESCRIPT", TypeScriptMode.class, TypeScriptMode.BASIC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth removing the flag in the future?
Disabling type extraction
We used to rely on the TypeScript compiler's type checker for getting type information during extraction. As of this PR, we only use the TypeScript compiler for parsing, and instead use CodeQL logic to obtain the needed type information ourselves.
Why?
Faster extraction: We're seeing 20-80% faster extraction times for TypeScript code bases. The wins here tend to outweigh the added cost on the QL side.
Incremental extraction: Type extraction was incompatible with "overlay mode". With this change, TypeScript files can be extracted in isolation from each other, meaning we only have to parse the files that have changed when building an overlay database.
Robustness: TypeScript extraction rarely fails, but when it does, it is usually because of type extraction. At least three public issues (1, 2, 3) and several internal ones are ultimately about type extraction problems. Several issues with type extraction have surfaced and been fixed over the years, but there's been a tendency for the "fixes" to be workarounds for problems that we don't have enough control over from the extractor.
Deprecations and breaking changes
This PR deprecates
Type
and other APIs relating to extracted types, and introduces a simple interface for interacting with the new QL-based name/type resolution.Queries relying on
Type
and friends should still compile (with deprecation warnings) but won't have the same results sinceType
is now empty. The change notes therefore lists this as a breaking change. We try to avoid breaking changes when at all possible, but there is no viable mechanism for avoiding it in this case.Call graph changes
When I made the original call graph estimates in the previous PR, I had unfortunately not disabled the influence of type extraction completely, meaning we lose more call edges than originally estimated.
This PR adds a few more fixes to recover some of them, but as seen in the evaluation we still lose 25k call edges. However, this only results in 1 lost alert, which was a FP. I have some follow-up work with generics that recovers about a third of the call edges. Despite the lost call edges, I'd like to move ahead with this PR as the pros clearly outweigh the cons at this point, and it means we can unblock other works that depend on it.