Skip to content

Basic support for overlay PR analysis #2945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Conversation

cklin
Copy link
Contributor

@cklin cklin commented Jun 23, 2025

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Confirm the readme has been updated if necessary.
  • Confirm the changelog has been updated if necessary.

@cklin cklin marked this pull request as ready for review June 23, 2025 15:51
@Copilot Copilot AI review requested due to automatic review settings June 23, 2025 15:51
@cklin cklin requested a review from a team as a code owner June 23, 2025 15:51
@cklin cklin requested a review from nickrolfe June 23, 2025 15:51
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds foundational support for overlay database analysis, including caching of the overlay-base database and integration into init/analyze workflows.

  • Introduces functions to upload/download overlay-base DB to Actions cache.
  • Updates configuration logic to determine overlay modes and caching behavior.
  • Integrates overlay mode handling into init and analyze actions and feature flags.

Reviewed Changes

Copilot reviewed 28 out of 42 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/overlay-database-utils.ts Implements check, upload, and download for overlay-base DB cache
src/config-utils.ts Adds getOverlayDatabaseMode and new augmentation properties
src/init-action.ts Passes sourceRoot to init and applies cache download logic
src/analyze-action.ts Overrides cleanup level and uploads overlay-base DB to cache
src/feature-flags.ts Introduces OverlayAnalysis feature and min CLI version check
Comments suppressed due to low confidence (2)

src/overlay-database-utils.ts:170

  • The new overlay database caching functions (uploadOverlayBaseDatabaseToCache and downloadOverlayBaseDatabaseFromCache) are not covered by existing tests. Consider adding unit tests to validate their guard conditions and successful cache interactions.
export async function uploadOverlayBaseDatabaseToCache(

src/config-utils.ts:791

  • [nitpick] getOverlayDatabaseMode contains complex branching logic. Consider adding or expanding its JSDoc to outline the decision flow and environment variable precedence for better readability and future reference.
async function getOverlayDatabaseMode(

logger: Logger,
): Promise<boolean> {
const overlayDatabaseMode = config.augmentationProperties.overlayDatabaseMode;
if (overlayDatabaseMode !== OverlayDatabaseMode.OverlayBase) {
Copy link
Preview

Copilot AI Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The guard logic for overlay database caching (checking mode, caching flag, and test mode) is duplicated in both upload and download functions. Consider extracting this into a shared helper to reduce duplication and simplify future maintenance.

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@nickrolfe nickrolfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions/comments, but overall it looks very sensible.

Comment on lines 848 to 856
if (buildMode !== BuildMode.None) {
logger.warning(
`Cannot build an ${overlayDatabaseMode} database because ` +
`build-mode is set to "${buildMode}" instead of "none". ` +
"Falling back to creating a normal full database instead.",
);
return nonOverlayAnalysis;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to check whether PR analyses for dynamic languages like Ruby actually have the build mode defined as None, or if it's left undefined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code so that it checks for build mode only for traced languages.

"from the CODEQL_OVERLAY_DATABASE_MODE environment variable.",
);
} else if (
["github", "dsp-testing"].includes(repository.owner) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we can't just use the feature flag to restrict overlay analysis to these orgs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not fully implement all the functionality that we planned, so it is considered an early prototype and suitable for internal testing only. So we want to be sure that we never attempt to perform overlay analysis on user repositories with this implementation, even if a user were to pin their workflow to this PR after merge, and then we enable all feature flags.

That is why I am using an explicit allowlist here instead of relying on feature flags.

}

function generateCacheKey(config: Config, codeQlVersion: string): string {
const sha = process.env.GITHUB_SHA || "unknown";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the GITHUB_SHA env var is somehow not set, should we instead skip uploading to the actions cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. In addition to what you said, there is another reason why we should not be relying on GITHUB_SHA: the environment variable, even if it exists, may not necessarily correspond to the commit that is analyzed. It is better to call getCommitOid() instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code to use getCommitOid().


const dbLocation = config.dbLocation;
const codeQlVersion = (await codeql.getVersion()).version;
const restoreKey = getCacheRestoreKey(config, codeQlVersion);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow how this can work, since the saveKey computation appends either a SHA or unknown to the result of getCacheRestoreKey(), and we're not appending that suffix here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code relies on the actions cache feature of performing restores using prefix match on the cache key:
https://docs.github.com/en/actions/how-tos/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#matching-a-cache-key

@cklin
Copy link
Contributor Author

cklin commented Jun 30, 2025

Thank you for your feedback @nickrolfe!

I will take the PR back to draft to address comments and resolve merge conflicts. Once that is done, I will mark the PR as ready for review again.

@cklin cklin marked this pull request as draft June 30, 2025 21:05
cklin added 14 commits June 30, 2025 14:46
This commit adds overlayDatabaseMode to AugmentationProperties and
creates a placeholder getOverlayDatabaseMode() function, with the
necessary inputs, to populate it.
This commit populates getOverlayDatabaseMode() in config-utils with the
same code from getOverlayDatabaseMode() in init.
This commit changes databaseInitCluster() to use overlayDatabaseMode
from AugmentationProperties instead of the overlayDatabaseMode
parameter. There is no behavior change because both overlayDatabaseMode
values are computed the same way.

The commit then cleans up the overlayDatabaseMode parameter and the code
paths that feed into it.
This commit changes getOverlayDatabaseMode so that, when
Feature.OverlayAnalysis is enabled, it calculates the overlay database
mode automatically based on analysis metadata. If we are analyzing the
default branch, use OverlayBase, and if we are analyzing a PR, use
Overlay.

If CODEQL_OVERLAY_DATABASE_MODE is set to a valid overlay database mode,
that environment variable still takes precedence.
This commit adds useOverlayDatabaseCaching to AugmentationProperties to
indicate whether the action should upload overlay-base databases to the
actions cache and to download a cached overlay-base database when
creating an overlay database.
@cklin cklin force-pushed the cklin/overlay-analysis branch from fd2660c to bc3a47b Compare July 1, 2025 17:50
@cklin cklin marked this pull request as ready for review July 1, 2025 18:19
@cklin
Copy link
Contributor Author

cklin commented Jul 1, 2025

@nickrolfe The PR is ready for another look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants