Skip to content

feat: create deploy_remote_function and deploy_udf functions to immediately deploy functions to BigQuery #1832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 24, 2025

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Jun 17, 2025

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

This commit refactors the implementation of immediate deployment
for remote functions and UDFs to eliminate code duplication introduced
in a previous commit.

Changes:
- The `remote_function` and `udf` methods in
  `bigframes.functions._function_session.FunctionSession` now accept
  an optional `deploy_immediately: bool` parameter (defaulting to `False`).
  The previous `deploy_remote_function` and `deploy_udf` methods in
  `FunctionSession` have been removed, and their logic is now
  incorporated into the unified methods.
- The public API functions `bigframes.pandas.deploy_remote_function`
  and `bigframes.pandas.deploy_udf` now call the corresponding
  `FunctionSession` methods with `deploy_immediately=True`.
- The public API functions `bigframes.pandas.remote_function` and
  `bigframes.pandas.udf` call the `FunctionSession` methods
  with `deploy_immediately=False` (relying on the default).
- Unit tests in `tests/unit/functions/test_remote_function.py` have
  been updated to patch the unified `FunctionSession` methods and
  verify the correct `deploy_immediately` boolean is passed based on
  which public API function is called.

Note: The underlying provisioning logic in `FunctionSession` currently
deploys functions immediately regardless of the `deploy_immediately`
flag. This flag serves as an indicator of intent and allows for
future enhancements to support true lazy deployment if desired, without
further API changes.
This commit corrects a previous refactoring attempt to eliminate
code duplication and properly separates immediate-deployment functions
from standard (potentially lazy) functions.

Changes:
- `bigframes.functions._function_session.FunctionSession` now has
  distinct methods: `remote_function`, `udf`,
  `deploy_remote_function`, and `deploy_udf`. The
  `deploy_immediately` flag has been removed from this class.
- `deploy_remote_function` and `deploy_udf` methods in
  `FunctionSession` are responsible for ensuring immediate
  deployment by calling the underlying provisioning logic directly.
  The standard `remote_function` and `udf` methods in
  `FunctionSession` also currently call this provisioning logic,
  meaning all functions are deployed immediately as of now, but the
  structure allows for future lazy evaluation for standard functions
  without changing the deploy variants' contract.
- Public API functions in `bigframes.pandas` (`remote_function`,
  `udf`, `deploy_remote_function`, `deploy_udf`) now correctly
  delegate to their corresponding distinct methods in `FunctionSession`
  (via the `Session` object).
- Unit tests in `tests/unit/functions/test_remote_function.py` have
  been updated to mock and verify calls to the correct distinct
  methods on `bigframes.session.Session`.

This resolves the issue of using a boolean flag to control
deployment type and instead relies on calling specific, dedicated
methods for immediate deployment, aligning with your request.
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jun 17, 2025
This commit simplifies the implementation of
`deploy_remote_function` and `deploy_udf` within
`bigframes.functions._function_session.FunctionSession`.

Given that the standard `remote_function` and `udf` methods in
`FunctionSession` already perform immediate deployment of resources
(as the underlying provisioning logic they call is immediate),
the `deploy_remote_function` and `deploy_udf` methods in the
same class are simplified to directly call `self.remote_function(...)`
and `self.udf(...)` respectively.

This change makes the distinction between the `deploy_` variants and
the standard variants in `FunctionSession` primarily a matter of
semantic clarity and intent at that level; both paths currently
result in immediate deployment. The public API in `bigframes.pandas`
continues to offer distinct `deploy_` functions that call these
`FunctionSession.deploy_` methods, preserving your user-facing API
and its documented behavior of immediate deployment.

No changes were needed for the public API in `bigframes.pandas` or
the unit tests, as they were already aligned with calling distinct
methods on the `Session` object, which in turn calls the
now-simplified `FunctionSession` methods.
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Jun 23, 2025
@tswast tswast marked this pull request as ready for review June 23, 2025 22:29
@tswast tswast requested review from a team as code owners June 23, 2025 22:29
@tswast tswast requested a review from shobsi June 23, 2025 22:29
@tswast
Copy link
Collaborator Author

tswast commented Jun 23, 2025

CC @ivansmf -- this PR should add those aliases for your sample.

@shobsi
Copy link
Contributor

shobsi commented Jun 23, 2025

We already have two ways for each - the decorator @bpd.udf and the function call bpd.udf(...)(func), why to add a third way - an extra API with very little value add?

@tswast
Copy link
Collaborator Author

tswast commented Jun 24, 2025

We already have two ways for each - the decorator @bpd.udf and the function call bpd.udf(...)(func), why to add a third way - an extra API with very little value add?

  1. To be explicit that we're deploying immediately. With udf and remote_function we may choose to defer deployment in future.
  2. To follow the general convention of verb_plus_description for functions. (https://thepythoncodingbook.com/2023/01/18/best-practices-in-python-functions/#:~:text=A%20function%20performs%20an%20action,describe%20what%20the%20function%20does.)

@ivansmf for additional thoughts.

@ivansmf
Copy link

ivansmf commented Jun 24, 2025

According to Jing Jing our APIs have to be a verb. That is the totality of the reason for the request.

@tswast tswast merged commit c706759 into main Jun 24, 2025
25 checks passed
@tswast tswast deleted the refactor-deploy-immediately branch June 24, 2025 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants