Skip to content

feat: add bpd.read_arrow to convert an Arrow object into a bigframes DataFrame #1855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 30, 2025

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Jun 25, 2025

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #735 🦕

@product-auto-label product-auto-label bot added the size: xl Pull request size is extra large. label Jun 25, 2025
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. label Jun 25, 2025
Adds `read_arrow` methods to `bigframes.session.Session` and
`bigframes.pandas.read_arrow` for creating BigQuery DataFrames
DataFrames from PyArrow Tables.

The implementation refactors existing logic from
`bigframes.session._io.bigquery.read_gbq_query` for converting
Arrow data into BigFrames DataFrames.

Includes:
- New file `bigframes/session/_io/arrow.py` with the core conversion logic.
- `read_arrow(pa.Table) -> bpd.DataFrame` in `Session` class.
- `read_arrow(pa.Table) -> bpd.DataFrame` in `pandas` module.
- Unit and system tests for the new functionality.
- Docstrings for new methods/functions.

Note: Unit tests for direct DataFrame operations (shape, to_pandas) on
the result of read_arrow are currently failing due to the complexity of
mocking the session and executor for LocalDataNode interactions.
System tests are recommended for full end-to-end validation.
@tswast tswast force-pushed the feat-read-arrow branch from 3b2587c to 07baa76 Compare June 25, 2025 20:46
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Jun 25, 2025
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Jun 26, 2025
@tswast tswast marked this pull request as ready for review June 26, 2025 15:35
@tswast tswast requested review from a team as code owners June 26, 2025 15:35
@tswast tswast requested a review from Genesis929 June 26, 2025 15:35
@tswast tswast mentioned this pull request Jun 26, 2025
@tswast tswast merged commit 633bf98 into main Jun 30, 2025
25 checks passed
@tswast tswast deleted the feat-read-arrow branch June 30, 2025 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Polars Support
2 participants