Skip to content

Allow querying the bigframes dataframe with custom BigQuery SQL like pandasql #647

Open
@NiloFreitas

Description

@NiloFreitas

Feature Request

Supposed I used bigframes and processed my data like this:
res_df = df.assign(pred=input_remote_function.apply(generate_text))

And got this output dataframe res_df:

uri pred
gs://dataproc-metastore-public-binaries/ads_ba... {"interpretation": " The primary message of th...
gs://dataproc-metastore-public-binaries/ads_ba... {"interpretation": " The primary message of th...
gs://dataproc-metastore-public-binaries/ads_ba... {"interpretation": " The primary message of th...

As you can see, the pred column is a JSON string.

Describe the solution you'd like

I would like to use BigQuery JSON Functions to explode this column into a column for each JSON attribute.

For example, I could write a custom BigQuery SQL code to run for this dataframe, like this:
res_df = df.sqldf("SELECT *, JSON_EXTRACT(pred, '$.interpretation') AS interpretation FROM df")

Getting as a result the res_df:

uri pred interpretation
gs://datap... {"interpretation": " The primary message of th... The primary messag...
gs://datap... {"interpretation": " The primary message of th... The primary messag...
gs://datap... {"interpretation": " The primary message of th... The primary messag...

Describe alternatives you've considered

As an alternative I need to use remote functions, like this one, to explode the column:

@bpd.remote_function(
    [str],
    str,
    bigquery_connection=CONNECTION_ID
)
def extract_json_attribute(input_content: str) -> str:
    
    pred = input_content.split("|||")[0]
    json_attribute = input_content.split("|||")[1]
    
    import json
    return json.loads(pred)[json_attribute]
input_remote_function_int = res_df['pred'] + '|||interpretation'
df = res_df.assign(interpretation=input_remote_function_int.apply(extract_json_attribute))

Which I cannot pass multiple parameter because of #646

Additional context

The pandasql allow us to query pandas DataFrames using SQL syntax.
Bigframes supporting, pushing down the query to BQ would be helpful.

https://pypi.org/project/pandasql/

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery-dataframes API.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions