Description
Feature Request
Supposed I used bigframes and processed my data like this:
res_df = df.assign(pred=input_remote_function.apply(generate_text))
And got this output dataframe res_df:
uri | pred |
---|---|
gs://dataproc-metastore-public-binaries/ads_ba... | {"interpretation": " The primary message of th... |
gs://dataproc-metastore-public-binaries/ads_ba... | {"interpretation": " The primary message of th... |
gs://dataproc-metastore-public-binaries/ads_ba... | {"interpretation": " The primary message of th... |
As you can see, the pred column is a JSON string.
Describe the solution you'd like
I would like to use BigQuery JSON Functions to explode this column into a column for each JSON attribute.
For example, I could write a custom BigQuery SQL code to run for this dataframe, like this:
res_df = df.sqldf("SELECT *, JSON_EXTRACT(pred, '$.interpretation') AS interpretation FROM df")
Getting as a result the res_df:
uri | pred | interpretation |
---|---|---|
gs://datap... | {"interpretation": " The primary message of th... | The primary messag... |
gs://datap... | {"interpretation": " The primary message of th... | The primary messag... |
gs://datap... | {"interpretation": " The primary message of th... | The primary messag... |
Describe alternatives you've considered
As an alternative I need to use remote functions, like this one, to explode the column:
@bpd.remote_function(
[str],
str,
bigquery_connection=CONNECTION_ID
)
def extract_json_attribute(input_content: str) -> str:
pred = input_content.split("|||")[0]
json_attribute = input_content.split("|||")[1]
import json
return json.loads(pred)[json_attribute]
input_remote_function_int = res_df['pred'] + '|||interpretation'
df = res_df.assign(interpretation=input_remote_function_int.apply(extract_json_attribute))
Which I cannot pass multiple parameter because of #646
Additional context
The pandasql allow us to query pandas DataFrames using SQL syntax.
Bigframes supporting, pushing down the query to BQ would be helpful.