-
Notifications
You must be signed in to change notification settings - Fork 50
chore: add array operators to SQLGlot compiler #1852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
@UNARY_OP_REGISTRATION.register(ops.ArrayIndexOp) | ||
def _(op: ops.ArrayIndexOp, expr: TypedExpr) -> sge.Expression: | ||
offset = sge.Anonymous( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compile_explode
also need SAFE_OFFSET
but present as different way. Maybe we should define them in the same way:
python-bigquery-dataframes/bigframes/core/compile/sqlglot/sqlglot_ir.py
Lines 367 to 371 in bc885bd
sge.Bracket( | |
this=column, | |
expressions=[unnested_offset_alias], | |
safe=True, | |
offset=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use sge.func
to call a specific function instead of a generic one currently referred to as sge.Anonymous
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
safe_offset
prefer upper case for any SQL preserved keywords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Done!
|
||
@UNARY_OP_REGISTRATION.register(ops.ArraySliceOp) | ||
def _(op: ops.ArraySliceOp, expr: TypedExpr) -> sge.Expression: | ||
slice_idx = sqlglot.to_identifier("slice_idx") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the sql ops use next(self.uid_gen.get_uid_stream("bfcol_"))
to generate any new columns but this uid_gen
instance is not passed to the scalar op compiler yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That feels like an overkill. I don't think we need to worry about name collisions in this subquery of unnesting an array with offsets , because none of these fields inherently has a name. Plus, the "slice_idx" and "el" are more meaningful names.
As for injecting uid_gen
to the scalar compiler in the general sense, let's only do that when it's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right. Both slice_idx
and el
are both local naming and won't be joined with other column names yet, according to the generated SQL below. Yes, I am okay to keep as current so far.
ARRAY(
SELECT
el
FROM UNNEST(`bfcol_1`) AS el WITH OFFSET AS slice_idx
WHERE
slice_idx >= 1 AND slice_idx < 5
) AS `bfcol_4`
In addition, I polished the OpRegistration class such that full type hints are supported for each compilation function. This is achieved by relaxing the constraints of function signatures.