Skip to content

Implement getspecial in ZJIT #13642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

eileencodes
Copy link
Member

Adds support for the getspecial instruction in zjit.

We split getspecial into two instructions, one for special symbols ($&, $', etc) and one for special backrefs ($1, $2`, etc).

@matzbot matzbot requested a review from a team June 17, 2025 17:59
@eileencodes eileencodes force-pushed the zjit-implement-getspecial branch from a608c69 to 06d112a Compare June 17, 2025 18:15
Copy link

launchable-app bot commented Jun 17, 2025

Tests Failed

✖️no tests failed ✔️62025 tests passed(1 flake)

asm_comment!(asm, "rb_reg_match_last");
asm.ccall(rb_reg_match_last as *const u8, vec![backref])
}
_ => panic!("invalid back-ref"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ => panic!("invalid back-ref"),
c => panic!("invalid back-ref {c}"),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you end up making an enum, it would be great to shift this into the parser into HIR so that we have more up-front guarantees of program legality

zjit/src/hir.rs Outdated
Comment on lines 449 to 450
GetSpecialSymbol { key: u64, svar: u64, state: InsnId },
GetSpecialNumber { key: u64, svar: u64, state: InsnId },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's store the svar as a new enum -- it seems to not make full use of the u64 and we can have nicer looking code this way

zjit/src/hir.rs Outdated
@@ -446,6 +446,8 @@ pub enum Insn {
/// Check whether an instance variable exists on `self_val`
DefinedIvar { self_val: InsnId, id: ID, pushval: VALUE, state: InsnId },

GetSpecialSymbol { key: u64, svar: u64, state: InsnId },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also make key an enum? LastLine | BackRef | Offset(n)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless this makes code generation notably uglier

@@ -379,6 +381,53 @@ fn gen_putspecialobject(asm: &mut Assembler, value_type: SpecialObjectType) -> O
)
}

fn gen_getspecial_symbol(asm: &mut Assembler, _key: u64, svar: u64) -> Opnd {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we use key?

zjit/src/hir.rs Outdated
Comment on lines 2630 to 2933
if svar == 0 {
let exit_id = fun.push_insn(block, Insn::Snapshot { state: exit_state });
fun.push_insn(block, Insn::SideExit { state: exit_id });
} else if svar & 0x01 != 0 {
// Handle symbol backrefs like $&, $`, $\, $+
let exit_id = fun.push_insn(block, Insn::Snapshot { state: exit_state });
let result = fun.push_insn(block, Insn::GetSpecialSymbol { key, svar, state: exit_id });
state.stack_push(result);
} else {
// Handle number backrefs like $1, $2, $3
let exit_id = fun.push_insn(block, Insn::Snapshot { state: exit_state });
let result = fun.push_insn(block, Insn::GetSpecialNumber { key, svar, state: exit_id });
state.stack_push(result);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if svar == 0 {
let exit_id = fun.push_insn(block, Insn::Snapshot { state: exit_state });
fun.push_insn(block, Insn::SideExit { state: exit_id });
} else if svar & 0x01 != 0 {
// Handle symbol backrefs like $&, $`, $\, $+
let exit_id = fun.push_insn(block, Insn::Snapshot { state: exit_state });
let result = fun.push_insn(block, Insn::GetSpecialSymbol { key, svar, state: exit_id });
state.stack_push(result);
} else {
// Handle number backrefs like $1, $2, $3
let exit_id = fun.push_insn(block, Insn::Snapshot { state: exit_state });
let result = fun.push_insn(block, Insn::GetSpecialNumber { key, svar, state: exit_id });
state.stack_push(result);
}
let exit_id = fun.push_insn(block, Insn::Snapshot { state: exit_state });
if svar == 0 {
// TODO: Handle whatever this is
fun.push_insn(block, Insn::SideExit { state: exit_id });
} else if svar & 0x01 != 0 {
// Handle symbol backrefs like $&, $`, $\, $+
let result = fun.push_insn(block, Insn::GetSpecialSymbol { key, svar, state: exit_id });
state.stack_push(result);
} else {
// Handle number backrefs like $1, $2, $3
let result = fun.push_insn(block, Insn::GetSpecialNumber { key, svar, state: exit_id });
state.stack_push(result);
}

Comment on lines 424 to 524
vec![
Opnd::Imm((svar >> 1).try_into().unwrap()),
backref,
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vec![
Opnd::Imm((svar >> 1).try_into().unwrap()),
backref,
]
vec![
Opnd::Imm((svar >> 1).try_into().unwrap()),
backref,
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 2 spaces and not 4? Can we use rustfmt? Or vim indentation rules?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant use any indentation at all. For better or worse, we don't (currently) use a formatter

@@ -379,6 +381,53 @@ fn gen_putspecialobject(asm: &mut Assembler, value_type: SpecialObjectType) -> O
)
}

fn gen_getspecial_symbol(asm: &mut Assembler, _key: u64, svar: u64) -> Opnd {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to save the PC for some of these because they can allocate or raise

Adds support for the getspecial instruction in zjit.

We split getspecial into two instructions, one for special symbols
(`$&`, $'`, etc) and one for special backrefs (`$1`, `$2`, etc).

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
@eileencodes eileencodes force-pushed the zjit-implement-getspecial branch from 06d112a to 040e847 Compare June 27, 2025 17:41
@eileencodes
Copy link
Member Author

Ok I think I fixed the things you asked for, happy to make more changes though if I missed anything.

@eileencodes
Copy link
Member Author

Nope I need to fix the failures 🤦🏼‍♀️

Copy link
Contributor

@tekknolagi tekknolagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great -- all I really have is small comments, which if you are short on time I can do in a follow-up

Insn::SideExit { state } => return gen_side_exit(jit, asm, &function.frame_state(*state)),
Insn::PutSpecialObject { value_type } => gen_putspecialobject(asm, *value_type),
Insn::AnyToString { val, str, state } => gen_anytostring(asm, opnd!(val), opnd!(str), &function.frame_state(*state))?,
Insn::Defined { op_type, obj, pushval, v } => gen_defined(jit, asm, *op_type, *obj, *pushval, opnd!(v))?,
Insn::GetSpecialSymbol { symbol_type, state: _ } => { gen_getspecial_symbol(asm, 0, *symbol_type) },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Insn::GetSpecialSymbol { symbol_type, state: _ } => { gen_getspecial_symbol(asm, 0, *symbol_type) },
Insn::GetSpecialSymbol { symbol_type, state: _ } => gen_getspecial_symbol(asm, 0, *symbol_type),

@@ -278,16 +278,18 @@ fn gen_insn(cb: &mut CodeBlock, jit: &mut JITState, asm: &mut Assembler, functio
Insn::PatchPoint(_) => return Some(()), // For now, rb_zjit_bop_redefined() panics. TODO: leave a patch point and fix rb_zjit_bop_redefined()
Insn::CCall { cfun, args, name: _, return_type: _, elidable: _ } => gen_ccall(jit, asm, *cfun, args)?,
Insn::GetIvar { self_val, id, state: _ } => gen_getivar(asm, opnd!(self_val), *id),
Insn::SetIvar { self_val, id, val, state: _ } => return gen_setivar(asm, opnd!(self_val), *id, opnd!(val)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please split this out into a separate commit (in an ideal world we minimize changes per commit and keep them grouped)

Comment on lines +486 to +487
// call rb_backref_get()
asm_comment!(asm, "rb_backref_get");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// call rb_backref_get()
asm_comment!(asm, "rb_backref_get");
asm_comment!(asm, "call rb_backref_get");

(here and for all ccalls)

Comment on lines +2921 to +2922
if svar == 0 {
fun.push_insn(block, Insn::SideExit { state: exit_id });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if svar == 0 {
fun.push_insn(block, Insn::SideExit { state: exit_id });
if svar == 0 {
// TODO: Handle non-backref
fun.push_insn(block, Insn::SideExit { state: exit_id });

(or whatever this thing is called)

@@ -452,6 +475,8 @@ pub enum Insn {
GetLocal { level: NonZeroU32, ep_offset: u32 },
/// Set a local variable in a higher scope
SetLocal { level: NonZeroU32, ep_offset: u32, val: InsnId },
GetSpecialSymbol { symbol_type: SpecialBackrefSymbol, state: InsnId },
GetSpecialNumber { svar: u64, state: InsnId },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GetSpecialNumber { svar: u64, state: InsnId },
GetSpecialNumber { nth: u64, state: InsnId },

This is called nth internally that seems like it communicates more intent

state.stack_push(result);
} else {
// Handle number backrefs like $1, $2, $3
let result = fun.push_insn(block, Insn::GetSpecialNumber { svar, state: exit_id });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let result = fun.push_insn(block, Insn::GetSpecialNumber { svar, state: exit_id });
let result = fun.push_insn(block, Insn::GetSpecialNumber { nth: (svar>>1), state: exit_id });

Comment on lines +515 to +519
let backref = asm.ccall(rb_backref_get as *const u8, vec![]);

// rb_reg_nth_match((int)(type >> 1), backref);
asm_comment!(asm, "rb_reg_nth_match");
asm.ccall(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let backref = asm.ccall(rb_backref_get as *const u8, vec![]);
// rb_reg_nth_match((int)(type >> 1), backref);
asm_comment!(asm, "rb_reg_nth_match");
asm.ccall(
let backref = asm.ccall(rb_backref_get as *const u8, vec![]);
// Save PC for GC
gen_save_pc(asm, state);
// rb_reg_nth_match((int)(type >> 1), backref);
asm_comment!(asm, "rb_reg_nth_match");
asm.ccall(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about indentation, but since this can allocate (str substr), we should save the PC for more accurate allocation tracing. Here and above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants