Age | Commit message (Collapse) | Author |
|
Notes:
Merged: https://github.com/ruby/ruby/pull/13596
|
|
Instead it's now a `shape_id` flag.
This allows to check if an object is complex without having
to chase the `rb_shape_t` pointer.
Notes:
Merged: https://github.com/ruby/ruby/pull/13511
|
|
As well as `RB_OBJ_SHAPE_ID` -> `rb_obj_shape_id`
and `RSHAPE` is now a simple alias for `rb_shape_lookup`.
I tried to turn all these into `static inline` but I'm having
trouble with `RUBY_EXTERN rb_shape_tree_t *rb_shape_tree_ptr;`
not being exposed as I'd expect.
Notes:
Merged: https://github.com/ruby/ruby/pull/13283
|
|
And `rb_shape_get_shape` -> `RB_OBJ_SHAPE`.
Notes:
Merged: https://github.com/ruby/ruby/pull/13283
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/13283
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/13283
|
|
* YJIT: Generate specialized code for Symbol for objtostring
Co-authored-by: John Hawthorn <[email protected]>
* Update yjit/src/codegen.rs
---------
Co-authored-by: John Hawthorn <[email protected]>
Co-authored-by: Maxime Chevalier-Boisvert <[email protected]>
Notes:
Merged-By: maximecb <[email protected]>
|
|
* Add opt_duparray_send insn to skip the allocation on `#include?`
If the method isn't going to modify the array we don't need to copy it.
This avoids the allocation / array copy for things like `[:a, :b].include?(x)`.
This adds a BOP for include? and tracks redefinition for it on Array.
Co-authored-by: Andrew Novoselac <[email protected]>
* YJIT: Implement opt_duparray_send include_p
Co-authored-by: Andrew Novoselac <[email protected]>
* Update opt_newarray_send to support simple forms of include?(arg)
Similar to opt_duparray_send but for non-static arrays.
* YJIT: Implement opt_newarray_send include_p
---------
Co-authored-by: Andrew Novoselac <[email protected]>
Notes:
Merged-By: maximecb <[email protected]>
|
|
* YJIT: Add `--yjit-compilation-log` flag to print out the compilation log at exit.
* YJIT: Add an option to enable the compilation log at runtime.
* YJIT: Fix a typo in the `IseqPayload` docs.
* YJIT: Add stubs for getting the YJIT compilation log in memory.
* YJIT: Add a compilation log based on a circular buffer to cap the log size.
* YJIT: Allow specifying either a file or directory name for the YJIT compilation log.
The compilation log will be populated as compilation events occur. If a directory is supplied, then a filename based on the PID will be used as the write target. If a file name is supplied instead, the log will be written to that file.
* YJIT: Add JIT compilation of C function substitutions to the compilation log.
* YJIT: Add compilation events to the circular buffer even if output is sent to a file.
Previously, the two modes were treated as being exclusive of one another. However, it could be beneficial to log all events to a file while also allowing for direct access of the last N events via `RubyVM::YJIT.compilation_log`.
* YJIT: Make timestamps the first element in the YJIT compilation log tuple.
* YJIT: Stream log to stderr if `--yjit-compilation-log` is supplied without an argument.
* YJIT: Eagerly compute compilation log messages to avoid hanging on to references that may GC.
* YJIT: Log all compiled blocks, not just the method entry points.
* YJIT: Remove all compilation events other than block compilation to slim down the log.
* YJIT: Replace circular buffer iterator with a consuming loop.
* YJIT: Support `--yjit-compilation-log=quiet` as a way to activate the in-memory log without printing it.
Co-authored-by: Randy Stauner <[email protected]>
* YJIT: Promote the compilation log to being the one YJIT log.
Co-authored-by: Randy Stauner <[email protected]>
* Update doc/yjit/yjit.md
* Update doc/yjit/yjit.md
---------
Co-authored-by: Randy Stauner <[email protected]>
Co-authored-by: Maxime Chevalier-Boisvert <[email protected]>
Notes:
Merged-By: maximecb <[email protected]>
|
|
Reported by Kevin Menard.
|
|
* YJIT: Encode doubles to VALUE objects and move stat generation to rust
Stats that can now be generated from rust have been moved there.
* Move object_shape_count call for runtime_stats to rust
This reduces the ruby method to a single primitive.
* Change hash_aset_usize from macro to function
Notes:
Merged-By: maximecb <[email protected]>
|
|
codepoint values. (#11032)
* Document why we need to explicitly spill registers.
* Simplify passing a byte value to `str_buf_cat`.
* YJIT: Enhance the `String#<<` method substitution to handle integer codepoint values.
* YJIT: Move runtime type check into YJIT.
Performing the check in YJIT means we can make assumptions about the type. It also improves correctness of stack traces in cases where the codepoint argument is not a String or a Fixnum.
Notes:
Merged-By: maximecb <[email protected]>
|
|
Use an enum for the method arg instead of needing to add an id
that doesn't map to an actual method name.
$ ruby --dump=insns -e 'b = "x"; [v].pack("E*", buffer: b)'
before:
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] b@0
0000 putchilledstring "x" ( 1)[Li]
0002 setlocal_WC_0 b@0
0004 putself
0005 opt_send_without_block <calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0007 newarray 1
0009 putchilledstring "E*"
0011 getlocal_WC_0 b@0
0013 opt_send_without_block <calldata!mid:pack, argc:2, kw:[#<Symbol:0x000000000023110c>], KWARG>
0015 leave
```
after:
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] b@0
0000 putchilledstring "x" ( 1)[Li]
0002 setlocal_WC_0 b@0
0004 putself
0005 opt_send_without_block <calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0007 putchilledstring "E*"
0009 getlocal b@0, 0
0012 opt_newarray_send 3, 5
0015 leave
```
Notes:
Merged: https://github.com/ruby/ruby/pull/11249
|
|
Mostly putting angle brackets around links to follow markdown syntax.
|
|
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.
Calls it optimizes look like this:
```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```
```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```
```ruby
def bar(*a) = a
def foo(...)
list = [1, 2]
bar(*list, ...) # optimized
end
foo(123)
```
All variants of the above but using `super` are also optimized, including a bare super like this:
```ruby
def foo(...)
super
end
```
This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:
```ruby
def m
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def bar(a) = a
def foo(...) = bar(...)
def test
m { foo(123) }
end
test
p test # allocates 1 object on master, but 0 objects with this patch
```
```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)
def test
m { foo(1, b: 2) }
end
test
p test # allocates 2 objects on master, but 0 objects with this patch
```
How does it work?
-----------------
This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.
I think this description is kind of confusing, so let's walk through an example with code.
```ruby
def delegatee(a, b) = a + b
def delegator(...)
delegatee(...) # CI2 (FORWARDING)
end
def caller
delegator(1, 2) # CI1 (argc: 2)
end
```
Before we call the delegator method, the stack looks like this:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
4| # |
5| delegatee(...) # CI2 (FORWARDING) |
6| end |
7| |
8| def caller |
-> 9| delegator(1, 2) # CI1 (argc: 2) |
10| end |
```
The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method. The `delegator` method has a special local called `...`
that holds the caller's CI object.
Here is the ISeq disasm fo `delegator`:
```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself ( 1)[LiCa]
0001 getlocal_WC_0 "..."@0
0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave [Re]
```
The local called `...` will contain the caller's CI: CI1.
Here is the stack when we enter `delegator`:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
-> 4| # | CI1 (argc: 2)
5| delegatee(...) # CI2 (FORWARDING) | cref_or_me
6| end | specval
7| | type
8| def caller |
9| delegator(1, 2) # CI1 (argc: 2) |
10| end |
```
The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`. In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`. It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).
Before executing the `send` instruction, we push `...` on the stack. The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):
```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself ( 1)[LiCa]
0001 getlocal_WC_0 "..."@0
0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave [Re]
```
Instruction 001 puts the caller's CI on the stack. `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
4| # | CI1 (argc: 2)
-> 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me
6| end | specval
7| | type
8| def caller | self
9| delegator(1, 2) # CI1 (argc: 2) | 1
10| end | 2
```
The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.
Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.
I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.
I was able to implement `Class#new` in Ruby
[here](https://github.com/ruby/ruby/pull/9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.
For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:
```ruby
SomeObject.new(foo: 1)
```
If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.
Co-Authored-By: John Hawthorn <[email protected]>
Co-Authored-By: Alan Wu <[email protected]>
|
|
Instructions for this code:
```ruby
# frozen_string_literal: true
[a].pack("C")
```
Before this commit:
```
== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)>
0000 putself ( 3)[Li]
0001 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 newarray 1
0005 putobject "C"
0007 opt_send_without_block <calldata!mid:pack, argc:1, ARGS_SIMPLE>
0009 leave
```
After this commit:
```
== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)>
0000 putself ( 3)[Li]
0001 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 putobject "C"
0005 opt_newarray_send 2, :pack
0008 leave
```
Co-authored-by: Maxime Chevalier-Boisvert <[email protected]>
Co-authored-by: Aaron Patterson <[email protected]>
|
|
Since we often take the VM lock as the first thing we do when entering
YJIT, and that needs a `src_loc!()`, this removes a allocation from
that. The main trick here is `concat!(file!(), '\0')` to get a C string
statically baked into the binary.
|
|
* YJIT: Add specialized codegen function for `TrueClass#===`
TrueClass#=== is currently number 10 in the most frequent C calls list of the lobsters benchmark.
```
require "benchmark/ips"
def wrap
true === true
true === false
true === :x
end
Benchmark.ips do |x|
x.report(:wrap) do
wrap
end
end
```
```
before
Warming up --------------------------------------
wrap 1.791M i/100ms
Calculating -------------------------------------
wrap 17.806M (± 1.0%) i/s - 89.544M in 5.029363s
after
Warming up --------------------------------------
wrap 4.024M i/100ms
Calculating -------------------------------------
wrap 40.149M (± 1.1%) i/s - 201.223M in 5.012527s
```
Co-authored-by: Maxime Chevalier-Boisvert <[email protected]>
Co-authored-by: Takashi Kokubun (k0kubun) <[email protected]>
Co-authored-by: Kevin Menard <[email protected]>
Co-authored-by: Alan Wu <[email protected]>
* Fix the new test for RJIT
---------
Co-authored-by: Maxime Chevalier-Boisvert <[email protected]>
Co-authored-by: Takashi Kokubun (k0kubun) <[email protected]>
Co-authored-by: Kevin Menard <[email protected]>
Co-authored-by: Alan Wu <[email protected]>
|
|
* Revert "Revert "YJIT: Optimize local variables when EP == BP" (#10584)"
This reverts commit c8783441952217c18e523749c821f82cd7e5d222.
* YJIT: Take care of GC references in ISEQ invariants
Co-authored-by: Alan Wu <[email protected]>
---------
Co-authored-by: Alan Wu <[email protected]>
|
|
This reverts commit 4cc58ea0b865f2fd20f1e881ddbd4c4fab0b072c.
Since the change landed call-threshold=1 CI runs have been timing out.
There has also been `verify-ctx` violations. Revert for now while we debug.
|
|
|
|
This mainly targets things like `T.unsafe()` from Sorbet, which is just an
identity function at runtime and only a hint for the static checker.
Only deal with simple caller and callees (no keywords and splat etc.).
Co-authored-by: Takashi Kokubun (k0kubun) <[email protected]>
|
|
* WIP getbyte implementation
* WIP String#getbyte implementation
* Fix whitespace in stats.rs
* fix?
* Fix whitespace, add comment
---------
Co-authored-by: Aaron Patterson <[email protected]>
|
|
* YJIT: Lazily push a frame for specialized C funcs
Co-authored-by: Maxime Chevalier-Boisvert <[email protected]>
* Fix a comment on pc_to_cfunc
* Rename rb_yjit_check_pc to rb_yjit_lazy_push_frame
* Rename it to jit_prepare_lazy_frame_call
* Fix a typo
* Optimize String#getbyte as well
* Optimize String#byteslice as well
---------
Co-authored-by: Maxime Chevalier-Boisvert <[email protected]>
|
|
|
|
Now that `...` uses `**kwrest` instead of regular splat and
ruby2keywords, we need to support these type of methods to
support `...` well.
|
|
YJIT: Prefer an overloaded cme if applicable
|
|
|
|
* YJIT: Specialize splatkw on T_HASH
* Fix a typo
Co-authored-by: Alan Wu <[email protected]>
* Fix a few more comments
---------
Co-authored-by: Alan Wu <[email protected]>
|
|
|
|
YJIT didn't guard for ruby2_keywords hash in case of splat calls that
land in methods with a rest parameter, creating incorrect results.
The compile-time checks didn't correspond to any actual effects of
ruby2_keywords, so it was masking this bug and YJIT was needlessly
refusing to compile some code. About 16% of fallback reasons in
`lobsters` was due to the ISeq check.
We already handle the tagging part with
exit_if_supplying_kw_and_has_no_kw() and should now have a dynamic guard
for all splat cases.
Note for backporting: You also need 7f51959ff1.
[Bug #20195]
|
|
* YJIT: expandarray for non-arrays
Co-authored-by: John Hawthorn <[email protected]>
* Skip the new test on RJIT
* Increment counter for to_ary exit
---------
Co-authored-by: John Hawthorn <[email protected]>
Co-authored-by: Takashi Kokubun <[email protected]>
|
|
```
warning: unused import: `condition::Condition`
--> src/asm/arm64/arg/mod.rs:13:9
|
13 | pub use condition::Condition;
| ^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
warning: unused import: `rb_yjit_fix_mul_fix as rb_fix_mul_fix`
--> src/cruby.rs:188:9
|
188 | pub use rb_yjit_fix_mul_fix as rb_fix_mul_fix;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
warning: unused import: `rb_insn_len as raw_insn_len`
--> src/cruby.rs:142:9
|
142 | pub use rb_insn_len as raw_insn_len;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
```
Make asm public so it stops warning about unused public stuff in there.
|
|
We've seen quite a few compaction bugs lately, and these assertions
should give clearer symptoms. We only call class_of() on
objects that the Ruby code can see.
|
|
* YJIT: Inline basic Ruby methods
* YJIT: Fix "InsnOut operand made it past register allocation"
checktype should not generate a useless instruction.
|
|
Co-authored-by: Alan Wu <[email protected]>
|
|
Previously, the version-controlled `cruby_bindings.inc.rs` file
contained the build-time artifact `id.h`, which nobu mentioned hinders
the goal of having fewer magic numbers in the repository.
Lookup the IDs YJIT needs on boot. It costs cycles, but it's fine since
YJIT only uses a handful of IDs at the moment. No perceptible
degradation to boot time found in my testing.
|
|
* Remove function call for String#bytesize
String size is stored in a consistent location, so we can eliminate the
function call.
* Update yjit/src/codegen.rs
Co-authored-by: Takashi Kokubun <[email protected]>
---------
Co-authored-by: Maxime Chevalier-Boisvert <[email protected]>
Co-authored-by: Takashi Kokubun <[email protected]>
Notes:
Merged-By: maximecb <[email protected]>
|
|
New Clippy lint in 1.72.0 is breaking our build as GitHub has updated
their image. No point hearing about lints from generated code we don't
manually write.
|
|
Co-authored-by: Maxime Chevalier-Boisvert <[email protected]>
Notes:
Merged-By: k0kubun <[email protected]>
|
|
Remove rb_control_frame_t::__bp__ and optimize bmethod calls
This commit removes the __bp__ field from rb_control_frame_t. It was
introduced to help MJIT, but since MJIT was replaced by RJIT, we can use
vm_base_ptr() to compute it from the SP of the previous control frame
instead. Removing the field avoids needing to set it up when pushing new
frames.
Simply removing __bp__ would cause crashes since RJIT and YJIT used a
slightly different stack layout for bmethod calls than the interpreter.
At the moment of the call, the two layouts looked as follows:
┌────────────┐ ┌────────────┐
│ frame_base │ │ frame_base │
├────────────┤ ├────────────┤
│ ... │ │ ... │
├────────────┤ ├────────────┤
│ args │ │ args │
├────────────┤ └────────────┘<─prev_frame_sp
│ receiver │
prev_frame_sp─>└────────────┘
RJIT & YJIT interpreter
Essentially, vm_base_ptr() needs to compute the address to frame_base
given prev_frame_sp in the diagrams. The presence of the receiver
created an off-by-one situation.
Make the interpreter use the layout the JITs use for iseq-to-iseq
bmethod calls. Doing so removes unnecessary argument shifting and
vm_exec_core() re-entry from the interpreter, yielding a speed
improvement visible through `benchmark/vm_defined_method.yml`:
patched: 7578743.1 i/s
master: 4796596.3 i/s - 1.58x slower
C-to-iseq bmethod calls now store one more VALUE than before, but that
should have negligible impact on overall performance.
Note that re-entering vm_exec_core() used to be necessary for firing
TracePoint events, but that's no longer the case since
9121e57a5f50bc91bae48b3b91edb283bf96cb6b.
Closes ruby/ruby#6428
|
|
The benchmark results show that this feature has either a positive or
no impact on performance. The memory usage is also mostly unchanged,
except in hexapdf, where there is a decrease in RSS.
-------------- ----------- ---------- --------- ----------- ---------- --------- -------------- -------------
bench master (ms) stddev (%) RSS (MiB) branch (ms) stddev (%) RSS (MiB) branch 1st itr master/branch
activerecord 70.8 2.2 56.0 71.7 2.2 56.0 0.99 0.99
erubi_rails 20.5 13.6 94.7 20.5 14.3 94.2 0.93 1.00
hexapdf 2541.0 0.7 212.8 2544.4 0.7 203.4 1.00 1.00
liquid-c 65.6 0.3 38.9 65.3 0.3 38.9 1.01 1.01
liquid-compile 63.7 0.3 34.6 61.1 0.2 34.6 1.04 1.04
liquid-render 163.1 0.1 37.1 163.3 0.1 37.1 1.00 1.00
mail 139.3 0.1 50.5 137.0 0.1 50.1 0.99 1.02
psych-load 2065.7 0.1 36.9 2068.2 0.1 37.3 1.00 1.00
railsbench 2034.6 0.5 103.9 2031.9 0.5 103.8 1.02 1.00
ruby-lsp 65.3 3.1 89.8 66.2 3.0 89.7 1.01 0.99
sequel 73.2 1.0 40.3 73.4 1.0 40.3 1.00 1.00
-------------- ----------- ---------- --------- ----------- ---------- --------- -------------- -------------
Notes:
Merged: https://github.com/ruby/ruby/pull/7871
|
|
* YJIT: Add codegen for Integer methods
* YJIT: Update dependencies
* YJIT: Fix Integer#[] for argc=2
Notes:
Merged-By: k0kubun <[email protected]>
|
|
This reverts commit 9e678cdbd054f78576a8f21b3f97cccc395ade22.
Without the `unsafe` annotations, the SAFETY comments make less sense.
I want to keep the SAFETY comments.
Notes:
Merged: https://github.com/ruby/ruby/pull/7657
|
|
Notes:
Merged-By: maximecb <[email protected]>
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/7535
Merged-By: k0kubun <[email protected]>
|
|
* YJIT: log the names of methods we call to in disasm
* Assert that pointer is not null
* Handle case where UTF8 conversion not possible
Notes:
Merged-By: maximecb <[email protected]>
|
|
|
|
Using a constant shows intention better and is less noisy. It always
took me a second to parse the long expression.
Notes:
Merged: https://github.com/ruby/ruby/pull/7121
|
|
* Add job to check clippy lints in CI
* Address all remaining clippy lints
* Check lints on arm64 as well
* Apply latest clippy lints
* Do not exit 0 on clippy warnings
Notes:
Merged-By: maximecb <[email protected]>
|