Age | Commit message (Collapse) | Author |
|
This commit inlines instructions for Class#new. To make this work, we
added a new YARV instructions, `opt_new`. `opt_new` checks whether or
not the `new` method is the default allocator method. If it is, it
allocates the object, and pushes the instance on the stack. If not, the
instruction jumps to the "slow path" method call instructions.
Old instructions:
```
> ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
New instructions:
```
> ./miniruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
This commit speeds up basic object allocation (`Foo.new`) by 60%, but
classes that take keyword parameters see an even bigger benefit because
no hash is allocated when instantiating the object (3x to 6x faster).
Here is an example that uses `Hash.new(capacity: 0)`:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.004 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.076 s … 1.088 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.9 ms ± 3.5 ms [User: 622.7 ms, System: 4.8 ms]
Range (min … max): 622.7 ms … 633.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.01 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
This commit changes the backtrace for `initialize`:
```
aaron@tc ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize
puts caller
end
end
def hello
Foo.new
end
hello
aaron@tc ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:8:in 'Class#new'
test.rb:8:in 'Object#hello'
test.rb:11:in '<main>'
aaron@tc ~/g/ruby (inline-new)> ./miniruby -v test.rb
ruby 3.5.0dev (2025-03-28T23:59:40Z inline-new c4157884e4) +PRISM [arm64-darwin24]
test.rb:8:in 'Object#hello'
test.rb:11:in '<main>'
```
It also increases memory usage for calls to `new` by 122 bytes:
```
aaron@tc ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def hello
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:hello)))
aaron@tc ~/g/ruby (inline-new)> make runruby
RUBY_ON_BUG='gdb -x ./.gdbinit -p' ./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems ./test.rb
656
aaron@tc ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
```
Thanks to @ko1 for coming up with this idea!
Co-Authored-By: John Hawthorn <[email protected]>
|
|
|
|
Add missing counters to rb_debug_counter_type enum
On master we have calls to the RB_DEBUG_COUNTER_INC macro
for counters that are not getting defined in the
rb_debug_counter_type enum.
This commit adds those that are missing in order for
compilation to pass with -DUSE_RUBY_DEBUG_LOG.
Notes:
Merged-By: k0kubun <[email protected]>
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/8071
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/7942
|
|
This commit separates the marking and sweeping phases so that marking
functions do not directly call sweeping functions.
Notes:
Merged: https://github.com/ruby/ruby/pull/7304
|
|
This commit is refactoring and documenting the debug counters related to
instance variables.
Notes:
Merged: https://github.com/ruby/ruby/pull/7305
|
|
When an object becomes "too complex" (in other words it has too many
variations in the shape tree), we transition it to use a "too complex"
shape and use a hash for storing instance variables.
Without this patch, there were rare cases where shape tree growth could
"explode" and cause performance degradation on what would otherwise have
been cached fast paths.
This patch puts a limit on shape tree growth, and gracefully degrades in
the rare case where there could be a factorial growth in the shape tree.
For example:
```ruby
class NG; end
HUGE_NUMBER.times do
NG.new.instance_variable_set(:"@unique_ivar_#{_1}", 1)
end
```
We consider objects to be "too complex" when the object's class has more
than SHAPE_MAX_VARIATIONS (currently 8) leaf nodes in the shape tree and
the object introduces a new variation (a new leaf node) associated with
that class.
For example, new variations on instances of the following class would be
considered "too complex" because those instances create more than 8
leaves in the shape tree:
```ruby
class Foo; end
9.times { Foo.new.instance_variable_set(":@uniq_#{_1}", 1) }
```
However, the following class is *not* too complex because it only has
one leaf in the shape tree:
```ruby
class Foo
def initialize
@a = @b = @c = @d = @e = @f = @g = @h = @i = nil
end
end
9.times { Foo.new }
``
This case is rare, so we don't expect this change to impact performance
of most applications, but it needs to be handled.
Co-Authored-By: Aaron Patterson <[email protected]>
Notes:
Merged: https://github.com/ruby/ruby/pull/6931
|
|
|
|
I plan to rebuild MJIT metrics later, not using debug counters.
|
|
The structure and readability of jit_exec is messed up right now. I'd
like to help the current situation by this for now. I'll resurrect
them when I need it again.
|
|
This reverts commit 9a6803c90b817f70389cae10d60b50ad752da48f.
|
|
This reverts commit 68bc9e2e97d12f80df0d113e284864e225f771c2.
|
|
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects. Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness"). Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree. Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.
For example:
```ruby
class Foo
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
class Bar
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```
Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.
This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.
This commit also adds some methods for debugging shapes on objects. See
`RubyVM::Shape` for more details.
For more context on Object Shapes, see [Feature: #18776]
Co-Authored-By: Aaron Patterson <[email protected]>
Co-Authored-By: Eileen M. Uchitelle <[email protected]>
Co-Authored-By: John Hawthorn <[email protected]>
|
|
Revert "* expand tabs. [ci skip]"
This reverts commit 830b5b5c351c5c6efa5ad461ae4ec5085e5f0275.
Revert "This commit implements the Object Shapes technique in CRuby."
This reverts commit 9ddfd2ca004d1952be79cf1b84c52c79a55978f4.
|
|
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects. Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness"). Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree. Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.
For example:
```ruby
class Foo
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
class Bar
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```
Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.
This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.
This commit also adds some methods for debugging shapes on objects. See
`RubyVM::Shape` for more details.
For more context on Object Shapes, see [Feature: #18776]
Co-Authored-By: Aaron Patterson <[email protected]>
Co-Authored-By: Eileen M. Uchitelle <[email protected]>
Co-Authored-By: John Hawthorn <[email protected]>
Notes:
Merged: https://github.com/ruby/ruby/pull/6386
|
|
* Rename mjit_exec to jit_exec
* Rename mjit_exec_slowpath to mjit_check_iseq
* Remove mjit_exec references from comments
Notes:
Merged-By: k0kubun <[email protected]>
|
|
Introduce new optimized method type
`OPTIMIZED_METHOD_TYPE_STRUCT_AREF/ASET` with index information.
Notes:
Merged: https://github.com/ruby/ruby/pull/5131
|
|
Redo of 34a2acdac788602c14bf05fb616215187badd504 and
931138b00696419945dc03e10f033b1f53cd50f3 which were reverted.
GitHub PR #4340.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <[email protected]>
Notes:
Merged: https://github.com/ruby/ruby/pull/4544
|
|
This reverts commit 08de37f9fa3469365e6b5c964689ae2bae0eb9f3.
This reverts commit e8ae922b62adb00a80d3d4c49f7d7b0e6026eaba.
|
|
Instead of on read. Once it's in the inline cache we never have to make
one again. We want to eventually put the value into the cache, and the
best opportunity to do that is when you write the value.
Notes:
Merged: https://github.com/ruby/ruby/pull/4340
|
|
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <[email protected]>
Notes:
Merged: https://github.com/ruby/ruby/pull/4340
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/4414
|
|
rb_funcall* (rb_funcall(), rb_funcallv(), ...) functions invokes
Ruby's method with given receiver. Ruby 2.7 introduced inline method
cache with static memory area. However, Ruby 3.0 reimplemented the
method cache data structures and the inline cache was removed.
Without inline cache, rb_funcall* searched methods everytime.
Most of cases per-Class Method Cache (pCMC) will be helped but
pCMC requires VM-wide locking and it hurts performance on
multi-Ractor execution, especially all Ractors calls methods
with rb_funcall*.
This patch introduced Global Call-Cache Cache Table (gccct) for
rb_funcall*. Call-Cache was introduced from Ruby 3.0 to manage
method cache entry atomically and gccct enables method-caching
without VM-wide locking. This table solves the performance issue
on multi-ractor execution.
[Bug #17497]
Ruby-level method invocation does not use gccct because it has
inline-method-cache and the table size is limited. Basically
rb_funcall* is not used frequently, so 1023 entries can be enough.
We will revisit the table size if it is not enough.
Notes:
Merged: https://github.com/ruby/ruby/pull/4129
|
|
constant cache `IC` is accessed by non-atomic manner and there are
thread-safety issues, so Ruby 3.0 disables to use const cache on
non-main ractors.
This patch enables it by introducing `imemo_constcache` and allocates
it by every re-fill of const cache like `imemo_callcache`.
[Bug #17510]
Now `IC` only has one entry `IC::entry` and it points to
`iseq_inline_constant_cache_entry`, managed by T_IMEMO object.
`IC` is atomic data structure so `rb_mjit_before_vm_ic_update()` and
`rb_mjit_after_vm_ic_update()` is not needed.
Notes:
Merged: https://github.com/ruby/ruby/pull/4022
|
|
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/3915
|
|
* vm_sync_lock
* vm_sync_lock_enter
* vm_sync_lock_enter_nb
* vm_sync_lock_enter_cr
* vm_sync_barrier
Notes:
Merged: https://github.com/ruby/ruby/pull/3910
|
|
add cc_found_in_ccs (renamed from cc_found_ccs), cc_not_found_in_ccs,
call0_public, call0_other debug counters to measure more details.
also it contains several modification.
Notes:
Merged: https://github.com/ruby/ruby/pull/3903
|
|
mc_inline_miss_same_def is added to check same method or not.
Also the mc_inline_miss_same_cc calculation was fixed.
|
|
ccs_not_found to count not found in ccs table.
|
|
|
|
counts for invalidating negative cache.
Notes:
Merged: https://github.com/ruby/ruby/pull/3892
|
|
This commit adds a debug counter for the case where the inline cache
*missed* but the ivar index table has an entry for that ivar. This is a
case where a polymorphic cache could help
Notes:
Merged: https://github.com/ruby/ruby/pull/3750
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/3740
|
|
|
|
|
|
To fix build failures.
Notes:
Merged: https://github.com/ruby/ruby/pull/3079
|
|
This shall fix compile errors.
Notes:
Merged: https://github.com/ruby/ruby/pull/3079
|
|
when there's no need to call CALLER_SETUP_ARG and CALLER_REMOVE_EMPTY_KW_SPLAT
(i.e. !rb_splat_or_kwargs_p(ci) && !calling->kw_splat).
Micro benchmark:
```
$ benchmark-driver -v --rbenv 'before;after' benchmark/vm_send_cfunc.yml --repeat-count=4
before: ruby 2.8.0dev (2020-04-13T23:45:05Z master b9d3ceee8f) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-14T00:48:52Z no-splat-fastpath 418d363722) [x86_64-linux]
Calculating -------------------------------------
before after
vm_send_cfunc 69.585M 88.724M i/s - 100.000M times in 1.437097s 1.127096s
Comparison:
vm_send_cfunc
after: 88723605.2 i/s
before: 69584737.1 i/s - 1.28x slower
```
Optcarrot:
```
$ benchmark-driver -v --rbenv 'before;after' benchmark.yml --repeat-count=12 --output=all
before: ruby 2.8.0dev (2020-04-13T23:45:05Z master b9d3ceee8f) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-14T00:48:52Z no-splat-fastpath 418d363722) [x86_64-linux]
Calculating -------------------------------------
before after
Optcarrot Lan_Master.nes 50.76119601545175 42.73858236484051 fps
50.76388649761503 51.04211379912850
50.80930672252514 51.39455790755538
50.90236000778749 51.75656936556145
51.01744746340430 51.86875277356489
51.06495279015112 51.88692482485558
51.07785337168974 51.93429603190578
51.20163525187862 51.95768145071314
51.34671771913112 52.45577266040274
51.35918340835583 52.53163888762858
51.46641337418146 52.62172484121034
51.50835463462257 52.85064021113239
```
Notes:
Merged-By: k0kubun <[email protected]>
|
|
Split ruby.h
Notes:
Merged-By: shyouhei < |