Age | Commit message (Collapse) | Author |
|
Ractor's free iterates through its TLS keys so we need to keep this
memory available until after Ractors are freed.
Minimal reproduction:
RUBY_FREE_AT_EXIT=1 ./miniruby -e rand
|
|
|
|
|
|
Introduce runtime flag for specifying the parser,
```
ruby --parser=prism
```
also update the description:
```
$ ruby --parser=prism --version
ruby 3.3.0dev (2023-12-08T04:47:14Z add-parser-runtime.. 0616384c9f) +PRISM [x86_64-darwin23]
```
[Bug #20044]
|
|
Previously with RUBY_FREE_ON_EXIT, ractors where being xfree-ed which is incorrect since they are not xmalloced.
Instead we can free ractors with ractor free during shutdown. This change only effects main ractor freeing when RUBY_FREE_ON_EXIT is set.
Co-authored-by: John Hawthorn <[email protected]>
|
|
Our current implementation of rb_postponed_job_register suffers from
some safety issues that can lead to interpreter crashes (see bug #1991).
Essentially, the issue is that jobs can be called with the wrong
arguments.
We made two attempts to fix this whilst keeping the promised semantics,
but:
* The first one involved masking/unmasking when flushing jobs, which
was believed to be too expensive
* The second one involved a lock-free, multi-producer, single-consumer
ringbuffer, which was too complex
The critical insight behind this third solution is that essentially the
only user of these APIs are a) internal, or b) profiling gems.
For a), none of the usages actually require variable data; they will
work just fine with the preregistration interface.
For b), generally profiling gems only call a single callback with a
single piece of data (which is actually usually just zero) for the life
of the program. The ringbuffer is complex because it needs to support
multi-word inserts of job & data (which can't be atomic); but nobody
actually even needs that functionality, really.
So, this comit:
* Introduces a pre-registration API for jobs, with a GVL-requiring
rb_postponed_job_prereigster, which returns a handle which can be
used with an async-signal-safe rb_postponed_job_trigger.
* Deprecates rb_postponed_job_register (and re-implements it on top of
the preregister function for compatability)
* Moves all the internal usages of postponed job register
pre-registration
|
|
This patch introduces thread specific storage APIs
for tools which use `rb_internal_thread_event_hook` APIs.
* `rb_internal_thread_specific_key_create()` to create a tool specific
thread local storage key and allocate the storage if not available.
* `rb_internal_thread_specific_set()` sets a data to thread and tool
specific storage.
* `rb_internal_thread_specific_get()` gets a data in thread and tool
specific storage.
Note that `rb_internal_thread_specific_get|set(thread_val, key)`
can be called without GVL and safe for async signal and safe for
multi-threading (native threads). So you can call it in any internal
thread event hooks. Further more you can call it from other native
threads. Of course `thread_val` should be living while accessing the
data from this function.
Note that you should not forget to clean up the set data.
|
|
when the RUBY_FREE_ON_SHUTDOWN environment variable is set, manually free memory at shutdown.
Co-authored-by: Nobuyoshi Nakada <[email protected]>
Co-authored-by: Peter Zhu <[email protected]>
|
|
`src_ep[VM_ENV_DATA_INDEX_ME_CREF]` was read out and held without
marking across the allocation in vm_env_new(). In case vm_env_new() ran
compaction, an invalid reference could have been written into
`copied_env`.
It might've been hard to actually produce a crash with this issue due to
the pinning marking of the field in rb_execution_context_mark().
|
|
Previously, the following crashed with
`vm_assert_env:imemo_type_p(obj, imemo_env)` due to missing a missing
WB:
o = Object.new
def o.foo(n)
freeze
GC.stress = 1
# inflate block nesting get an imemo_env for each level
n.tap do |i|
i.tap do |local|
return Ractor.make_shareable(-> do
local + i + n
end)
end
end
ensure
GC.stress = false
GC.verify_internal_consistency
end
p o.foo(1)[]
By the time the recursive env_copy() call returns, `copied_env` could
have aged or have turned greyed, so we need a WB for the
`ep[VM_ENV_DATA_INDEX_SPECVAL]` assignment which adds an edge.
Fix: 674eb7df7f409099f33da77293d9658e09b470d6
|
|
This reverts commit 9b76c7fc89460ed8e9be40e4037c1d68395c0f6d.
|
|
Enable Prism using either --prism
ruby --prism test.rb
or via env var
RUBY_PRISM=1 ruby test.rb
|
|
The original order of events is:
1. Allocate env_body.
2. Fill env_body using elements in src_env, and it performs operations
that can trigger a GC.
3. Create the copied_env using vm_env_new.
However, if GC compaction runs during step 2, then copied_env would not
have yet been created and objects on env_body could move but it would
not be reference updated.
This commit changes the the order to be (1), (3), (2).
|
|
Previously, vm_make_env_each() did:
1. ALLOC env_body
2. Copy locals into env_body
3. Allocate imemo_env
4. Set up imemo_env with env_body
If compaction runs during (3), locals copied to env_body could be
moved and the imemo_env could end up with invalid references.
Move (2) down so it reads references after potential movement.
|
|
|
|
The `while` loop condition dereferences `cfp` and no `break` there,
`cfp` cannot be NULL just after the loop.
|
|
`rb_jmpbuf_t` type is considerably large due to inline-allocated
Asyncify buffer, and it leads to stack overflow even with small number
of C-method call frames. This commit allocates the Asyncify buffer used
by `rb_wasm_setjmp` in heap to mitigate the issue.
This patch introduces a new type `rb_vm_tag_jmpbuf_t` to abstract the
representation of a jump buffer, and init/deinit hook points to manage
lifetime of the buffer. These changes are effectively NFC for non-wasm
platforms.
|
|
This is an experimental commit that uses a functional red-black tree to
create an index of the ancestor shapes. It uses an Okasaki style
functional red black tree:
https://www.cs.tufts.edu/comp/150FP/archive/chris-okasaki/redblack99.pdf
This tree is advantageous because:
* It offers O(n log n) insertions and O(n log n) lookups.
* It shares memory with previous "versions" of the tree
When we insert a node in the tree, only the parts of the tree that need
to be rebalanced are newly allocated. Parts of the tree that don't need
to be rebalanced are not reallocated, so "new trees" are able to share
memory with old trees. This is in contrast to a sorted set where we
would have to duplicate the set, and also resort the set on each
insertion.
I've added a new stat to RubyVM.stat so we can understand how the red
black tree increases.
|
|
Having this variable actually helps the performance of non-JITed calls.
----- ----------- ---------- ---------- ---------- ------------- ------------
bench before (ms) stddev (%) after (ms) stddev (%) after 1st itr before/after
fib 241.9 0.5 225.4 1.0 1.06 1.07
----- ----------- ---------- ---------- ---------- ------------- ------------
(benchmarked with --yjit-cold-threshold=0)
|
|
To fix https://github.com/ruby/ruby/actions/runs/6581593578/job/17881779994
|
|
|
|
* Port call threshold logic from Rust to C for performance
* Prefix global/field names with yjit_
* Fix linker error
* Fix preprocessor condition for rb_yjit_threshold_hit
* Fix third linker issue
* Exclude yjit_calls_at_interv from RJIT bindgen
---------
Co-authored-by: Takashi Kokubun <[email protected]>
|
|
This patch introduce M:N thread scheduler for Ractor system.
In general, M:N thread scheduler employs N native threads (OS threads)
to manage M user-level threads (Ruby threads in this case).
On the Ruby interpreter, 1 native thread is provided for 1 Ractor
and all Ruby threads are managed by the native thread.
From Ruby 1.9, the interpreter uses 1:1 thread scheduler which means
1 Ruby thread has 1 native thread. M:N scheduler change this strategy.
Because of compatibility issue (and stableness issue of the implementation)
main Ractor doesn't use M:N scheduler on default. On the other words,
threads on the main Ractor will be managed with 1:1 thread scheduler.
There are additional settings by environment variables:
`RUBY_MN_THREADS=1` enables M:N thread scheduler on the main ractor.
Note that non-main ractors use the M:N scheduler without this
configuration. With this configuration, single ractor applications
run threads on M:1 thread scheduler (green threads, user-level threads).
`RUBY_MAX_CPU=n` specifies maximum number of native threads for
M:N scheduler (default: 8).
This patch will be reverted soon if non-easy issues are found.
[Bug #19842]
|
|
* YJIT: Add counter to measure how often we compile "cold" ISEQs (#535)
Fix counter name in DEFAULT_COUNTERS
YJIT: add --yjit-cold-threshold, don't compile cold ISEQs
YJIT: increase default cold threshold to 200_000
Remove rb_yjit_call_threshold()
Remove conflict markers
Fix compilation errors
Threshold 1 should compile immediately
Debug deadlock issue with test_ractor
Fix call threshold issue with tests
* Revert exception threshold logic. Document option in yjid.md
* (void) for 0 parameter functions in C99
* Rename iseq_entry_cold => cold_iseq_entry
* Document --yjit-cold-threshold in ruby.c
* Update doc/yjit/yjit.md
Co-authored-by: Jean byroot Boussier <[email protected]>
* Shorten help string to appease test
* Address bug found by Kokubun. Reorder logic.
---------
Co-authored-by: Alan Wu <[email protected]>
Co-authored-by: Jean byroot Boussier <[email protected]>
|
|
This commit moves IO#readline to Ruby. In order to call C functions,
keyword arguments must be converted to hashes. Prior to this commit,
code like `io.readline(chomp: true)` would allocate a hash. This
commits moves the keyword "denaturing" to Ruby, allowing us to send
positional arguments to the C API and avoiding the hash allocation.
Here is an allocation benchmark for the method:
```
x = GC.stat(:total_allocated_objects)
File.open("/usr/share/dict/words") do |f|
f.readline(chomp: true) until f.eof?
end
p ALLOCATIONS: GC.stat(:total_allocated_objects) - x
```
Before this commit, the output was this:
```
$ make run
./miniruby -I./lib -I. -I.ext/common -r./arm64-darwin22-fake ./test.rb
{:ALLOCATIONS=>707939}
```
Now it is this:
```
$ make run
./miniruby -I./lib -I. -I.ext/common -r./arm64-darwin22-fake ./test.rb
{:ALLOCATIONS=>471962}
```
[Bug #19890] [ruby-core:114803]
|
|
All kind of AST nodes use same struct RNode, which has u1, u2, u3 union members
for holding different kind of data.
This has two problems.
1. Low flexibility of data structure
Some nodes, for example NODE_TRUE, don’t use u1, u2, u3. On the other hand,
NODE_OP_ASGN2 needs more than three union members. However they use same
structure definition, need to allocate three union members for NODE_TRUE and
need to separate NODE_OP_ASGN2 into another node.
This change removes the restriction so make it possible to
change data structure by each node type.
2. No compile time check for union member access
It’s developer’s responsibility for using correct member for each node type when it’s union.
This change clarifies which node has which type of fields and enables compile time check.
This commit also changes node_buffer_elem_struct buf management to handle
different size data with alignment.
|
|
|
|
ICLASS does not have the path usually, so it needs to be registered
separately.
|
|
Revert commit "Directly allocate FrozenCore as an ICLASS",
813a5f4fc46a24ca1695d23c159250b9e1080ac7.
|
|
This is an internal only function not exposed to the C extension API.
It's only use so far is from rb_vm_mark, where it's used to mark the
values in the vm->trap_list.cmd array.
There shouldn't be any reason why these cannot move.
This commit allows them to move by updating their references during the
reference updating step of compaction.
To do this we've introduced another internal function
rb_gc_update_values as a partner to rb_gc_mark_values.
This allows us to refactor rb_gc_mark_values to not pin
Notes:
Merged: https://github.com/ruby/ruby/pull/8341
|
|
It's only used once, and it has to equal `ec->cfp`, so just use that.
|
|
For writing THROW_DATA_VAL, being able to see that it's writing to the
same frame after modifying PC and SP is nice.
|
|
Otherwise the ISeq page will constantly be written
into preventing it from being shared.
Notes:
Merged: https://github.com/ruby/ruby/pull/8259
|
|
Co-authored-by: Maxime Chevalier-Boisvert <[email protected]>
Notes:
Merged-By: k0kubun <[email protected]>
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/8182
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/8181
|
|
Revert "Revert "Skip calling jit_exec on Wasm""
This reverts commit 2e94610f70baca4af004202f288a6b5dd10889ca.
It's not about whether it's optimized away or not. I just don't want to
leave and maintain the callsite (e.g. signature) in the path where
YJIT is never built.
|
|
This reverts commit e80752f9bbc5228dba3066cd95a81e2e496bd9d7.
RJIT and YJIT are never enabled on Wasm. When both are disabled,
`jit_exec` is defined to return `Qundef` constantly, and is optimized
away.
Notes:
Merged: https://github.com/ruby/ruby/pull/8176
|
|
We often break Wasm build when we modify how jit_exec works. I'm
planning to modify it again soon.
We actually don't support running Ruby JIT on Wasm, so it doesn't seem
worth the maintenance effort.
|
|
|
|
|
|
Notes:
Merged-By: k0kubun <[email protected]>
|
|
|
|
Remove rb_control_frame_t::__bp__ and optimize bmethod calls
This commit removes the __bp__ field from rb_control_frame_t. It was
introduced to help MJIT, but since MJIT was replaced by RJIT, we can use
vm_base_ptr() to compute it from the SP of the previous control frame
instead. Removing the field avoids needing to set it up when pushing new
frames.
Simply removing __bp__ would cause crashes since RJIT and YJIT used a
slightly different stack layout for bmethod calls than the interpreter.
At the moment of the call, the two layouts looked as follows:
┌────────────┐ ┌────────────┐
│ frame_base │ │ frame_base │
├────────────┤ ├────────────┤
│ ... │ │ ... │
├────────────┤ ├────────────┤
│ args │ │ args │
├────────────┤ └────────────┘<─prev_frame_sp
│ receiver │
prev_frame_sp─>└────────────┘
RJIT & YJIT interpreter
Essentially, vm_base_ptr() needs to compute the address to frame_base
given prev_frame_sp in the diagrams. The presence of the receiver
created an off-by-one situation.
Make the interpreter use the layout the JITs use for iseq-to-iseq
bmethod calls. Doing so removes unnecessary argument shifting and
vm_exec_core() re-entry from the interpreter, yielding a speed
improvement visible through `benchmark/vm_defined_method.yml`:
patched: 7578743.1 i/s
master: 4796596.3 i/s - 1.58x slower
C-to-iseq bmethod calls now store one more VALUE than before, but that
should have negligible impact on overall performance.
Note that re-entering vm_exec_core() used to be necessary for firing
TracePoint events, but that's no longer the case since
9121e57a5f50bc91bae48b3b91edb283bf96cb6b.
Closes ruby/ruby#6428
|
|
* YJIT: refactoring to allow for fancier call threshold logic
* Avoid potentially compiling functions multiple times.
* Update vm.c
Co-authored-by: Alan Wu <[email protected]>
---------
Co-authored-by: Alan Wu <[email protected]>
Notes:
Merged-By: maximecb <[email protected]>
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/8040
|
|
This commit reduces dependency to CRuby object.
Notes:
Merged: https://github.com/ruby/ruby/pull/7950
|
|
It's a bad idea to overwrite the flags as the garbage collector may have
set other flags.
Notes:
Merged: https://github.com/ruby/ruby/pull/7940
|
|
Introduce Universal Parser mode for the parser.
This commit includes these changes:
* Introduce `UNIVERSAL_PARSER` macro. All of CRuby related functions
are passed via `struct rb_parser_config_struct` when this macro is enabled.
* Add CI task with 'cppflags=-DUNIVERSAL_PARSER' for ubuntu.
Notes:
Merged: https://github.com/ruby/ruby/pull/7927
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/7709
|