summaryrefslogtreecommitdiff
path: root/string.c
AgeCommit message (Collapse)Author
24 hours[Bug #20998] Check if the string is frozen in rb_str_locktmp() & ↵Benoit Daloze
rb_str_unlocktmp() Notes: Merged: https://github.com/ruby/ruby/pull/13615
4 daysGet rid of FL_EXIVARJean Boussier
Now that the shape_id gives us all the same information, it's no longer needed. Notes: Merged: https://github.com/ruby/ruby/pull/13612
4 daysUse the `shape_id` rather than `FL_EXIVAR`Jean Boussier
We still keep setting `FL_EXIVAR` so that `rb_shape_verify_consistency` can detect discrepancies. Notes: Merged: https://github.com/ruby/ruby/pull/13612
4 daysAdd SHAPE_ID_HAS_IVAR_MASK for quick ivar checkJean Boussier
This allow checking if an object has ivars with just a shape_id mask. Notes: Merged: https://github.com/ruby/ruby/pull/13606
2025-05-29[Bug #21380] Prohibit modification in String#split blockNobuyoshi Nakada
Reported at https://hackerone.com/reports/3163876 Notes: Merged: https://github.com/ruby/ruby/pull/13462
2025-05-27Rename `rb_shape_set_shape_id` in `rb_obj_set_shape_id`Jean Boussier
Notes: Merged: https://github.com/ruby/ruby/pull/13450
2025-05-26[DOC] More tweaks for String#byteindexBurdetteLamar
Notes: Merged: https://github.com/ruby/ruby/pull/13440
2025-05-26Add shape_id to RBasic under 32 bitJohn Hawthorn
This makes `RBobject` `4B` larger on 32 bit systems but simplifies the implementation a lot. [Feature #21353] Co-authored-by: Jean Boussier <[email protected]> Notes: Merged: https://github.com/ruby/ruby/pull/13341
2025-05-25Use RB_VM_LOCKINGNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/13439
2025-05-22[DOC] Tweaks for String#byteindexBurdetteLamar
Notes: Merged: https://github.com/ruby/ruby/pull/13365
2025-05-16[DOC] Tweaks for String#append_as_bytesBurdette Lamar
Notes: Merged: https://github.com/ruby/ruby/pull/13352 Merged-By: peterzhu2118 <[email protected]>
2025-05-16[DOC] Tweaks for String#bBurdetteLamar
Notes: Merged: https://github.com/ruby/ruby/pull/13354
2025-05-16[DOC] Tweaks for String#ascii_only?BurdetteLamar
Notes: Merged: https://github.com/ruby/ruby/pull/13353
2025-05-15[DOC] Tweaks for String#=~ (#13325)Burdette Lamar
Notes: Merged-By: peterzhu2118 <[email protected]>
2025-05-14[DOC] Tweaks for String#<< (#13306)Burdette Lamar
Notes: Merged-By: peterzhu2118 <[email protected]>
2025-05-14[DOC] Tweaks for String#== (#13323)Burdette Lamar
Notes: Merged-By: peterzhu2118 <[email protected]>
2025-05-14[DOC] Tweaks for String#[] (#13335)Burdette Lamar
Notes: Merged-By: peterzhu2118 <[email protected]>
2025-05-14[DOC] Tweaks for String#[]=BurdetteLamar
Notes: Merged: https://github.com/ruby/ruby/pull/13336
2025-05-13[DOC] Tweaks for String#<=>BurdetteLamar
Notes: Merged: https://github.com/ruby/ruby/pull/13320
2025-05-13[DOC] Remove a garbageNobuyoshi Nakada
2025-05-12[DOC] Tweak for String#+@ (#13285)Burdette Lamar
Notes: Merged-By: peterzhu2118 <[email protected]>
2025-05-08[DOC] Tweaks for What's HereBurdetteLamar
Notes: Merged: https://github.com/ruby/ruby/pull/13281
2025-05-08[DOC] Tweaks for String#-@Burdette Lamar
Notes: Merged: https://github.com/ruby/ruby/pull/13254 Merged-By: peterzhu2118 <[email protected]>
2025-05-08Move `object_id` in object fields.Jean Boussier
And get rid of the `obj_to_id_tbl` It's no longer needed, the `object_id` is now stored inline in the object alongside instance variables. We still need the inverse table in case `_id2ref` is invoked, but we lazily build it by walking the heap if that happens. The `object_id` concern is also no longer a GC implementation concern, but a generic implementation. Co-Authored-By: Matt Valentine-House <[email protected]> Notes: Merged: https://github.com/ruby/ruby/pull/13159
2025-05-04[DOC] Tweaks for String#+BurdetteLamar
Notes: Merged: https://github.com/ruby/ruby/pull/13252
2025-05-04[DOC] Tweaks for String#*BurdetteLamar
Notes: Merged: https://github.com/ruby/ruby/pull/13251
2025-05-04[DOC] Tweaks for String#%BurdetteLamar
Notes: Merged: https://github.com/ruby/ruby/pull/13244
2025-05-01[DOC] Tweaks for String.newBurdette Lamar
Notes: Merged: https://github.com/ruby/ruby/pull/13027 Merged-By: peterzhu2118 <[email protected]>
2025-04-30Suppress gcc 15 unterminated-string-initialization warningsNobuyoshi Nakada
2025-04-23Fix comparison of signed and unsigned integersJean Boussier
``` ../string.c:660:38: warning: comparison of integers of different signs: 'rb_atomic_t' (aka 'unsigned int') and 'int' [-Wsign-compare] 660 | RUBY_ASSERT(table->count < table->capacity / 2); ``` Notes: Merged: https://github.com/ruby/ruby/pull/13160
2025-04-19Fix style [ci skip]Nobuyoshi Nakada
2025-04-19Implement dsize function for `fstring_table_type`Jean Boussier
The fstring table size used to be reported as part of the VM size, but since it was refactored to be lock-less it was no longer reported. Since it's now wrapped by a `T_DATA`, we can implement its `dsize` function and get a valuable insight into the size of the table. ``` {"address":"0x100ebff18", "type":"DATA", "shape_id":0, "slot_size":80, "struct":"VM/fstring_table", "memsize":131176, ... ``` Notes: Merged: https://github.com/ruby/ruby/pull/13138
2025-04-19Fix style of recent fstring featureJean Boussier
Notes: Merged: https://github.com/ruby/ruby/pull/13137
2025-04-18Lock-free hash set for fstrings [Feature #21268]John Hawthorn
This implements a hash set which is wait-free for lookup and lock-free for insert (unless resizing) to use for fstring de-duplication. As highlighted in https://bugs.ruby-lang.org/issues/19288, heavy use of fstrings (frozen interned strings) can significantly reduce the parallelism of Ractors. I tried a few other approaches first: using an RWLock, striping a series of RWlocks (partitioning the hash N-ways to reduce lock contention), and putting a cache in front of it. All of these improved the situation, but were unsatisfying as all still required locks for writes (and granular locks are awkward, since we run the risk of needing to reach a vm barrier) and this table is somewhat write-heavy. My main reference for this was Cliff Click's talk on a lock free hash-table for java https://www.youtube.com/watch?v=HJ-719EGIts. It turns out this lock-free hash set is made easier to implement by a few properties: * We only need a hash set rather than a hash table (we only need keys, not values), and so the full entry can be written as a single VALUE * As a set we only need lookup/insert/delete, no update * Delete is only run inside GC so does not need to be atomic (It could be made concurrent) * I use rb_vm_barrier for the (rare) table rebuilds (It could be made concurrent) We VM lock (but don't require other threads to stop) for table rebuilds, as those are rare * The conservative garbage collector makes deferred replication easy, using a T_DATA object Another benefits of having a table specific to fstrings is that we compare by value on lookup/insert, but by identity on delete, as we only want to remove the exact string which is being freed. This is faster and provides a second way to avoid the race condition in https://bugs.ruby-lang.org/issues/21172. This is a pretty standard open-addressing hash table with quadratic probing. Similar to our existing st_table or id_table. Deletes (which happen on GC) replace existing keys with a tombstone, which is the only type of update which can occur. Tombstones are only cleared out on resize. Unlike st_table, the VALUEs are stored in the hash table itself (st_table's bins) rather than as a compact index. This avoids an extra pointer dereference and is possible because we don't need to preserve insertion order. The table targets a load factor of 2 (it is enlarged once it is half full). Notes: Merged: https://github.com/ruby/ruby/pull/12921
2025-04-18Extract rb_gc_free_fstring to string.cJohn Hawthorn
This allows more flexibility in how we deal with the fstring table Notes: Merged: https://github.com/ruby/ruby/pull/12921
2025-04-14Assert the GVL is held when performing various `rb_` functions.Samuel Williams
[Feature #20877] Notes: Merged: https://github.com/ruby/ruby/pull/11975
2025-04-02[DOC] Tweaks to String::try_convertBurdette Lamar
Notes: Merged: https://github.com/ruby/ruby/pull/13030 Merged-By: peterzhu2118 <[email protected]>
2025-03-27Freeze $/ and make it ractor safeÉtienne Barrié
[Feature #21109] By always freezing when setting the global rb_rs variable, we can ensure it is not modified and can be accessed from a ractor. We're also making sure it's an instance of String and does not have any instance variables. Of course, if $/ is changed at runtime, it may cause surprising behavior but doing so is deprecated already anyway. Co-authored-by: Jean Boussier <[email protected]> Notes: Merged: https://github.com/ruby/ruby/pull/12975
2025-03-08string.c: Improve `fstring_hash` to reduce collisionsJean Boussier
`rb_str_hash` doesn't include the encoding for ASCII only strings because ASCII only strings are equal regardless of their encoding. But in the case if the `fstring_table`, two identical ASCII strings with different encodings aren't equal. Given it's common to have both `:foo` (or `def foo`) and `"foo"` in the same source code, this causes a lot of collisions in the `fstring_table`. Notes: Merged: https://github.com/ruby/ruby/pull/12881
2025-03-05Fix a race condition with interned strings sweeping.Jean Boussier
[Bug #21172] This fixes a rare CI failure. The timeline of the race condition is: - A `"foo" oid=1` string is interned. - `"foo" oid=1` is no longer referenced and will be swept in the future. - Another `"foo" oid=2` string is interned. - `register_fstring` finds `"foo" oid=1`, but since it is about to be swept, removes it from `fstring_table` and insert `"foo" oid=2` instead. - `"foo" oid=1` is swept, since it has the `RSTRING_FSTR` flag, a `st_delete` is issued in `fstring_table` which removes `"foo" oid=2`. I don't know how to reproduce this bug consistently in a single test case. Notes: Merged: https://github.com/ruby/ruby/pull/12857
2025-02-24String#gsub! Elide MatchData allocation when we know it can't escapeJean Boussier
In gsub is used with a string replacement or a map that doesn't have a default proc, we know for sure no code can cause the MatchData to escape the `gsub` call. In such case, we still have to allocate a new MatchData because we don't know what is the lifetime of the backref, but for any subsequent match we can re-use the MatchData we allocated ourselves, reducing allocations significantly. This partially fixes [Misc #20652], except when a block is used, and partially reduce the performance impact of abc0304cb28cb9dcc3476993bc487884c139fd11 / [Bug #17507] ``` compare-ruby: ruby 3.5.0dev (2025-02-24T09:44:57Z master 5cf146399f) +PRISM [arm64-darwin24] built-ruby: ruby 3.5.0dev (2025-02-24T10:58:27Z gsub-elude-match da966636e9) +PRISM [arm64-darwin24] warming up.... | |compare-ruby|built-ruby| |:----------------|-----------:|---------:| |escape | 3.577k| 3.697k| | | -| 1.03x| |escape_bin | 5.869k| 6.743k| | | -| 1.15x| |escape_utf8 | 3.448k| 3.738k| | | -| 1.08x| |escape_utf8_bin | 6.361k| 7.267k| | | -| 1.14x| ``` Co-Authored-By: Étienne Barrié <[email protected]>
2025-02-12Elide string allocation when using `String#gsub` in MAP modeJean Boussier
If the provided Hash doesn't have a default proc, we know for sure that we'll never call into user provided code, hence the string we allocate to access the Hash can't possibly escape. So we don't actually have to allocate it, we can use a fake_str, AKA a stack allocated string. ``` compare-ruby: ruby 3.5.0dev (2025-02-10T13:47:44Z master 3fb455adab) +PRISM [arm64-darwin23] built-ruby: ruby 3.5.0dev (2025-02-10T17:09:52Z opt-gsub-alloc ea5c28958f) +PRISM [arm64-darwin23] warming up.... | |compare-ruby|built-ruby| |:----------------|-----------:|---------:| |escape | 3.374k| 3.722k| | | -| 1.10x| |escape_bin | 5.469k| 6.587k| | | -| 1.20x| |escape_utf8 | 3.465k| 3.734k| | | -| 1.08x| |escape_utf8_bin | 5.752k| 7.283k| | | -| 1.27x| ``` Notes: Merged: https://github.com/ruby/ruby/pull/12730
2025-01-22[DOC] Fix code markup in String#matchKouhei Yanagita
Notes: Merged: https://github.com/ruby/ruby/pull/12608
2025-01-12[Doc] Encourage use of encoding constantsJean Boussier
Lots of documentation examples still use encoding APIs with encoding names rather than encoding constants. I think it would be preferable to direct users toward constants as it can help with auto-completion, static analysis and such. Notes: Merged: https://github.com/ruby/ruby/pull/12552
2025-01-02[DOC] Exclude 'Class' and 'Module' from RDoc's autolinkingNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/12496
2024-12-13[DOC] [Feature #20205] Document the new power of String#+@Alan Wu
Notes: Merged: https://github.com/ruby/ruby/pull/12341
2024-11-27Optimize `rb_must_asciicompat`Jean Boussier
While profiling `strscan`, I noticed `rb_must_asciicompat` was quite slow, as more than 5% of the benchmark was spent in it: https://share.firefox.dev/49bOcTn By checking for the common 3 ASCII compatible encoding index first, we can skip a lot of expensive operations in the happy path. Notes: Merged: https://github.com/ruby/ruby/pull/12180
2024-11-26Many of Oniguruma functions need valid encoding stringsNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/12169
2024-11-26Check negative integer underflowNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/12169
2024-11-25Place all non-default GC API behind USE_SHARED_GCMatt Valentine-House
So that it doesn't get included in the generated binaries for builds that don't support loading shared GC modules Co-Authored-By: Peter Zhu <[email protected]> Notes: Merged: https://github.com/ruby/ruby/pull/12149