Age | Commit message (Collapse) | Author |
|
[Bug #21380] Prohibit modification in String#split block
Reported at https://hackerone.com/reports/3163876
|
|
Fix a race condition with interned strings sweeping.
[Bug #21172]
This fixes a rare CI failure.
The timeline of the race condition is:
- A `"foo" oid=1` string is interned.
- `"foo" oid=1` is no longer referenced and will be swept in the future.
- Another `"foo" oid=2` string is interned.
- `register_fstring` finds `"foo" oid=1`, but since it is about to be swept,
removes it from `fstring_table` and insert `"foo" oid=2` instead.
- `"foo" oid=1` is swept, since it has the `RSTRING_FSTR` flag,
a `st_delete` is issued in `fstring_table` which removes `"foo" oid=2`.
I don't know how to reproduce this bug consistently in a single test
case.
|
|
6b4f8945d600168bf530d21395da8293fbd5e8ba: [Backport #20909]
Check negative integer underflow
Many of Oniguruma functions need valid encoding strings
|
|
[Bug #20585]
This was changed in 36a06efdd9f0604093dccbaf96d4e2cb17874dc8 because
`String.new(1024)` would end up allocating `1025` bytes, but the problem
with this change is that the caller may be trying to right size a String.
So instead, we should just better document the behavior of `capacity:`.
Co-authored-by: Jean Boussier <[email protected]>
|
|
[Bug #20322] Fix rb_enc_interned_str_cstr null encoding
The documentation for `rb_enc_interned_str_cstr` notes that `enc` can be
a null pointer, but this currently causes a segmentation fault when
trying to autoload the encoding. This commit fixes the issue by checking
for NULL before calling `rb_enc_autoload`.
|
|
[Bug #20292] Truncate embedded string to new capacity
|
|
Make io_fwrite safe for compaction
[Bug #20169]
Embedded strings are not safe for system calls without the GVL because
compaction can cause pages to be locked causing the operation to fail
with EFAULT. This commit changes io_fwrite to use rb_str_tmp_frozen_no_embed_acquire,
which guarantees that the return string is not embedded.
|
|
#20190] (#10300)
Fix coderange of invalid_encoding_string.<<(ord)
Appending valid encoding character can change coderange from invalid to valid.
Example: "\x95".force_encoding('sjis')<<0x5C will be a valid string "\x{955C}"
|
|
#20150] (#10253)
Fix memory leak in grapheme clusters
[Bug #20150]
String#grapheme_cluters and String#each_grapheme_cluster leaks memory
because if the string is not UTF-8, then the created regex will not
be freed.
For example:
str = "hello world".encode(Encoding::UTF_32LE)
10.times do
1_000.times do
str.grapheme_clusters
end
puts `ps -o rss= -p #{$$}`
end
Before:
26000
42256
59008
75792
92528
109232
125936
142672
159392
176160
After:
9264
9504
9808
10000
10128
10224
10352
10544
10704
10896
---
string.c | 98 +++++++++++++++++++++++++++++++-----------------
test/ruby/test_string.rb | 11 ++++++
2 files changed, 75 insertions(+), 34 deletions(-)
|
|
The test fails when RGENGC_CHECK_MODE is turned on:
1) Failure:
TestSymbol#test_inspect_under_gc_compact_stress [test/ruby/test_symbol.rb:123]:
<":testing"> expected but was
<":\x00\x00\x00\x00\x00\x00\x00">.
|
|
The test fails when RGENGC_CHECK_MODE is turned on:
TestString#test_sub_gc_compact_stress = 9.42 s
1) Failure:
TestString#test_sub_gc_compact_stress [test/ruby/test_string.rb:2089]:
<"aaa [amp] yyy"> expected but was
<"aaa [] yyy">.
|
|
|
|
|
|
String#chomp! returned nil without checking the number of passed
arguments in this case.
|
|
|
|
Embedded shared strings cannot be moved because strings point into the
slot of the shared string. There may be code using the RSTRING_PTR on
the stack, which would pin the string but not pin the shared string,
causing it to move.
|
|
We need to guard match from GC because otherwise it could end up being
reclaimed or moved in compaction.
|
|
We need to guard match from GC because otherwise it could end up being
reclaimed or moved in compaction.
|
|
`String#+@` is 2-3 times faster than `String#dup` because it can
directly go through `rb_str_dup` instead of using the generic
much slower `rb_obj_dup`.
This fact led to the existance of the ugly `Performance/UnfreezeString`
rubocop performance rule that encourage users to rewrite the much
more readable and convenient `"foo".dup` into the ugly `(+"foo")`.
Let's make that rubocop rule useless.
```
compare-ruby: ruby 3.3.0dev (2023-11-20T02:02:55Z master 701b0650de) [arm64-darwin22]
last_commit=[ruby/prism] feat: add encoding for IBM865 (https://github.com/ruby/prism/pull/1884)
built-ruby: ruby 3.3.0dev (2023-11-20T12:51:45Z faster-str-lit-dup 6b745bbc5d) [arm64-darwin22]
warming up..
| |compare-ruby|built-ruby|
|:------|-----------:|---------:|
|uplus | 16.312M| 16.332M|
| | -| 1.00x|
|dup | 5.912M| 16.329M|
| | -| 2.76x|
```
|
|
Some code out there blind calls `force_encoding` without checking
what the original encoding was, which clears the coderange uselessly.
If the String is big, it can be a rather costly mistake.
For instance the `rack-utf8_sanitizer` gem does this on request
bodies.
|
|
|
|
If the required capacity would fit in an embded string,
returns one.
This can reduce malloc churn for code that use string buffers.
|
|
|
|
|
|
Previously we used the next character following the found prefix to
determine if the match ended on a broken character.
This had caused surprising behaviour when a valid character was followed
by a UTF-8 continuation byte.
This commit changes the behaviour to instead look for the end of the
last character in the prefix.
[Bug #19784]
Co-authored-by: ywenc <[email protected]>
Co-authored-by: Nobuyoshi Nakada <[email protected]>
Notes:
Merged: https://github.com/ruby/ruby/pull/8348
|
|
- String#start_with?
- String#delete_prefix
- String#delete_prefix!
Notes:
Merged: https://github.com/ruby/ruby/pull/8296
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/8296
|
|
Previously, the following crashed due to use-after-free
with AArch64 Alpine Linux 3.18.3 (aarch64-linux-musl):
```ruby
str = 'a' * (32*1024*1024)
p({z: str})
```
32 MiB is the default for `GC_MALLOC_LIMIT_MAX`, and the crash
could be dodged by setting `RUBY_GC_MALLOC_LIMIT_MAX` to large values.
Under a debugger, one can see the `str2` of rb_str_buf_append()
getting prematurely collected while str_buf_cat4() allocates capacity.
Add GC guards so the buffer of `str2` lives across the GC run
initiated in str_buf_cat4().
[Bug #19792]
|
|
|
|
We don't need to check for STR_NOEMBED because the check above for
STR_EMBED_P means that it can never be false.
Notes:
Merged: https://github.com/ruby/ruby/pull/8238
|
|
Notes:
Merged-By: peterzhu2118 <[email protected]>
|
|
Fix str_subseq so that it does not attempt to predict the size of the
object returned by str_alloc_heap.
Notes:
Merged: https://github.com/ruby/ruby/pull/8165
|
|
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/8080
Merged-By: nobu <[email protected]>
|
|
So that irregular parts may be more noticeable.
Notes:
Merged: https://github.com/ruby/ruby/pull/8047
|
|
Leave callers to convert byte index to char index, as well as
`rb_str_index`, so that `rb_str_rpartition` does not need to
re-convert char index to byte index.
Notes:
Merged: https://github.com/ruby/ruby/pull/8047
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/8045
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/7991
|
|
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/7988
|
|
Notes:
Merged-By: peterzhu2118 <[email protected]>
|
|
When String#split is used with an empty string as the field seperator it
effectively splits the original string into chars, and there is a
pre-existing fast path for this using SPLIT_TYPE_CHARS.
However this path creates an empty array in the smallest size pool and
grows from there, despite already knowing the size of the desired array.
This commit pre-allocates the correct size array in this case in order
to allow the arrays to be embedded and avoid being allocated in the
transient heap
Notes:
Merged: https://github.com/ruby/ruby/pull/7919
|
|
* Unify length field for embedded and heap strings
The length field is of the same type and position in RString for both
embedded and heap allocated strings, so we can unify it.
* Remove RSTRING_EMBED_LEN
Notes:
Merged-By: maximecb <[email protected]>
|
|
The length of an embedded string is no longer in the flags.
|
|
The capacity of the string can be calculated using the str_capacity
function.
Notes:
Merged: https://github.com/ruby/ruby/pull/7879
|
|
The call to RSTRING_GETMEM already fetched the pointer and length, so we
don't need to fetch it again.
Notes:
Merged: https://github.com/ruby/ruby/pull/7879
|
|
The STR_DEC_LEN macro is not used.
|
|
NEWOBJ_OF is now our canonical newobj macro. It takes an optional ec
Notes:
Merged: https://github.com/ruby/ruby/pull/7393
|
|
Remove !USE_RVARGC code
[Feature #19579]
The Variable Width Allocation feature was turned on by default in Ruby
3.2. Since then, we haven't received bug reports or backports to the
non-Variable Width Allocation code paths, so we assume that nobody is
using it. We also don't plan on maintaining the non-Variable Width
Allocation code, so we are going to remove it.
Notes:
Merged-By: maximecb <[email protected]>
|
|
|