[#107765] [Ruby master Bug#18605] Fails to run on (newer) 32bit Windows with ucrt — "lazka (Christoph Reiter)" <noreply@...>

Issue #18605 has been reported by lazka (Christoph Reiter).

8 messages 2022/03/03

[#107769] [Ruby master Misc#18609] keyword decomposition in enumerable (question/guidance) — "Ethan (Ethan -)" <noreply@...>

Issue #18609 has been reported by Ethan (Ethan -).

10 messages 2022/03/04

[#107784] [Ruby master Feature#18611] Promote best practice for combining multiple values into a hash code — "chrisseaton (Chris Seaton)" <noreply@...>

Issue #18611 has been reported by chrisseaton (Chris Seaton).

12 messages 2022/03/07

[#107791] [Ruby master Bug#18614] Error (busy loop) inTestGemCommandsSetupCommand#test_destdir_flag_does_not_try_to_write_to_the_default_gem_home — duerst <noreply@...>

Issue #18614 has been reported by duerst (Martin D端rst).

7 messages 2022/03/08

[#107794] [Ruby master Feature#18615] Use -Werror=implicit-function-declaration by deault for building C extensions — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18615 has been reported by Eregon (Benoit Daloze).

11 messages 2022/03/08

[#107832] [Ruby master Bug#18622] const_get still looks in Object, while lexical constant lookup no longer does — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18622 has been reported by Eregon (Benoit Daloze).

16 messages 2022/03/10

[#107847] [Ruby master Bug#18625] ruby2_keywords does not unmark the hash if the receiving method has a *rest parameter — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18625 has been reported by Eregon (Benoit Daloze).

13 messages 2022/03/11

[#107886] [Ruby master Feature#18630] Introduce general `IO#timeout` and `IO#timeout=`for all (non-)blocking operations. — "ioquatix (Samuel Williams)" <noreply@...>

Issue #18630 has been reported by ioquatix (Samuel Williams).

28 messages 2022/03/14

[#108026] [Ruby master Feature#18654] Enhancements to prettyprint — "kddeisz (Kevin Newton)" <noreply@...>

Issue #18654 has been reported by kddeisz (Kevin Newton).

9 messages 2022/03/22

[#108039] [Ruby master Feature#18655] Merge `IO#wait_readable` and `IO#wait_writable` into core — "byroot (Jean Boussier)" <noreply@...>

Issue #18655 has been reported by byroot (Jean Boussier).

10 messages 2022/03/23

[#108056] [Ruby master Bug#18658] Need openssl 3 support for Ubuntu 22.04 (Ruby 2.7.x and 3.0.x) — "schneems (Richard Schneeman)" <noreply@...>

Issue #18658 has been reported by schneems (Richard Schneeman).

19 messages 2022/03/24

[#108075] [Ruby master Bug#18663] Autoload doesn't work with fiber context switch. — "ioquatix (Samuel Williams)" <noreply@...>

Issue #18663 has been reported by ioquatix (Samuel Williams).

10 messages 2022/03/25

[#108117] [Ruby master Feature#18668] Merge `io-nonblock` gems into core — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18668 has been reported by Eregon (Benoit Daloze).

22 messages 2022/03/30

[ruby-core:107962] [Ruby master Bug#18641] UTF-16 surrogate pairs

From: duerst <noreply@...>
Date: 2022-03-18 00:24:22 UTC
List: ruby-core #107962
Issue #18641 has been updated by duerst (Martin Dürst).

Status changed from Open to Rejected

`"\uD83D\uDC69"` tries to create an UTF-8 string with surrogates. In UTF-8, surrogates are not allowed, and therefore you get an error. Adding `.force_encoding(Encoding::UTF_16)` does not change any of this, the error has already happened. It is also conceptually wrong, because it would label a sequence of UTF-8 bytes as UTF-16, which would give very strange results.

If you want the 'woman' emoji in UTF-16, then here are some choices:
```
"\u{1F469}".encode('UTF-16') # but this will prepend \uFEFF
"👩".encode('UTF-16') # but this will prepend \uFEFF
[0xD83D, 0xDC69]..pack('S>*').force_encoding('UTF-16')
```

If it's something else that you want, please tell us what you want. Also, please note that the above worked on two of my systems, but may not work on your system, because it depends on the endianness of UTF-16 (whether it is actually UTF-16BE or UTF-16LE).

----------------------------------------
Bug #18641: UTF-16 surrogate pairs
https://bugs.ruby-lang.org/issues/18641#change-96911

* Author: noraj (Alexandre ZANNI)
* Status: Rejected
* Priority: Normal
* ruby -v: ruby 3.1.1p18 (2022-02-18 revision 53f5fc4236) [x86_64-linux]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
That Ruby triggers an *invalid Unicode codepoint* error while using surrogate pairs in an UTF-8 string is expected, however those codepoints should be valid in an UTF-16 string.
It is also expected that unpaired surrogates are invalid however paired surrogates are valid cf. https://unicode.org/faq/utf_bom.html#utf16-7.

Version tested: 3.0.3p157, 3.1.0p0 and 3.1.1p18

``` ruby
➜ irb
irb(main):001:0> a = ''.force_encoding(Encoding::UTF_16)
=> ""
irb(main):002:0> a += "\uD83D\uDC69".force_encoding(Encoding::UTF_16)
/home/noraj/.asdf/installs/ruby/3.1.0/lib/ruby/3.1.0/irb/workspace.rb:119:in `eval': (irb):2: invalid Unicode codepoint (SyntaxError)                                                                            
a += "\uD83D\uDC69".force_encoding(Encodi...                                                
        ^~~~                                                                                
(irb):2: invalid Unicode codepoint                                                          
a += "\uD83D\uDC69".force_encoding(Encoding::UT...                                          
              ^~~~                                                                          
        from /home/noraj/.asdf/installs/ruby/3.1.0/lib/ruby/gems/3.1.0/gems/irb-1.4.1/exe/irb:11:in `<top (required)>'                                                                                           
        from /home/noraj/.asdf/installs/ruby/3.1.0/bin/irb:25:in `load'                     
        from /home/noraj/.asdf/installs/ruby/3.1.0/bin/irb:25:in `<main>'
```

Also see [Unicode 14.0 Implementation Guidelines - 5.4 Handling Surrogate Pairs in UTF-16](https://www.unicode.org/versions/Unicode14.0.0/ch05.pdf)



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:[email protected]?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread