summaryrefslogtreecommitdiff
path: root/lib/set.rb
AgeCommit message (Collapse)Author
2025-04-26Implement Set as a core classJeremy Evans
Set has been an autoloaded standard library since Ruby 3.2. The standard library Set is less efficient than it could be, as it uses Hash for storage, which stores unnecessary values for each key. Implementation details: * Core Set uses a modified version of `st_table`, named `set_table`. than `s/st_/set_/`, the main difference is that the stored records do not have values, making them 1/3 smaller. `st_table_entry` stores `hash`, `key`, and `record` (value), while `set_table_entry` only stores `hash` and `key`. This results in large sets using ~33% less memory compared to stdlib Set. For small sets, core Set uses 12% more memory (160 byte object slot and 64 malloc bytes, while stdlib set uses 40 for Set and 160 for Hash). More memory is used because the set_table is embedded and 72 bytes in the object slot are currently wasted. Hopefully we can make this more efficient and have it stored in an 80 byte object slot in the future. * All methods are implemented as cfuncs, except the pretty_print methods, which were moved to `lib/pp.rb` (which is where the pretty_print methods for other core classes are defined). As is typical for core classes, internal calls call C functions and not Ruby methods. For example, to check if something is a Set, `rb_obj_is_kind_of` is used, instead of calling `is_a?(Set)` on the related object. * Almost all methods use the same algorithm that the pure-Ruby implementation used. The exception is when calling `Set#divide` with a block with 2-arity. The pure-Ruby method used tsort to implement this. I developed an algorithm that only allocates a single intermediate hash and does not need tsort. * The `flatten_merge` protected method is no longer necessary, so it is not implemented (it could be). * Similar to Hash/Array, subclasses of Set are no longer reflected in `inspect` output. * RDoc from stdlib Set was moved to core Set, with minor updates. This includes a comprehensive benchmark suite for all public Set methods. As you would expect, the native version is faster in the vast majority of cases, and multiple times faster in many cases. There are a few cases where it is significantly slower: * Set.new with no arguments (~1.6x) * Set#compare_by_identity for small sets (~1.3x) * Set#clone for small sets (~1.5x) * Set#dup for small sets (~1.7x) These are slower as Set does not currently use the AR table optimization that Hash does, so a new set_table is initialized for each call. I'm not sure it's worth the complexity to have an AR table-like optimization for small sets (for hashes it makes sense, as small hashes are used everywhere in Ruby). The rbs and repl_type_completor bundled gems will need updates to support core Set. The pull request marks them as allowed failures. This passes all set tests with no changes. The following specs needed modification: * Modifying frozen set error message (changed for the better) * `Set#divide` when passed a 2-arity block no longer yields the same object as both the first and second argument (this seems like an issue with the previous implementation). * Set-like objects that override `is_a?` such that `is_a?(Set)` return `true` are no longer treated as Set instances. * `Set.allocate.hash` is no longer the same as `nil.hash` * `Set#join` no longer calls `Set#to_a` (it calls the underlying C function). * `Set#flatten_merge` protected method is not implemented. Previously, `set.rb` added a `SortedSet` autoload, which loads `set/sorted_set.rb`. This replaces the `Set` autoload in `prelude.rb` with a `SortedSet` autoload, but I recommend removing it and `set/sorted_set.rb`. This moves `test/set/test_set.rb` to `test/ruby/test_set.rb`, reflecting that switch to a core class. This does not move the spec files, as I'm not sure how they should be handled. Internally, this uses the st_* types and functions as much as possible, and only adds set_* types and functions as needed. The underlying set_table implementation is stored in st.c, but there is no public C-API for it, nor is there one planned, in order to keep the ability to change the internals going forward. For internal uses of st_table with Qtrue values, those can probably be replaced with set_table. To do that, include internal/set_table.h. To handle symbol visibility (rb_ prefix), internal/set_table.h uses the same macro approach that include/ruby/st.h uses. The Set class (rb_cSet) and all methods are defined in set.c. There isn't currently a C-API for the Set class, though C-API functions can be added as needed going forward. Implements [Feature #21216] Co-authored-by: Jean Boussier <[email protected]> Co-authored-by: Oliver Nutter <[email protected]>
2024-12-02[ruby/set] Fix ^ to respect subclassesKouhei Yanagita
https://github.com/ruby/set/commit/f88ecdef6b
2024-12-02[ruby/set] Speed up Set#flattenKouhei Yanagita
Improved performance by ensuring that identical `Set` objects are processed only once. https://github.com/ruby/set/commit/cadb686e93
2024-11-29[ruby/set] Bump VERSION to 1.1.1Akinori MUSHA
https://github.com/ruby/set/commit/1c3cded76a
2024-09-19Added missing block argHiroshi SHIBATA
2024-09-19[ruby/set] 2024Akinori MUSHA
https://github.com/ruby/set/commit/ea95c5a3d2
2024-09-19[ruby/set] Reword the document for to_a and clarify the implementation notesAkinori MUSHA
ref. https://github.com/ruby/ruby/pull/11453 https://github.com/ruby/set/commit/3cf6d11bd2
2023-12-23[ruby/set] Bump version to 1.1.0Akinori MUSHA
https://github.com/ruby/set/commit/d6cab5bcc8
2023-12-23[ruby/set] Use the pattern argument instead of a blockAkinori MUSHA
https://github.com/ruby/set/commit/c63047c2ce
2023-12-23[ruby/set] The arity of initialize_clone is -1 in Ruby >= 3Akinori MUSHA
https://github.com/ruby/set/commit/32a9689696
2023-12-23[ruby/set] Drop support for Ruby 2Akinori MUSHA
https://github.com/ruby/set/commit/64dad673d8
2023-12-08[ruby/set] Bump version to 1.0.4Akinori MUSHA
https://github.com/ruby/set/commit/efc8c8c9f5
2023-04-25[ruby/set] Update lib/set.rbAkinori MUSHA
https://github.com/ruby/set/commit/bc59f85f2f
2023-04-25[ruby/set] Expose Set::VERSIONHiroshi SHIBATA
https://github.com/ruby/set/commit/d39b33f463
2023-02-24[ruby/set] Set#merge does not take keyword arguments as a HashAkinori MUSHA
https://github.com/ruby/set/commit/ca1c9532a9
2023-02-24[ruby/set] Set#merge takes many enumerable objects like Hash#merge! doesAkinori MUSHA
https://github.com/ruby/set/commit/becaca994d
2023-02-15[DOC] remove redundant paragraph at set.rb (#6472)TJ
remove redundant paragraph at set.rb Notes: Merged-By: hsbt <[email protected]>
2023-01-11[ruby/set] Avoid the `block or return` pattern to save Proc allocationsJean Boussier
Using the block param in a boolean context like this cause it to be allocated. Using it with an `if` or `unless` was optimized in 3.2 (https://github.com/ruby/ruby/pull/6286) but using it with `or` or `and` wasn't. ```ruby def foo(&block) block or return 1 end puts RubyVM::InstructionSequence.of(method(:foo)).disasm == disasm: #<ISeq:foo@(irb):11 (11,0)-(13,3)> (catch: false) local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: 0, kw: -1@-1, kwrest: -1]) [ 1] block@0<Block> 0000 getblockparam block@0, 0 ( 12)[LiCa] 0003 dup 0004 branchif 10 0006 pop 0007 putobject_INT2FIX_1_ 0008 leave [Re] 0009 putnil 0010 leave ``` versus ``` def foo(&block) return 1 if block end puts RubyVM::InstructionSequence.of(method(:foo)).disasm == disasm: #<ISeq:foo@(irb):15 (15,0)-(17,3)> (catch: false) local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: 0, kw: -1@-1, kwrest: -1]) [ 1] block@0<Block> 0000 getblockparamproxy block@0, 0 ( 16)[LiCa] 0003 branchunless 7 0005 putobject_INT2FIX_1_ 0006 leave ( 17)[Re] 0007 putnil ( 16) 0008 leave ``` https://github.com/ruby/set/commit/e89da977d4
2022-10-12[DOC] Replace the external URIs to docs with rdoc-refNobuyoshi Nakada
2022-05-16[ruby/set] Fix a typoKazuhiro NISHIYAMA
https://github.com/ruby/set/commit/71a876ae81
2022-04-16[ruby/set] Repair format for What's HereBurdetteLamar
https://github.com/ruby/set/commit/292baacb60
2022-02-18Make Set a builtin feature [Feature #16989]Akinori MUSHA
Notes: Merged: https://github.com/ruby/ruby/pull/5563
2021-09-28[ruby/set] Make Set#pretty_print IRB::ColorPrinter friendlyKazuki Tsujimoto
https://github.com/ruby/set/commit/f467028cdb
2021-07-29[ruby/set] Improve What's Here linksBurdetteLamar
https://github.com/ruby/set/commit/76b056c3b9
2021-07-29[ruby/set] Improve What's Here linksBurdetteLamar
https://github.com/ruby/set/commit/dd787a3988
2021-07-29[ruby/set] Update documentation for intersect?/disjoint?Jeremy Evans
https://github.com/ruby/set/commit/35b69e9d69
2021-07-29[ruby/set] Allow the use of any enumerable in intersect?/disjoint?Jeremy Evans
https://github.com/ruby/set/commit/1a73ab9047
2021-07-29[ruby/set] Allow Set#intersect? and #disjoint? to accept array argumentJeremy Evans
Implements [Feature #17838] https://github.com/ruby/set/commit/d9b389bafa
2021-05-10[ruby/set] Adding section: What's HereBurdette Lamar
https://github.com/ruby/set/commit/257dc452a7
2021-05-10[ruby/set] Adding section: What's HereBurdette Lamar
https://github.com/ruby/set/commit/8f4c62768d
2021-05-10[ruby/set] Adding section: What's HereBurdette Lamar
https://github.com/ruby/set/commit/254d927c8c
2021-05-10[ruby/set] Adding section: What's HereBurdette Lamar
https://github.com/ruby/set/commit/ab81354de1
2020-12-22Import set 1.0.1Akinori MUSHA
- Eliminate warnings - Convert rdoc to markdown
2020-12-22Import set 1.0.0Akinori MUSHA
- SortedSet has been removed for dependency and performance reasons. - Set#join is added as a shorthand for `.to_a.join`. - Set#<=> is added. https://github.com/ruby/set/blob/v1.0.0/CHANGELOG.md
2020-12-04[ruby/set] Add `Set#<=>`Marc-Andre Lafortune
https://github.com/ruby/set/commit/447974a374
2020-12-04[ruby/set] Remove SortedSet implementationsAkinori MUSHA
It required RBTree to perform decently and the external dependency was not suitable for a standard library. The pure ruby fallback implementation was originally meant to be an example of how to write a subclass of Set, and its poor performance was not suitable for use in production. I decided it should be distributed as an external library instead of bundling it with Set. https://github.com/ruby/set/commit/dfcc8e568b
2020-12-04[ruby/set] Resurrect support for Ruby 2.xAkinori MUSHA
In Ruby 2.x, initialize_copy does not take a freeze option. https://github.com/ruby/set/commit/3da6c309df
2020-06-11Make mutating the result of SortedSet#to_a not affect the setJeremy Evans
Fixes [Bug #15834] Notes: Merged: https://github.com/ruby/ruby/pull/3215
2020-03-22Support obj.clone(freeze: true) for freezing cloneJeremy Evans
This freezes the clone even if the receiver is not frozen. It is only for consistency with freeze: false not freezing the clone even if the receiver is frozen. Because Object#clone is now partially implemented in Ruby and not fully implemented in C, freeze: nil must be supported to provide the default behavior of only freezing the clone if the receiver is frozen. This requires modifying delegate and set, to set freeze: nil instead of freeze: true as the keyword parameter for initialize_clone. Those are the two libraries in stdlib that override initialize_clone. Implements [Feature #16175] Notes: Merged: https://github.com/ruby/ruby/pull/2969
2020-01-03Call initialize_clone with freeze: false if clone called with freeze: falseJeremy Evans
This makes it possible to initialize_clone to correctly not freeze internal state if the freeze: false keyword is passed to clone. If clone is called with freeze: true or no keyword, do not pass a second argument to initialize_clone to keep backwards compatibility. This makes it so that external libraries that override initialize_clone but do not support the freeze keyword will fail with ArgumentError if passing freeze: false to clone. I think that is better than the current behavior, which succeeds but results in an unfrozen object with frozen internals. Fix related issues in set and delegate in stdlib. Fixes [Bug #14266] Notes: Merged: https://github.com/ruby/ruby/pull/2816
2019-12-31speed up set intersectOleg Zubchenko
Notes: Merged: https://github.com/ruby/ruby/pull/2003
2019-11-18Deprecate taint/trust and related methods, and make the methods no-opsJeremy Evans
This removes the related tests, and puts the related specs behind version guards. This affects all code in lib, including some libraries that may want to support older versions of Ruby. Notes: Merged: https://github.com/ruby/ruby/pull/2476
2019-09-06Fix SortedSet subclasses that override initializeJeremy Evans
The first time SortedSet#initialize is called, it overwrites itself, then recalls #initialize, which results in calling the subclass's initialize, not the current initialize. Just inline the default initialize behavior to avoid this issue. No test for this as it can only be triggered the very first time that SortedSet#initialize is called. Fixes [Bug #15830]
2019-01-20raise FrozenError instead of RuntimeErrorkazu
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66875 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-25Add a new #filter alias for #selecteregon
* In Enumerable, Enumerator::Lazy, Array, Hash and Set [Feature #13784] [ruby-core:82285] * Share specs for the various #select#select! methods and reuse them for #filter/#filter!. * Add corresponding filter tests for select tests. * Update NEWS. [Fix GH-1824] From: Alexander Patrick <[email protected]> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-22lib/set.rb: [DOC] remove empty commentsstomar
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60881 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-22set.rb: improve docs for Setstomar
* lib/set.rb: [DOC] add examples for Set#replace, add examples for creating a set from a hash with duplicates, simplify and fix style of some other examples, fix typos. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60879 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-17Add examples to Set documentation [ci skip]knu
GitHub PR: https://github.com/ruby/ruby/pull/1752 [Fix GH-1752] Submitted by: @Ana06 <[email protected]> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60824 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-29lib/set.rb: improve docs for Set#===stomar
* lib/set.rb: [DOC] improve description and examples for Set#===. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60573 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-22Add `Set#reset`knu
This method resets the internal state of a set after modification to existing elements, reindexing and deduplicating them. [Feature #6589] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60360 b2dd03c8-39d4-4d8f-98ff-823fe69b080e