ruby-core

Issue #16897 has been updated by jeremyevans0 (Jeremy Evans).


 As @Eregon mentioned, the `***a` approach is likely to be the same speed or slower than `*args, **kw` approach on CRuby as it has to allocate at least as many objects.  It could be theoretically possible to increase performance by reducing allocations in the following cases:
 
* 0-3 arguments with no keywords
* 0-1 arguments with 1 keyword

You could do this by storing the arguments inside the object, similar to how arrays and hashes are optimized internally.  Those cases would allow you to get by with a single object allocation instead of allocating two objects (array+hash), assuming that the caller side is not doing any object allocation.  All other cases would be as slow or slower.

This approach would only be faster if you never needed to access the arguments or keywords passed.  As soon as you need access to the arguments or keywords, it would likely be slower as it would have to allocate an array or hash for them.  This limits the usefulness of the approach to specific cases.

When you compare `***a` to `ruby2_keywords`, which is currently the fastest approach, the cases where it could theoretically be faster I believe are limited to 0-1 arguments with 1 keyword.

This approach will increase complexity in an already complex system.  It would be significant undertaking to implement, and it's not clear it would provide a net performance improvement.

It is true that supporting `ruby2_keywords` makes `*args` calls without keywords slower.  I think the maximum slowdown was around 10%, and that was when the callee did not accept a splat or keywords. When the callee accepted a splat or keywords, I think the slowdown was around 1%.  However, as `ruby2_keywords` greatly speeds up delegation (see below), `ruby2_keywords` results in a net increase in performance in the majority of cases.  Until `ruby2_keywords` no longer results in a net increase in performance in the majority of cases, I believe it should stay.

Here's a benchmark showing a 160% improvement in delegation performance in master by using `ruby2_keywords` instead of `*args, **kw`:

```ruby
def m1(arg) end
def m2(*args) end
def m3(arg, k: 1) end
def m4(*args, k: 1) end
def m5(arg, **kw) end
def m6(*args, **kw) end

ruby2_keywords def d1(*args)
  m2(*args);m2(*args);m2(*args);m2(*args);m2(*args);
  m3(*args);m3(*args);m3(*args);m3(*args);m3(*args);
  m4(*args);m4(*args);m4(*args);m4(*args);m4(*args);
  m5(*args);m5(*args);m5(*args);m5(*args);m5(*args);
  m6(*args);m6(*args);m6(*args);m6(*args);m6(*args);
end
ruby2_keywords def d1a(*args)
  m1(*args);m1(*args);m1(*args);m1(*args);m1(*args);
end

def d2(*args, **kw)
  m2(*args, **kw);m2(*args, **kw);m2(*args, **kw);m2(*args, **kw);m2(*args, **kw);
  m3(*args, **kw);m3(*args, **kw);m3(*args, **kw);m3(*args, **kw);m3(*args, **kw);
  m4(*args, **kw);m4(*args, **kw);m4(*args, **kw);m4(*args, **kw);m4(*args, **kw);
  m5(*args, **kw);m5(*args, **kw);m5(*args, **kw);m5(*args, **kw);m5(*args, **kw);
  m6(*args, **kw);m6(*args, **kw);m6(*args, **kw);m6(*args, **kw);m6(*args, **kw);
end
def d2a(*args, **kw)
  m1(*args, **kw);m1(*args, **kw);m1(*args, **kw);m1(*args, **kw);m1(*args, **kw);
end

require 'benchmark'

print "ruby2_keywords: "
puts(Benchmark.measure do
  100000.times do
    d1a(1)
    d1(1, k: 1)
  end
end)

print "   *args, **kw: "
puts(Benchmark.measure do
  100000.times do
    d2a(1)
    d2(1, k: 1)
  end
end)
```

Results:

```
ruby2_keywords:   1.350000   0.000000   1.350000 (  1.395517)
   *args, **kw:   3.630000   0.000000   3.630000 (  3.693702)
```

----------------------------------------
Feature #16897: Can a Ruby 3.0 compatible general purpose memoizer be written in such a way that it matches Ruby 2 performance?
https://bugs.ruby-lang.org/issues/16897#change-86000

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
```ruby
require 'benchmark/ips'

module Memoizer
def memoize_26(method_name)
  cache = {}

  uncached = "#{method_name}_without_cache"
  alias_method uncached, method_name

  define_method(method_name) do |*arguments|
    found = true
    data = cache.fetch(arguments) { found = false }
    unless found
      cache[arguments] = data = public_send(uncached, *arguments)
    end
    data
  end
end

  def memoize_27(method_name)
    cache = {}

    uncached = "#{method_name}_without_cache"
    alias_method uncached, method_name

    define_method(method_name) do |*args, **kwargs|
      found = true
      all_args = [args, kwargs]
      data = cache.fetch(all_args) { found = false }
      unless found
        cache[all_args] = data = public_send(uncached, *args, **kwargs)
      end
      data
    end
  end

  def memoize_27_v2(method_name)
    uncached = "#{method_name}_without_cache"
    alias_method uncached, method_name

    cache = "MEMOIZE_#{method_name}"

    params = instance_method(method_name).parameters
    has_kwargs = params.any? {|t, name| "#{t}".start_with? "key"}
    has_args = params.any? {|t, name| !"#{t}".start_with? "key"}

    args = []

    args << "args" if has_args
    args << "kwargs" if has_kwargs

    args_text = args.map do |n|
      n == "args" ? "*args" : "**kwargs"
    end.join(",")

    class_eval <<~RUBY
      #{cache} = {}
      def #{method_name}(#{args_text})
        found = true
        all_args = #{args.length === 2 ? "[args, kwargs]" : args[0]}
        data = #{cache}.fetch(all_args) { found = false }
        unless found
          #{cache}[all_args] = data = public_send(:#{uncached} #{args.empty? ? "" : ", #{args_text}"})
        end
        data
      end
    RUBY

  end

end

module Methods
  def args_only(a, b)
    sleep 0.1
    "#{a} #{b}"
  end

  def kwargs_only(a:, b: nil)
    sleep 0.1
    "#{a} #{b}"
  end

  def args_and_kwargs(a, b:)
    sleep 0.1
    "#{a} #{b}"
  end
end

class OldMethod
  extend Memoizer
  include Methods

  memoize_26 :args_and_kwargs
  memoize_26 :args_only
  memoize_26 :kwargs_only
end

class NewMethod
  extend Memoizer
  include Methods

  memoize_27 :args_and_kwargs
  memoize_27 :args_only
  memoize_27 :kwargs_only
end

class OptimizedMethod
  extend Memoizer
  include Methods

  memoize_27_v2 :args_and_kwargs
  memoize_27_v2 :args_only
  memoize_27_v2 :kwargs_only
end

OptimizedMethod.new.args_only(1,2)


methods = [
  OldMethod.new,
  NewMethod.new,
  OptimizedMethod.new
]

Benchmark.ips do |x|
  x.warmup = 1
  x.time = 2

  methods.each do |m|
    x.report("#{m.class} args only") do |times|
      while times > 0
        m.args_only(10, b: 10)
        times -= 1
      end
    end

    x.report("#{m.class} kwargs only") do |times|
      while times > 0
        m.kwargs_only(a: 10, b: 10)
        times -= 1
      end
    end

    x.report("#{m.class} args and kwargs") do |times|
      while times > 0
        m.args_and_kwargs(10, b: 10)
        times -= 1
      end
    end
  end

  x.compare!
end


# # Ruby 2.6.5
# #
# OptimizedMethod args only:   974266.9 i/s
#  OldMethod args only:   949344.9 i/s - 1.03x  slower
# OldMethod args and kwargs:   945951.5 i/s - 1.03x  slower
# OptimizedMethod kwargs only:   939160.2 i/s - 1.04x  slower
# OldMethod kwargs only:   868229.3 i/s - 1.12x  slower
# OptimizedMethod args and kwargs:   751797.0 i/s - 1.30x  slower
#  NewMethod args only:   730594.4 i/s - 1.33x  slower
# NewMethod args and kwargs:   727300.5 i/s - 1.34x  slower
# NewMethod kwargs only:   665003.8 i/s - 1.47x  slower
#
# #
# # Ruby 2.7.1
#
# OptimizedMethod kwargs only:  1021707.6 i/s
# OptimizedMethod args only:   955694.6 i/s - 1.07x  (0.00) slower
# OldMethod args and kwargs:   940911.3 i/s - 1.09x  (ア 0.00) slower
#  OldMethod args only:   930446.1 i/s - 1.10x  (ア 0.00) slower
# OldMethod kwargs only:   858238.5 i/s - 1.19x  (ア 0.00) slower
# OptimizedMethod args and kwargs:   773773.5 i/s - 1.32x  (ア 0.00) slower
# NewMethod args and kwargs:   772653.3 i/s - 1.32x  (ア 0.00) slower
#  NewMethod args only:   771253.2 i/s - 1.32x  (ア 0.00) slower
# NewMethod kwargs only:   700604.1 i/s - 1.46x  (ア 0.00) slower
```

The bottom line is that a generic delegator often needs to make use of all the arguments provided to a method.

```ruby
def count(*args, **kwargs)
  counter[[args, kwargs]] += 1
  orig_count(*args, **kwargs)
end
```

The old pattern meant we could get away with one less array allocation per:

```ruby
def count(*args)
  counter[args] += 1
  orig_count(*args, **kwargs)
end
```

I would like to propose some changes to Ruby 3 to allow to recover this performance. 

Perhaps:

```ruby
def count(...)
  args = ...
  counter[args] += 1
  orig_count(...)
end
```

Or:

```ruby
def count(***args)

  counter[args] += 1
  orig_count(***args)
end
```

Thoughts? 



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:[email protected]?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

Thread

Prev Next

In This Thread

Prev Next