Project

General

Profile

Actions

Feature #1873

closed

MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Group

Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Group

Added by runpaint (Run Paint Run Run) over 16 years ago. Updated about 8 years ago.

Status:
Rejected
Target version:
[ruby-core:24732]

Description

=begin
I suspect that MatchData#[:symbol] should return an Array of values when the same named group has been matched multiple times.

m = 'food'.match(/(?oo)(?d)/)
=> #<MatchData "ood" f:"oo" f:"d">
m[:f]
=> "d"
m.to_a
=> ["ood", "oo", "d"]
=end


Files

re_named_values_at.patch.gz (1015 Bytes) re_named_values_at.patch.gz erikh (Erik Hollensbe), 08/16/2009 11:12 PM
re_all_values.patch.gz (1.39 KB) re_all_values.patch.gz erikh (Erik Hollensbe), 04/11/2012 06:34 PM

Updated by naruse (Yui NARUSE) over 16 years ago Actions #1

=begin
This request is in other words, enable capture history.

A-5. Disabled functions by default syntax
+ capture history
(?@...) and (?@...)
ex. /(?@a)*/.match("aaa") ==> [<0-1>, <1-2>, <2-3>]
see sample/listcap.c file.
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt
=end

Updated by runpaint (Run Paint Run Run) over 16 years ago Actions #2

=begin

This request is in other words, enable capture history.

I'm not sure I understand. The example shows that the history is already captured. Both the #inspect output and #to_a shows that the data is already stored inside the MatchData object; it's just not accessible with the #[Symbol] accessor.

IOW:

'abc'.match(/(?a)(?b)(?c)/)
=> #<MatchData "abc" a:"a" a:"b" a:"c">

Group 'a' is shown as having the values 'a', 'b', and 'c', but #[:a] only returns 'c':

'abc'.match(/(?a)(?b)(?c)/)[:a]
=> "c"
'abc'.match(/(?
a)(?b)(?c)/).captures
=> ["a", "b", "c"]

This doesn't seem like a disabled function to me; just an assumption that each group name will only appear once.
=end

Updated by naruse (Yui NARUSE) over 16 years ago Actions #3

=begin
Yes, of cource, your desiable API needs more implementation after the option enabled.

I said by the comment before, Ruby 1.9's regexp is based on Oniguruma. So if a function is already implemented in Oniguruma, the function is more easy to realize than functions which are not implemented in Oniguruma.
=end

Updated by runpaint (Run Paint Run Run) over 16 years ago Actions #4

=begin
Yui,

Thank you for your help. I didn't realise it would be difficult. I'd assumed that as:

/(?a)(?b)(?c)/.named_captures
=> {"a"=>[1, 2, 3]}

And:

'abc'.match(/(?a)(?b)(?c)/)[1..3]
=> ["a", "b", "c"]

It would be simply be a matter of mapping one to the other.

How about then if we set this ticket's priority to 'Low', then add a note to the documentation of MatchData#[] that explains this quirk? :-)

(As an aside that Oniguruma document would make a great start for the RDoc of Regexp. Currently the actual syntax of the patterns doesn't seem to appear anywhere in ri.)
=end

Updated by naruse (Yui NARUSE) over 16 years ago Actions #5

=begin
Oh sorry, i misunderstood. I think you wanted to access "b" with /(?\w)+/.match("abc").

Your proposal may be able to implement because those data is near by us. (you taught it by inspect data, sorry!)

Anyway however, changing return value from alwasy String to String or Array is difficult because of compatibility.
If you want access another matched string, suggest to add a new API to access them.
So what is the desirable API is the problem.
=end

Updated by runpaint (Run Paint Run Run) over 16 years ago Actions #6

=begin

Oh sorry, i misunderstood. I think you wanted to access "b" with /(?\w)+/.match("abc").

That's quite alright. :-)

Anyway however, changing return value from alwasy String to String or Array is difficult because of
compatibility.

Hmmm... That's unfortunate MatchData#[] is a lovely API. I guess one approach is to return an Array when there are multiple matches; a String otherwise. Anybody who currently relies on only the last match being returned is both in the minority and taking advantage of a bug. But I confess not to being overly fond of this solution. :-/ I'm not sure.
=end

Updated by naruse (Yui NARUSE) over 16 years ago Actions #7

=begin

one approach is to return an Array when there are multiple matches; a String otherwise

There are few APIs which returns different types except true/false or obj/nil.
This is because such API disturbs duck typing.

So new API which returns always Array seems the way.
=end

Updated by runpaint (Run Paint Run Run) over 16 years ago Actions #8

=begin
I think adding another method of this form to MatchData will be confusing. How about overloading MatchData#values_at ? It currently takes one or more integer indices and returns an Array of corresponding values. It could be modified to take a list of Symbols (and Strings, if it must) and return an Array of the matches. This is backward compatible, requires no new methods, and uses the same principle as MatchData#[], which previously only accepted Integer arguments, and now accepts Symbols/Strings, too.

>> 'haystack'.match(/(?<h>ay).+(?<h>ack)/).values_at(:h)
['ay','ack']
>> 'haystack'.match(/(?<h>ay).+(?<h>a(?<seek>ck))/).values_at(:seek)
['ck']
>> 'haystack'.match(/(?<h>ay).+(?<h>a(?<seek>ck))/).values_at(:seek, 'h')
['ck','ay','ack']

=end

Updated by erikh (Erik Hollensbe) over 16 years ago Actions #9

=begin
Attached is a patch which implements the overloaded #values_at functionality. If there is a problem, let me know and I'll alter it. Test cases and docs included.

For now, any information (be it named capture, index, or unexpected type) that doesn't yield information is effectively a no-op, is ignored and no result appears in the array. This is the behavior I noticed in other areas of the MatchData class and figured it was safest to honor that.
=end

Updated by runpaint (Run Paint Run Run) over 16 years ago Actions #10

=begin
Thanks, Erik. I tried the patch out and it works well. :-)
=end

Updated by mame (Yusuke Endoh) almost 16 years ago Actions #11

=begin
Hi,

How about overloading MatchData#values_at ?

Why do you attempt to reuse existing method? :-/
I think it is better to introduce new method like MatchData#all_values.

'haystack'.match(/(?ay).+(?a(?ck))/).values_at(:seek, 'h')
['ck','ay','ack']

I expect values_at returns an array whose length is equal to the number
of arguments.

By the way, I think it is strange (or even a bug) for MatchData#values_at
to reject Symbols.

--
Yusuke ENDOH
=end

Updated by znz (Kazuhiro NISHIYAMA) almost 16 years ago Actions #12

  • Target version set to 2.0.0

=begin

=end

Updated by nahi (Hiroshi Nakamura) almost 14 years ago Actions #13 [ruby-core:43386]

  • Description updated (diff)
  • Assignee set to naruse (Yui NARUSE)

Updated by naruse (Yui NARUSE) almost 14 years ago Actions #14 [ruby-core:43391]

  • Status changed from Open to Feedback

This feature itself is acceptable, but proposed method name (API) is not acceptable.

Updated by trans (Thomas Sawyer) almost 14 years ago Actions #15 [ruby-core:43446]

This is the first time I've seen regular expression groups, so it's interesting.

It occurs to me that with this addition MatchData is both a sort of Array and a sort of Hash. That being so consider md.to_h.

Updated by erikh (Erik Hollensbe) over 13 years ago Actions #16 [ruby-core:44286]

I've attached a new patch -- this implements the same functionality but refers to it as all_values and reverts the old changes to values_at. This is fundamentally the same functionality as values_at with the overridden functionality described in the ticket.

Sorry for the latency on this, it's been a crazy few years. :)

Tests pass, including the new ones.

Updated by Anonymous over 13 years ago Actions