Feature #1873
closedMatchData#[]: Omits All But Last Captures Corresponding to the Same Named Group
Description
=begin
I suspect that MatchData#[:symbol] should return an Array of values when the same named group has been matched multiple times.
m = 'food'.match(/(?oo)(?d)/)
=> #<MatchData "ood" f:"oo" f:"d">
m[:f]
=> "d"
m.to_a
=> ["ood", "oo", "d"]
=end
Files
Updated by naruse (Yui NARUSE) over 16 years ago
=begin
This request is in other words, enable capture history.
A-5. Disabled functions by default syntax
+ capture history
(?@...) and (?@...)
ex. /(?@a)*/.match("aaa") ==> [<0-1>, <1-2>, <2-3>]
see sample/listcap.c file.
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt
=end
Updated by runpaint (Run Paint Run Run) over 16 years ago
=begin
This request is in other words, enable capture history.
I'm not sure I understand. The example shows that the history is already captured. Both the #inspect output and #to_a shows that the data is already stored inside the MatchData object; it's just not accessible with the #[Symbol] accessor.
IOW:
'abc'.match(/(?a)(?b)(?c)/)
=> #<MatchData "abc" a:"a" a:"b" a:"c">
Group 'a' is shown as having the values 'a', 'b', and 'c', but #[:a] only returns 'c':
'abc'.match(/(?a)(?b)(?c)/)[:a]
=> "c"
'abc'.match(/(?a)(?b)(?c)/).captures
=> ["a", "b", "c"]
This doesn't seem like a disabled function to me; just an assumption that each group name will only appear once.
=end
Updated by naruse (Yui NARUSE) over 16 years ago
=begin
Yes, of cource, your desiable API needs more implementation after the option enabled.
I said by the comment before, Ruby 1.9's regexp is based on Oniguruma. So if a function is already implemented in Oniguruma, the function is more easy to realize than functions which are not implemented in Oniguruma.
=end
Updated by runpaint (Run Paint Run Run) over 16 years ago
=begin
Yui,
Thank you for your help. I didn't realise it would be difficult. I'd assumed that as:
/(?a)(?b)(?c)/.named_captures
=> {"a"=>[1, 2, 3]}
And:
'abc'.match(/(?a)(?b)(?c)/)[1..3]
=> ["a", "b", "c"]
It would be simply be a matter of mapping one to the other.
How about then if we set this ticket's priority to 'Low', then add a note to the documentation of MatchData#[] that explains this quirk? :-)
(As an aside that Oniguruma document would make a great start for the RDoc of Regexp. Currently the actual syntax of the patterns doesn't seem to appear anywhere in ri.)
=end
Updated by naruse (Yui NARUSE) over 16 years ago
=begin
Oh sorry, i misunderstood. I think you wanted to access "b" with /(?\w)+/.match("abc").
Your proposal may be able to implement because those data is near by us. (you taught it by inspect data, sorry!)
Anyway however, changing return value from alwasy String to String or Array is difficult because of compatibility.
If you want access another matched string, suggest to add a new API to access them.
So what is the desirable API is the problem.
=end
Updated by runpaint (Run Paint Run Run) over 16 years ago
=begin
Oh sorry, i misunderstood. I think you wanted to access "b" with /(?\w)+/.match("abc").
That's quite alright. :-)
Anyway however, changing return value from alwasy String to String or Array is difficult because of
compatibility.
Hmmm... That's unfortunate MatchData#[] is a lovely API. I guess one approach is to return an Array when there are multiple matches; a String otherwise. Anybody who currently relies on only the last match being returned is both in the minority and taking advantage of a bug. But I confess not to being overly fond of this solution. :-/ I'm not sure.
=end
Updated by naruse (Yui NARUSE) over 16 years ago
=begin
one approach is to return an Array when there are multiple matches; a String otherwise
There are few APIs which returns different types except true/false or obj/nil.
This is because such API disturbs duck typing.
So new API which returns always Array seems the way.
=end
Updated by runpaint (Run Paint Run Run) over 16 years ago
=begin
I think adding another method of this form to MatchData will be confusing. How about overloading MatchData#values_at ? It currently takes one or more integer indices and returns an Array of corresponding values. It could be modified to take a list of Symbols (and Strings, if it must) and return an Array of the matches. This is backward compatible, requires no new methods, and uses the same principle as MatchData#[], which previously only accepted Integer arguments, and now accepts Symbols/Strings, too.
>> 'haystack'.match(/(?<h>ay).+(?<h>ack)/).values_at(:h)
['ay','ack']
>> 'haystack'.match(/(?<h>ay).+(?<h>a(?<seek>ck))/).values_at(:seek)
['ck']
>> 'haystack'.match(/(?<h>ay).+(?<h>a(?<seek>ck))/).values_at(:seek, 'h')
['ck','ay','ack']
=end
Updated by erikh (Erik Hollensbe) over 16 years ago
=begin
Attached is a patch which implements the overloaded #values_at functionality. If there is a problem, let me know and I'll alter it. Test cases and docs included.
For now, any information (be it named capture, index, or unexpected type) that doesn't yield information is effectively a no-op, is ignored and no result appears in the array. This is the behavior I noticed in other areas of the MatchData class and figured it was safest to honor that.
=end
Updated by runpaint (Run Paint Run Run) over 16 years ago
=begin
Thanks, Erik. I tried the patch out and it works well. :-)
=end
Updated by mame (Yusuke Endoh) almost 16 years ago
=begin
Hi,
How about overloading MatchData#values_at ?
Why do you attempt to reuse existing method? :-/
I think it is better to introduce new method like MatchData#all_values.
'haystack'.match(/(?ay).+(?a(?ck))/).values_at(:seek, 'h')
['ck','ay','ack']
I expect values_at returns an array whose length is equal to the number
of arguments.
By the way, I think it is strange (or even a bug) for MatchData#values_at
to reject Symbols.
--
Yusuke ENDOH [email protected]
=end
Updated by znz (Kazuhiro NISHIYAMA) almost 16 years ago
- Target version set to 2.0.0
=begin
=end
Updated by nahi (Hiroshi Nakamura) almost 14 years ago
- Description updated (diff)
- Assignee set to naruse (Yui NARUSE)
Updated by naruse (Yui NARUSE) almost 14 years ago
- Status changed from Open to Feedback
This feature itself is acceptable, but proposed method name (API) is not acceptable.
Updated by trans (Thomas Sawyer) almost 14 years ago
This is the first time I've seen regular expression groups, so it's interesting.
It occurs to me that with this addition MatchData is both a sort of Array and a sort of Hash. That being so consider md.to_h.
Updated by erikh (Erik Hollensbe) over 13 years ago
- File re_all_values.patch.gz re_all_values.patch.gz added
I've attached a new patch -- this implements the same functionality but refers to it as all_values and reverts the old changes to values_at. This is fundamentally the same functionality as values_at with the overridden functionality described in the ticket.
Sorry for the latency on this, it's been a crazy few years. :)
Tests pass, including the new ones.