Age | Commit message (Collapse) | Author |
|
`builder.pair_label` is no good since it makes use of variables that the parser gem encountered.
Since the prism translator doesn't keep proper track of that information, the following code interprets
the implicit value as a local variable, even though it is not in scope:
```rb
def foo
bar = 123
end
{ bar: }
```
https://github.com/ruby/prism/commit/bbeb5b083a
|
|
Turns out, it was already almost correct. If you disregard \c and \M style escapes, only a single character is allowed to be escaped in a regex so most tests passed already.
There was also a mistake where the wrong value was constructed for the ast, this is now fixed.
One test fails because of this, but I'm fairly sure it is because of a parser bug. For `/\“/`, the backslash is supposed to be removed because it is a multibyte character. But tbh,
I don't entirely understand all the rules.
Fixes more than half of the remaining ast differences for rubocop tests
|
|
When parent scopes around an eval are forwarding parameters (like
*, **, &, or ...) we need to know that information when we are in
the parser. As such, we need to support passing that information
into the scopes option. In order to do this, unfortunately we need
a bunch of changes.
The scopes option was previously an array of array of strings.
These corresponded to the names of the locals in the parent scopes.
We still support this, but now additionally support passing in a
Prism::Scope instance at each index in the array. This Prism::Scope
class holds both the names of the locals as well as an array of
forwarding parameter names (symbols corresponding to the forwarding
parameters). There is convenience function on the Prism module that
creates a Prism::Scope object using Prism.scope.
In JavaScript, we now additionally support an object much the same
as the Ruby side. In Java, we now have a ParsingOptions.Scope class
that holds that information. In the dump APIs, these objects in all
3 languages will add an additional byte for the forwarding flags in
the middle of the scopes serialization.
All of this is in service of properly parsing the following code:
```ruby
def foo(*) = eval("bar(*)")
```
https://github.com/ruby/prism/commit/21abb6b7c4
|
|
https://github.com/ruby/prism/commit/7a93a307ac
|
|
https://github.com/ruby/prism/commit/8ab2532f09
|
|
To make it so that you can pass `freeze: true` to Prism parse
methods and get back a deeply-frozen AST that is Ractor-
shareable.
https://github.com/ruby/prism/commit/8e6a93b2d2
|
|
It falsely considered it to be a single backtick command
https://github.com/ruby/prism/commit/dd762be590
|
|
```rb
<<~F
foo
#{}
bar
F
```
has zero common whitespace.
https://github.com/ruby/prism/commit/1f3c222a06
|
|
Tests worked around this but the incompatibility is not hard to fix.
This fixes 17 token incompatibilies in tests here that were previously passing
https://github.com/ruby/prism/commit/101962526d
|
|
https://github.com/ruby/prism/commit/8d9d429155
|
|
multiline bytes
A rather silly issue with a rather simple fix.
The ranges already use the offset cache, this effectivly double-encoded them.
https://github.com/ruby/prism/commit/66b65634c0
|
|
In https://github.com/ruby/prism/pull/3393 I made a mistake.
When there is no previous token, it wraps around to -1. Oops
Additionally, if a comment has no newline then the offset should be kept as is
https://github.com/ruby/prism/commit/3c266f1de4
|
|
Skipping detecting the encoding is almost always right, just for binary it should actually happen.
A symbol containing escapes that are invalid
in utf-8 would fail to parse since symbols must be valid in the script encoding.
Additionally, the parser gem would raise an exception somewhere during string handling
https://github.com/ruby/prism/commit/fa0154d9e4
|
|
There appear to be a bunch of rules, changing behaviour for
inline comments, multiple comments after another, etc.
This seems to line up with reality pretty closely, token differences for RuboCop tests go from 1129 to 619 which seems pretty impressive
https://github.com/ruby/prism/commit/2e1b92670c
|
|
`not foo` should be `!foo`
`not()` should be `!nil`
Fixes [Bug #21027]
https://github.com/ruby/prism/commit/871ed4b462
|
|
strings and word arrays
These are not line continuations. They either should be taken literally,
or allow the word array to contain the following whitespace (newlines in this case)
Before:
```
0...1: tSTRING_BEG => "'"
1...12: tSTRING_CONTENT => "foobar\\\n"
12...16: tSTRING_CONTENT => "baz\n"
16...17: tSTRING_END => "'"
17...18: tNL => nil
```
After:
```
0...1: tSTRING_BEG => "'"
1...6: tSTRING_CONTENT => "foo\\\n"
6...12: tSTRING_CONTENT => "bar\\\n"
12...16: tSTRING_CONTENT => "baz\n"
16...17: tSTRING_END => "'"
17...18: tNL => nil
```
https://github.com/ruby/prism/commit/b6554ad64e
|
|
This leaves `\c` and `\M` escaping but I don't understand how these should even work yet. Maybe later.
https://github.com/ruby/prism/commit/13db3e8cb9
|
|
translator
This is a followup to #3373, where the implementation
was extracted
https://github.com/ruby/prism/commit/2637007929
|
|
translator
Much of this logic should be shared between interpolated symbols and regexps.
It's also incorrect when the node contains a literal `\\n` (same as for plain string nodes at the moment)
https://github.com/ruby/prism/commit/561914f99b
|
|
In that specific case, no string node is emitted
https://github.com/ruby/prism/commit/1166db13dd
|
|
Turns out, the vast majority of work was already done with handling the same for heredocs
I'm confident this should also apply to actual string nodes (there's even a todo for it) but
no tests change if I apply it there too, so I can't say for sure if the logic would be correct.
The individual test files are a bit too large, maybe something else would break that currently passes.
Leaving it for later to look more closely into that.
https://github.com/ruby/prism/commit/6bba1c54e1
|
|
blocks/lambda
Blocks and lambdas inherit anonymous arguments from the method they are a part of.
They themselves don't allow to introduce new anonymous arguments.
While you can write this:
```rb
def foo(*)
bar { |**| }
end
```
referecing the new parameter inside of the block will always be a syntax error.
https://github.com/ruby/prism/commit/2cbd27e134
|
|
The offset cache contains an entry for each byte so it can't be accessed via the string length.
Adds tests for all variants except for this:
```
"fo
o" "ba
’"
```
For some reason, this still has the wrong offset.
https://github.com/ruby/prism/commit/a651126458
|
|
https://github.com/ruby/prism/commit/a679ee0e5c
|
|
https://github.com/ruby/prism/commit/6b6aa05bfb
|
|
Heredocs that contain "\\n" don't start a new string node.
https://github.com/ruby/prism/commit/61d9d3a15e
|
|
Follow up https://github.com/ruby/prism/pull/3336.
Development for Ruby 3.5 has begun on the master branch:
https://github.com/ruby/ruby/commit/2f064b3b4b71f9495bbc4229e7efdbfad494862f
https://github.com/ruby/prism/commit/aa49c1bd78
|
|
https://github.com/ruby/prism/commit/b283a72c88
Notes:
Merged: https://github.com/ruby/ruby/pull/12358
|
|
https://github.com/ruby/prism/commit/34efacc618
Notes:
Merged: https://github.com/ruby/ruby/pull/12358
|
|
https://github.com/ruby/prism/commit/9686897290
Notes:
Merged: https://github.com/ruby/ruby/pull/12358
|
|
https://github.com/ruby/prism/commit/817a8e39d9
Notes:
Merged: https://github.com/ruby/ruby/pull/12358
|
|
https://github.com/ruby/prism/commit/f80026883d
Notes:
Merged: https://github.com/ruby/ruby/pull/12358
|
|
https://github.com/ruby/prism/commit/230c8b8a48
|
|
default gem
* This is notably necessary on TruffleRuby, which is updating to Ruby 3.3 which introduces Prism as a default gem.
* Using the existing path is not an option as it would end up in truffleruby/lib/build/libprism.so and
"truffleruby/lib/include/#{header}" which are not good places for such files.
https://github.com/ruby/prism/commit/5d16473e69
|
|
https://github.com/ruby/prism/commit/5ea6042408
|
|
Introduce StringQuery to provide methods to access some metadata
about the Ruby lexer.
https://github.com/ruby/prism/commit/d3f55b67b9
|
|
Calculating code unit offsets for a source can be very expensive,
especially when the source is large. This commit introduces a new
class that wraps the source and desired encoding into a cache that
reuses pre-computed offsets. It performs quite a bit better.
There are still some problems with this approach, namely character
boundaries and the fact that the cache is unbounded, but both of
these may be addressed in subsequent commits.
https://github.com/ruby/prism/commit/2e3e1a4d4d
|
|
https://github.com/ruby/prism/commit/343197e4ff
|
|
https://github.com/ruby/prism/commit/25a4cf6794
Co-authored-by: Kevin Newton <[email protected]>
|
|
Followup to https://github.com/ruby/prism/pull/3079
https://github.com/ruby/prism/commit/68f434e356
|
|
continuation
https://github.com/ruby/prism/commit/84a9251915
|
|
https://github.com/ruby/prism/commit/098f1c4607
|
|
https://github.com/ruby/prism/commit/a4fcd5339a
|
|
`Prism::Translation::Parser::Lexer`
## Summary
This PR fixes `kDO_LAMBDA` token incompatibility between Parser gem and `Prism::Translation::Parser` for lambda `do` block.
### Parser gem (Expected)
Returns `kDO_LAMBDA` token:
```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> do end"; p Parser::Ruby33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master https://github.com/ruby/prism/commit/eb144ef91e) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:kDO_LAMBDA, ["do", #<Parser::Source::Range example.rb 3...5>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 6...9>]]]
```
### `Prism::Translation::Parser` (Actual)
Previously, the parser returned `kDO` token when parsing the following:
```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> do end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master https://github.com/ruby/prism/commit/eb144ef91e) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:kDO, ["do", #<Parser::Source::Range example.rb 3...5>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 6...9>]]]
```
After the update, the parser now returns `kDO_LAMBDA` token for the same input:
```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> do end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master https://github.com/ruby/prism/commit/eb144ef91e) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:kDO_LAMBDA, ["do", #<Parser::Source::Range example.rb 3...5>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 6...9>]]]
```
## Additional Information
Unfortunately, this kind of edge case doesn't work as expected; `kDO` is returned instead of `kDO_LAMBDA`.
However, since `kDO` is already being returned in this case, there is no change in behavior.
### Parser gem
Returns `tLAMBDA` token:
```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> (foo = -> (bar) {}) do end"; p Parser::Ruby33.new.tokenize(buf)[2]'
ruby 3.3.5 (2024-09-03 revision https://github.com/ruby/prism/commit/ef084cc8f4) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 3...4>]],
[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 4...7>]], [:tEQL, ["=", #<Parser::Source::Range example.rb 8...9>]],
[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 10...12>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 13...14>]],
[:tIDENTIFIER, ["bar", #<Parser::Source::Range example.rb 14...17>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 17...18>]],
[:tLAMBEG, ["{", #<Parser::Source::Range example.rb 19...20>]], [:tRCURLY, ["}", #<Parser::Source::Range example.rb 20...21>]],
[:tRPAREN, [")", #<Parser::Source::Range example.rb 21...22>]], [:kDO_LAMBDA, ["do", #<Parser::Source::Range example.rb 23...25>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 26...29>]]]
```
### `Prism::Translation::Parser`
Returns `kDO` token:
```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> (foo = -> (bar) {}) do end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.3.5 (2024-09-03 revision https://github.com/ruby/prism/commit/ef084cc8f4) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 3...4>]],
[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 4...7>]], [:tEQL, ["=", #<Parser::Source::Range example.rb 8...9>]],
[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 10...12>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 13...14>]],
[:tIDENTIFIER, ["bar", #<Parser::Source::Range example.rb 14...17>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 17...18>]],
[:tLAMBEG, ["{", #<Parser::Source::Range example.rb 19...20>]], [:tRCURLY, ["}", #<Parser::Source::Range example.rb 20...21>]],
[:tRPAREN, [")", #<Parser::Source::Range example.rb 21...22>]], [:kDO, ["do", #<Parser::Source::Range example.rb 23...25>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 26...29>]]]
```
As the intention is not to address such special cases at this point, a comment has been left indicating that this case still returns `kDO`.
In other words, `kDO_LAMBDA` will now be returned except for edge cases after this PR.
https://github.com/ruby/prism/commit/2ee480654c
|
|
https://github.com/ruby/prism/commit/b28877fa4f
|
|
Fixes [Bug #20744]
https://github.com/ruby/prism/commit/f1b8b1b2a2
|
|
https://github.com/ruby/prism/commit/0b527ca93f
|
|
https://github.com/ruby/prism/commit/d68ea29d04
Notes:
Merged: https://github.com/ruby/ruby/pull/11497
|
|
This PR fixes a token incompatibility between Parser gem and `Prism::Translation::Parser` for double splat argument.
## Parser gem (Expected)
Returns `tDSTAR` token:
```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "def f(**foo) end"; p Parser::Ruby33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master https://github.com/ruby/prism/commit/eb144ef91e) [x86_64-darwin23]
[[:kDEF, ["def", #<Parser::Source::Range example.rb 0...3>]], [:tIDENTIFIER, ["f", #<Parser::Source::Range example.rb 4...5>]],
[:tLPAREN2, ["(", #<Parser::Source::Range example.rb 5...6>]], [:tDSTAR, ["**", #<Parser::Source::Range example.rb 6...8>]],
[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 8...11>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 11...12>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 13...16>]]]
```
## `Prism::Translation::Parser` (Actual)
Previously, the parser returned `tPOW` token when parsing the following:
```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "def f(**foo) end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master https://github.com/ruby/prism/commit/eb144ef91e) [x86_64-darwin23]
[[:kDEF, ["def", #<Parser::Source::Range example.rb 0...3>]], [:tIDENTIFIER, ["f", #<Parser::Source::Range example.rb 4...5>]],
[:tLPAREN2, ["(", #<Parser::Source::Range example.rb 5...6>]], [:tPOW, ["**", #<Parser::Source::Range example.rb 6...8>]],
[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 8...11>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 11...12>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 13...16>]]]
```
After the update, the parser now returns `tDSTAR` token for the same input:
```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "def f(**foo) end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master https://github.com/ruby/prism/commit/eb144ef91e) [x86_64-darwin23]
[[:kDEF, ["def", #<Parser::Source::Range example.rb 0...3>]], [:tIDENTIFIER, ["f", #<Parser::Source::Range example.rb 4...5>]],
[:tLPAREN2, ["(", #<Parser::Source::Range example.rb 5...6>]], [:tDSTAR, ["**", #<Parser::Source::Range example.rb 6...8>]],
[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 8...11>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 11...12>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 13...16>]]]
```
With this change, the following code could be removed from test/prism/ruby/parser_test.rb:
```diff
- when :tPOW
- actual_token[0] = expected_token[0] if expected_token[0] == :tDSTAR
```
`tPOW` is the token type for the behavior of `a ** b`, and its behavior remains unchanged:
```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "a ** b"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master https://github.com/ruby/prism/commit/eb144ef91e) [x86_64-darwin23]
[[:tIDENTIFIER, ["a", #<Parser::Source::Range example.rb 0...1>]], [:tPOW, ["**", #<Parser::Source::Range example.rb 2...4>]],
[:tIDENTIFIER, ["b", #<Parser::Source::Range example.rb 5...6>]]]
```
https://github.com/ruby/prism/commit/66bde35a44
|
|
This PR fixes a token incompatibility between Parser gem and `Prism::Translation::Parser` for left parenthesis.
## Parser gem (Expected)
Returns `tLPAREN2` token:
```console
$ bundle exec ruby -Ilib -rparser/ruby33 \
-ve 'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "foo(:bar)"; p Parser::Ruby33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master https://github.com/ruby/prism/commit/eb144ef91e) [x86_64-darwin23]
[[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 0...3>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 3...4>]],
[:tSYMBOL, ["bar", #<Parser::Source::Range example.rb 4...8>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 8...9>]]]
```
## `Prism::Translation::Parser` (Actual)
Previously, the parser returned `tLPAREN` token when parsing the following:
```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "foo(:bar)"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master https://github.com/ruby/prism/commit/eb144ef91e) [x86_64-darwin23]
[[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 0...3>]], [:tLPAREN, ["(", #<Parser::Source::Range example.rb 3...4>]],
[:tSYMBOL, ["bar", #<Parser::Source::Range example.rb 4...8>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 8...9>]]]
```
After the update, the parser now returns `tLPAREN2` token for the same input:
```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "foo(:bar)"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master https://github.com/ruby/prism/commit/eb144ef91e) [x86_64-darwin23]
[[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 0...3>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 3...4>]],
[:tSYMBOL, ["bar", #<Parser::Source::Range example.rb 4...8>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 8...9>]]]
```
The `PARENTHESIS_LEFT` token in Prism is classified as either `tLPAREN` or `tLPAREN2` in the Parser gem.
The tokens that were previously all classified as `tLPAREN` are now also classified to `tLPAREN2`.
With this change, the following code could be removed from `test/prism/ruby/parser_test.rb`:
```diff
- when :tLPAREN
- actual_token[0] = expected_token[0] if expected_token[0] == :tLPAREN2
```
https://github.com/ruby/prism/commit/04d6f3478d
|