Commit Graph

23 Commits

Author SHA1 Message Date
Haojian Wu 1828c75d5f [pseudo] Apply the function-declarator to member functions.
A followup patch of d489b3807f, but for
member functions, this will eliminate a false parse of member
declaration.

Differential Revision: https://reviews.llvm.org/D131720
2022-08-12 13:49:01 +02:00
Haojian Wu a1a1a78ac8 [pseudo] Eliminate an ambiguity for the empty member declaration.
We happened to introduce a `member-declaration := ;` rule
when inlining the `member-declaration := decl-specifier-seq_opt
member-declarator-list_opt ;`.
And with the `member-declaration := empty-declaration` rule, we had two parses of `;`.

This patch is to restrict the grammar to eliminate the
`member-declaration := ;` rule.

Differential Revision: https://reviews.llvm.org/D131724
2022-08-12 13:46:26 +02:00
Haojian Wu 6f6c40a875 [pseudo] Eliminate the false `::` nested-name-specifier ambiguity
The solution is to favor the longest possible nest-name-specifier, and
drop other alternatives by using the guard, per per C++ [basic.lookup.qual.general].

Motivated cases:

```
Foo::Foo() {};
// the constructor can be parsed as:
//  - Foo ::Foo(); // where the first Foo is return-type, and ::Foo is the function declarator
//  + Foo::Foo(); // where Foo::Foo is the function declarator
```

```
void test() {

// a very slow parsing case when there are many qualifers!
X::Y::Z;
// The statement can be parsed as:
//  - X ::Y::Z; // ::Y::Z is the declarator
//  - X::Y ::Z; // ::Z is the declarator
//  + X::Y::Z;  // a declaration without declarator (X::Y::Z is decl-specifier-seq)
//  + X::Y::Z;  // a qualifed-id expression
}
```

Differential Revision: https://reviews.llvm.org/D130511
2022-07-28 11:01:15 +02:00
Sam McCall b2b993a6ae [pseudo] Eliminate multiple-specified-types ambiguities using guards
Motivating case: `foo bar;` is not a declaration of nothing with `foo` and `bar`
both types.

This is a common and critical ambiguity, clangd/AST.cpp has 20% fewer
ambiguous nodes (1674->1332) after this change.

Differential Revision: https://reviews.llvm.org/D130337
2022-07-25 12:57:07 +02:00
Haojian Wu 2a88fb2ecb [pseudo] Eliminate the dangling-else syntax ambiguity.
- the grammar ambiguity is eliminated by a guard;
- modify the guard function signatures, now all parameters are folded in
  to a single object, avoid a long parameter list (as we will add more
  parameters in the near future);

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D130160
2022-07-22 09:13:09 +02:00
Sam McCall 3132e9cd7c [pseudo] Key guards by RuleID, add guards to literals (and 0).
After this, NUMERIC_CONSTANT and strings should parse only one way.

There are 8 types of literals, and 24 valid (literal, TokenKind) pairs.
This means adding 8 new named guards (or 24, if we want to assert the token).

It seems fairly clear to me at this point that the guard names are unneccesary
indirection: the guards are in fact coupled to the rule signature.

(Also add the zero guard I forgot in the previous patch.)

Differential Revision: https://reviews.llvm.org/D130066
2022-07-21 22:42:31 +02:00
Haojian Wu d489b3807f [pseudo] Implement a guard to determine function declarator.
This eliminates some simple-declaration/function-definition false
parses.

- implement a function to determine whether a declarator ForestNode is a
  function declarator;
- extend the standard declarator to two guarded function-declarator and
  non-function-declarator nonterminals;

Differential Revision: https://reviews.llvm.org/D129222
2022-07-19 09:44:45 +02:00
Haojian Wu b94ea8b3eb [pseudo] Add bracket recovery for function parameters. 2022-07-18 10:23:15 +02:00
Sam McCall 7d8e2742d9 [pseudo] Define recovery strategy as grammar extension.
Differential Revision: https://reviews.llvm.org/D129158
2022-07-06 15:03:38 +02:00
Sam McCall 3121167488 [pseudo] Add error-recovery framework & brace-based recovery
The idea is:

- a parse failure is detected when all heads die when trying to shift the next token
- we can recover by choosing a nonterminal we're partway through parsing, and
  determining where it ends through nonlocal means (e.g. matching brackets)
- we can find candidates by walking up the stack from the (ex-)heads
- the token range is defined using heuristics attached to grammar rules
- the unparsed region is represented in the forest by an Opaque node

This patch has the core GLR functionality.
It does not allow recovery heuristics to be attached as extensions to
the grammar, but rather infers a brace-based heuristic.

Expected followups:

- make recovery heuristics grammar extensions (depends on D127448)
- add recovery to our grammar for bracketed constructs and sequence nodes
- change the structure of our augmented `_ := start` rules to eliminate some
  special-cases in glrParse.
- (if I can work out how): avoid some spurious recovery cases described in comments

(Previously mistakenly committed as a0f4c10ae2)

Differential Revision: https://reviews.llvm.org/D128486
2022-07-05 20:49:41 +02:00
Haojian Wu 9ab67cc8bf [pseudo] Implement guard extension.
- Extend the GLR parser to allow conditional reduction based on the
  guard functions;
- Implement two simple guards (contextual-override/final) for cxx.bnf;
- layering: clangPseudoCXX depends on clangPseudo (as the guard function need
  to access the TokenStream);

Differential Revision: https://reviews.llvm.org/D127448
2022-07-05 15:55:15 +02:00
Haojian Wu 70c0d92930 [pseudo] Use the prebuilt cxx grammar for the lit tests, NFC.
Differential Revision: https://reviews.llvm.org/D129074
2022-07-05 15:17:18 +02:00
Sam McCall 743971faaf Revert "[pseudo] Add error-recovery framework & brace-based recovery"
This reverts commit a0f4c10ae2.
This commit hadn't been reviewed yet, and was unintentionally included
on another branch.
2022-06-28 21:11:09 +02:00
Sam McCall a0f4c10ae2 [pseudo] Add error-recovery framework & brace-based recovery
The idea is:
 - a parse failure is detected when all heads die when trying to shift
   the next token
 - we can recover by choosing a nonterminal we're partway through parsing,
   and determining where it ends through nonlocal means (e.g. matching brackets)
 - we can find candidates by walking up the stack from the (ex-)heads
 - the token range is defined using heuristics attached to grammar rules
 - the unparsed region is represented in the forest by an Opaque node

This patch has the core GLR functionality.
It does not allow recovery heuristics to be attached as extensions to
the grammar, but rather infers a brace-based heuristic.

Expected followups:
 - make recovery heuristics grammar extensions (depends on D127448)
 - add recover to our grammar for bracketed constructs and sequence nodes
 - change the structure of our augmented `_ := start` rules to eliminate
   some special-cases in glrParse.
 - (if I can work out how): avoid some spurious recovery cases described
   in comments
 - grammar changes to eliminate the hard distinction between init-list
   and designated-init-list shown in the recovery-init-list.cpp testcase

Differential Revision: https://reviews.llvm.org/D128486
2022-06-28 21:08:43 +02:00
Sam McCall aacefc817d [pseudo] Simplify/loosen the grammar around lambda captures.
Treat captures as a uniform list, rather than default-captures being special
snowflakes that may only appear at the start.

This accepts a larger set of (incorrect) code, and simplifies error-handling
by making this fit into the usual homogeneous-list pattern.

Differential Revision: https://reviews.llvm.org/D128708
2022-06-28 15:56:12 +02:00
Sam McCall 8cf28585a4 [pseudo] Allow mixed designated/undesignated init lists.
This isn't allowed by the standard grammar but is allowed in C, and clang/GCC
permit it as an extension.
It avoids the need to determine which type of list we have in error-recovery.

While here, also support array index designators `{ [4]=1 }` which are
also legal in C, and common extensions in C++.

Differential Revision: https://reviews.llvm.org/D128687
2022-06-28 15:45:41 +02:00
Sam McCall 6b187fdf3b [pseudo] Add xfail tests for a simple-declaration/function-definition ambiguity
I expect to eliminate this ambiguity at the grammar level by use of guards,
because it interferes with brace-based error recvoery.

Differential Revision: https://reviews.llvm.org/D127400
2022-06-23 15:52:22 +02:00
Haojian Wu 28eeea1e27 [pseudo]Pull out the operator< test, NFC
Fix the review comment in https://reviews.llvm.org/D125479.
2022-06-07 11:00:08 +02:00
Haojian Wu cf88150c48 [pseudo] Fix the incorrect parameters-and-qualifiers rule.
The parenthese body should be parameter-declaration-clause, rather
than parameter-declaration-list.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D125479
2022-06-07 10:47:07 +02:00
Haojian Wu ecd7ff53b5 [pseudo] Fix the type-parameter rule.
The IDENTIFIER should be optional.

Differential Revision: https://reviews.llvm.org/D126998
2022-06-07 10:36:45 +02:00
Haojian Wu 90dab0473e [pseudo] Handle the language predefined identifier __func__
The clang lexer lexes it as a dedicated token kind (rather than
identifier), we extend the grammar to handle it.

Differential Revision: https://reviews.llvm.org/D126996
2022-06-07 10:34:37 +02:00
Haojian Wu 58b33bc8c4 [pseudo] Fix noptr-abstract-declarator rule.
The const-expression in the [] can be empty.

Differential Revision: https://reviews.llvm.org/D126992
2022-06-07 10:22:23 +02:00
Haojian Wu 0a6a17a4f9 [pseudo] Fix the member-specification grammar rule.
The grammar rule is not right, doesn't match the standard one.

Differential Revision: https://reviews.llvm.org/D126991
2022-06-07 10:18:18 +02:00