Commit Graph

9 Commits

Author SHA1 Message Date
Haojian Wu 9ab67cc8bf [pseudo] Implement guard extension.
- Extend the GLR parser to allow conditional reduction based on the
  guard functions;
- Implement two simple guards (contextual-override/final) for cxx.bnf;
- layering: clangPseudoCXX depends on clangPseudo (as the guard function need
  to access the TokenStream);

Differential Revision: https://reviews.llvm.org/D127448
2022-07-05 15:55:15 +02:00
Haojian Wu d263447311 [pseudo] Fix the build for the benchmark tool. 2022-07-05 15:42:41 +02:00
Haojian Wu fe66aebd75 [pseudo] Define a clangPseudoCLI library.
- define a common data structure Language which is a compiled result of the
  bnf grammar. It is defined in Language.h;
- creates a clangPseudoCLI lib which defines a grammar commandline flag and
  expose a function to get the Language. It supports --grammar=cxx,
  --grammmar=/path/to/file.bnf;
- use the clangPseudoCLI in clang-pseudo, fuzzer, and benchmark tools (
  simplify the code and use the prebuilt cxx grammar);

Split out from https://reviews.llvm.org/D127448.

Differential Revision: https://reviews.llvm.org/D128679
2022-07-01 08:31:34 +02:00
Sam McCall 3f028c02ba [pseudo] Grammar::parseBNF returns Grammar not unique_ptr. NFC 2022-06-28 16:34:21 +02:00
Haojian Wu 70d35fe125 [pseudo] Fix the broken build of ClangPseudoBenchmark, after c70aeaa. 2022-06-09 23:03:54 +02:00
Sam McCall 0360b9f159 [pseudo] (trivial) bracket-matching
Error-tolerant bracket matching enables our error-tolerant parsing strategies.
The implementation here is *not* yet error tolerant: this patch sets up the APIs
and plumbing, and describes the planned approach.

Differential Revision: https://reviews.llvm.org/D125911
2022-05-24 15:13:36 +02:00
Sam McCall e8e00e342c [pseudo] benchmark cleanups. NFC
- add missing benchmark for lex/preprocess steps
- name benchmarks after the function they're benchmarking, when appropriate
- remove unergonomic "run" prefixes from benchmark names
- give a useful error message if --grammar or --source are missing
- Use realistic example of how to run, run all benchmarks by default.
  (for someone who doesn't know the commands, this is the most useful action)
- Improve typos/wording in comment
- clean up unused vars
- avoid "parseable stream" name, which isn't a great name & not one I expected
  to escape from ClangPseudoMain

Differential Revision: https://reviews.llvm.org/D125312
2022-05-17 20:22:42 +02:00
Haojian Wu 1a65c491be [pseudo] Support parsing variant target symbols.
With this patch, we're able to parse smaller chunks of C++ code (statement,
declaration), rather than translation-unit.

The start symbol is listed in the grammar in a form of `_ :=
statement`, each start symbol has a dedicated state (`_ := • statement`).
We create and track all these separate states in the LRTable. When we
start parsing, we lookup the corresponding state to start the parser.

LR pasing table changes with this patch:
- number of states: 1467 -> 1471
- number of actions: 82891 -> 83578
- size of the table (bytes): 334248 -> 336996

Differential Revision: https://reviews.llvm.org/D125006
2022-05-16 10:38:16 +02:00
Haojian Wu be895d5768 [pseudo] Add benchmarks for pseudoparser.
Running on SemaDecl.cpp with the cxx.bnf grammar:

```
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
runParseBNFGrammar      649389 ns       649365 ns         1013
runBuildLR            34591903 ns     34591380 ns           20
runPreprocessTokens   11418744 ns     11418703 ns           61 bytes_per_second=63.8971M/s
runGLRParse          282996863 ns    282988726 ns            2 bytes_per_second=2.57827M/s
runParseOverall      294969719 ns    294951870 ns            2 bytes_per_second=2.4737M/s
```

Differential Revision: https://reviews.llvm.org/D125226
2022-05-10 14:13:46 +02:00