TokenManager defines Token interfaces for the clang syntax-tree. This is the level
of abstraction that the syntax-tree should use to operate on Tokens.
It decouples the syntax-tree from a particular token implementation (TokenBuffer
previously). This enables us to use a different underlying token implementation
for the syntax Leaf node -- in clang pseudoparser, we want to produce a
syntax-tree with its own pseudo::Token rather than syntax::Token.
Differential Revision: https://reviews.llvm.org/D128411
This reverts commit 049f4e4eab.
The problem was a stray dependency in CLANG_TEST_DEPS which caused cmake
to fail if clang-pseudo wasn't built. This is now removed.
This should make clearer that:
- it's not part of clang proper
- there's no expectation to update it along with clang (beyond green tests)
- clang should not depend on it
This is intended to be expose a library, so unlike other tools has a split
between include/ and lib/.
The main renames are:
clang/lib/Tooling/Syntax/Pseudo/* => clang-tools-extra/pseudo/lib/*
clang/include/clang/Tooling/Syntax/Pseudo/* => clang-tools-extra/pseudo/include/clang-pseudo/*
clang/tools/clang/pseudo/* => clang-tools-extra/pseudo/tool/*
clang/test/Syntax/* => clang-tools-extra/pseudo/test/*
clang/unittests/Tooling/Syntax/Pseudo/* => clang-tools-extra/pseudo/unittests/*
#include "clang/Tooling/Syntax/Pseudo/*" => #include "clang-pseudo/*"
namespace clang::syntax::pseudo => namespace clang::pseudo
check-clang => check-clang-pseudo
clangToolingSyntaxPseudo => clangPseudo
The clang-pseudo and ClangPseudoTests binaries are not renamed.
See discussion around:
https://discourse.llvm.org/t/rfc-a-c-pseudo-parser-for-tooling/59217/50
Differential Revision: https://reviews.llvm.org/D121233
Add a utility function to strip comments from a "raw" tokenstream. The
derived stream will be fed to the GLR parser (for early testing).
Differential Revision: https://reviews.llvm.org/D121092
Without this patch, when End == Start, we access Actions[Actions.end()]
though we return an empty result.
This fixes an assertion failure in MSVC STL debug build.
The TokenStream class is the representation of the source code that will
be fed into the GLR parser.
This patch allows a "raw" TokenStream to be built by reading source code.
It also supports scanning a TokenStream to find the directive structure.
Next steps (with placeholders in the code): heuristically choosing a
path through #ifs, preprocessing the code by stripping directives and comments.
These will produce a suitable stream to feed into the parser proper.
Differential Revision: https://reviews.llvm.org/D119162
This patch introduces a dense implementation of the LR parsing table, which is
used by LR parsers.
We build a SLR(1) parsing table from the LR(0) graph.
Statistics of the LR parsing table on the C++ spec grammar:
- number of states: 1449
- number of actions: 83069
- size of the table (bytes): 334928
Differential Revision: https://reviews.llvm.org/D118196
LRGraph is the key component of the clang pseudo parser, it is a
deterministic handle-finding finite-state machine, which is used to
generated the LR parsing table.
Separate from https://reviews.llvm.org/D118196.
Differential Revision: https://reviews.llvm.org/D119172
This patch introduces the Grammar class, which is a critial piece for constructing
a tabled-based parser.
As the first patch, the scope is limited to:
- define base types (symbol, rules) of modeling the grammar
- construct Grammar by parsing the BNF file (annotations are excluded for now)
Differential Revision: https://reviews.llvm.org/D114790
LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement.
This patch replaces `std::vector<bool>` with `llvm::BitVector` in the Syntax library and replaces range-based for loop with regular for loop. This is necessary due to `llvm::BitVector` not having `begin()` and `end()` (D117116).
Reviewed By: dexonsmith, dblaikie
Differential Revision: https://reviews.llvm.org/D118109
This avoids an unnecessary copy required by 'return OS.str()', allowing
instead for NRVO or implicit move. The .str() call (which flushes the
stream) is no longer required since 65b13610a5,
which made raw_string_ostream unbuffered by default.
Differential Revision: https://reviews.llvm.org/D115374
`expandedTokens(SourceRange)` used to do a binary search to get the
expanded tokens belonging to a source range. Each binary search uses
`isBeforeInTranslationUnit` to order two source locations. This is
inherently very slow.
By profiling clangd we found out that users like clangd::SelectionTree
spend 95% of time in `isBeforeInTranslationUnit`. Also it is worth
noting that users of `expandedTokens(SourceRange)` majorly use ranges
provided by AST to query this funciton. The ranges provided by AST are
token ranges (starting at the beginning of a token and ending at the
beginning of another token).
Therefore we can avoid the binary search in majority of the cases by
maintaining an index of ExpandedToken by their SourceLocations. We still
do binary search for ranges which are not token ranges but such
instances are quite low.
Performance:
`~/build/bin/clangd --check=clang/lib/Serialization/ASTReader.cpp`
Before: Took 2:10s to complete.
Now: Took 1:13s to complete.
Differential Revision: https://reviews.llvm.org/D99086
OpaqueValueExpr doesn't correspond to the concrete syntax, it has
invalid source location, ignore them.
Reviewed By: kbobyrev
Differential Revision: https://reviews.llvm.org/D96112
The EndLoc of a type loc can be invalid for broken code.
Also extend the existing test to support error code with `error-ok`
annotation.
Differential Revision: https://reviews.llvm.org/D96261
Non-mechanical changes:
- Added FIXME to StringLiteral to cover multi-token string literals.
- LiteralExpression::getLiteralToken() is gone. (It was never called)
This is because we don't codegen methods in Alternatives
It's conceptually suspect if we consider multi-token string literals, though.
Differential Revision: https://reviews.llvm.org/D91277
Similar to the previous patch, this doesn't convert *all* the classes that
could be converted. It also doesn't enforce any new invariants etc.
It *does* include some data we don't use yet: specific token types that are
allowed and optional/required status of sequence items. (Similar to Dmitri's
prototype). I think these are easier to add as we go than later, and serve
a useful documentation purpose.
Differential Revision: https://reviews.llvm.org/D90659
So far, only used to generate Kind and implement classof().
My plan is to have this general-purpose Nodes.inc in the style of AST
DeclNodes.inc etc, and additionally a special-purpose backend generating
the actual class definitions. But baby steps...
Differential Revision: https://reviews.llvm.org/D90540
Rationale:
Children of a syntax tree had forward links only, because there was no
need for reverse links.
This need appeared when we started mutating the syntax tree.
On a forward list, to remove a target node in O(1) we need a pointer to the node before the target. If we don't have this "before" pointer, we have to find it, and that requires O(n).
So in order to remove a syntax node from a tree, we would similarly need to find the node before to then remove. This is both not ergonomic nor does it have a good complexity.
Differential Revision: https://reviews.llvm.org/D90240
This gives us slightly nicer syntax (foreach) for idioms currently expressed
as a loop, and the option to use range algorithms where it makes sense
(e.g. llvm::all_of et al encapsulate the needed flow control in a useful way).
It's also a building block for iteration over filtered views (e.g. iterate over
all Stmt children, with the right type):
for (const Statement &S : filter<Statement>(N.children()))
...
I realize the recent direction has been mostly towards strongly-typed
node-specific facilities, but I think it's important we have convenient
generic facilities too.
Differential Revision: https://reviews.llvm.org/D90023
This is an initial attempt to start using Syntax Trees in clangd while improving state of folding ranges feature and experimenting with Syntax Tree capabilities.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D88553
The patch adjusts the existing `llvm::DenseMap<unsigned, T>` and
`llvm::DenseSet<unsigned>` objects that store source locations, so
that they use `SourceLocation` directly instead of `unsigned`.
This patch relies on the `DenseMapInfo` trait added in D89719.
It also replaces the construction of `SourceLocation` objects from
the constants -1 and -2 with calls to the trait's methods `getEmptyKey`
and `getTombstoneKey` where appropriate.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D69840
After this change all nodes that have a delimited-list are using the
`List` API.
Implementation details:
Let's look at a declaration with multiple declarators:
`int a, b;`
To generate a declarator list node we need to have the range of
declarators: `a, b`:
However, the `ClangAST` actually stores them as separate declarations:
`int a ;`
`int b;`
We solve that by appropriately marking the declarators on each separate
declaration in the `ClangAST` and then for the final declarator `int
b`, shrinking its range to fit to the already marked declarators.
Differential Revision: https://reviews.llvm.org/D88403
There can be Macros that are tagged with `modifiable`. Thus verifying
`canModifyAllDescendants` is not sufficient to avoid macros when deep
copying.
We think the `TokenBuffer` could inform us whether a `Token` comes from
a macro. We'll look into that when we can surface this information
easily, for instance in unit tests for `ComputeReplacements`.
Differential Revision: https://reviews.llvm.org/D88034
* Introduce `TreeTest.cpp` to unit test `Tree.h`
* Add `generateAllTreesWithShape` to generating test cases
* Add tests for `findFirstLeaf` and `findLastLeaf`
* Fix implementations of `findFirstLeaf` and `findLastLeaf` that had
been broken when empty `Tree` were present.
Differential Revision: https://reviews.llvm.org/D87779