Otherwise the brace was detected as a function brace, not wrong per se,
but when directly calling the lambda the calling parens were put on the
next line.
Differential Revision: https://reviews.llvm.org/D129946
After b646f09555,
the added regression test started being formatted as-if the
multiplication `*` was a pointer. This adapts the heuristic to
distinguish between these two cases.
Reviewed By: jackhong12, curdeius, HazardyKnusperkeks, owenpan
Differential Revision: https://reviews.llvm.org/D129771
When removing an r_brace that is the first token of an annotated line, if the
line above ends with a line comment, clang-format generates invalid code by
merging the tokens after the r_brace into the line comment.
Fixes#56488.
Differential Revision: https://reviews.llvm.org/D129742
MacroUnexpander applies the structural formatting of expanded lines into
UnwrappedLines to the corresponding unexpanded macro calls, resulting in
UnwrappedLines for the macro calls the user typed.
Differential Revision: https://reviews.llvm.org/D88299
Break after a constructor initializer colon only if it's not followed by a
comment on the same line.
Fixes#41128.
Fixes#43246.
Differential Revision: https://reviews.llvm.org/D129057
This change fixes a clang-format unit test failure introduced by [D124748](https://reviews.llvm.org/D124748). The `countLeadingWhitespace` function was calling `isspace` with values that could fall outside the valid input range. The valid input range for `isspace` is unsigned 0-255. Values outside this range produce undefined behavior, which on Windows manifests as an assertion being raised in the debug runtime libraries. `countLeadingWhitespace` was calling `isspace` with a signed char that could produce a negative value if the underlying byte's value was 128 or above, which can happen for non-ASCII encodings. The fix is to use `StringRef`'s `bytes_begin` and `bytes_end` iterators to read the values as unsigned chars instead.
This bug can be reproduced by building the `check-clang-unit` target with a DEBUG configuration under Windows. This change is already covered by existing unit tests.
Reviewed By: MyDeveloperDay
Differential Revision: https://reviews.llvm.org/D128786
The setLength function checks for the token kind which could be
uninitialized in the previous version.
The problem was introduced in 2e32ff106e.
Reviewed By: MyDeveloperDay, owenpan
Differential Revision: https://reviews.llvm.org/D128607
Verilog uses the backtick instead of the hash. In this revision
backticks are lexed manually and then get labeled as hashes so the logic
for handling C preprocessor stuff don't have to change. Hashes get
labeled as identifiers for Verilog-specific stuff like delays.
Reviewed By: HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D124749
This patch mainly handles treating `begin` as block openers.
While and for statements will be handled in another patch.
Reviewed By: HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D123450
The current way of counting whitespace would count backticks as
whitespace. For Verilog stuff we need backticks to be handled
correctly. For JavaScript the current way is to compare the entire
token text to see if it's a backtick. However, when the backtick is the
first token following an escaped newline, the escaped newline will be
part of the tok::unknown token. Verilog has macros and escaped newlines
unlike JavaScript. So we can't regard an entire tok::unknown token as
whitespace. Previously, the start of every token would be matched for
newlines. Now, it is all whitespace instead of just newlines.
The column counting problem has already been fixed for JavaScript in
e71b4cbdd1 by counting columns elsewhere.
Reviewed By: HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D124748
Turn off RemoveBracesLLVM while analyzing InsertBraces and vice
versa to avoid potential interference of each other and better the
performance.
Differential Revision: https://reviews.llvm.org/D127685
Call mightFitOneOneline() on the line before the closing brace only
if it is at the level of the block.
Differential Revision: https://reviews.llvm.org/D127614
Change the signatures of parseBlock(), parseLevel(), and
parseStructuralElement() to support combining else and if when
removing braces. See #55663.
Differential Revision: https://reviews.llvm.org/D127005
Do not remove braces if the conditional of if/for/while might not
fit on a single line even after the opening brace is removed.
Examples:
// ColumnLimit: 20
// 45678901234567890
if (a) { /* Remove. */
foo();
}
if (-b >= c) { // Keep.
bar();
}
Differential Revision: https://reviews.llvm.org/D126052
We have autogenerated pragma regions in our code
which where awkwardly broken up like this:
```
#pragma region foo(bar : hello)
```
becomes
```
#pragma region foo(bar \
: hello)
```
This fixes the problem by adding region as a keyword
and handling it the same way as pragma mark
Reviewed By: curdeius
Differential Revision: https://reviews.llvm.org/D125961
This reverts commit 50cd52d935.
It provoked regressions in C++ and ObjectiveC as described in https://reviews.llvm.org/D123676#3515949.
Reproducers:
```
MACRO_BEGIN
#if A
int f();
#else
int f();
#endif
```
```
NS_SWIFT_NAME(A)
@interface B : C
@property(readonly) D value;
@end
```
Clean up UnwrappedLineParser for RemoveBracesLLVM to avoid calling
mightFitOnOneLine() as much as possible.
Differential Revision: https://reviews.llvm.org/D125626
The combination of
- AlignConsecutiveAssignments.Enabled = true
- BinPackArguments = false
would result in the first continuation line of a braced-init-list being
improperly indented (missing a shift) when in a continued function call.
Indentation was also wrong for braced-init-lists continuing a
direct-list-initialization. Check for opening braced lists in
continuation and ensure that the correct shift occurs.
Fixes https://github.com/llvm/llvm-project/issues/55360
Reviewed By: curdeius
Differential Revision: https://reviews.llvm.org/D125162
Fixes https://github.com/llvm/llvm-project/issues/55407.
Given configuration:
```
UseTab: Always
PointerAlignment: Right
AlignConsecutiveDeclarations: true
```
Before, the pointer was misaligned in this code:
```
void f() {
unsigned long long big;
char *ptr; // misaligned
int i;
}
```
That was due to the fact that when handling right-aligned pointers, the Spaces were changed but StartOfTokenColumn was not.
Also, a tab was used not only for indentation but for spacing too when using `UseTab: ForIndentation` config option:
```
void f() {
unsigned long long big;
char *ptr; // \t after char
int i;
}
```
Reviewed By: owenpan
Differential Revision: https://reviews.llvm.org/D125528
If a closing brace is followed by a non-trailing comment, the
newline before the closing brace must also be removed.
Differential Revision: https://reviews.llvm.org/D125451
Reimplement the RemoveBracesLLVM feature which handles a
single-statement block that would get wrapped.
Fixes#53543.
Differential Revision: https://reviews.llvm.org/D125137
Due to how parseBracedList always stopped on the first closing angle
bracket and was used in parsing angle bracketed expression inside concept
definition, nested brackets inside concepts were parsed incorrectly.
nextToken() call before calling parseBracedList is required because
we were processing opening angle bracket inside parseBracedList second
time leading to incorrect logic after my fix.
Fixes https://github.com/llvm/llvm-project/issues/54943
Fixes https://github.com/llvm/llvm-project/issues/54837
Reviewed By: HazardyKnusperkeks, curdeius
Differential Revision: https://reviews.llvm.org/D123896
Fixes https://github.com/llvm/llvm-project/issues/54522.
This fixes regression introduced in 5e5efd8a91.
Before the culprit commit, macros in WhitespaceSensitiveMacros were correctly formatted even if their closing parenthesis weren't followed by semicolon (or, to be precise, when they were followed by a newline).
That commit changed the type of the macro token type from TT_UntouchableMacroFunc to TT_FunctionLikeOrFreestandingMacro.
Correct formatting (with `WhitespaceSensitiveMacros = ['FOO']`):
```
FOO(1+2)
FOO(1+2);
```
Regressed formatting:
```
FOO(1 + 2)
FOO(1+2);
```
Reviewed By: HazardyKnusperkeks, owenpan, ksyx
Differential Revision: https://reviews.llvm.org/D123676
The ShouldShiftBeAdded lambda checks if extra space should be
added before the wrapped part of a braced list. If the first
element of the list is wrapped, no extra space should be added.
Fixes#55161.
Differential Revision: https://reviews.llvm.org/D124956
This reverts commit d46fa023ca.
Regressed include order in some cases with trailing comments, see the
comments on https://reviews.llvm.org/D121370. Will add a regression test
in a follow-up commit.
There was some duplicate code in determineStarAmpUsage and
determinePlusMinusCaretUsage
Now a `-` or `+` following `;`, `sizeof`, `co_await`, or `delete` is
regarded as a unary operator.
Now a `*` or `&` following `case` is also a unary operator.
Reviewed By: curdeius, MyDeveloperDay, HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D121754
Fixes [[ https://github.com/llvm/llvm-project/issues/38995 | #38995 ]]
This is an attempt to modify the regular expression to identify
`@import` and `import` alongside the regular `#include`. The challenging
part was not to support `@` in addition to `#` but how to handle
everything that comes after the `include|import` keywords. Previously
everything that wasn't `"` or `<` was consumed. But as you can see in
this example from the issue #38995, there is no `"` or `<` following the
keyword:
```
@import Foundation;
```
I experimented with a lot of fancy and useful expressions in [this
online regex tool](https://regex101.com) only to find out that some
things are simply not supported by the regex implementation in LLVM.
* For example the beginning `[\t\ ]*` should be replacable by the
horizontal whitespace character `\h*` but this will break the
`SortIncludesTest.LeadingWhitespace` test.
That's why I've chosen to come back to the basic building blocks.
The essential change in this patch is the change from this regular
expression:
```
^[\t\ ]*#[\t\ ]*(import|include)[^"<]*(["<][^">]*[">])
~ ~~~~~~~~~~~~~~
^ ^
| |
only support # prefix not @ |
only support "" and <> as
delimiters
no support for C++ modules and ;
ending. Also this allows for ">
or <" or "" or <> which all seems
either off or wrong.
```
to this:
```
^[\t\ ]*[@#][\t\ ]*(import|include)([^"]*("[^"]+")|[^<]*(<[^>]+>)|[\t\
]*([^;]+;))
~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~
~~~~~~~~~~~~~~
^ ^ ^ ^ ^
| | | | |
Now support @ and #. Clearly support "" and <> as
well as an
include name without enclosing
characters.
Allows for no mixture of "> or
<" or
empty include names.
```
Here is how I've tested this patch:
```
ninja clang-Format
ninja FormatTests
./tools/clang/unittests/Format/FormatTests
--gtest_filter=SortIncludesTest*
```
And if that worked I doubled checked that nothing else broke by running
all format checks:
```
./tools/clang/unittests/Format/FormatTests
```
One side effect of this change is it should partially support
[C++20 Module](https://en.cppreference.com/w/cpp/language/modules)
`import` lines without the optional `export` in front. Adding
this can be a change on its own that shouldn't be too hard. I say
partially because the `@` or `#` are currently *NOT* optional in the
regular expression.
I see an opportunity to optimized the matching to exclude `@include` for
example. But eventually these should be caught by the compiler, so...
With my change, the matching group is not at a fixed position any
longer. I decided to
choose the last match (group) that is not empty.
Reviewed By: HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D121370
With D117142, we would now format
```
struct A {
#define A
void f() { a(); }
#endif
};
```
into
```
struct A {
#ifdef A
void f() {
a();
}
#endif
};
```
because we were looking for the record lbrace without skipping preprocess lines.
Fixes https://github.com/llvm/llvm-project/issues/54901.
Reviewed By: curdeius, owenpan
Differential Revision: https://reviews.llvm.org/D123737
[clang-format] Indent import statements in JavaScript.
Take for example this piece of code found at
<https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import>.
```
for (const link of document.querySelectorAll("nav > a")) {
link.addEventListener("click", e => {
e.preventDefault();
import('/modules/my-module.js')
.then(module => {
module.loadPageInto(main);
})
.catch(err => {
main.textContent = err.message;
});
});
}
```
Previously the import line would be unindented, looking like this.
```
for (const link of document.querySelectorAll("nav > a")) {
link.addEventListener("click", e => {
e.preventDefault();
import('/modules/my-module.js')
.then(module => {
module.loadPageInto(main);
})
.catch(err => {
main.textContent = err.message;
});
});
}
```
Actually we were going to fix this along with fixing Verilog import
statements. But the patch got too big.
Reviewed By: MyDeveloperDay, curdeius
Differential Revision: https://reviews.llvm.org/D121906
We currently have all those fields in AnnotatingParser::Context. They
are not inherited from the Context object for the parent scope. They
are exclusive. Now they are replaced with an enum.
`InCpp11AttributeSpecifier` and `InCSharpAttributeSpecifier` are not
handled like the rest in ContextType because they are not exclusive.
Reviewed By: curdeius, MyDeveloperDay, HazardyKnusperkeks, owenpan
Differential Revision: https://reviews.llvm.org/D121907
in TokenAnnotator::parseBrace. Left is misleading, because we have a
loop and Left does not move.
Also return early.
Differential Revision: https://reviews.llvm.org/D121558
in TokenAnnotator::parseParens(). Left is misleading since we have a
loop and Left is not adjusted.
Differential Revision: https://reviews.llvm.org/D121557
I have lost count of the number of times this has been reported, but it fundamentally comes down to the fact that the "AlignArrayLeft/Right" function is fundamentally broken for non-square arrays.
As a result, a pointer can end up running off the end of the array structure, I've spent the last 2 weekends trying to rewrite this algorithm but I've struggled to get it aligned correctly.
This is an interim fix, that ignores all array that are non-square and leaves them alone. I think this can allow us to close out most of the crashes (if not all).
I think this can help reduce the number of bugs coming in that are duplicates.
https://github.com/llvm/llvm-project/issues/53748https://github.com/llvm/llvm-project/issues/51767https://github.com/llvm/llvm-project/issues/51277
Reviewed By: curdeius, HazardyKnusperkeks, feg208
Differential Revision: https://reviews.llvm.org/D121069
Before, the code:
```
int Value { get; } = 0;
int Value { init; } = 0;
```
was formatted incoherently:
```
int Value { get; } = 0;
int Value { init; }
= 0;
```
because `init` was not recognised as an accessor specifier.
Reviewed By: MyDeveloperDay, HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D121132
Fixes https://github.com/llvm/llvm-project/issues/54147.
When handling `AllowShortFunctionsOnASingleLine`, we were searching for the last line with a smaller level than the current line. The search was incorrect when the first line had the same level as the current one. This led to an unsatisfied assumption about the existence of a brace (non-comment token).
Reviewed By: HazardyKnusperkeks, owenpan
Differential Revision: https://reviews.llvm.org/D120902
https://github.com/llvm/llvm-project/issues/53981
Reorder the qualifiers inside the template argument. This should handle the simple cases of
```
<const T>
<T const>
```
But only by relaxing that single letter capital variables are not possible macros
Fixes: #53981
Reviewed By: HazardyKnusperkeks, curdeius
Differential Revision: https://reviews.llvm.org/D120710
Originally filed at crbug.com/1184570.
When the name of a namespace is a macro that takes arguments,
- It fixed the indentation.
- It fixed the namepsace end comments.
Differential Revision: https://reviews.llvm.org/D120931
We have a little problem. TokenAnnotator::resetTokenMetadata() resets
the type, except for a (growing) whitelist. This is because the
TokenAnnotator visits some tokens multiple times. E.g. trying to
identify if a < is an operator less or a template opener. And in some
runs, which are bascially "reverted" the types are reset.
On the other hand, if the parser does already know the type, it should
be able to set it, without it being reset. So we introduce the ability
to set a type and make that final.
Differential Revision: https://reviews.llvm.org/D120511
This change can be seen as code cleanup but motivation is more performance related.
While browsing perf reports captured during Linux build we can notice unusual portion of instructions executed in std::vector<std::string> copy constructor like:
0.59% 0.58% clang-14 clang-14 [.] std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::vector
or even:
1.42% 0.26% clang clang-14 [.] clang::LangOptions::LangOptions
|
--1.16%--clang::LangOptions::LangOptions
|
--0.74%--std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::vector
After more digging we can see that relevant LangOptions std::vector members (*Files, ModuleFeatures and NoBuiltinFuncs)
are constructed when Lexer::LangOpts field is initialized on list:
Lexer::Lexer(..., const LangOptions &langOpts, ...)
: ..., LangOpts(langOpts),
Since LangOptions copy constructor is called by Lexer(..., const LangOptions &LangOpts,...) and local Lexer objects are created thousands times
(in Lexer::getRawToken, Preprocessor::EnterSourceFile and more) during single module processing in frontend it makes std::vector copy constructors surprisingly hot.
Unfortunately even though in current Lexer implementation mentioned std::vector members are unused and most of time empty,
no compiler is smart enough to optimize their std::vector copy constructors out (take a look at test assembly): https://godbolt.org/z/hdoxPfMYY even with LTO enabled.
However there is simple way to fix this. Since Lexer doesn't access *Files, ModuleFeatures, NoBuiltinFuncs and any other LangOptions fields (but only LangOptionsBase)
we can simply get rid of redundant copy constructor assembly by changing LangOpts type to more appropriate const LangOptions reference: https://godbolt.org/z/fP7de9176
Additionally we need to store LineComment outside LangOpts because it's written in SkipLineComment function.
Also FormatTokenLexer need to be adjusted a bit to avoid lifetime issues related to passing local LangOpts reference to Lexer.
After this change I can see more than 1% speedup in some of my microbenchmarks when using Clang release binary built with LTO.
For Linux build gains are not so significant but still nice at the level of -0.4%/-0.5% instructions drop.
Differential Revision: https://reviews.llvm.org/D120334
Fixes https://github.com/llvm/llvm-project/issues/53876.
This is a solution for standard C++ casts: const_cast, dynamic_cast, reinterpret_cast, static_cast.
A general approach handling all possible casts is not possible without semantic information.
Consider the code:
```
static_cast<T>(*function_pointer_variable)(arguments);
```
vs.
```
some_return_type<T> (*function_pointer_variable)(parameters);
// Later used as:
function_pointer_variable = &some_function;
return function_pointer_variable(args);
```
In the latter case, it's not a cast but a variable declaration of a pointer to function.
Without knowing what `some_return_type<T>` is (and clang-format does not know it), it's hard to distinguish between the two cases. Theoretically, one could check whether "parameters" are types (not a cast) and "arguments" are value/expressions (a cast), but that might be inefficient (needs lots of lookahead).
Reviewed By: MyDeveloperDay, HazardyKnusperkeks, owenpan
Differential Revision: https://reviews.llvm.org/D120140
Fixes https://github.com/llvm/llvm-project/issues/53962.
Given the config:
```
BasedOnStyle: LLVM
QualifierAlignment: Custom
QualifierOrder: ['constexpr', 'type']
```
The code:
```
template <typename F>
requires std::invocable<F>
constexpr constructor();
```
was incorrectly formatted to:
```
template <typename F>
requires
constexpr std::invocable<F> constructor();
```
because we considered `std::invocable<F> constexpr` as a type, not recognising the requires clause.
This patch avoids moving the qualifier across the boundary of the requires clause (checking `ClosesRequiresClause`).
Reviewed By: HazardyKnusperkeks, owenpan
Differential Revision: https://reviews.llvm.org/D120309
Setting a boolean within an if and only using it in the very next if is
a bit confusing. Merge it into one if.
Differential Revision: https://reviews.llvm.org/D120237
In 529aa4b011
by setting the identifier info to nullptr, we started to subtly
interfere with the parts in the beginning of the function,
529aa4b011/clang/lib/Format/UnwrappedLineParser.cpp (L991)
causing the preprocessor nesting to change in some cases. E.g., for the
added regression test, clang-format started incorrectly guessing the
language as C++.
This tries to address this by introducing an internal identifier info
element to use instead.
Reviewed By: curdeius, MyDeveloperDay
Differential Revision: https://reviews.llvm.org/D120315
Adds a new option InsertBraces to insert the optional braces after
if, else, for, while, and do in C++.
Differential Revision: https://reviews.llvm.org/D120217
We can return as early as possible and only calculate IsComparison if we
really need to. Also cache getPrecedence() instead of querying it at
most 4 times.
Differential Revision: https://reviews.llvm.org/D119923
This reverts commit e021987273.
This commit provokes failures in formatting tests of polly.
Cf. https://lab.llvm.org/buildbot/#/builders/205/builds/3320.
That's probably because of `)` being annotated as `CastRParen` instead of `Unknown` before, hence being kept on the same line with the next token.
Fixes https://github.com/llvm/llvm-project/issues/53876.
This is a solution for standard C++ casts: const_cast, dynamic_cast, reinterpret_cast, static_cast.
A general approach handling all possible casts is not possible without semantic information.
Consider the code:
```
static_cast<T>(*function_pointer_variable)(arguments);
```
vs.
```
some_return_type<T> (*function_pointer_variable)(parameters);
// Later used as:
function_pointer_variable = &some_function;
return function_pointer_variable(args);
```
In the latter case, it's not a cast but a variable declaration of a pointer to function.
Without knowing what `some_return_type<T>` is (and clang-format does not know it), it's hard to distinguish between the two cases. Theoretically, one could check whether "parameters" are types (not a cast) and "arguments" are value/expressions (a cast), but that might be inefficient (needs lots of lookahead).
Reviewed By: MyDeveloperDay, HazardyKnusperkeks, owenpan
Differential Revision: https://reviews.llvm.org/D120140