Different loop termination conditions resulted in confusion of whether
*Offset was intended to be inside or outside the token.
This ultimately led to constructing an out-of-range SourceLocation.
Fix by making Offset consistently point *after* the token.
Differential Revision: https://reviews.llvm.org/D135356
This change fixes a clang-format unit test failure introduced by [D124748](https://reviews.llvm.org/D124748). The `countLeadingWhitespace` function was calling `isspace` with values that could fall outside the valid input range. The valid input range for `isspace` is unsigned 0-255. Values outside this range produce undefined behavior, which on Windows manifests as an assertion being raised in the debug runtime libraries. `countLeadingWhitespace` was calling `isspace` with a signed char that could produce a negative value if the underlying byte's value was 128 or above, which can happen for non-ASCII encodings. The fix is to use `StringRef`'s `bytes_begin` and `bytes_end` iterators to read the values as unsigned chars instead.
This bug can be reproduced by building the `check-clang-unit` target with a DEBUG configuration under Windows. This change is already covered by existing unit tests.
Reviewed By: MyDeveloperDay
Differential Revision: https://reviews.llvm.org/D128786
The setLength function checks for the token kind which could be
uninitialized in the previous version.
The problem was introduced in 2e32ff106e.
Reviewed By: MyDeveloperDay, owenpan
Differential Revision: https://reviews.llvm.org/D128607
Verilog uses the backtick instead of the hash. In this revision
backticks are lexed manually and then get labeled as hashes so the logic
for handling C preprocessor stuff don't have to change. Hashes get
labeled as identifiers for Verilog-specific stuff like delays.
Reviewed By: HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D124749
The current way of counting whitespace would count backticks as
whitespace. For Verilog stuff we need backticks to be handled
correctly. For JavaScript the current way is to compare the entire
token text to see if it's a backtick. However, when the backtick is the
first token following an escaped newline, the escaped newline will be
part of the tok::unknown token. Verilog has macros and escaped newlines
unlike JavaScript. So we can't regard an entire tok::unknown token as
whitespace. Previously, the start of every token would be matched for
newlines. Now, it is all whitespace instead of just newlines.
The column counting problem has already been fixed for JavaScript in
e71b4cbdd1 by counting columns elsewhere.
Reviewed By: HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D124748
This reverts commit 50cd52d935.
It provoked regressions in C++ and ObjectiveC as described in https://reviews.llvm.org/D123676#3515949.
Reproducers:
```
MACRO_BEGIN
#if A
int f();
#else
int f();
#endif
```
```
NS_SWIFT_NAME(A)
@interface B : C
@property(readonly) D value;
@end
```
Fixes https://github.com/llvm/llvm-project/issues/54522.
This fixes regression introduced in 5e5efd8a91.
Before the culprit commit, macros in WhitespaceSensitiveMacros were correctly formatted even if their closing parenthesis weren't followed by semicolon (or, to be precise, when they were followed by a newline).
That commit changed the type of the macro token type from TT_UntouchableMacroFunc to TT_FunctionLikeOrFreestandingMacro.
Correct formatting (with `WhitespaceSensitiveMacros = ['FOO']`):
```
FOO(1+2)
FOO(1+2);
```
Regressed formatting:
```
FOO(1 + 2)
FOO(1+2);
```
Reviewed By: HazardyKnusperkeks, owenpan, ksyx
Differential Revision: https://reviews.llvm.org/D123676
This change can be seen as code cleanup but motivation is more performance related.
While browsing perf reports captured during Linux build we can notice unusual portion of instructions executed in std::vector<std::string> copy constructor like:
0.59% 0.58% clang-14 clang-14 [.] std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::vector
or even:
1.42% 0.26% clang clang-14 [.] clang::LangOptions::LangOptions
|
--1.16%--clang::LangOptions::LangOptions
|
--0.74%--std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::vector
After more digging we can see that relevant LangOptions std::vector members (*Files, ModuleFeatures and NoBuiltinFuncs)
are constructed when Lexer::LangOpts field is initialized on list:
Lexer::Lexer(..., const LangOptions &langOpts, ...)
: ..., LangOpts(langOpts),
Since LangOptions copy constructor is called by Lexer(..., const LangOptions &LangOpts,...) and local Lexer objects are created thousands times
(in Lexer::getRawToken, Preprocessor::EnterSourceFile and more) during single module processing in frontend it makes std::vector copy constructors surprisingly hot.
Unfortunately even though in current Lexer implementation mentioned std::vector members are unused and most of time empty,
no compiler is smart enough to optimize their std::vector copy constructors out (take a look at test assembly): https://godbolt.org/z/hdoxPfMYY even with LTO enabled.
However there is simple way to fix this. Since Lexer doesn't access *Files, ModuleFeatures, NoBuiltinFuncs and any other LangOptions fields (but only LangOptionsBase)
we can simply get rid of redundant copy constructor assembly by changing LangOpts type to more appropriate const LangOptions reference: https://godbolt.org/z/fP7de9176
Additionally we need to store LineComment outside LangOpts because it's written in SkipLineComment function.
Also FormatTokenLexer need to be adjusted a bit to avoid lifetime issues related to passing local LangOpts reference to Lexer.
After this change I can see more than 1% speedup in some of my microbenchmarks when using Clang release binary built with LTO.
For Linux build gains are not so significant but still nice at the level of -0.4%/-0.5% instructions drop.
Differential Revision: https://reviews.llvm.org/D120334
This factors out a pattern that comes up from time to time.
Reviewed By: MyDeveloperDay, HazardyKnusperkeks, owenpan
Differential Revision: https://reviews.llvm.org/D117769
Fixes https://github.com/llvm/llvm-project/issues/44601.
This patch handles a bug when parsing a below example code :
```
template <class> class S;
template <class T> bool operator<(S<T> const &x, S<T> const &y) {
return x.i < y.i;
}
template <class T> class S {
int i = 42;
friend bool operator< <>(S const &, S const &);
};
int main() { return S<int>{} < S<int>{}; }
```
which parse `< <>` as `<< >`, not `< <>` in terms of tokens as discussed in discord.
1. Add a condition in `tryMergeLessLess()` considering `operator` keyword and `>`
2. Force to leave a whitespace between `tok::less` and a template opener
3. Add unit test
Reviewed By: MyDeveloperDay, curdeius
Differential Revision: https://reviews.llvm.org/D117398
When building clang-format with -Wall on Visual Studio 20119 we see the following, prevent this the only -Wall error
```
..FormatTokenLexer.cpp(45) : warning C4868: compiler may not enforce left-to-right evaluation order in braced initializer list
```
Reviewed By: HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D113844
This allows to ignore for example Qts emit when
AlignConsecutiveDeclarations is set, otherwise it is parsed as a type
and it results in some misformating:
unsigned char MyChar = 'x';
emit signal(MyChar);
Differential Revision: https://reviews.llvm.org/D93776
Update `Lexer` / `Lexer::Lexer` to use `MemoryBufferRef` instead of
`MemoryBuffer*`. Callers that were acquiring a `MemoryBuffer*` via
`SourceManager::getBuffer` were updated, such that if they checked
`Invalid` they use `getBufferOrNone` and otherwise `getBufferOrFake`.
Differential Revision: https://reviews.llvm.org/D89398
Update clang/lib/Format and clang/lib/Rewrite to use a `MemoryBufferRef`
from `getBufferOrFake` instead of `MemoryBuffer*` from `getBuffer`.
No functionality change here, since the call sites weren't checking if
the buffer was valid.
Differential Revision: https://reviews.llvm.org/D89406
https://bugs.llvm.org/show_bug.cgi?id=47461
The following change {D80940} caused a regression in code which ifdef's around the try and catch block cause incorrect brace placement around the catch
```
try
{
}
catch (...) {
// This is not a small function
bar = 1;
}
}
```
The brace after the catch will be placed on a newline
Reviewed By: curdeius
Differential Revision: https://reviews.llvm.org/D87291
This adds a `AttributeMacros` configuration option that causes certain
identifiers to be parsed like a __attribute__((foo)) annotation.
This is motivated by our CHERI C/C++ fork which adds a __capability
qualifier for pointer/reference. Without this change clang-format parses
many type declarations as multiplications/bitwise-and instead.
I initially considered adding "__capability" as a new clang-format keyword,
but having a list of macros that should be treated as attributes is more
flexible since it can be used e.g. for static analyzer annotations or other language
extensions.
Example: std::vector<foo * __capability> -> std::vector<foo *__capability>
Depends on D86775 (to apply cleanly)
Reviewed By: MyDeveloperDay, jrtc27
Differential Revision: https://reviews.llvm.org/D86782
Summary:
https://bugs.llvm.org/show_bug.cgi?id=46383
When the c preprocessor stringizes tokens, the generated string literals
are affected by the whitespace. This means clang-format can affect
codegen silently, adding spaces and newlines to strings. Practically
speaking, the vast majority of cases will be harmless, only affecting
single identifiers or debug macros.
In the interest of doing no harm in other cases though, this introduces
a blacklist option 'WhitespaceSensitiveMacros', which contains a list of
names of function-like macros whose contents should not be touched by
clang-format, period. Clang-format can't automatically detect these
without a real compile context, so users will have to specify it
explicitly (it still beats clang-format off'ing at every invocation).
Defaults include "STRINGIZE", "PP_STRINGIZE", and "BOOST_PP_STRINGIZE".
Subscribers: kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D82620
Summary:
https://bugs.llvm.org/show_bug.cgi?id=33890
This revision allow the microsoft `for each(.... in ...` nonstandard C++ extension which can be used in C++/CLI to be handled as a ForEach macro.
This prevents the breaking between the for and each onto a new line
Reviewed By: JakeMerdichAMD
Subscribers: cfe-commits
Tags: #clang, #clang-format
Differential Revision: https://reviews.llvm.org/D80228
This enables us to intercept changes to the token type via setType(), which
is a precondition for being able to use multi-pass formatting for macro
arguments.
Differential Revision: https://reviews.llvm.org/D67405
Summary:
Disable merging of Type? into a single token.
Merge ?? ?. and ?[ into a single token.
Reviewers: krasimir, MyDeveloperDay
Reviewed By: krasimir
Subscribers: cfe-commits
Tags: #clang-format, #clang
Differential Revision: https://reviews.llvm.org/D75368
Summary:
No longer merge 'name' and ':' into a single token.
Ensure that line breaks cannot be placed before or after a named-argument colon.
Ensure that no space is inserted before a named-argument colon.
Reviewers: krasimir
Reviewed By: krasimir
Subscribers: cfe-commits, MyDeveloperDay
Tags: #clang-format, #clang
Differential Revision: https://reviews.llvm.org/D75244
Summary:
Merge 'argumentName' and ':' into a single token in foo(argumentName: bar).
Add C# named argument as a token type.
Reviewers: krasimir, MyDeveloperDay
Reviewed By: krasimir
Tags: #clang-format
Differential Revision: https://reviews.llvm.org/D74894
Summary: Merge '[', 'target' , ':' into a single token for C# attributes to
prevent the target from being seen as a label.
Reviewers: MyDeveloperDay, krasimir
Reviewed By: krasimir
Tags: #clang-format
Differential Revision: https://reviews.llvm.org/D74043
Summary:
This addresses issues raised in https://bugs.llvm.org/show_bug.cgi?id=44454.
There are outstanding issues with multi-line verbatim strings in C# that will be addressed in a follow-up PR.
Reviewers: krasimir, MyDeveloperDay
Reviewed By: krasimir, MyDeveloperDay
Subscribers: MyDeveloperDay
Tags: #clang-format
Differential Revision: https://reviews.llvm.org/D73492
Summary:
clang-format currently treats the nullish coalescing operator `??` like
the ternary operator. That causes multiple nullish terms to be each
indented relative to the last `??`, as they would in a ternary.
The `??` operator is often used in chains though, and as such more
similar to other binary operators, such as `||`. So to fix the indent,
set its token type to `||`, so it inherits the same treatment.
This opens up the question of operator precedence. However, `??` is
required to be parenthesized when mixed with `||` and `&&`, so this is
not a problem that can come up in syntactically legal code.
Reviewers: krasimir
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73026
Summary:
JavaScript / TypeScript is adding two new operators: the null
propagating operator `?.` and the nullish coalescing operator `??`.
const x = foo ?? 'default';
const z = foo?.bar?.baz;
This change adds support to lex and format both.
Reviewers: krasimir
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D69971