With the demangler parenthesizing 'a >> b' inside template parameters,
because C++11 parsing of >> there, we don't really need to add spaces
between adjacent template arg closing '>' chars. In 2022, that just
looks odd.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D123134
(Exitingly) a fold expression's operators include .* and ->*, but we
failed to demangle them as we categorize those as MemberExprs, not
BinaryExprs.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D123305
GCC emits [some] static symbols with an 'L' mangling, which we attempt
to demangle. But the module mangling changes have exposed that we
were doing so at the wrong level. Such manglings are outside of the
ABI as they are internal-linkage, so a bit of reverse engineering was
needed. This adjusts the demangler along the same lines as the
existing gcc demangler (which is not yet module-aware). 'L' is part
of an unqualified name. As before we merely parse the 'L', and then
ignore it.
Reviewed By: iains
Differential Revision: https://reviews.llvm.org/D123138
Both > and >> expressions need to be parenthesized inside template
argument lists.
Reviewed By: dblaikie, rjmccall
Differential Revision: https://reviews.llvm.org/D122474
The demangler had no concept of operator precendence, and would
parenthesize many more subexpressions than necessary. In particular
it would parenthesize primary-expressions, such as '4', which just
looks strange. It would also parenthesize '>' expressions, just in
case they were inside a template parameter list.
This patch fixes both issues.
* Add operator precedence to the OpInfo structure, and add a
subexpression helper that will parenthesize a lower precedence
subexpression.
* Add a 'greater-than is greater-than' indicator to the output buffer,
so the expression printer knows whether it is immediately inside a
template parameter list (and must therefore parenthesize 'expr >
expr'). This is a counter, so that ...
* Add open and close printers to the output buffer, that increment and
decrement the gt-is-gt indicator.
* Parenthesize comma operators inside comma-separated lists. (probably
a rare case, but still).
This dramatically reduces the extraneous parentheses being printed.
Reviewed By: dblaikie, bruno
Differential Revision: https://reviews.llvm.org/D120905
Add support for module name demangling. We have two new demangler
nodes -- ModuleName and ModuleEntity. The former represents a module
name in a hierarchical fashion. The latter is the combination of a
(name) node and a module name. Because module names and entity
identities use the same substitution encoding, we have to adjust the
flow of how substitutions are handled, and examine the substituted
node to know how to deal with it.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D119933
The demangler doesn't understand 'aw' as an operator name. This adds
the necessary smarts -- you may use this as an operator functionname,
but not as an expression operator.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D120143
parseNestedName's main loop allowed parsing a grammar that was more
flexible than the actual grammar. This refactors that to rule out
some more incorrect manglings.
1) The 'L' extension only applies to unqualified-name components, so
check it just there.
2) The 'M' suffix is, AFAICT, removed from the grammar. Rather than
eliminate it, let's parse it after we've parsed a component.
Added some additional bad mangling tests, which are now rejected.
I don't break the 'T' and 'D[tT]' cases out of the loop, even though
they can only appear at first position, as it seems simpler to just
check there is nothing SoFar.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D119542
I discovered some demangler problems:
a) parsing of new expressions was broken, ignoring any 'gs' prefix
b) (when #a is fixed) badly formatted global new expressions
c) formatting of new and delete failed to correctly add whitespace
(a) happens as parseExpr swallows the 'gs' prefix but doesn't pass it
to 'parseNewExpr'. It seems simpler to me to just code the new
expression parsing directly in parseExpr, as is done for delete
expressions.
(b) global new should be rendered something like '::new T' not
'::operator new T'
(c) is resolved by being a bit more careful with whitespace.
Best shown with some examples (don't worry that these symbols are for
impossible instantiations, that's not the point):
Old behaviour:
build/bin/llvm-cxxfilt _ZN2FnIXgsnw_iEEXna_ipiLi4EEEEEvv _ZN2FnIXnwLj4E_iEEXgsnaLj4E_ipiLi4EEEEEvv _ZN2FnIXgsdlLi4EEXdaLi4EEEEvv _ZN2FnIXdlLj4EEXgsdaLj4EEEEvv
void Fn<new int, new[] int(4)>() // No ::new
void Fn<new (4u)int, new[] (4u)int(4)>() // No ::new, poor whitespace
void Fn<::delete4, delete[] 4>() // missing necessary space
void Fn<delete4u, ::delete[] 4u>() // missing necessary space
New behaviour:
build/bin/llvm-cxxfilt _ZN2FnIXgsnw_iEEXna_ipiLi4EEEEEvv _ZN2FnIXnwLj4E_iEEXgsnaLj4E_ipiLi4EEEEEvv _ZN2FnIXgsdlLi4EEXdaLi4EEEEvv _ZN2FnIXdlLj4EEXgsdaLj4EEEEvv
void Fn<::new int, new[] int(4)>()
void Fn<new(4u) int, ::new[](4u) int(4)>()
void Fn<::delete 4, delete[] 4>()
void Fn<delete 4u, ::delete[] 4u>()
Binutils' behaviour:
c++filt _ZN2FnIXgsnw_iEEXna_ipiLi4EEEEEvv _ZN2FnIXnwLj4E_iEEXgsnaLj4E_ipiLi4EEEEEvv _ZN2FnIXgsdlLi4EEXdaLi4EEEEvv _ZN2FnIXdlLj4EEXgsdaLj4EEEEvv
void Fn<::new int, new int(4)>()
void Fn<new (4u) int, ::new (4u) int(4)>()
void Fn<::delete (4), delete[] (4)>()
void Fn<delete (4u), ::delete[] (4u)>()
The new and binutils demanglings are the same modulo some whitespace and optional parens.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D118476
The demangler treats ->* as a BinaryExpr, but .* as a MemberExpr.
That's inconsistent. This makes the former a MemberExpr too.
However, in order to not regress the paren output, MemberExpr::print
is modified to parenthesize the MemberExpr if the operator ends with
'*'. Printing is affected thusly:
Before:
obj.member
obj->member
obj.*member
(obj) ->* (member)
After:
obj.member # Unchanged
obj->member # Unchanged
obj.*(member) # Added paren member operand
obj->*(member) # Removed paren on object operand, less whitespace
The right solution to the paren problem is to add some notion of
precedence (and associativity) to Nodes, but that's a larger change
that would become simpler once the refactoring I'm doing is completed.
FWIW, binutils' demangler's paren algorithm has a small idea of
precedence, and will generally not emit parens when the operand is
unary.
Reviewed By: bruno
Differential Revision: https://reviews.llvm.org/D118486
The parsing of nested names is a little lax. This corrects that.
1) The 'L' local name prefix cannot appear before a NestedName -- only
within it. Let's remove that check from parseName, and then adjust
parseUnscopedName to allow it with or without the 'St' prefix.
2) In a nested name, a <template-param>, <decltype> or <substitution>
can only appear as the first element. Let's enforce that. Note I do
not remove these from the loop, to make the change easier to follow
(such a change will come later).
3) Given that, there's no need to special case 'St' outside of the
loop, handle it with the other 'S' elements.
4) There's no need to reset 'EndsWithTemplateArgs' after each
non-template-arg component. Rather, always clear it and then set it
in the template-args case.
5) An template-args cannot immediately follow a template-args.
6) The parsing of a CDtor name with ABITags would attach the tags to
the NestedName node, rather than the CDTor node. This is different to
how ABITags are attached to an unscopedName. Make it consistent.
7) We remain with only CDTor and UnscopedName requireing construction
of a NestedName, so let's drop the PushComponent lambda.
8) Add some tests to catch the new rejected manglings.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D118132
We were dropping the [gs] modifier by parsing it in parseExpr, but not
forwarding it on to parseUnresolvedName. This is the straightforwards
fix to forward that flag -- parseExpr must see past it.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D118504
The demangler test harness is a little unclear. The failed demangling
message always causes me to think about 'reality', changing to a
simple 'Found' seems clearer.
The expected-to-fail tests abort as soon as one passes, rather than
continue, and then abort if any passed. This changes that loop to
fail at the end, in a similar manner to the expected-to-work loop.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D118130
There's some unnecessary code duplication in the parser. This
refactors that and deploys boolean variables to avoid the duplication.
These also happen to help adding module demangling (with an updated
mangling scheme).
1a) The grammar requires some lookahead concerning <template-args>. We
may discover an <unscoped-name> is actually <unscoped-template-name>
<template-args>. (When <unscoped-name> was a substitution, there must
be a following <template-args>.) Refactor parseName to only have one
code path looking for the 'I' indicating <template-args>.
1b) While there I altered the control flow to hold the result in a
variable, rather than tail call. Made it easier to debug (and of
course an optimizer will DTRT here anyway).
2a) An <unscoped-name> can have an St or StL prefix. No need for
completely separate code paths handling the following unqualified-name
though.
2b) Also no need to look for both 'St' and 'StL' separately. Look for
'St' and then conditionally swallow an 'L'.
3) We get a similar issue as #1a when parsing a typeName. Here I just
change the control flow slightly to bring the 'break' out to the end
of the 'S' block and embed the early return inside an if. That's more
in keeping with the code style.
4) Although NFC, there's a new testcase as that's not covered by the
existing demangler tests and is significant in the #1a case above.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D117879
We've stopped doing it in libc++ for a while now because these names
would end up rotting as we move things around and copy/paste stuff.
This cleans up all the existing files so as to stop the spreading
as people copy-paste headers around.
A libfuzzer run has discovered some inputs for which the demangler does
not terminate. When minimized, it looks like this: _Zcv1BIRT_EIS1_E
Deciphered:
_Z
cv - conversion operator
* result type
1B - "B"
I - template args begin
R - reference type <.
T_ - forward template reference | *
E - template args end | |
| |
* parameter type | |
I - template args begin | |
S1_ - substitution #1 * <'
E - template args end
The reason is: template-parameter refs in conversion operator result type
create forward-references, while substitutions are instantly resolved via
back-references. Together these can create a reference loop. It causes an
infinite loop in ReferenceType::collapse().
I see three possible ways to avoid these loops:
1. check if resolving a forward reference creates a loop and reject the
invalid input (hard to traverse AST at this point)
2. check if a substitution contains a malicious forward reference and
reject the invalid input (hard to traverse AST at this point;
substitutions are quite common: may affect performance; hard to
clearly detect loops at this point)
3. detect loops in ReferenceType::collapse() (cannot reject the input)
This patch implements (3) as seemingly the least-impact change. As a
side effect, such invalid input strings are not rejected and produce
garbage, however there are already similar guards in
`if (Printing) return;` checks.
Fixes https://llvm.org/PR51407
Differential Revision: https://reviews.llvm.org/D107712
Now that Lit supports regular expressions inside XFAIL & friends, it is
much easier to write Lit annotations based on the triple.
Differential Revision: https://reviews.llvm.org/D104747
This fixes a long standing issue where the triple is not always set
consistently in all configurations. This change also moves the
back-deployment Lit features to using the proper target triple
instead of using something ad-hoc.
This will be necessary for using from scratch Lit configuration files
in both normal testing and back-deployment testing.
Differential Revision: https://reviews.llvm.org/D102012
Before this patch, we could only link against the back-deployment libc++abi
dylib. This patch allows linking against the just-built libc++abi, but
running against the back-deployment one -- just like we do for libc++.
Also, add XFAIL markup to flag expected errors.
Differential Revision: https://reviews.llvm.org/D91069
The two operations have acted differently since Clang 8, but were
unfortunately mangled the same. The new mangling uses new "vendor
extended expression" syntax proposed in
https://github.com/itanium-cxx-abi/cxx-abi/issues/112
GCC had the same mangling problem, https://gcc.gnu.org/PR88115, and
will hopefully be switching to the same mangling as implemented here.
Additionally, fix the mangling of `__uuidof` to use the new extension
syntax, instead of its previous nonstandard special-case.
Adjusts the demangler accordingly.
Differential Revision: https://reviews.llvm.org/D93922
Summary:
Before this patch, we could only link against the back-deployment libc++abi
dylib. This patch allows linking against the just-built libc++abi, but
running against the back-deployment one -- just like we do for libc++.
Also, add XFAIL markup to flag expected errors.
We used <iostream> in several places where we don't actually need the
full power of <iostream>, and where using basic `std::printf` is enough.
This is better, since `std::printf` can be supported on systems that don't
have a notion of locales, while <iostream> can't.
This is needed when running the tests in Freestanding mode, where main()
isn't treated specially. In Freestanding, main() doesn't get mangled as
extern "C", so whatever runtime we're using fails to find the entry point.
One way to solve this problem is to define a symbol alias from __Z4mainiPPc
to _main, however this requires all definitions of main() to have the same
mangling. Hence this commit.
Summary:
Caught by HWASAN on arm64 Android (which uses ld128 for long double). This
was running the existing fuzzer.
The specific minimized fuzz input to reproduce this is:
__cxa_demangle("1\006ILeeeEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE", 0, 0, 0);
Reviewers: eugenis, srhines, #libc_abi!
Subscribers: kristof.beyls, danielkiss, libcxx-commits
Tags: #libc_abi
Differential Revision: https://reviews.llvm.org/D77924
These names have been changed from CamelCase to camelCase, but there were
many places (comments mostly) that still used the old names.
This change is NFC.
This implements demangling support for the mangling extensions specified
in https://github.com/itanium-cxx-abi/cxx-abi/pull/85, much of which is
implemented in Clang r359967 and r371004.
Specifically, this provides demangling for:
* <template-param-decl> in <lambda-sig>
* <template-param> with non-zero level
* lambda-expression literals (not emitted by Clang yet)
* nullptr literals
* string literals
(The final two seem unrelated, but handling them was necessary in order
to disambiguate between lambda expressions and the other forms of
literal for which we have a type but no value.)
When demangling a <lambda-sig>, we form template parameters with no
corresponding argument, so we cannot substitute in the argument in the
demangling. Instead we invent synthetic names for the template
parameters (eg, '[]<typename $T>($T *x)').
llvm-svn: 371273
to reflect the new license. These used slightly different spellings that
defeated my regular expressions.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351648