This is a missing piece for C99 conformance.
This patch handles UCNs by adding a '\\' case to LexTokenInternal and
LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN.
If the UCN is not syntactically well-formed, we fall back to the old
treatment: a backslash followed by an identifier beginning with 'u' (or 'U').
Because the spelling of an identifier with UCNs still has the UCN in it, we
need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo.
Of course, valid code that does *not* use UCNs will see only a very minimal
performance hit (checks after each identifier for non-ASCII characters,
checks when converting raw_identifiers to identifiers that they do not
contain UCNs, and checks when getting the spelling of an identifier that it
does not contain a UCN).
This patch also adds basic support for actual UTF-8 in the source. This is
treated almost exactly the same as UCNs except that we consider stray
Unicode characters to be mistakes and offer a fixit to remove them.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@173369 91177308-0d34-0410-b5e6-96231b3b80d8
uncovered.
This required manually correcting all of the incorrect main-module
headers I could find, and running the new llvm/utils/sort_includes.py
script over the files.
I also manually added quite a few missing headers that were uncovered by
shuffling the order or moving headers up to be main-module-headers.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@169237 91177308-0d34-0410-b5e6-96231b3b80d8
string literal needs cleaning (because it contains line-splicing in the
encoding prefix or in the ud-suffix), do not clean the section between the
double-quotes -- that's the "raw" bit!
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@168776 91177308-0d34-0410-b5e6-96231b3b80d8
This makes LexCharConstant() look more like LexStringLiteral(), which doesn't
have this bug. Add tests for eof after \ for several other cases.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@168269 91177308-0d34-0410-b5e6-96231b3b80d8
don't recursively continue lexing.
This avoids a stack overflow with a sequence of many empty #includes.
rdar://11988695
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@167801 91177308-0d34-0410-b5e6-96231b3b80d8
undefined behaviour, and move the diagnostic for '' from an Error into
an ExtWarn in this group. This is important for some users of the preprocessor,
and is necessary for gcc compatibility.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@159335 91177308-0d34-0410-b5e6-96231b3b80d8
* Removed docs for Lexer::makeFileCharRange from Lexer.cpp, as they're in
the header file;
* Reworked the documentation for SkipBlockComment so that it doesn't confuse
Doxygen's comment parsing;
* Added another summary with \brief markup.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158618 91177308-0d34-0410-b5e6-96231b3b80d8
1. Teach Lexer that pragma lexers are like macro expansions at EOF.
2. Treat pragmas like #define/#undef when printing.
3. If we just printed a directive, add a newline before any more tokens.
(4. Miscellaneous cleanup in PrintPreprocessedOutput.cpp)
PR10594 and <rdar://problem/11562490> (two separate related problems)
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158571 91177308-0d34-0410-b5e6-96231b3b80d8
modes. For languages other than C99/C11, this isn't quite a conforming
extension, and for C++11, it breaks some reasonable code containing
user-defined literals.
In languages which don't officially have hexfloats, pare back this extension
to only apply in cases where the token starts 0x and does not contain an
underscore. The extension is still not quite conforming, but it's a lot closer
now.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158487 91177308-0d34-0410-b5e6-96231b3b80d8
This was a problem for people who write 'return(result);'
Also fix ARCMT's corresponding code, though there's no test case for this
because implicit casts like this are rejected by the migrator for being
ambiguous, and explicit casts have no problem.
<rdar://problem/11577346>
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158130 91177308-0d34-0410-b5e6-96231b3b80d8
The member variable is always "LangOpts" and the member function is always "getLangOpts".
Reviewed by Chris Lattner
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152536 91177308-0d34-0410-b5e6-96231b3b80d8
starting with an underscore is ill-formed.
Since this rule rejects programs that were using <inttypes.h>'s macros, recover
from this error by treating the ud-suffix as a separate preprocessing-token,
with a DefaultError ExtWarn. The approach of treating such cases as two tokens
is under discussion for standardization, but is in any case a conforming
extension and allows existing codebases to keep building while the committee
makes up its mind.
Reword the warning on the definition of literal operators not starting with
underscores (which are, strangely, legal) to more explicitly state that such
operators can't be called by literals. Remove the special-case diagnostic for
hexfloats, since it was both triggering in the wrong cases and incorrect.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152287 91177308-0d34-0410-b5e6-96231b3b80d8
identifiers, in cases where those identifiers would be treated as
user-defined literal suffixes in C++11.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152198 91177308-0d34-0410-b5e6-96231b3b80d8
grammar requires a string-literal and not a user-defined-string-literal. The
two constructs are still represented by the same TokenKind, in order to prevent
a combinatorial explosion of different kinds of token. A flag on Token tracks
whether a ud-suffix is present, in order to prevent clients from needing to look
at the token's spelling.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152098 91177308-0d34-0410-b5e6-96231b3b80d8
kinds as the underlying string literals, and we silently drop the ud-suffix;
those issues will be fixed by subsequent patches.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152012 91177308-0d34-0410-b5e6-96231b3b80d8
of macro arguments.
For "MAC1( MAC2(foo) )" and location of 'foo' token it would return
"MAC1" instead of "MAC2".
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@148704 91177308-0d34-0410-b5e6-96231b3b80d8
start/end location.
It is commonly needed after calling the function; with this way we avoid
recalculating it.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@148479 91177308-0d34-0410-b5e6-96231b3b80d8
re-computed rather than the variables be re-used just after the assert.
Just use the variables since we have them already. Fixes an unused
variable warning.
Also fix an 80-column violation.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@148212 91177308-0d34-0410-b5e6-96231b3b80d8
\<newline><newline>
don't consume the second newline.
Thanks to David Blaikie for pointing out the crash!
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@147138 91177308-0d34-0410-b5e6-96231b3b80d8