Commit Graph

217 Commits

Author SHA1 Message Date
Jordan Rose c7629d9415 Handle universal character names and Unicode characters outside of literals.
This is a missing piece for C99 conformance.

This patch handles UCNs by adding a '\\' case to LexTokenInternal and
LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN.
If the UCN is not syntactically well-formed, we fall back to the old
treatment: a backslash followed by an identifier beginning with 'u' (or 'U').

Because the spelling of an identifier with UCNs still has the UCN in it, we
need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo.

Of course, valid code that does *not* use UCNs will see only a very minimal
performance hit (checks after each identifier for non-ASCII characters,
checks when converting raw_identifiers to identifiers that they do not
contain UCNs, and checks when getting the spelling of an identifier that it
does not contain a UCN).

This patch also adds basic support for actual UTF-8 in the source. This is
treated almost exactly the same as UCNs except that we consider stray
Unicode characters to be mistakes and offer a fixit to remove them.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@173369 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-24 20:50:46 +00:00
Dmitri Gribenko cfa88f8939 Remove useless 'llvm::' qualifier from names like StringRef and others that are
brought into 'clang' namespace by clang/Basic/LLVM.h


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@172323 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-12 19:30:44 +00:00
Argyrios Kyrtzidis d93335c43f Pull the bulk of Lexer::MeasureTokenLength() out into a new function,
Lexer::getRawToken().

No functionality change.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@171771 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-07 19:16:18 +00:00
Richard Smith 80ad52f327 s/CPlusPlus0x/CPlusPlus11/g
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@171367 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-02 11:42:31 +00:00
Chandler Carruth 55fc873017 Sort all of Clang's files under 'lib', and fix up the broken headers
uncovered.

This required manually correcting all of the incorrect main-module
headers I could find, and running the new llvm/utils/sort_includes.py
script over the files.

I also manually added quite a few missing headers that were uncovered by
shuffling the order or moving headers up to be main-module-headers.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@169237 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-04 09:13:33 +00:00
Richard Smith 30cddaec99 Teach Lexer::getSpelling about raw string literals. Specifically, if a raw
string literal needs cleaning (because it contains line-splicing in the
encoding prefix or in the ud-suffix), do not clean the section between the
double-quotes -- that's the "raw" bit!


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@168776 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-28 07:29:00 +00:00
Nico Weber 6d926ae667 Fix crash on end-of-file after \ in a char literal, fixes PR14369.
This makes LexCharConstant() look more like LexStringLiteral(), which doesn't
have this bug. Add tests for eof after \ for several other cases.



git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@168269 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-17 20:25:54 +00:00
Eli Friedman 35a2b798ef Fix an assertion failure printing the unused-label fixit in files using CRLF line endings. <rdar://problem/12639047>.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@167900 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-14 01:28:38 +00:00
Daniel Dunbar 6e64973789 Revert r167801, "[preprocessor] When #including something that contributes no
tokens at all,". This change broke External/Nurbs in LLVM test-suite.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@167858 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-13 19:12:37 +00:00
Nico Weber dd81731d82 UCNs in char literals are done (in LiteralSupport), remove FIXME. Expand UCN FIXME in LexNumericConstant.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@167818 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-13 06:25:15 +00:00
Argyrios Kyrtzidis 4d10b40ea8 [preprocessor] When #including something that contributes no tokens at all,
don't recursively continue lexing.

This avoids a stack overflow with a sequence of many empty #includes.
rdar://11988695

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@167801 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-13 01:03:15 +00:00
Argyrios Kyrtzidis 3185d4ac30 In Lexer::LexTokenInternal, avoid code duplication; no functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@167800 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-13 01:02:40 +00:00
Nico Weber bb23628148 s/BCPLComment/LineComment/
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@167690 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-11 07:02:14 +00:00
Argyrios Kyrtzidis 1cb7142b66 Take into account that there may be a BOM at the beginning of the file,
when computing the size of the precompiled preamble.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@166659 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-25 01:51:45 +00:00
Dmitri Gribenko 374b3837d6 StringRef'ize Preprocessor::CreateString().
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@164555 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-24 21:07:17 +00:00
Roman Divacky 31ba613537 Dont cast away const needlessly. Found by gcc48 -Wcast-qual.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@163325 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-06 15:59:27 +00:00
Eli Friedman e506f8a410 Make a bunch of methods on Lexer private.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@162970 91177308-0d34-0410-b5e6-96231b3b80d8
2012-08-31 02:29:37 +00:00
Dmitri Gribenko 60b202c5eb Lexer: remove dead stores. Found by Clang static analyzer!
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@160973 91177308-0d34-0410-b5e6-96231b3b80d8
2012-07-30 17:59:40 +00:00
Richard Smith b6ebd44902 Add warning flag -Winvalid-pp-token for preprocessing-tokens which have
undefined behaviour, and move the diagnostic for '' from an Error into
an ExtWarn in this group. This is important for some users of the preprocessor,
and is necessary for gcc compatibility.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@159335 91177308-0d34-0410-b5e6-96231b3b80d8
2012-06-28 07:51:56 +00:00
James Dennett ec7699398b Documentation cleanup:
* Removed docs for Lexer::makeFileCharRange from Lexer.cpp, as they're in
  the header file;
* Reworked the documentation for SkipBlockComment so that it doesn't confuse
  Doxygen's comment parsing;
* Added another summary with \brief markup.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158618 91177308-0d34-0410-b5e6-96231b3b80d8
2012-06-17 03:40:43 +00:00
Jordan Rose 0cdd1fe3ec [-E] Emit a rewritten _Pragma on its own line.
1. Teach Lexer that pragma lexers are like macro expansions at EOF.
2. Treat pragmas like #define/#undef when printing.
3. If we just printed a directive, add a newline before any more tokens.
(4. Miscellaneous cleanup in PrintPreprocessedOutput.cpp)

PR10594 and <rdar://problem/11562490> (two separate related problems)

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158571 91177308-0d34-0410-b5e6-96231b3b80d8
2012-06-15 23:33:51 +00:00
James Dennett a05369fbb7 Documentation cleanup: escape backslashes in Doxygen comments.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158552 91177308-0d34-0410-b5e6-96231b3b80d8
2012-06-15 21:36:54 +00:00
Richard Smith d2e95d1538 PR12717: Clang supports hexadecimal floating-point literals in all language
modes. For languages other than C99/C11, this isn't quite a conforming
extension, and for C++11, it breaks some reasonable code containing
user-defined literals.

In languages which don't officially have hexfloats, pare back this extension
to only apply in cases where the token starts 0x and does not contain an
underscore. The extension is still not quite conforming, but it's a lot closer
now.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158487 91177308-0d34-0410-b5e6-96231b3b80d8
2012-06-15 05:07:49 +00:00
David Blaikie 1a8354659a Fix PR13065.
This condition (added in r158093) was overly conservative.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158483 91177308-0d34-0410-b5e6-96231b3b80d8
2012-06-15 00:47:13 +00:00
Dmitri Gribenko 092bf67e5c Correct method name in comment: from LexRawToken to LexFromRawLexer, according
to a change done long ago in r57393.



git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158243 91177308-0d34-0410-b5e6-96231b3b80d8
2012-06-08 23:19:37 +00:00
Jordan Rose d880b3aa6d Insert a space if necessary when suggesting CFBridgingRetain/Release.
This was a problem for people who write 'return(result);'

Also fix ARCMT's corresponding code, though there's no test case for this
because implicit casts like this are rejected by the migrator for being
ambiguous, and explicit casts have no problem.

<rdar://problem/11577346>

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158130 91177308-0d34-0410-b5e6-96231b3b80d8
2012-06-07 01:10:31 +00:00
David Blaikie 8c0b3787e7 Add a -rewrite-includes option, which is similar to -rewrite-macros, but only expands #include directives.
Patch contributed by Lubos Lunak (l.lunax@suse.cz).
Review by Matt Beaumont-Gay (matthewbg@google.com).

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158093 91177308-0d34-0410-b5e6-96231b3b80d8
2012-06-06 18:52:13 +00:00
David Blaikie 80d7c52653 Escape \n and \r in doxycomment.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158091 91177308-0d34-0410-b5e6-96231b3b80d8
2012-06-06 18:43:20 +00:00
Benjamin Kramer 3093b20d82 Lexer::ReadToEndOfLine: Only build the string if it's actually used and do so in a less malloc-intensive way.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@157064 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-18 19:32:16 +00:00
Seth Cantrell 5e6c3f0397 Support -Wc++98-compat-pedantic as requested:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20120409/056126.html

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@154655 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-13 03:43:23 +00:00
Seth Cantrell d55522287e C++11 no longer requires files to end with a newline
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@154643 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-13 01:00:34 +00:00
Francois Pichet b0afd5df3c ext_reserved_user_defined_literal must not default to Error in MicrosoftMode. Hence create ext_ms_reserved_user_defined_literal that doesn't default to Error; otherwise MSVC headers won't parse.
Fixes PR12383.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@154273 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-07 23:09:23 +00:00
David Blaikie 4e4d08403c Unify naming of LangOptions variable/get function across the Clang stack (Lex to AST).
The member variable is always "LangOpts" and the member function is always "getLangOpts".

Reviewed by Chris Lattner

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152536 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-11 07:00:24 +00:00
Richard Smith 2fb4ae3682 Implement C++11 [lex.ext]p10 for string and character literals: a ud-suffix not
starting with an underscore is ill-formed.

Since this rule rejects programs that were using <inttypes.h>'s macros, recover
from this error by treating the ud-suffix as a separate preprocessing-token,
with a DefaultError ExtWarn. The approach of treating such cases as two tokens
is under discussion for standardization, but is in any case a conforming
extension and allows existing codebases to keep building while the committee
makes up its mind.

Reword the warning on the definition of literal operators not starting with
underscores (which are, strangely, legal) to more explicitly state that such
operators can't be called by literals. Remove the special-case diagnostic for
hexfloats, since it was both triggering in the wrong cases and incorrect.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152287 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-08 02:39:21 +00:00
Richard Smith e816c717d4 Add -Wc++11-compat warning for string and character literals followed by
identifiers, in cases where those identifiers would be treated as
user-defined literal suffixes in C++11.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152198 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-07 03:13:00 +00:00
Richard Smith 99831e4677 User-defined literals: reject string and character UDLs in all places where the
grammar requires a string-literal and not a user-defined-string-literal. The
two constructs are still represented by the same TokenKind, in order to prevent
a combinatorial explosion of different kinds of token. A flag on Token tracks
whether a ud-suffix is present, in order to prevent clients from needing to look
at the token's spelling.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152098 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-06 03:21:47 +00:00
Richard Smith 5cc2c6eb67 Lexing support for user-defined literals. Currently these lex as the same token
kinds as the underlying string literals, and we silently drop the ud-suffix;
those issues will be fixed by subsequent patches.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152012 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-05 04:02:15 +00:00
Argyrios Kyrtzidis a83f4d2315 Change Lexer::makeFileCharRange() to have it accept a CharSourceRange
instead of a SourceRange, and handle the case where the range is
a char (not token) range.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@149677 91177308-0d34-0410-b5e6-96231b3b80d8
2012-02-03 05:58:29 +00:00
Argyrios Kyrtzidis 7f6cf9764b Improve Lexer::getImmediateMacroName to take into account inner macros
of macro arguments.

For "MAC1( MAC2(foo) )" and location of 'foo' token it would return
"MAC1" instead of "MAC2".

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@148704 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-23 16:58:33 +00:00
Argyrios Kyrtzidis d9806c912a Enhance Lexer::makeFileCharRange to check for ranges inside a macro argument
expansion, in which case it returns a file range in the location where the
argument was spelled.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@148551 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-20 16:52:43 +00:00
Argyrios Kyrtzidis e64d903765 Introduce Lexer::getSourceText() that returns a string for the source
that the given source range encompasses.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@148481 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-19 15:59:19 +00:00
Argyrios Kyrtzidis 11b652d41d Introduce Lexer::makeFileCharRange() that accepts a token source range
and returns a character range with file locations.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@148480 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-19 15:59:14 +00:00
Argyrios Kyrtzidis 69bda4c027 For Lexer's isAt[Start/End]OfMacroExpansion add an out parameter for the macro
start/end location.

It is commonly needed after calling the function; with this way we avoid
recalculating it.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@148479 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-19 15:59:08 +00:00
Anna Zaks c2a8d6cee0 Refactor: Pull getImmediateMacroName() out of DiagnosticRenderer and
into Lexer and Preprocessor; making it widely available.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@148410 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-18 20:17:16 +00:00
Chandler Carruth ae9f85b2c0 Two variables had been added for an assert, but their values were
re-computed rather than the variables be re-used just after the assert.
Just use the variables since we have them already. Fixes an unused
variable warning.

Also fix an 80-column violation.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@148212 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-15 09:03:45 +00:00
Argyrios Kyrtzidis 04a94bcc56 In Lexer::getCharAndSizeSlow[NoWarn] if we come up against
\<newline><newline>

don't consume the second newline.

Thanks to David Blaikie for pointing out the crash!

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@147138 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-22 04:38:07 +00:00
Argyrios Kyrtzidis f132dcaae8 In Lexer::getCharAndSizeSlow[NoWarn] make sure we don't go over the end of the buffer
when the end of the buffer is immediately after an escaped newline.

Fixes http://llvm.org/PR10153.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@147091 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-21 20:19:55 +00:00
David Blaikie 99ba9e3bd7 Unweaken vtables as per http://llvm.org/docs/CodingStandards.html#ll_virtual_anch
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@146959 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-20 02:48:34 +00:00
Benjamin Kramer 6300f5b438 Remove assert from hot code path and add a clarifying comment.
The assert wasn't adding much value but slowed down Release+Asserts builds.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@145082 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-22 20:39:31 +00:00
Benjamin Kramer 3f6f4e65e7 Lexer: Don't throw away the hard work SSE did to find a slash.
We can reuse the information and avoid looping over all the bytes again.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@145070 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-22 18:56:46 +00:00