This is a missing piece for C99 conformance.
This patch handles UCNs by adding a '\\' case to LexTokenInternal and
LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN.
If the UCN is not syntactically well-formed, we fall back to the old
treatment: a backslash followed by an identifier beginning with 'u' (or 'U').
Because the spelling of an identifier with UCNs still has the UCN in it, we
need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo.
Of course, valid code that does *not* use UCNs will see only a very minimal
performance hit (checks after each identifier for non-ASCII characters,
checks when converting raw_identifiers to identifiers that they do not
contain UCNs, and checks when getting the spelling of an identifier that it
does not contain a UCN).
This patch also adds basic support for actual UTF-8 in the source. This is
treated almost exactly the same as UCNs except that we consider stray
Unicode characters to be mistakes and offer a fixit to remove them.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@173369 91177308-0d34-0410-b5e6-96231b3b80d8
uncovered.
This required manually correcting all of the incorrect main-module
headers I could find, and running the new llvm/utils/sort_includes.py
script over the files.
I also manually added quite a few missing headers that were uncovered by
shuffling the order or moving headers up to be main-module-headers.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@169237 91177308-0d34-0410-b5e6-96231b3b80d8
string literal needs cleaning (because it contains line-splicing in the
encoding prefix or in the ud-suffix), do not clean the section between the
double-quotes -- that's the "raw" bit!
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@168776 91177308-0d34-0410-b5e6-96231b3b80d8
This makes LexCharConstant() look more like LexStringLiteral(), which doesn't
have this bug. Add tests for eof after \ for several other cases.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@168269 91177308-0d34-0410-b5e6-96231b3b80d8
don't recursively continue lexing.
This avoids a stack overflow with a sequence of many empty #includes.
rdar://11988695
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@167801 91177308-0d34-0410-b5e6-96231b3b80d8
undefined behaviour, and move the diagnostic for '' from an Error into
an ExtWarn in this group. This is important for some users of the preprocessor,
and is necessary for gcc compatibility.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@159335 91177308-0d34-0410-b5e6-96231b3b80d8
* Removed docs for Lexer::makeFileCharRange from Lexer.cpp, as they're in
the header file;
* Reworked the documentation for SkipBlockComment so that it doesn't confuse
Doxygen's comment parsing;
* Added another summary with \brief markup.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158618 91177308-0d34-0410-b5e6-96231b3b80d8
1. Teach Lexer that pragma lexers are like macro expansions at EOF.
2. Treat pragmas like #define/#undef when printing.
3. If we just printed a directive, add a newline before any more tokens.
(4. Miscellaneous cleanup in PrintPreprocessedOutput.cpp)
PR10594 and <rdar://problem/11562490> (two separate related problems)
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158571 91177308-0d34-0410-b5e6-96231b3b80d8
modes. For languages other than C99/C11, this isn't quite a conforming
extension, and for C++11, it breaks some reasonable code containing
user-defined literals.
In languages which don't officially have hexfloats, pare back this extension
to only apply in cases where the token starts 0x and does not contain an
underscore. The extension is still not quite conforming, but it's a lot closer
now.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158487 91177308-0d34-0410-b5e6-96231b3b80d8
This was a problem for people who write 'return(result);'
Also fix ARCMT's corresponding code, though there's no test case for this
because implicit casts like this are rejected by the migrator for being
ambiguous, and explicit casts have no problem.
<rdar://problem/11577346>
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@158130 91177308-0d34-0410-b5e6-96231b3b80d8
The member variable is always "LangOpts" and the member function is always "getLangOpts".
Reviewed by Chris Lattner
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152536 91177308-0d34-0410-b5e6-96231b3b80d8
starting with an underscore is ill-formed.
Since this rule rejects programs that were using <inttypes.h>'s macros, recover
from this error by treating the ud-suffix as a separate preprocessing-token,
with a DefaultError ExtWarn. The approach of treating such cases as two tokens
is under discussion for standardization, but is in any case a conforming
extension and allows existing codebases to keep building while the committee
makes up its mind.
Reword the warning on the definition of literal operators not starting with
underscores (which are, strangely, legal) to more explicitly state that such
operators can't be called by literals. Remove the special-case diagnostic for
hexfloats, since it was both triggering in the wrong cases and incorrect.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152287 91177308-0d34-0410-b5e6-96231b3b80d8
identifiers, in cases where those identifiers would be treated as
user-defined literal suffixes in C++11.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152198 91177308-0d34-0410-b5e6-96231b3b80d8
grammar requires a string-literal and not a user-defined-string-literal. The
two constructs are still represented by the same TokenKind, in order to prevent
a combinatorial explosion of different kinds of token. A flag on Token tracks
whether a ud-suffix is present, in order to prevent clients from needing to look
at the token's spelling.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152098 91177308-0d34-0410-b5e6-96231b3b80d8
kinds as the underlying string literals, and we silently drop the ud-suffix;
those issues will be fixed by subsequent patches.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@152012 91177308-0d34-0410-b5e6-96231b3b80d8
of macro arguments.
For "MAC1( MAC2(foo) )" and location of 'foo' token it would return
"MAC1" instead of "MAC2".
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@148704 91177308-0d34-0410-b5e6-96231b3b80d8
start/end location.
It is commonly needed after calling the function; with this way we avoid
recalculating it.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@148479 91177308-0d34-0410-b5e6-96231b3b80d8
re-computed rather than the variables be re-used just after the assert.
Just use the variables since we have them already. Fixes an unused
variable warning.
Also fix an 80-column violation.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@148212 91177308-0d34-0410-b5e6-96231b3b80d8
\<newline><newline>
don't consume the second newline.
Thanks to David Blaikie for pointing out the crash!
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@147138 91177308-0d34-0410-b5e6-96231b3b80d8
This also adds a -Wc++98-compat-pedantic for warning on constructs which would
be diagnosed by -std=c++98 -pedantic (that is, it warns even on C++11 features
which we enable by default, with no warning, in C++98 mode).
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@142034 91177308-0d34-0410-b5e6-96231b3b80d8
Many of the code now under LangOptions::MicrosoftExt will eventually be moved under the LangOptions::MicrosoftMode flag.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@139987 91177308-0d34-0410-b5e6-96231b3b80d8
Previously we would cut off the source file buffer at the code-completion
point; this impeded code-completion inside C++ inline methods and,
recently, with buffering ObjC methods.
Have the code-completion inserted into the source buffer so that it can
be buffered along with a method body. When we actually hit the code-completion
point the cut-off lexing or parsing.
Fixes rdar://10056932&8319466
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@139086 91177308-0d34-0410-b5e6-96231b3b80d8
The function was only counting lines that included tokens and not empty lines,
but MaxLines (mainly initiated to the line where the code-completion point resides)
is a count of overall lines (even empty ones).
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@139085 91177308-0d34-0410-b5e6-96231b3b80d8
loads the named module. The syntax itself is intentionally hideous and
will be replaced at some later point with something more
palatable. For now, we're focusing on the semantics:
- Module imports are handled first by the preprocessor (to get macro
definitions) and then the same tokens are also handled by the parser
(to get declarations). If both happen (as in normal compilation),
the second one is redundant, because we currently have no way to
hide macros or declarations when loading a module. Chris gets credit
for this mad-but-workable scheme.
- The Preprocessor now holds on to a reference to a module loader,
which is responsible for loading named modules. CompilerInstance is
the only important module loader: it now knows how to create and
wire up an AST reader on demand to actually perform the module load.
- We search for modules in the include path, using the module name
with the suffix ".pcm" (precompiled module) for the file name. This
is a temporary hack; we hope to improve the situation in the
future.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@138679 91177308-0d34-0410-b5e6-96231b3b80d8
etc. With this I think essentially all of the SourceManager APIs are
converted. Comments and random other bits of cleanup should be all thats
left.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@136057 91177308-0d34-0410-b5e6-96231b3b80d8
and various other 'expansion' based terms. I've tried to reformat where
appropriate and catch as many references in comments but I'm going to do
several more passes. Also I've tried to expand parameter names to be
more clear where appropriate.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@136056 91177308-0d34-0410-b5e6-96231b3b80d8
FullSourceLoc::getInstantiationLoc to ...::getExpansionLoc. This is part
of the API and documentation update from 'instantiation' as the term for
macros to 'expansion'.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@135914 91177308-0d34-0410-b5e6-96231b3b80d8
source locations from source locations loaded from an AST/PCH file.
Previously, loading an AST/PCH file involved carefully pre-allocating
space at the beginning of the source manager for the source locations
and FileIDs that correspond to the prefix, and then appending the
source locations/FileIDs used for parsing the remaining translation
unit. This design forced us into loading PCH files early, as a prefix,
whic has become a rather significant limitation.
This patch splits the SourceManager space into two parts: for source
location "addresses", the lower values (growing upward) are used to
describe parsed code, while upper values (growing downward) are used
for source locations loaded from AST/PCH files. Similarly, positive
FileIDs are used to describe parsed code while negative FileIDs are
used to file/macro locations loaded from AST/PCH files. As a result,
we can load PCH/AST files even during parsing, making various
improvemnts in the future possible, e.g., teaching #include <foo.h> to
look for and load <foo.h.gch> if it happens to be already available.
This patch was originally written by Sebastian Redl, then brought
forward to the modern age by Jonathan Turner, and finally
polished/finished by me to be committed.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@135484 91177308-0d34-0410-b5e6-96231b3b80d8
'expand'. Also update the public API it provides to the new term, and
propagate that update to the various clients.
No functionality changed.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@135138 91177308-0d34-0410-b5e6-96231b3b80d8
When a macro instantiation occurs, reserve a SLocEntry chunk with length the
full length of the macro definition source. Set the spelling location of this chunk
to point to the start of the macro definition and any tokens that are lexed directly
from the macro definition will get a location from this chunk with the appropriate offset.
For any tokens that come from argument expansion, '##' paste operator, etc. have their
instantiation location point at the appropriate place in the instantiated macro definition
(the argument identifier and the '##' token respectively).
This improves macro instantiation diagnostics:
Before:
t.c:5:9: error: invalid operands to binary expression ('struct S' and 'int')
int y = M(/);
^~~~
t.c:5:11: note: instantiated from:
int y = M(/);
^
After:
t.c:5:9: error: invalid operands to binary expression ('struct S' and 'int')
int y = M(/);
^~~~
t.c:3:20: note: instantiated from:
\#define M(op) (foo op 3);
~~~ ^ ~
t.c:5:11: note: instantiated from:
int y = M(/);
^
The memory savings for a candidate boost library that abuses the preprocessor are:
- 32% less SLocEntries (37M -> 25M)
- 30% reduction in PCH file size (900M -> 635M)
- 50% reduction in memory usage for the SLocEntry table (1.6G -> 800M)
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@134587 91177308-0d34-0410-b5e6-96231b3b80d8
in case we want to make a world where we can check intermediate instantiations
for this kind of breadcrumb.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@127221 91177308-0d34-0410-b5e6-96231b3b80d8
The previous name was inaccurate as this token in fact appears at
the end of every preprocessing directive, not just macro definitions.
No functionality change, except for a diagnostic tweak.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@126631 91177308-0d34-0410-b5e6-96231b3b80d8