Lex: Don't restrict legal UCNs when preprocessing assembly

The C and C++ standards disallow using universal character names to
refer to some characters, such as basic ascii and control characters,
so we reject these sequences in the lexer. However, when the
preprocessor isn't being used on C or C++, it doesn't make sense to
apply these restrictions.

Notably, accepting these characters avoids issues with unicode escapes
when GHC uses the compiler as a preprocessor on haskell sources.

Fixes rdar://problem/14742289

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@193067 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Justin Bogner 2013-10-21 05:02:28 +00:00
parent b3b5e6cb65
commit fbfd6426e2
2 changed files with 7 additions and 0 deletions

View File

@ -2730,6 +2730,10 @@ uint32_t Lexer::tryReadUCN(const char *&StartPtr, const char *SlashLoc,
StartPtr = CurPtr;
}
// Don't apply C family restrictions to UCNs in assembly mode
if (LangOpts.AsmPreprocessor)
return CodePoint;
// C99 6.4.3p2: A universal character name shall not specify a character whose
// short identifier is less than 00A0 other than 0024 ($), 0040 (@), or
// 0060 (`), nor one in the range D800 through DFFF inclusive.)

View File

@ -72,6 +72,9 @@
11: T11(b)
// CHECK-Identifiers-True: 11: #0
// Universal character names can specify basic ascii and control characters
12: \u0020\u0030\u0080\u0000
// CHECK-Identifiers-False: 12: \u0020\u0030\u0080\u0000
// This should not crash
// rdar://8823139