llvm-project

Commit Graph

Author	SHA1	Message	Date
Corentin Jabot	c92056d038	[Clang][C++23] P2071 Named universal character escapes Implements [[ https://wg21.link/p2071r1 \| P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch not warn in c++23 mode will be done later once this paper is plenary approved (in July). We add * A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)` * A set of functions in `Unicode.h` to query that data, including * A function to find an exact match of a given Unicode character name * A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching * A function returning the best matching codepoint for a given string per edit distance * Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits * Support of `\N{}` as UCN with loose matching diagnostics/fixits. Loose matching is considered an error to match closely the semantics of P2071. The generated data contributes to 280kB of data to the binaries. `UnicodeData.txt` and `NameAliases.txt` are not committed to the repository in this patch, and regenerating the data is a manual process. Reviewed By: tahonermann Differential Revision: https://reviews.llvm.org/D123064	2022-06-25 19:03:33 +02:00
Corentin Jabot	274adcb866	Implement delimited escape sequences. \x{XXXX} \u{XXXX} and \o{OOOO} are accepted in all languages mode in characters and string literals. This is a feature proposed for both C++ (P2290R1) and C (N2785). The papers have been seen by both committees but are not yet adopted into either standard. However, they do have support from both committees.	2021-09-15 09:54:49 -04:00
Corentin Jabot	4e80636db7	Implement P1949 This adds the Unicode 13 data for XID_Start and XID_Continue. The definition of valid identifier is changed in all C++ modes as P1949 (https://wg21.link/p1949) was accepted by WG21 as a defect report.	2021-08-18 07:33:14 -04:00
Richard Smith	e87aeb378d	When pretty-printing a C++11 literal operator, don't insert whitespace between the "" and the suffix; that breaks names such as 'operator""if'. For symmetry, also remove the space between the 'operator' and the '""'. llvm-svn: 249641	2015-10-08 00:17:59 +00:00
Alp Toker	b05e0b53b9	Preprocessor: support defined() with operator names for MS compatibility Also flesh out missing tests, improve diagnostic QOI and fix a couple of corner cases found in the process. Fixes PR10606. llvm-svn: 209276	2014-05-21 06:13:51 +00:00
Richard Smith	4ee696d55c	PR18870: Parse language linkage specifiers properly if the string-literal is spelled in an interesting way. llvm-svn: 201536	2014-02-17 23:25:27 +00:00
Richard Smith	8b7258bdb3	PR18855: Add support for UCNs and UTF-8 encoding within ud-suffixes. llvm-svn: 201532	2014-02-17 21:52:30 +00:00
Richard Smith	6f21206850	DR1473: Do not require a space between operator"" and the ud-suffix in a literal-operator-id. llvm-svn: 166373	2012-10-20 08:41:10 +00:00
Richard Smith	bcc22fc4e1	Support for raw and template forms of numeric user-defined literals, and lots of tidying up. llvm-svn: 152392	2012-03-09 08:00:36 +00:00
Richard Smith	7d182a7909	Fix a couple of issues with literal-operator-id parsing, and provide recovery for a few kinds of error. Specifically: Since we're after translation phase 6, the "" token might be formed by multiple source-level string literals. Checking the token width is not a correct way of detecting empty string literals, due to escaped newlines. Diagnose and recover from a missing space between "" and suffix, and from string literals other than "", which are followed by a suffix. llvm-svn: 152348	2012-03-08 23:06:02 +00:00
Richard Smith	39570d0020	Add support for cooked forms of user-defined-integer-literal and user-defined-floating-literal. Support for raw forms of these literals to follow. llvm-svn: 152302	2012-03-08 08:45:32 +00:00
Richard Smith	75b67d6dc5	User-defined literal support for character literals. llvm-svn: 152277	2012-03-08 01:34:56 +00:00
Richard Smith	c67fdd4eb9	AST representation for user-defined literals, plus just enough of semantic analysis to make the AST representation testable. They are represented by a new UserDefinedLiteral AST node, which is a sugared CallExpr. All semantic properties, including full CodeGen support, are achieved for free by this representation. UserDefinedLiterals can never be dependent, so no custom instantiation behavior is required. They are mangled as if they were direct calls to the underlying literal operator. This matches g++'s apparent behavior (but not its actual mangling, which is broken for literal-operator-ids). User-defined string literals are now fully-operational, but the semantic analysis is quite hacky and needs more work. No other forms of user-defined literal are created yet, but the AST support for them is present. This patch committed after midnight because we had already hit the quota for new kinds of literal yesterday. llvm-svn: 152211	2012-03-07 08:35:16 +00:00
Richard Smith	d67aea28f6	User-defined literals: reject string and character UDLs in all places where the grammar requires a string-literal and not a user-defined-string-literal. The two constructs are still represented by the same TokenKind, in order to prevent a combinatorial explosion of different kinds of token. A flag on Token tracks whether a ud-suffix is present, in order to prevent clients from needing to look at the token's spelling. llvm-svn: 152098	2012-03-06 03:21:47 +00:00

14 Commits