llvm-project

Commit Graph

Author	SHA1	Message	Date
Corentin Jabot	6882ca9aff	[Clang] Adjust extension warnings for delimited sequences WG21 approved delimited escape sequences and named escape sequences. Adjust the extension warnings accordingly, and update the release notes. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D129664	2022-07-14 07:50:58 +02:00
Corentin Jabot	c92056d038	[Clang][C++23] P2071 Named universal character escapes Implements [[ https://wg21.link/p2071r1 \| P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch not warn in c++23 mode will be done later once this paper is plenary approved (in July). We add * A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)` * A set of functions in `Unicode.h` to query that data, including * A function to find an exact match of a given Unicode character name * A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching * A function returning the best matching codepoint for a given string per edit distance * Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits * Support of `\N{}` as UCN with loose matching diagnostics/fixits. Loose matching is considered an error to match closely the semantics of P2071. The generated data contributes to 280kB of data to the binaries. `UnicodeData.txt` and `NameAliases.txt` are not committed to the repository in this patch, and regenerating the data is a manual process. Reviewed By: tahonermann Differential Revision: https://reviews.llvm.org/D123064	2022-06-25 19:03:33 +02:00
Corentin Jabot	274adcb866	Implement delimited escape sequences. \x{XXXX} \u{XXXX} and \o{OOOO} are accepted in all languages mode in characters and string literals. This is a feature proposed for both C++ (P2290R1) and C (N2785). The papers have been seen by both committees but are not yet adopted into either standard. However, they do have support from both committees.	2021-09-15 09:54:49 -04:00
Scott Egerton	a90ea38698	[Lexer] Allow UCN for dollar symbol '\u0024' in identifiers when using -fdollars-in-identifiers flag. Summary: Previously, the -fdollars-in-identifiers flag allows the '$' symbol to be used in an identifier but the universal character name equivalent '\u0024' is not allowed. This patch changes this, so that \u0024 is valid in identifiers. Reviewers: rsmith, jordan_rose Reviewed By: rsmith Subscribers: dexonsmith, simoncook, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D71758	2020-01-15 11:28:57 +00:00
Alp Toker	b05e0b53b9	Preprocessor: support defined() with operator names for MS compatibility Also flesh out missing tests, improve diagnostic QOI and fix a couple of corner cases found in the process. Fixes PR10606. llvm-svn: 209276	2014-05-21 06:13:51 +00:00
Rafael Espindola	925213b0fa	Add 'not' to commands that are expected to fail. This is at least good documentation, but also opens the possibility of using pipefail. llvm-svn: 185652	2013-07-04 16:16:58 +00:00
Jordan Rose	d4960d3eb1	Test fix-it ranges for Unicode characters. Also, remove stray -fdiagnostics-parseable-fixits from ucn-pp-identifier. llvm-svn: 173373	2013-01-24 21:19:51 +00:00
Jordan Rose	62db5066e9	Add a fixit for \U1234 -> \u1234. llvm-svn: 173371	2013-01-24 20:50:52 +00:00
Jordan Rose	7f43dddae0	Handle universal character names and Unicode characters outside of literals. This is a missing piece for C99 conformance. This patch handles UCNs by adding a '\\' case to LexTokenInternal and LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN. If the UCN is not syntactically well-formed, we fall back to the old treatment: a backslash followed by an identifier beginning with 'u' (or 'U'). Because the spelling of an identifier with UCNs still has the UCN in it, we need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo. Of course, valid code that does not use UCNs will see only a very minimal performance hit (checks after each identifier for non-ASCII characters, checks when converting raw_identifiers to identifiers that they do not contain UCNs, and checks when getting the spelling of an identifier that it does not contain a UCN). This patch also adds basic support for actual UTF-8 in the source. This is treated almost exactly the same as UCNs except that we consider stray Unicode characters to be mistakes and offer a fixit to remove them. llvm-svn: 173369	2013-01-24 20:50:46 +00:00

9 Commits