llvm-project

Commit Graph

Author	SHA1	Message	Date
Corentin Jabot	6882ca9aff	[Clang] Adjust extension warnings for delimited sequences WG21 approved delimited escape sequences and named escape sequences. Adjust the extension warnings accordingly, and update the release notes. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D129664	2022-07-14 07:50:58 +02:00
Corentin Jabot	c92056d038	[Clang][C++23] P2071 Named universal character escapes Implements [[ https://wg21.link/p2071r1 \| P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch not warn in c++23 mode will be done later once this paper is plenary approved (in July). We add * A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)` * A set of functions in `Unicode.h` to query that data, including * A function to find an exact match of a given Unicode character name * A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching * A function returning the best matching codepoint for a given string per edit distance * Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits * Support of `\N{}` as UCN with loose matching diagnostics/fixits. Loose matching is considered an error to match closely the semantics of P2071. The generated data contributes to 280kB of data to the binaries. `UnicodeData.txt` and `NameAliases.txt` are not committed to the repository in this patch, and regenerating the data is a manual process. Reviewed By: tahonermann Differential Revision: https://reviews.llvm.org/D123064	2022-06-25 19:03:33 +02:00
Aaron Ballman	22db4824b9	Use functions with prototypes when appropriate; NFC A significant number of our tests in C accidentally use functions without prototypes. This patch converts the function signatures to have a prototype for the situations where the test is not specific to K&R C declarations. e.g., void func(); becomes void func(void); This is the third batch of tests being updated (there are a significant number of other tests left to be updated).	2022-02-07 09:25:01 -05:00
Corentin Jabot	274adcb866	Implement delimited escape sequences. \x{XXXX} \u{XXXX} and \o{OOOO} are accepted in all languages mode in characters and string literals. This is a feature proposed for both C++ (P2290R1) and C (N2785). The papers have been seen by both committees but are not yet adopted into either standard. However, they do have support from both committees.	2021-09-15 09:54:49 -04:00
Jordan Rose	7f43dddae0	Handle universal character names and Unicode characters outside of literals. This is a missing piece for C99 conformance. This patch handles UCNs by adding a '\\' case to LexTokenInternal and LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN. If the UCN is not syntactically well-formed, we fall back to the old treatment: a backslash followed by an identifier beginning with 'u' (or 'U'). Because the spelling of an identifier with UCNs still has the UCN in it, we need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo. Of course, valid code that does not use UCNs will see only a very minimal performance hit (checks after each identifier for non-ASCII characters, checks when converting raw_identifiers to identifiers that they do not contain UCNs, and checks when getting the spelling of an identifier that it does not contain a UCN). This patch also adds basic support for actual UTF-8 in the source. This is treated almost exactly the same as UCNs except that we consider stray Unicode characters to be mistakes and offer a fixit to remove them. llvm-svn: 173369	2013-01-24 20:50:46 +00:00

5 Commits