llvm-project

Commit Graph

Author	SHA1	Message	Date
Corentin Jabot	9fec67483d	Encode columnWidthUTF8 tests as UTF-8 sequences. Some platforms do not encode string literals as UTF-8 when building llvm	2022-11-28 15:56:12 +01:00
Corentin Jabot	2903769bf5	Update the list of double width codepoints All east asian width wide and full-width codepoints are considered double width, as well as emojis and symbols commonely rendered as emoji. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D138518	2022-11-28 15:13:37 +01:00
Corentin Jabot	c932cef32a	Update Unicode to 15.0 Unicode 15.0 adds 4,489 characters, for a total of 149,186 characters. These additions include 2 new scripts along with 20 new emoji characters, and 4,193 CJK ideographs. This changes modify most existing tables including - XID_Start/XID_Continue in Clang - The character name database (used by \N{} in Clang) - The list of formattable/printable codepoints - The case folding algorithm (which we had not updated since Unicode 9) - The list of nonspacing/enclosing marks used by the column width computation algorithm. The rest of the column width algorithm is not updated. Reviewed By: tahonermann Differential Revision: https://reviews.llvm.org/D133807	2022-09-22 05:03:01 +02:00
Fangrui Song	67854f9ed0	Use value_or instead of getValueOr. NFC	2022-06-29 21:55:02 -07:00
Corentin Jabot	c92056d038	[Clang][C++23] P2071 Named universal character escapes Implements [[ https://wg21.link/p2071r1 \| P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch not warn in c++23 mode will be done later once this paper is plenary approved (in July). We add * A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)` * A set of functions in `Unicode.h` to query that data, including * A function to find an exact match of a given Unicode character name * A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching * A function returning the best matching codepoint for a given string per edit distance * Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits * Support of `\N{}` as UCN with loose matching diagnostics/fixits. Loose matching is considered an error to match closely the semantics of P2071. The generated data contributes to 280kB of data to the binaries. `UnicodeData.txt` and `NameAliases.txt` are not committed to the repository in this patch, and regenerating the data is a manual process. Reviewed By: tahonermann Differential Revision: https://reviews.llvm.org/D123064	2022-06-25 19:03:33 +02:00
Archibald Elliott	38ac4093d9	[NFCI][Support] Avoid ASSERT_/EXPECT_TRUE(A <op> B) The error messages in tests are far better when a test fails if the test is written using ASSERT_/EXPECT_<operator>(A, B) rather than ASSERT_/EXPECT_TRUE(A <operator> B). This commit updates all of llvm/unittests/Support to use these macros where possible. This change has not been possible in: - llvm/unittests/Support/FSUniqueIDTest.cpp - due to not overloading operators beyond ==, != and <. - llvm/unittests/Support/BranchProbabilityTest.cpp - where the unchanged tests are of the operator overloads themselves. There are other possibilities of this conversion not being valid, which have not applied in these tests, as they do not use NULL (they use nullptr), and they do not use const char* (they use std::string or StringRef). Reviewed By: mubashar_ Differential Revision: https://reviews.llvm.org/D117319	2022-01-21 13:15:04 +00:00
serge-sans-paille	9501419e87	Speedup some unicode rendering Use a fast path for column width computation for ascii characters. Especially relevant for llvm-objdump. before: % time ./bin/llvm-objdump -D -j .text /lib/libc.so.6 >/dev/null ./bin/llvm-objdump -D -j .text /lib/libc.so.6 > /dev/null 0.75s user 0.01s system 99% cpu 0.757 total after: % time ./bin/llvm-objdump -D -j .text /lib/libc.so.6 >/dev/null ./bin/llvm-objdump -D -j .text /lib/libc.so.6 > /dev/null 0.37s user 0.01s system 99% cpu 0.378 total Differential Revision: https://reviews.llvm.org/D92180	2020-12-03 20:11:11 +01:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Alexander Kornienko	9aa60fd6f8	Move generic isPrint and columnWidth implementations to a separate header/source to allow using both generic and system-dependent versions on win32. Summary: This is needed so we can use generic columnWidthUTF8 in clang-format on win32 simultaneously with a separate system-dependent implementations of isPrint/columnWidth in TextDiagnostic.cpp to avoid attempts to print Unicode characters using narrow-character interfaces (which is not supported on Windows, and we'll have to figure out how to handle this). Reviewers: jordan_rose Reviewed By: jordan_rose CC: llvm-commits, klimek Differential Revision: http://llvm-reviews.chandlerc.com/D1559 llvm-svn: 189952	2013-09-04 16:00:12 +00:00

9 Commits