All east asian width wide and full-width codepoints
are considered double width, as well as emojis and
symbols commonely rendered as emoji.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D138518
Unicode 15.0 adds 4,489 characters, for a total of 149,186 characters.
These additions include 2 new scripts along with 20 new emoji characters,
and 4,193 CJK ideographs.
This changes modify most existing tables including
- XID_Start/XID_Continue in Clang
- The character name database (used by \N{} in Clang)
- The list of formattable/printable codepoints
- The case folding algorithm (which we had not updated since Unicode 9)
- The list of nonspacing/enclosing marks used by the column width
computation algorithm. The rest of the column width algorithm
is not updated.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D133807
Implements [[ https://wg21.link/p2071r1 | P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch not warn in c++23 mode will be done later once this paper is plenary approved (in July).
We add
* A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)`
* A set of functions in `Unicode.h` to query that data, including
* A function to find an exact match of a given Unicode character name
* A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching
* A function returning the best matching codepoint for a given string per edit distance
* Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits
* Support of `\N{}` as UCN with loose matching diagnostics/fixits.
Loose matching is considered an error to match closely the semantics of P2071.
The generated data contributes to 280kB of data to the binaries.
`UnicodeData.txt` and `NameAliases.txt` are not committed to the repository in this patch, and regenerating the data is a manual process.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D123064
The error messages in tests are far better when a test fails if the test
is written using ASSERT_/EXPECT_<operator>(A, B) rather than
ASSERT_/EXPECT_TRUE(A <operator> B).
This commit updates all of llvm/unittests/Support to use these macros
where possible.
This change has not been possible in:
- llvm/unittests/Support/FSUniqueIDTest.cpp - due to not overloading
operators beyond ==, != and <.
- llvm/unittests/Support/BranchProbabilityTest.cpp - where the unchanged
tests are of the operator overloads themselves.
There are other possibilities of this conversion not being valid, which
have not applied in these tests, as they do not use NULL (they use
nullptr), and they do not use const char* (they use std::string or
StringRef).
Reviewed By: mubashar_
Differential Revision: https://reviews.llvm.org/D117319
Use a fast path for column width computation for ascii characters. Especially
relevant for llvm-objdump.
before:
% time ./bin/llvm-objdump -D -j .text /lib/libc.so.6 >/dev/null
./bin/llvm-objdump -D -j .text /lib/libc.so.6 > /dev/null 0.75s user 0.01s system 99% cpu 0.757 total
after:
% time ./bin/llvm-objdump -D -j .text /lib/libc.so.6 >/dev/null
./bin/llvm-objdump -D -j .text /lib/libc.so.6 > /dev/null 0.37s user 0.01s system 99% cpu 0.378 total
Differential Revision: https://reviews.llvm.org/D92180
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
This is needed so we can use generic columnWidthUTF8 in clang-format on
win32 simultaneously with a separate system-dependent implementations of
isPrint/columnWidth in TextDiagnostic.cpp to avoid attempts to print Unicode
characters using narrow-character interfaces (which is not supported on Windows,
and we'll have to figure out how to handle this).
Reviewers: jordan_rose
Reviewed By: jordan_rose
CC: llvm-commits, klimek
Differential Revision: http://llvm-reviews.chandlerc.com/D1559
llvm-svn: 189952