All east asian width wide and full-width codepoints
are considered double width, as well as emojis and
symbols commonely rendered as emoji.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D138518
Unicode 15.0 adds 4,489 characters, for a total of 149,186 characters.
These additions include 2 new scripts along with 20 new emoji characters,
and 4,193 CJK ideographs.
This changes modify most existing tables including
- XID_Start/XID_Continue in Clang
- The character name database (used by \N{} in Clang)
- The list of formattable/printable codepoints
- The case folding algorithm (which we had not updated since Unicode 9)
- The list of nonspacing/enclosing marks used by the column width
computation algorithm. The rest of the column width algorithm
is not updated.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D133807
Instead of dumping the string literal (which
quotes it and escape every non-ascii symbol),
we can use the content of the string when it is a
8 byte string.
Wide, UTF-8/UTF-16/32 strings are still completely
escaped, until we clarify how these entities should
behave (cf https://wg21.link/p2361).
`FormatDiagnostic` is modified to escape
non printable characters and invalid UTF-8.
This ensures that unicode characters, spaces and new
lines are properly rendered in static messages.
This make clang more consistent with other implementation
and fixes this tweet
https://twitter.com/jfbastien/status/1298307325443231744 :)
Of note, `PaddingChecker` did print out new lines that were
later removed by the diagnostic printing code.
To be consistent with its tests, the new lines are removed
from the diagnostic.
Unicode tables updated to both use the Unicode definitions
and the Unicode 14.0 data.
U+00AD SOFT HYPHEN is still considered a print character
to match existing practices in terminals, in addition of
being considered a formatting character as per Unicode.
Reviewed By: aaron.ballman, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D108469
Instead of dumping the string literal (which
quotes it and escape every non-ascii symbol),
we can use the content of the string when it is a
8 byte string.
Wide, UTF-8/UTF-16/32 strings are still completely
escaped, until we clarify how these entities should
behave (cf https://wg21.link/p2361).
`FormatDiagnostic` is modified to escape
non printable characters and invalid UTF-8.
This ensures that unicode characters, spaces and new
lines are properly rendered in static messages.
This make clang more consistent with other implementation
and fixes this tweet
https://twitter.com/jfbastien/status/1298307325443231744 :)
Of note, `PaddingChecker` did print out new lines that were
later removed by the diagnostic printing code.
To be consistent with its tests, the new lines are removed
from the diagnostic.
Unicode tables updated to both use the Unicode definitions
and the Unicode 14.0 data.
U+00AD SOFT HYPHEN is still considered a print character
to match existing practices in terminals, in addition of
being considered a formatting character as per Unicode.
Reviewed By: aaron.ballman, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D108469
Use a fast path for column width computation for ascii characters. Especially
relevant for llvm-objdump.
before:
% time ./bin/llvm-objdump -D -j .text /lib/libc.so.6 >/dev/null
./bin/llvm-objdump -D -j .text /lib/libc.so.6 > /dev/null 0.75s user 0.01s system 99% cpu 0.757 total
after:
% time ./bin/llvm-objdump -D -j .text /lib/libc.so.6 >/dev/null
./bin/llvm-objdump -D -j .text /lib/libc.so.6 > /dev/null 0.37s user 0.01s system 99% cpu 0.378 total
Differential Revision: https://reviews.llvm.org/D92180
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
This is needed so we can use generic columnWidthUTF8 in clang-format on
win32 simultaneously with a separate system-dependent implementations of
isPrint/columnWidth in TextDiagnostic.cpp to avoid attempts to print Unicode
characters using narrow-character interfaces (which is not supported on Windows,
and we'll have to figure out how to handle this).
Reviewers: jordan_rose
Reviewed By: jordan_rose
CC: llvm-commits, klimek
Differential Revision: http://llvm-reviews.chandlerc.com/D1559
llvm-svn: 189952