12 Commits

Author SHA1 Message Date
cor3ntin
03e43cf1c7
[Clang] Update Unicode version to 15.1 (#77147)
This update all of our Unicode tables to Unicode 15.1. This is a minor
version so only a relatively small numbers of characters are added,
mainly ideographs

https://www.unicode.org/versions/Unicode15.1.0/#Appendices_nb
2024-01-17 22:55:58 +01:00
Corentin Jabot
31f4859c3e [Clang] Allow additional mathematical symbols in identifiers.
Implement the proposed UAX Profile
"Mathematical notation profile for default identifiers".

This implements a not-yet approved Unicode for a vetted
UAX31 identifier profile
https://www.unicode.org/L2/L2022/22230-math-profile.pdf

This change mitigates the reported disruption caused
by the implementation of UAX31 in C++ and C2x,
as these mathematical symbols are commonly used in the
scientific community.

Fixes #54732

Reviewed By: tahonermann, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D137051
2022-12-16 10:20:49 +01:00
Corentin Jabot
c932cef32a Update Unicode to 15.0
Unicode 15.0 adds 4,489 characters, for a total of 149,186 characters.
These additions include 2 new scripts along with 20 new emoji characters,
and 4,193 CJK ideographs.

This changes modify most existing tables including
 - XID_Start/XID_Continue in Clang
 - The character name database (used by \N{} in Clang)
 - The list of formattable/printable codepoints
 - The case folding algorithm (which we had not updated since Unicode 9)
 - The list of nonspacing/enclosing marks used by the column width
   computation algorithm. The rest of the column width algorithm
   is not updated.

Reviewed By: tahonermann

Differential Revision: https://reviews.llvm.org/D133807
2022-09-22 05:03:01 +02:00
Corentin Jabot
afb6223bc5 Support Unicode 14 identifiers
This update the UAX tables to support new Unicode 14 identifiers.
2021-09-16 13:21:27 -04:00
Corentin Jabot
4e80636db7 Implement P1949
This adds the Unicode 13 data for XID_Start and XID_Continue.
The definition of valid identifier is changed in all C++ modes
as P1949 (https://wg21.link/p1949) was accepted by WG21 as a defect
report.
2021-08-18 07:33:14 -04:00
Nico Weber
3ad6cea9bb clang: Fix typo in comment
llvm-svn: 369539
2019-08-21 15:41:29 +00:00
Chandler Carruth
2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Benjamin Kramer
2f5db8b3db Header guard canonicalization, clang part.
Modifications made by clang-tidy with minor tweaks.

llvm-svn: 215557
2014-08-13 16:25:19 +00:00
Joey Gouly
18730ba4ab Fix the range for Malayam UCNs in C99.
llvm-svn: 200845
2014-02-05 15:32:23 +00:00
Alexander Kornienko
37d6b18633 Use new UnicodeCharSet interface.
Summary: This is a Clang part of http://llvm-reviews.chandlerc.com/D1534

Reviewers: jordan_rose, klimek, rsmith

Reviewed By: rsmith

CC: cfe-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1535

llvm-svn: 189583
2013-08-29 12:12:31 +00:00
Alexander Kornienko
dfec0b036d Use isCharInSet from llvm/Support/UnicodeCharRanges.h, added a test for double-width characters in FixIt-hints.
Summary: This is a follow-up to r187837.

Reviewers: gribozavr, jordan_rose

Reviewed By: jordan_rose

CC: cfe-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1306

llvm-svn: 188056
2013-08-09 06:35:06 +00:00
Jordan Rose
58c61e006f Properly validate UCNs for C99 and C++03 (both more restrictive than C(++)11).
Add warnings under -Wc++11-compat, -Wc++98-compat, and -Wc99-compat when a
particular UCN is incompatible with a different standard, and -Wunicode when
a UCN refers to a surrogate character in C++03.

llvm-svn: 174788
2013-02-09 01:10:25 +00:00