llvm-project/clang/test/Lexer/comment-utf8.c
Corentin Jabot d4892a168f [Clang] Add a warning on invalid UTF-8 in comments.
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.

The warning is off by default as its likely to be somewhat disruptive
otherwise.

This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D128059
2022-07-13 10:19:26 +02:00

28 lines
701 B
C
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

// RUN: %clang_cc1 -fsyntax-only %s -Winvalid-utf8 -verify
// expected-no-diagnostics
//§ § § 😀 你好 ©
/*§ § § 😀 你好 ©*/
/*
§ § § 😀 你好 ©©©
*/
/* § § § 😀 你好 © */
/*
a longer comment to exerce the vectorized code path
----------------------------------------------------
αααααααααααααααααααααα // here is some unicode
----------------------------------------------------
----------------------------------------------------
*/
// The following test checks that a short comment is not merged
// with a subsequent long comment containing utf-8
enum a {
x /* 01234567890ABCDEF*/
};
/*ααααααααα*/