Darwin targets implement -mcmodel=large by forcing all global accesses to use
the GOT, instead of the ELF movz/movk sequence. That means it's compatible with
PIC so the Clang driver shouldn't reject the option.
Reenables b31414bf4f9898f7817a9fcf8a91f62ec26f3eaf.
Also adds a new warning for missing `--symbol-graph-dir` arg when
`--emit-extension-symbol-graphs` is provided. This also reverts the
commit that removed.
We required the file name of an 'importable module unit' should end
with .cppm (or .ccm, .cxxm, .c++m).
But the driver can accept '-fmodule-output' for files with normal
suffixes (e.g., .cpp). This is somewhat inconsistency.
In this patch, we only claim the option `-fmodule-output` is used if
the type of the input file is modules related. Then now the compiler
will emit 'unused argument' warnings if the input file is not modules
related.
This extends ExtractAPI to take into account symbols defined in categories to types defined in an external module.
This introduces 2 new command line flags, `--symbol-graph-dir=DIR` and `--emit-extension-symbol-graphs`, when used together this generates additional symbol graph files at `DIR/ExtendedModule@ProductName.symbols.json` for each external module that is extended in this way.
Additionally this makes some cleanups to tests to make them more resilient and cleans up the `APISet` data structure.
Defines a subset of attributes and emits them to a section called
.hexagon.attributes.
The current attributes recorded are the attributes needed by
llvm-objdump to automatically determine target features and eliminate
the need to manually pass features.
This defines the basic set of pointer authentication clang builtins
(provided in a new header, ptrauth.h), with diagnostics and IRGen
support. The availability of the builtins is gated on a new flag,
`-fptrauth-intrinsics`.
Note that this only includes the basic intrinsics, and notably excludes
`ptrauth_sign_constant`, `ptrauth_type_discriminator`, and
`ptrauth_string_discriminator`, which need extra logic to be fully
supported.
This also introduces clang/docs/PointerAuthentication.rst, which
describes the ptrauth model in general, in addition to these builtins.
Co-Authored-By: Akira Hatanaka <ahatanaka@apple.com>
Co-Authored-By: John McCall <rjmccall@apple.com>
When `-fcx-no-limited-range` or` -fno-cx-fortran-rules` follows another
complex range option on the command line, it will trigger a warning with
empty message.
`warning: overriding '-fcx-fortran-rules' option with ''
[-Woverriding-option]`
or
`warning: overriding '-fcx-limited-range' option with ''
[-Woverriding-option]`
This patch fixes that.
Added --offload-compression-level= option to clang and
-compression-level=
option to clang-offload-bundler for controlling compression level.
Added support of long distance matching (LDM) for llvm::zstd which is
off
by default. Enable it for clang-offload-bundler by default since it
improves compression rate in general.
Change default compression level to 3 for zstd for clang-offload-bundler
since it works well for bundle entry size from 1KB to 32MB, which should
cover most of the clang-offload-bundler usage. Users can still specify
compression level by -compression-level= option if necessary.
This implements the C++23 `[[assume]]` attribute.
Assumption information is lowered to a call to `@llvm.assume`, unless the expression has side-effects, in which case it is discarded and a warning is issued to tell the user that the assumption doesn’t do anything. A failed assumption at compile time is an error (unless we are in `MSVCCompat` mode, in which case we don’t check assumptions at compile time).
Due to performance regressions in LLVM, assumptions can be disabled with the `-fno-assumptions` flag. With it, assumptions will still be parsed and checked, but no calls to `@llvm.assume` will be emitted and assumptions will not be checked at compile time.
Summary:
The very first version of the `clang-linker-wrapper` used `--` as a
separator for the host and device arguments. I moved away from this
towards a commandline parsing implementation years ago but never got
around to officially removing this.
When -gsplit-dwarf is passed in clang emmmits -ggnu-pubnames which
results in
.debug_gnu_pubnames/..debug_gnu_pubtypes being generated.
This is used by GDB, but not by LLDB.
Changed so that these sections are not emitted for LLDB tuning, unless
flag
is passed explicitly.
Emitting the basic block address map with
`-fbasic-block-sections=labels` is allowed for AArch64 ELF since
7eaf94fefa1250fc8a46982cea8ce99abacae11f. Allow doing so with
`-fbasic-block-address-map`.
When -gsplit-dwarf is passed in clang emmmits -ggnu-pubnames which
results in
.debug_gnu_pubnames/..debug_gnu_pubtypes being generated.
This is used by GDB, but not by LLDB.
Changed so that these sections are not emitted for LLDB tuning.
Summary:
Recent changes to the `libc` project caused the headers to be installed
to `include/<triple>` for the GPU and the libraries to be in
`lib/<triple>`. This means we should automatically append these search
paths so they can be found by default. This allows the following to work
targeting AMDGPU.
```shell
$ clang foo.c -flto -mcpu=native --target=amdgcn-amd-amdhsa -lc <install>/lib/amdgcn-amd-amdhsa/crt1.o
$ amdhsa-loader a.out
```
Installapi has important distinctions when compared to the clang driver,
so much that, it doesn't make much sense to try to integrate into it.
This patch partially reverts the CC1 action & driver support to replace
with its own driver as a clang tool.
For distribution, we could use `LLVM_TOOL_LLVM_DRIVER_BUILD` mechanism
for integrating the functionality into clang such that the toolchain
size is less impacted.
Remove the `-freroll-loops` flag, which has not had any effect since the
migration to the new pass manager. The underlying pass has been removed
entirely in #80972 due to lack of maintenance and known bugs.
Fixes https://github.com/llvm/llvm-project/issues/59065.
This introduces a basic outline of installapi as a clang driver option.
It captures relevant information as cc1 args, which are common arguments
already passed to the linker to encode into TBD file outputs. This is
effectively an upstream for what already exists as `tapi installapi` in
Xcode toolchains, but directly in Clang. This patch does not handle any
AST traversing on input yet.
InstallAPI is broadly an operation that takes a series of header files
that represent a single dynamic library and generates a TBD file out of
it which represents all the linkable symbols and necessary attributes
for statically linking in clients. It is the linkable object in all
Apple SDKs and when building dylibs in Xcode. `clang -installapi` also
will support verification where it compares all the information recorded
for the TBD files against the already built binary, to catch possible
mismatches like when a declaration is missing a definition for an
exported symbol.
This refactors the fast-math handling in the clang driver, moving the
settings into a lambda that is shared by the -ffp-model=fast and
-ffast-math code. Previously the -ffp-model=fast handler changed the
local option variable and fell through to the -ffast-math handler.
This refactoring is intended to prepare the way for decoupling the
-ffp-model=fast settings from the -ffast-math settings and possibly
introduce a less aggressive fp-model.
Basic block sections "all" doesn't work on AArch64 since branch
relaxation may create new basic blocks. However, the other basic
block section modes should work out of the box since machine function
splitting already uses the basic block sections pass.
This adds GCC-compatible names for code model selection on 64-bit SPARC
with absolute code.
Testing with a 2-stage build then running codegen tests works okay under
all of the supported code models.
(32-bit target does not have selectable code models)
Reviewed By: @brad0, @MaskRay
Today `-split-machine-functions` and `-fbasic-block-sections={all,list}`
cannot be combined with `-basic-block-sections=labels` (the labels
option will be ignored).
The inconsistency comes from the way basic block address map -- the
underlying mechanism for basic block labels -- encodes basic block
addresses
(https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html).
Specifically, basic block offsets are computed relative to the function
begin symbol. This relies on functions being contiguous which is not the
case for MFS and basic block section binaries. This means Propeller
cannot use binary profiles collected from these binaries, which limits
the applicability of Propeller for iterative optimization.
To make the `SHT_LLVM_BB_ADDR_MAP` feature work with basic block section
binaries, we propose modifying the encoding of this section as follows.
First let us review the current encoding which emits the address of each
function and its number of basic blocks, followed by basic block entries
for each basic block.
| | |
|--|--|
| Address of the function | Function Address |
| Number of basic blocks in this function | NumBlocks |
| BB entry 1
| BB entry 2
| ...
| BB entry #NumBlocks
To make this work for basic block sections, we treat each basic block
section similar to a function, except that basic block sections of the
same function must be encapsulated in the same structure so we can map
all of them to their single function.
We modify the encoding to first emit the number of basic block sections
(BB ranges) in the function. Then we emit the address map of each basic
block section section as before: the base address of the section, its
number of blocks, and BB entries for its basic block. The first section
in the BB address map is always the function entry section.
| | |
|--|--|
| Number of sections for this function | NumBBRanges |
| Section 1 begin address | BaseAddress[1] |
| Number of basic blocks in section 1 | NumBlocks[1] |
| BB entries for Section 1
|..................|
| Section #NumBBRanges begin address | BaseAddress[NumBBRanges] |
| Number of basic blocks in section #NumBBRanges |
NumBlocks[NumBBRanges] |
| BB entries for Section #NumBBRanges
The encoding of basic block entries remains as before with the minor
change that each basic block offset is now computed relative to the
begin symbol of its containing BB section.
This patch adds a new boolean codegen option `-basic-block-address-map`.
Correspondingly, the front-end flag `-fbasic-block-address-map` and LLD
flag `--lto-basic-block-address-map` are introduced.
Analogously, we add a new TargetOption field `BBAddrMap`. This means BB
address maps are either generated for all functions in the compiling
unit, or for none (depending on `TargetOptions::BBAddrMap`).
This patch keeps the functionality of the old
`-fbasic-block-sections=labels` option but does not remove it. A
subsequent patch will remove the obsolete option.
We refactor the `BasicBlockSections` pass by separating the BB address
map and BB sections handing to their own functions (named
`handleBBAddrMap` and `handleBBSections`). `handleBBSections` renumbers
basic blocks and places them in their assigned sections.
`handleBBAddrMap` is invoked after `handleBBSections` (if requested) and
only renumbers the blocks.
- New tests added:
- Two tests basic-block-address-map-with-basic-block-sections.ll and
basic-block-address-map-with-mfs.ll to exercise the combination of
`-basic-block-address-map` with `-basic-block-sections=list` and
'-split-machine-functions`.
- A driver sanity test for the `-fbasic-block-address-map` option
(basic-block-address-map.c).
- An LLD test for testing the `--lto-basic-block-address-map` option.
This reuses the LLVM IR from `lld/test/ELF/lto/basic-block-sections.ll`.
- Renamed and modified the two existing codegen tests for basic block
address map (`basic-block-sections-labels-functions-sections.ll` and
`basic-block-sections-labels.ll`)
- Removed `SHT_LLVM_BB_ADDR_MAP_V0` tests. Full deprecation of
`SHT_LLVM_BB_ADDR_MAP_V0` and `SHT_LLVM_BB_ADDR_MAP` version less than 2
will happen in a separate PR in a few months.
Close https://github.com/llvm/llvm-project/issues/79240
Cite the comment from @mizvekov in
//github.com/llvm/llvm-project/issues/79240:
> There are two kinds of bugs / issues relevant here:
>
> Clang bugs that this change hides
> Here we can add a Frontend flag that disables the GMF ODR check, just
> so
> we can keep tracking, testing and fixing these issues.
> The Driver would just always pass that flag.
> We could add that flag in this current issue.
> Bugs in user code:
> I don't think it's worth adding a corresponding Driver flag for
> controlling the above Frontend flag, since we intend it's behavior to
> become default as we fix the problems, and users interested in testing
> the more strict behavior can just use the Frontend flag directly.
This patch follows the suggestion:
- Introduce the CC1 flag `-fskip-odr-check-in-gmf` which is by default
off, so that the every existing test will still be tested with checking
ODR violations.
- Passing `-fskip-odr-check-in-gmf` in the driver to keep the behavior
we intended.
- Edit the document to tell the users who are still interested in more
strict checks can use `-Xclang -fno-skip-odr-check-in-gmf` to get the
existing behavior.
The options `-fcx-limited-range` and `-fcx-fortran-rules` were added in
_https://github.com/llvm/llvm-project/pull/70244_
The code adding the options introduced an erroneous warning.
`$ clang -c -fcx-limited-range t1.c`
`clang: warning: overriding '' option with '-fcx-limited-range'
[-Woverriding-option]`
and
`$ clang -c -fcx-fortran-rules t1.c`
`clang: warning: overriding '' option with '-fcx-fortran-rules'
[-Woverriding-option]`
The warning doesn't make sense. This patch removes it.
GCC supports -mtls-dialect= for several architectures to select TLSDESC.
This patch supports the following values
* x86: "gnu". "gnu2" (TLSDESC) is not supported yet.
* RISC-V: "trad" (general dynamic), "desc" (TLSDESC, see #66915)
AArch64 toolchains seem to support TLSDESC from the beginning, and the
general dynamic model has poor support. Nobody seems to use the option
-mtls-dialect= at all, so we don't bother with it.
There also seems very little interest in AArch32's TLSDESC support.
TLSDESC does not change IR, but affects object file generation. Without
a backend option the option is a no-op for in-process ThinLTO.
There seems no motivation to have fine-grained control mixing trad/desc
for TLS, so we just pass -mllvm, and don't bother with a modules flag
metadata or function attribute.
Co-authored-by: Paul Kirth <paulkirth@google.com>
Currently, the UnifiedLTO pipeline seems to have trouble with several
LTO features, like SplitLTO units, which means we cannot use important
optimizations like Whole Program Devirtualization or security hardening
instrumentation like CFI.
This patch reverts FatLTO to using distinct pipelines for Full LTO and
ThinLTO. It still avoids module cloning, since that was error prone.
This flag forces the compiler to generate code for OpenMP target regions
as if the user specified the #pragma omp requires unified_shared_memory
in each source file.
The option does not have a -fno-* friend since OpenMP requires the
unified_shared_memory clause to be present in all source files. Since
this flag does no harm if the clause is present, it can be used in
conjunction. My understanding is that USM should not be turned off
selectively, hence, no -fno- version.
This adds a basic test to check the correct generation of double
indirect access to declare target globals in USM mode vs non-USM mode.
Which I think is the only difference observable in code generation.
This runtime test checks for the (non-)occurence of data movement between host
and device. It does one run without the flag and one with the flag to
also see that both versions behave as expected. In the case w/o the new
flag data movement between host and device is expected. In the case with
the flag such data movement should not be present / reported.
[Sema] Add `-fvisibility-global-new-delete=` option (#75364)
By default the implicitly declared replaceable global new and delete
operators are given a default visibility attribute. Previous work, see:
https://reviews.llvm.org/D53787, added
`-fvisibility-global-new-delete-hidden` to change this to a hidden
visibility attribute.
This change adds `-fvisibility-global-new-delete=` which controls
whether (or not) to add an implicit visibility attribute to the implicit
declarations for these functions, and what visibility that attribute
will specify. The option takes 4 values: `force-hidden`,
`force-protected`, `force-default` and `source`. Option values
`force-hidden`, `force-protected` and `force-default` assign hidden,
protected, and default visibilities respectively; the use of the term
force in the value names is designed to imply to a user that the semantics
of this option differ significantly from `-fvisibility=`. An option
value of `source` implies that no implicit attribute is added; without
the attribute the replaceable global new and delete operators behave
normally (like other functions) with respect to visibility attributes,
pragmas and options.
The motivation for the `source` value is to facilitate users who intend
to replace these functions either for a single linkage unit or a limited
set of linkage units. `-fvisibility-global-new-delete=source` can be
applied globally to the compilations in a build where the existing
`-fvisibility-global-new-delete-hidden` cannot, as it conflicts with a
common pattern where these functions are dynamically imported.
The existing `-fvisibility-global-new-delete-hidden` is now a deprecated
spelling of `-fvisibility-global-new-delete=force-hidden`
A release note has been added for these changes.
`-fvisibility-global-new-delete=source` will be set by default for PS5.
PS5 users that want the normal toolchain behaviour will be able to
supply `-fvisibility-global-new-delete=force-default`.
Add a clang flag, "-ftrivial-auto-var-init-max-size=" so that clang
skips auto-init a variable if the auto-init memset size exceeds the flag
setting (in bytes). Note that this skipping doesn't apply to
runtime-sized variables like VLA.
Considerations: "__attribute__((uninitialized))" can be used to manually
opt variables out. However, there are thousands of large variables
(e.g., >=1KB, most of them are arrays and used as buffers) in big
codebase. Manually opting them out one by one is not efficient.
Summary:
The linker wrapper's job is to sort various embedded inputs into a list
of files that participate in a single link job. So far, this has been
completely 1-to-1, that is, each input file participates in exactly one
link job. However, support for AMD's target-id requires that one input
file may participate in multiple link jobs. For example, if given a
`gfx90a` static library and a `gfx90a:xnack+` object file input, we
should link the gfx90a` target into the `gfx90a:xnack+` job. These are
considered separate CPUs that can be mutually linked more or less.
This patch adds the necessary logic to make this happen. It primarily
reworks the logic to copy relevant input files into a separate list. So,
it moves construction of the final list of link jobs into the extraction
phase. We also need to copy the files in the case that it is needed more
than once, as the entire workflow expects ownership of said file.
A previous commit (82f75ed) made clang ignore .gch files that were not
Clang AST files. This broke `-gmodules`, which embeds the Clang AST into
an object file containing debug info.
This changes the probing to detect any file format recognized by
`llvm::identify_magic()` as potentially containing a Clang AST.
Previous PR: https://github.com/llvm/llvm-project/pull/69204
Make it apply to x86-64 medium and large code models since that's what
the backend does.
Limit logic to exclude x86-32.
Default to 0, let the driver set it to 65536 for the medium code model
if one is not passed. Set it to 0 for the large code model by default to
match gcc and since some users make assumptions about the large code
model that any small data will break.
-mbranch-protection=gcs (enabled by -mbranch-protection=standard) causes
generated objects to be marked with the gcs feature. This is done via
the guarded-control-stack module flag, in a similar way to
branch-target-enforcement and sign-return-address.
Enabling GCS causes the GNU_PROPERTY_AARCH64_FEATURE_1_GCS bit to be set
on generated objects. No code generation changes are required, as GCS
just requires that functions are called using BL and returned from using
RET (or other similar variant instructions), which is already the case.