Fat LTO objects contain both LTO compatible IR, as well as generated
object code. This allows users to defer the choice of whether to use LTO
or not to link-time. This is a feature available in GCC for some time,
and makes the existing -ffat-lto-objects flag functional in the same
way as GCC's.
This patch adds support for that flag in the driver, as well as setting the
necessary codegen options for the backend. Largely, this means we select
the newly added pass pipeline for generating fat objects.
Users are expected to pass -ffat-lto-objects to clang in addition to one
of the -flto variants. Without the -flto flag, -ffat-lto-objects has no
effect.
// Compile and link. Use the object code from the fat object w/o LTO.
clang -fno-lto -ffat-lto-objects -fuse-ld=lld foo.c
// Compile and link. Select full LTO at link time.
clang -flto -ffat-lto-objects -fuse-ld=lld foo.c
// Compile and link. Select ThinLTO at link time.
clang -flto=thin -ffat-lto-objects -fuse-ld=lld foo.c
// Compile and link. Use ThinLTO with the UnifiedLTO pipeline.
clang -flto=thin -ffat-lto-objects -funified-lto -fuse-ld=lld foo.c
// Compile and link. Use full LTO with the UnifiedLTO pipeline.
clang -flto -ffat-lto-objects -funified-lto -fuse-ld=lld foo.c
// Link separately, using ThinLTO.
clang -c -flto=thin -ffat-lto-objects foo.c
clang -flto=thin -fuse-ld=lld foo.o -ffat-lto-objects # pass --lto=thin --fat-lto-objects to ld.lld
// Link separately, using full LTO.
clang -c -flto -ffat-lto-objects foo.c
clang -flto -fuse-ld=lld foo.o # pass --lto=full --fat-lto-objects to ld.lld
Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977
Depends on D146776
Reviewed By: tejohnson, MaskRay
Differential Revision: https://reviews.llvm.org/D146777
Previously we returned i32 on RV32 and i64 on RV64. The instructions
only consume 32 bits and only produce 32 bits. For RV64, the result
is sign extended to 64 bits like *W instructions.
This patch removes this detail from the interface to improve
portability and consistency. This matches the proposal for scalar
intrinsics here https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44
I've included IR autoupgrade support as well.
I'll be doing this for other builtins/intrinsics that currently use
'long' in other patches.
Reviewed By: VincentWu
Differential Revision: https://reviews.llvm.org/D154647
This patch adds a new option -fkeep-persistent-storage-variables to emit
all variables that have a persistent storage duration, including global,
static and thread-local variables. This could be useful in cases where
the presence of all these variables as symbols in the object file are
required, so that they can be directly addressed.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D150221
Unsigned is a better representation for bitmanipulation and cryptography.w
The only exception being the return values for clz and ctz intrinsics is
a signed int. That matches the target independent clz and ctz builtins.
This is consistent with the current scalar crypto proposal
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44
Reviewed By: VincentWu
Differential Revision: https://reviews.llvm.org/D154616
This removes another use of 'long' to mean xlen from builtins.
I've also converted the types to unsigned as proposed in D154616.
clmul_32 is available to RV64 as its emulation is clmul+sext.w
clmulh_32 and clmulr_32 are not available on RV64 as their emulation
is currently 6 instructions in the worst case.
Depends on D154635
For the cover letter of the patch-set, please checkout D154628.
This is the 8th patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154636
Depends on D154634
For the cover letter of the patch-set, please checkout D154628.
This is the 7th patch of the patch-set. This patch includes change to
vfcvt_x_f, vfcvt_xu_f, vfwcvt_x_f, vfwcvt_xu_f, vfncvt_x_f, vfncvt_xu_f
vfcvt_f_x, vfcvt_f_xu, vfncvt_f_x vfncvt_f_xu, vfncvt_f_f
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154635
Depends on D154633
For the cover letter of the patch-set, please checkout D154628.
This is the 6th patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154634
Depends on D154632
For the cover letter of the patch-set, please checkout D154628.
This is the 5th patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154633
Depends on D154631
For the cover letter of the patch-set, please checkout D154628.
This is the 4th patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154632
Depends on D154629
For the cover letter of the patch-set, please checkout D154628.
This is the 3rd patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154631
Depends on D154628
For the cover letter of the patch-set, please checkout D154628.
This is the 2nd patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154629
Depends on D152996.
This patch-set aims to add a variant for the RVV floating-point
intrinsics that controls the rounding mode (`frm`). The rounding mode
variant appends `_rm` before the policy suffix to distinguish from
those without them.
Specification PR: riscv-non-isa/rvv-intrinsic-doc#226
This is the 1st patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154628
Depends on D152879.
Specification PR: riscv-non-isa/rvv-intrinsic-doc#226
This patch adds variant of `vfadd` that models the rounding mode control.
The added variant has suffix `_rm` appended to differentiate from the
existing ones that does not alternate `frm` and uses whatever is inside.
The value `7` is used to indicate no rounding mode change. Reusing the
semantic from the rounding mode encoding for scalar floating-point
instructions.
Additional data member `HasFRMRoundModeOp` is added so we can append
`_rm` suffix for the fadd variants that models rounding mode control.
Additional data member `IsRVVFixedPoint` is added so we can define
pseudo instructions with rounding mode operand and distinguish the
instructions between fixed-point and floating-point.
Reviewed By: craig.topper, kito-cheng
Differential Revision: https://reviews.llvm.org/D152996
AST nodes that may depend on FP options keep them as a difference
relative to the options outside the AST node. At the moment of
instantiation the FP options may be different from the default values,
defined by command-line option. In such case FP attributes would have
unexpected values. For example, the code:
template <class C> void func_01(int last, C) {
func_01(last, int());
}
void func_02() { func_01(0, 1); }
#pragma STDC FENV_ACCESS ON
caused compiler crash, because template instantiation takes place at the
end of translation unit, where pragma STDC FENV_ACCESS is in effect. As
a result, code in the template instantiation would use constrained
intrinsics while the function does not have StrictFP attribute.
To solve this problem, FP attributes in Sema must be set to default
values, defined by command line options.
This change resolves https://github.com/llvm/llvm-project/issues/63542.
Differential Revision: https://reviews.llvm.org/D154359
- Unified LTO default for all PlayStation targets
- Add default case to tests
- SplitLTOUnit not default for PS4
- Remove ignorelist requirement to address buildbot failures
Differential Revision: https://reviews.llvm.org/D123971
The unified LTO pipeline creates a single LTO bitcode structure that can
be used by Thin or Full LTO. This means that the LTO mode can be chosen
at link time and that all LTO bitcode produced by the pipeline is
compatible, from an optimization perspective. This makes the behavior of
LTO a bit more predictable by normalizing the set of LTO features
supported by each LTO bitcode file.
Example usage:
# Compile and link. Select regular LTO at link time.
clang -flto -funified-lto -fuse-ld=lld foo.c
# Compile and link. Select ThinLTO at link time.
clang -flto=thin -funified-lto -fuse-ld=lld foo.c
# Link separately, using ThinLTO.
clang -c -flto -funified-lto foo.c # -flto={full,thin} are identical in
terms of compilation actions
clang -flto=thin -fuse-ld=lld foo.o # pass --lto=thin to ld.lld
# Link separately, using regular LTO.
clang -c -flto -funified-lto foo.c
clang -flto -fuse-ld=lld foo.o # pass --lto=full to ld.lld
The RFC discussing the details and rational for this change is here:
https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774
This restores commit b4a82b62258c5f650a1cccf5b179933e6bae4867, reverted
in 3ab7ef28eebf9019eb3d3c4efd7ebfd160106bb1 because it was thought to
cause a bot failure, which ended up being unrelated to this patch set.
Differential Revision: https://reviews.llvm.org/D154856
Builtin floating-point number classification functions:
- __builtin_isnan,
- __builtin_isinf,
- __builtin_finite, and
- __builtin_isnormal
now are implemented using `llvm.is_fpclass`.
This change makes the target callback `TargetCodeGenInfo::testFPKind`
unneeded. It is preserved in this change and should be removed later.
Differential Revision: https://reviews.llvm.org/D112932
Previously the MemProf profile was expected to be in the same profile
file as a normal PGO profile, passed via the usual -fprofile-use=
option, and was matched in the same pass. To simplify profile
preparation, since the raw MemProf profile requires the binary for
symbolization and may be simpler to index separately from the raw PGO
profile, and also to enable providing a MemProf profile for a SamplePGO
build, separate out the MemProf feedback option and matching pass.
This patch adds the -fmemory-profile-use=${file} option, and the
provided file is passed down to LLVM and ultimately used in a new
MemProfUsePass which performs the matching of just the memory profile
contents of that file.
Note that a single profile file containing both normal PGO and MemProf
profile data is still supported, and the relevant profile data is
matched by the appropriate matching pass(es) based on which option(s)
the profile is provided with (the same profile file can be supplied to
both feedback options).
Differential Revision: https://reviews.llvm.org/D154856
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.
`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.
Differential Revision: https://reviews.llvm.org/D154591
The attribute is a proper enum attribute, strictfp. We were getting
strictfp and "strictfp" set on every function with
-fexperimental-strict-floating-point.
https://reviews.llvm.org/D139629
With `-fsanitize=kcfi`, Clang currently won't emit type hashes for
C++ member functions, which leads to check failures if they are
indirectly called. As there's no reason to exclude member functions
in CodeGenModule::setKCFIType, emit type hashes also for them to fix
member function pointer calls with KCFI, and add a test to confirm
that types are emitted correctly.
This adds new intrisics to support the LDAP1 and STL1 Advanced SIMD
(Neon) instructions introduced as part of FEAT_LRCPC3.
The new intrinsics `vldap1(q)_lane`/`vstl1(q)_lane` generate IR code
similar to the existing `vld1(q)_lane/st1(q)_lane` ones, but capturing
the difference in the atomic release/acquire memory model.
The LLVM code generation changes to ensure that this instruction pair
is lowered to the correct LDAP1/STL1 instructions will be covered in a
separate commit.
Based on a patch by Sam Elliott.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D153128
Debug info emission for extern variables in C++ was previously disabled
when the functionality was added in https://reviews.llvm.org/D71818 and
originally in https://reviews.llvm.org/D70696, because there was no use
case. We are enabling it now, as we start to deploy BPF programs
compiled from C++, leveraging C++ features like templates to reduce code
complexity. This patch is required so that we can still use kconfig in
such BPF programs compiled from C++.
Reviewed By: rnk, dblaikie, MaskRay, yonghong-song
Differential Revision: https://reviews.llvm.org/D153898
This matches the data type of the intrinsics. This case be seen
from the removal of sext and trunc instructions from the IR.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D154577
This matches the data type of the intrinsics. This case be seen
from the removal of sext and trunc instructions from the IR.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D154572
These builtins were recently changed to return 'int' like the
similar __builtin_clz/__builtin_ctz builtins, but the IR generation
was not updated to use a truncate.
This is recommit of 98390ccb80569e8fbb20e6c996b4b8cff87fbec6, reverted
in 82a3969d710f5fb7a2ee4c9afadb648653923fef, because it caused
https://github.com/llvm/llvm-project/issues/63542. Although the problem
described in the issue is independent of the reverted patch, fail of
PCH/late-parsed-instantiations.cpp indeed obseved on PowerPC and is
likely to be caused by wrong serialization of `LateParsedTemplate`
objects. In this patch the serialization is fixed.
Original commit message is below.
Previously function template instantiations occurred with FP options
that were in effect at the end of translation unit. It was a problem
for late template parsing as these FP options were used as attributes of
AST nodes and may result in crash. To fix it FP options are set to the
state of the point of template definition.
Differential Revision: https://reviews.llvm.org/D143241
This refactor patch means to remove CPU_SPECIFIC* MACROs in X86TargetParser.def
and move those information into ProcInfo of X86TargetParser.cpp. Since these
two files both maintain a table with redundant info such as cpuname and its
features supported. CPU_SPECIFIC* MACROs define some different information. This
patch dealt with them in these ways when moving:
1.mangling
This is now moved to Mangling in ProcInfo and directly initialized at array of
Processors. CPUs don't support cpu_dispatch/specific are assigned '\0' as
mangling.
2.CPU alias
The alias cpu will also be initialized in array of Processors, its attributes
will be same as its alias target cpu. Same feature list, same mangling.
3.TUNE_NAME
Before my change, some cpu names support cpu_dispatch/specific are not
supported in X86.td, which means optimizer/backend doesn't recognize them. So
they use a different TUNE_NAME to generate in IR. In this patch, I added these
missing cpu support at X86.td by utilizing existing Features and XXXTunings, so
that each cpu name can directly use its own name as TUNE_NAME to be supported
by optimizer/backend.
4.Feature list
The feature list of one CPU maintained in X86TargetParser.def is not same as
the one in X86TargetParser.cpp. It only maintains part of features of one CPU
(features defined by X86_FEATURE_COMPAT). While X86TargetParser.cpp maintains
a complete one. This patch abandons the feature list maintained by CPU_SPECIFIC*
MACROs because assigning a CPU with a complete one doesn't affect the
functionality of cpu_dispatch/specific.
Except these four info, since some of CPUs supported by cpu_dispatch/specific
doesn's support clang options like -march, -mtune before, this patch also kept
this behavior still by adding another member OnlyForCPUDispatchSpecific in
ProcInfo.
Reviewed By: pengfei, RKSimon
Differential Revision: https://reviews.llvm.org/D151696
Similar to the existing _mm_load1_pd/_mm_loaddup_pd and broadcast vector loads, these intrinsic should ensure the loads are unaligned and not assume type alignment
Fixes#62325
This is a follow-up to 2e275e24355cb224981f9beb2b026a3169fc7232 and
1395cde24b3641e284bb1daae7d56c189a2635e3 which corrects a missed case:
initializing an _Atomic(T *) from a null pointer constant in the form
of the integer literal 0.
Fixes https://github.com/llvm/llvm-project/issues/63550
Differential Revision: https://reviews.llvm.org/D154284
This caused asserts in some Android and Windows builds:
SelectionDAGNodes.h:1138: llvm::SDValue::SDValue(SDNode *, unsigned int):
Assertion `(!Node || !ResNo || ResNo < Node->getNumValues()) && "Invalid result number for the given node!"' failed.
See comment on 85bdea023f
Also revert "HIP: Use frexp builtins in math headers"
which seems to depend on this change.
This reverts commit 85bdea023f5116f789095b606554739403042a21.
This reverts commit bf8e92c0e792cbe3c9cc50607a1e33c6912ffd0e.
The existing code assumes that both `DeclareRISCVVBuiltins` and
`DeclareRISCVSiFiveVectorBuiltins` are set when coming into the if-statement
under SemaLookup.cpp.
This is not the case and causes issue #63571.
This patch resolves the issue.
Reviewed By: 4vtomat, kito-cheng
Differential Revision: https://reviews.llvm.org/D154050