When emitting a vfcvt with a rounding mode, we end up generating an unnecessary
vmset because the only rounding mode pseudos have a mask operand. This patch
adds a pseudo without a mask, and marks the masked variant with the
MaskedPseudo class so the doPeepholeMergeVMV optimisation knows to remove the
redundant vmset.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D154266
These are often used when we're packing vXi64 comparison results, but we don't have PACKSSQD so have to bitcast, which doesn't work well with num sign bits value tracking.
If we know bits are already 0, we will not need to clear them again with a BIC.
So we can use KnownBits to shrink the size of the constant in the creation BIC
from And, potentially undoing the known-bits folds that happen during
compilation.
BIC only has a single register operand for input and output, so has less
scheduling freedom than a AND, but usually saves the materialization of a
constant.
Differential Revision: https://reviews.llvm.org/D154217
Reapply - the issue was that the `< %s` was missing in the RUN lines,
which didn't impact update_llc_test_checks but of course caused issues
for lit.
The test file is copied from X86 (which is also mostly shared with Arm,
PowerPC) rather than integrated into float-intrinsics.ll and
double-intrinsics.ll.
There's currently a compiler crash for the soft float cases (expect this
is the issue in <https://github.com/llvm/llvm-project/issues/63661>)
which will be a addressed with a follow-on patch posted for review.
This refactor patch means to remove CPU_SPECIFIC* MACROs in X86TargetParser.def
and move those information into ProcInfo of X86TargetParser.cpp. Since these
two files both maintain a table with redundant info such as cpuname and its
features supported. CPU_SPECIFIC* MACROs define some different information. This
patch dealt with them in these ways when moving:
1.mangling
This is now moved to Mangling in ProcInfo and directly initialized at array of
Processors. CPUs don't support cpu_dispatch/specific are assigned '\0' as
mangling.
2.CPU alias
The alias cpu will also be initialized in array of Processors, its attributes
will be same as its alias target cpu. Same feature list, same mangling.
3.TUNE_NAME
Before my change, some cpu names support cpu_dispatch/specific are not
supported in X86.td, which means optimizer/backend doesn't recognize them. So
they use a different TUNE_NAME to generate in IR. In this patch, I added these
missing cpu support at X86.td by utilizing existing Features and XXXTunings, so
that each cpu name can directly use its own name as TUNE_NAME to be supported
by optimizer/backend.
4.Feature list
The feature list of one CPU maintained in X86TargetParser.def is not same as
the one in X86TargetParser.cpp. It only maintains part of features of one CPU
(features defined by X86_FEATURE_COMPAT). While X86TargetParser.cpp maintains
a complete one. This patch abandons the feature list maintained by CPU_SPECIFIC*
MACROs because assigning a CPU with a complete one doesn't affect the
functionality of cpu_dispatch/specific.
Except these four info, since some of CPUs supported by cpu_dispatch/specific
doesn's support clang options like -march, -mtune before, this patch also kept
this behavior still by adding another member OnlyForCPUDispatchSpecific in
ProcInfo.
Reviewed By: pengfei, RKSimon
Differential Revision: https://reviews.llvm.org/D151696
The test file is copied from X86 (which is also mostly shared with Arm,
PowerPC) rather than integrated into float-intrinsics.ll and
double-intrinsics.ll.
There's currently a compiler crash for the soft float cases (expect this
is the issue in <https://github.com/llvm/llvm-project/issues/63661>)
which will be a addressed with a follow-on patch posted for review.
Summary: Currently, if there are multiple definitions of the same symbol declared has weak linkage, the linker may choose the wrong one when they are compiled with integrated-as. This patch fixes the issue. If the target symbol is a weak label we must not attempt to resolve the fixup directly. Emit a relocation and leave resolution of the final target address to the linker.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D153839
GCNHazardRecognizer::fixVcmpxExecWARHazard() mitigates a specific hazard
by inserting a wait on sa_sdst==0 if such a wait isn't already present.
Unfortunately, the check for an existing wait incorrectly checks for one
that doesn't actually care about sa_sdst itself, but requires that no
other counters are waited for.
Once the check is performed correctly, a lit test needs to be updated,
since it is currently testing for the incorrect behaviour.
Differential Revision: https://reviews.llvm.org/D154438
SIInsertWaitcnts inserts waitcnt instructions to resolve data
dependencies. The GFX10+ vscnt (VMEM store count) counter is never used
in this way. It is only used to resolve memory dependencies, and that is
handled by SIMemoryLegalizer. Hence there is no need to conservatively
wait for vscnt to be 0 on function entry and before returns.
Differential Revision: https://reviews.llvm.org/D153537
In llvm/test/CodeGen/ARM/large-stack.ll, the C in FileCheck wasn't
uppercased. This wasn't spotted in development as MacOS's HFS+ fs is apparently
often configured case-insensitive.
* SVE pseudos don't pick up the right latency information during MI
scheduling as the regex do not match with instruction name.
* Move UNDEF, PSEUDO, and ZERO to the end of actual SVE instruction
* Some CPUs *td files will be fixed in the next commit
Differential Revision: https://reviews.llvm.org/D154232
The ARM backend codebase is dotted with places where armv6-m will generate
constant pools. Now that we can generate execute-only code for armv6-m, we need
to make sure we use the movs/lsls/adds/lsls/adds/lsls/adds pattern instead of
these.
Big stacks is one of the obvious places. In this patch we take care of two
sites:
1. take care of big stacks in prologue/epilogue
2. take care of save/tSTRspi nodes, which implicitly fixes
emitThumbRegPlusImmInReg which is used in several frame lowering fns
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D154233
This reverts commit 92a9c30c61da7f973d55cd84fade424159b9cac9.
This has caused a test failure in the 2nd stage of Linaro's
Arm 32 bit buildbots.
LLVM::simplified-template-names.s
7: error: Simplified template DW_AT_name could not be reconstituted:
check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8: original: f3<unsigned char, (unsigned char)'\x00'>
check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9: reconstituted: f3<unsigned char, (unsigned char)'\x7f'>
check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I suspect a load/store is slightly off.
In the revision https://reviews.llvm.org/D138990 was enabled select
optimize pass for AArch64.
We were doing some benchmarking on the Neoverse V1 and were
experimenting with select optimize heuristics. We found out that there
are some additional profitable transformations to predictable branches
(with prediction rate > 75% according to Agner Fog's rule of thumb) can
be done by base heuristic from SelectOptimize pass or by
optimizeSelectInst form CodeGenPrepare pass. But they are blocked on the
Neoverse V1, since PredictableSelectIsExpensive feature is not set for
that subtarget.
Note that to achieve this results we also changed predictable branch
threshold from 99% to 75%
Looks like it makes sense to add this feature to all targets where was
enabled select optimize pass in the https://reviews.llvm.org/D138990.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D143162
Add mir test cases for D154193, which tend to remove test16rr in possible and32ri+test16rr, similar to what we did for and32*+test64rr.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D154322
In change https://reviews.llvm.org/D152790, it was discovered that the
alignment requirement calculation for LDRD/STRD codegen was suboptimal
and the calculation for volatile loads and stores was adjusted.
This change here adopts the calculation for the remaining non-volatile
occurances.
Differential Revision: https://reviews.llvm.org/D153800
The aarch64 backend will benefit from expanding 64vector sdiv/udiv into mulh
using shift(mul(ext, ext)), as the larger type size is legal and the mul(ext,
ext) can efficiently use smull/umull instructions. This extends the existing
code in GetMULHS to handle vector types for it.
Differential Revision: https://reviews.llvm.org/D154049
Reuse the existing div/rem selection code to also handle mul/imul to support G_MUL/G_SMULH/G_UMULH, as they have a similar pattern using rDX/rAX for mulh/mul results, plus the AH/AL support for i8 multiplies.
The instruction-precise, or asynchronous, unwind tables usually take up
much more space than the synchronous ones. If a user is concerned about
the load size of the program and does not need the features provided
with the asynchronous tables, the compiler should be able to generate
the more compact variant.
This patch changes the generation of CFI instructions for these cases so
that they all come in one chunk in the prolog; it emits only one
`.cfi_def_cfa*` instruction followed by `.cfi_offset` ones after all
stack adjustments and register spills, and avoids generating CFI
instructions in the epilog(s) as well as any other exceeding CFI
instructions like `.cfi_remember_state` and `.cfi_restore_state`.
Effectively, it reverses the effects of D111411 and D114545 on functions
with the `uwtable(sync)` attribute. As a side effect, it also restores
the behavior on functions that have neither `uwtable` nor `nounwind`
attributes.
Differential Revision: https://reviews.llvm.org/D153098
Some signed opcodes were being lowered to their unsigned counterparts and
vice-versa.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154234
For OpenMP offload, Clang emits global symbols containing the string
'<captured>', which contains characters that are invalid in PTX.
Extend the existing pass that replaces '.' and '@' characters with
'_$_' to also replace '<' and '>' characters.
Reviewed By: cchen
Differential Revision: https://reviews.llvm.org/D154241
This reverts commit 20f0c68fd83a0147a8ec1722bd2e848180610288.
https://reviews.llvm.org/D153966#4464594 reports an optimization
regression in Rust.
Additionally this change has caused an unexpected 0.3% compile-time
regression.
The codegen routine introduced in 18077e9fd688 did not account for vectors with
more than 16 lanes. Remove the incorrect assertion and bail out of the
optimization when encountering this case. Add test cases that previously
triggered the assertion. Unfortunately, these test cases now have terrible
codegen, but that is at least better than crashing.
Fixes#63500.
Differential Revision: https://reviews.llvm.org/D154124
Fix https://github.com/llvm/llvm-project/issues/63579
```
% cat a.c
void foo() {}
% clang --target=arm-none-eabi -mthumb -mno-unaligned-access -fsanitize=kcfi a.c -S -o - | grep p2align
.p2align 1
% clang --target=armv6m-none-eabi -fsanitize=function a.c -S -o - | grep p2align
.p2align 1
```
Ensure that -fsanitize={function,kcfi} instrumented functions are aligned by at
least 4, so that loading the type hash before the function label will not cause
a misaligned access. This is especially important for -mno-unaligned-access
configurations that don't set `setMinFunctionAlignment` to 4 or greater.
With this patch, the generated assembly for the examples above will contain `.p2align 2`
before the type hash.
If `__attribute__((aligned(N)))` or `-falign-functions=N` is specified, the
larger alignment will be used.
Reviewed By: simon_tatham, samitolvanen
Differential Revision: https://reviews.llvm.org/D154125
As introduced in D99148, RISC-V uses the softPromoteHalf legalisation
for fp16 values without zfh, with logic ensuring that f16 values are
passed in lower bits of FPRs (see D98670) when F or D support is
present. This legalisation produces ISD::FP_TO_FP16 and ISD::FP16_TO_FP
nodes which (as described in ISDOpcodes.h) provide a "semi-softened
interface for dealing with f16 (as an i16)". i.e. the return type of the
FP_TO_FP16 is an integer rather than a float (and the arg of FP16_TO_FP
is an integer). The remainder of the description focuses primarily on
FP_TO_FP16 for ease of explanation.
FP_TO_FP16 is lowered to a libcall to `__truncsfhf2 (float)` or
`__truncdfhf2 (double)`. As of D92241, `_Float16` is used as the return
type of these libcalls if the host compiler accepts `_Float16` in a test
input (i.e. dst_t is set to `_Float16`). `_Float16` is enabled for the
RISC-V target as of D105001 and so the return value should be passed in
an FPR on hard float ABIs.
This patch fixes the ABI issue in what appears to be a minimally
invasive way - leaving the softPromoteHalf logic undisturbed, and
lowering FP_TO_FP16 to an f32-returning libcall, converting its result
to an XLen integer value.
As can be seen in the test changes, the custom lowering for FP16_TO_FP
means the libcall is no longer tail-callable.
Although this patch fixes the issue, there are two open items:
* Redundant fmv.x.w and fmv.w.x pairs are now somtimes produced during
lowering (not a correctness issue).
* Now coverage for STRICT variants of FP16 conversion opcodes.
Differential Revision: https://reviews.llvm.org/D151284
There wasn't previous coverage for rv32id-ilp32, rv64id-lp64,
rv32id-ilp32d, or rv64id-lp64d. This is needed as D151284 fixes a bug
related to the ABI used for libcalls for fp<->fp16 conversion when hard
FP support is present.
These tests will be optimized with INCT32/INCF32/DECT32/DECF32
in the future.
Reviewed By: zixuan-wu
Differential Revision: https://reviews.llvm.org/D153434