52796 Commits

Author SHA1 Message Date
Amaury Séchet
7b54626bac [NFC] Fix indentation in addcarry.ll 2023-07-05 16:45:14 +00:00
Luke Lau
60be17a685 [RISCV] Add VFCVT pseudos with no mask
When emitting a vfcvt with a rounding mode, we end up generating an unnecessary
vmset because the only rounding mode pseudos have a mask operand. This patch
adds a pseudo without a mask, and marks the masked variant with the
MaskedPseudo class so the doPeepholeMergeVMV optimisation knows to remove the
redundant vmset.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D154266
2023-07-05 17:28:43 +01:00
Simon Pilgrim
38721f29f8 [X86] ComputeNumSignBitsForTargetNode - attempt to recognise PACKSSDW(PACKSSDW(X,Y),PACKSSDW(Z,W)) patterns
These are often used when we're packing vXi64 comparison results, but we don't have PACKSSQD so have to bitcast, which doesn't work well with num sign bits value tracking.
2023-07-05 16:43:43 +01:00
Simon Pilgrim
a32d14fd4c [X86] Fold BITOP(PACKSS(X,Z),PACKSS(Y,W)) --> PACKSS(BITOP(X,Y),BITOP(Z,W))
Fold allsignbits pack patterns to make better use of cheap (and commutable) logic ops
2023-07-05 16:43:43 +01:00
David Green
ae8f929b93 [AArch64] Use known zero bits when creating BIC
If we know bits are already 0, we will not need to clear them again with a BIC.
So we can use KnownBits to shrink the size of the constant in the creation BIC
from And, potentially undoing the known-bits folds that happen during
compilation.

BIC only has a single register operand for input and output, so has less
scheduling freedom than a AND, but usually saves the materialization of a
constant.

Differential Revision: https://reviews.llvm.org/D154217
2023-07-05 15:42:33 +01:00
David Green
86bd9a420f [AArch64] Additional tests for creating BIC from known bits. NFC 2023-07-05 15:42:33 +01:00
Alex Bradbury
2ae71f541e [RISCV][test] Add commented out f128 test for llvm.frexp.ll
This represents the crash reported in <https://github.com/llvm/llvm-project/issues/63661>
2023-07-05 11:18:51 +01:00
Alex Bradbury
7de4c6f8d9 [RISCV][test] Add test coverage for llvm.frexp.*.* intrinsics
Reapply - the issue was that the `< %s` was missing in the RUN lines,
which didn't impact update_llc_test_checks but of course caused issues
for lit.

The test file is copied from X86 (which is also mostly shared with Arm,
PowerPC) rather than integrated into float-intrinsics.ll and
double-intrinsics.ll.

There's currently a compiler crash for the soft float cases (expect this
is the issue in <https://github.com/llvm/llvm-project/issues/63661>)
which will be a addressed with a follow-on patch posted for review.
2023-07-05 10:40:39 +01:00
Freddy Ye
7717c0071d [X86] Remove CPU_SPECIFIC* MACROs and add getCPUDispatchMangling
This refactor patch means to remove CPU_SPECIFIC* MACROs in X86TargetParser.def
and move those information into ProcInfo of X86TargetParser.cpp. Since these
two files both maintain a table with redundant info such as cpuname and its
features supported. CPU_SPECIFIC* MACROs define some different information. This
patch dealt with them in these ways when moving:
1.mangling
This is now moved to Mangling in ProcInfo and directly initialized at array of
Processors. CPUs don't support cpu_dispatch/specific are assigned '\0' as
mangling.
2.CPU alias
The alias cpu will also be initialized in array of Processors, its attributes
will be same as its alias target cpu. Same feature list, same mangling.
3.TUNE_NAME
Before my change, some cpu names support cpu_dispatch/specific are not
supported in X86.td, which means optimizer/backend doesn't recognize them. So
they use a different TUNE_NAME to generate in IR. In this patch, I added these
missing cpu support at X86.td by utilizing existing Features and XXXTunings, so
that each cpu name can directly use its own name as TUNE_NAME to be supported
by optimizer/backend.
4.Feature list
The feature list of one CPU maintained in X86TargetParser.def is not same as
the one in X86TargetParser.cpp. It only maintains part of features of one CPU
(features defined by X86_FEATURE_COMPAT). While X86TargetParser.cpp maintains
a complete one. This patch abandons the feature list maintained by CPU_SPECIFIC*
MACROs because assigning a CPU with a complete one doesn't affect the
functionality of cpu_dispatch/specific.
Except these four info, since some of CPUs supported by cpu_dispatch/specific
doesn's support clang options like -march, -mtune before, this patch also kept
this behavior still by adding another member OnlyForCPUDispatchSpecific in
ProcInfo.

Reviewed By: pengfei, RKSimon

Differential Revision: https://reviews.llvm.org/D151696
2023-07-05 17:32:00 +08:00
Alex Bradbury
80c5698ec3 Revert "[RISCV][test] Add test coverage for llvm.frexp.*.* intrinsics"
Reverting due to weird failure.

This reverts commit 4b8162fe9ccb68b5b42f683df8df42ed43bfd5e7.
2023-07-05 10:29:06 +01:00
Alex Bradbury
4b8162fe9c [RISCV][test] Add test coverage for llvm.frexp.*.* intrinsics
The test file is copied from X86 (which is also mostly shared with Arm,
PowerPC) rather than integrated into float-intrinsics.ll and
double-intrinsics.ll.

There's currently a compiler crash for the soft float cases (expect this
is the issue in <https://github.com/llvm/llvm-project/issues/63661>)
which will be a addressed with a follow-on patch posted for review.
2023-07-05 10:24:30 +01:00
esmeyi
2d74cf1f24 [XCOFF] Force recording a relocation for weak symbol label.
Summary: Currently, if there are multiple definitions of the same symbol declared has weak linkage, the linker may choose the wrong one when they are compiled with integrated-as. This patch fixes the issue. If the target symbol is a weak label we must not attempt to resolve the fixup directly. Emit a relocation and leave resolution of the final target address to the linker.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D153839
2023-07-05 01:58:18 -04:00
Lei Huang
c7c3d71414 [PowerPC] add testcase for vector add and shift 2023-07-04 10:45:19 -04:00
Stephen Thomas
2dfb4b56fe [AMDGPU] Fix incorrect hazard mitigation
GCNHazardRecognizer::fixVcmpxExecWARHazard() mitigates a specific hazard
by inserting a wait on sa_sdst==0 if such a wait isn't already present.
Unfortunately, the check for an existing wait incorrectly checks for one
that doesn't actually care about sa_sdst itself, but requires that no
other counters are waited for.

Once the check is performed correctly, a lit test needs to be updated,
since it is currently testing for the incorrect behaviour.

Differential Revision: https://reviews.llvm.org/D154438
2023-07-04 14:42:51 +01:00
Jay Foad
f2c164c815 [AMDGPU] Do not wait for vscnt on function entry and return
SIInsertWaitcnts inserts waitcnt instructions to resolve data
dependencies. The GFX10+ vscnt (VMEM store count) counter is never used
in this way. It is only used to resolve memory dependencies, and that is
handled by SIMemoryLegalizer. Hence there is no need to conservatively
wait for vscnt to be 0 on function entry and before returns.

Differential Revision: https://reviews.llvm.org/D153537
2023-07-04 12:22:38 +01:00
Ties Stuij
d145abcfb3 [ARM] fix typo in large-stack.ll introduced when fixing another typo 2023-07-04 11:23:24 +01:00
Ties Stuij
61bcaae7ab [ARM] fix typo in large-stack.ll test
In llvm/test/CodeGen/ARM/large-stack.ll, the C in FileCheck wasn't
uppercased. This wasn't spotted in development as MacOS's HFS+ fs is apparently
often configured case-insensitive.
2023-07-04 11:18:25 +01:00
Ties Stuij
1f082d2da0 [ARM] make execute only long call test checks more robust
Reviewed By: olista01

Differential Revision: https://reviews.llvm.org/D154355
2023-07-04 10:51:48 +01:00
Harvin Iriawan
c35d2071d8 [AArch64] NFC : Change the way SVE pseudos are appended
* SVE pseudos don't pick up the right latency information during MI
    scheduling as the regex do not match with instruction name.

  * Move UNDEF, PSEUDO, and ZERO to the end of actual SVE instruction

  * Some CPUs *td files will be fixed in the next commit

    Differential Revision: https://reviews.llvm.org/D154232
2023-07-04 10:41:56 +01:00
Ties Stuij
112d769e5e [ARM] generate correct code for armv6-m XO big stack operations
The ARM backend codebase is dotted with places where armv6-m will generate
constant pools. Now that we can generate execute-only code for armv6-m, we need
to make sure we use the movs/lsls/adds/lsls/adds/lsls/adds pattern instead of
these.

Big stacks is one of the obvious places. In this patch we take care of two
sites:
1. take care of big stacks in prologue/epilogue
2. take care of save/tSTRspi nodes, which implicitly fixes
   emitThumbRegPlusImmInReg which is used in several frame lowering fns

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D154233
2023-07-04 10:40:06 +01:00
Igor Kirillov
e13582e9e3 [CodeGen] Precommit tests for D153355
Differential Revision: https://reviews.llvm.org/D153856
2023-07-04 09:29:38 +00:00
Ben Shi
3e6b80b1bd [CSKY] Optimize conditional select with CLRT/CLRF
Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D154409
2023-07-04 15:22:18 +08:00
Ben Shi
ef53ec969b [CSKY][test][NFC] Add more tests of conditional select
Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D154408
2023-07-04 15:22:18 +08:00
Paul Walker
c9eec3b085 [SVE] Extend incp/decp testing to cover 32-bit use cases. 2023-07-03 15:36:56 +01:00
David Spickett
ab3bb86d44 Revert "[ARM] Adjust strd/ldrd codegen alignment requirements"
This reverts commit 92a9c30c61da7f973d55cd84fade424159b9cac9.

This has caused a test failure in the 2nd stage of Linaro's
Arm 32 bit buildbots.

LLVM::simplified-template-names.s

            7: error: Simplified template DW_AT_name could not be reconstituted:
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            8:  original: f3<unsigned char, (unsigned char)'\x00'>
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            9:  reconstituted: f3<unsigned char, (unsigned char)'\x7f'>
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I suspect a load/store is slightly off.
2023-07-03 14:05:49 +00:00
Aleksandr Popov
22f2173837 [AArch64] Add PredictableSelectIsExpensive feature to all the cpus that have FeatureEnableSelectOptimize
In the revision https://reviews.llvm.org/D138990 was enabled select
optimize pass for AArch64.

We were doing some benchmarking on the Neoverse V1 and were
experimenting with select optimize heuristics. We found out that there
are some additional profitable transformations to predictable branches
(with prediction rate > 75% according to Agner Fog's rule of thumb) can
be done by base heuristic from SelectOptimize pass or by
optimizeSelectInst form CodeGenPrepare pass. But they are blocked on the
Neoverse V1, since PredictableSelectIsExpensive feature is not set for
that subtarget.

Note that to achieve this results we also changed predictable branch
threshold from 99% to 75%

Looks like it makes sense to add this feature to all targets where was
enabled select optimize pass in the https://reviews.llvm.org/D138990.

Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D143162
2023-07-02 09:23:43 +02:00
Wang, Xin10
f64e11369f [X86]Precommit test cases for D154193
Add mir test cases for D154193, which tend to remove test16rr in possible and32ri+test16rr, similar to what we did for and32*+test64rr.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D154322
2023-07-03 04:43:50 -04:00
Ben Shi
d53063c3e2 [CSKY] Optimize conditional branch with BLZ32/BLSZ32/BHZ32/BHSZ32
Add more `Pat`s to generate BLZ32/BLSZ32/BHZ32/BHSZ32.

Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D153607
2023-07-03 15:03:43 +08:00
Ben Shi
86829d15f4 [CSKY] Optimize IR pattern icmp-select with DECT32/DECF32
Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D153518
2023-07-03 15:03:43 +08:00
Maurice Heumann
92a9c30c61 [ARM] Adjust strd/ldrd codegen alignment requirements
In change https://reviews.llvm.org/D152790, it was discovered that the
alignment requirement calculation for LDRD/STRD codegen was suboptimal
and the calculation for volatile loads and stores was adjusted.

This change here adopts the calculation for the remaining non-volatile
occurances.

Differential Revision: https://reviews.llvm.org/D153800
2023-07-02 14:25:25 -07:00
David Green
f55d96b9a2 [DAG][AArch64] Handle vector types when expanding sdiv/udiv into mulh
The aarch64 backend will benefit from expanding 64vector sdiv/udiv into mulh
using shift(mul(ext, ext)), as the larger type size is legal and the mul(ext,
ext) can efficiently use smull/umull instructions. This extends the existing
code in GetMULHS to handle vector types for it.

Differential Revision: https://reviews.llvm.org/D154049
2023-07-02 15:02:52 +01:00
Simon Pilgrim
8269fd2db5 [GlobalIsel][X86] Add initial scalar G_MUL/G_SMULH/G_UMULH instruction selection handling
Reuse the existing div/rem selection code to also handle mul/imul to support G_MUL/G_SMULH/G_UMULH, as they have a similar pattern using rDX/rAX for mulh/mul results, plus the AH/AL support for i8 multiplies.
2023-07-02 12:56:41 +01:00
David Green
878e498f05 [AArch64] Expand typesizes of tests for constant srem/urem. NFC
See D154049.
2023-07-02 12:44:15 +01:00
Igor Kudrin
6e54fccede [AArch64] Emit fewer CFI instructions for synchronous unwind tables
The instruction-precise, or asynchronous, unwind tables usually take up
much more space than the synchronous ones. If a user is concerned about
the load size of the program and does not need the features provided
with the asynchronous tables, the compiler should be able to generate
the more compact variant.

This patch changes the generation of CFI instructions for these cases so
that they all come in one chunk in the prolog; it emits only one
`.cfi_def_cfa*` instruction followed by `.cfi_offset` ones after all
stack adjustments and register spills, and avoids generating CFI
instructions in the epilog(s) as well as any other exceeding CFI
instructions like `.cfi_remember_state` and `.cfi_restore_state`.
Effectively, it reverses the effects of D111411 and D114545 on functions
with the `uwtable(sync)` attribute. As a side effect, it also restores
the behavior on functions that have neither `uwtable` nor `nounwind`
attributes.

Differential Revision: https://reviews.llvm.org/D153098
2023-07-01 16:31:09 -07:00
Evandro Menezes
6a5da11b87 [AArch64] Add scheduling model for Neoverse N1
Add the scheduling model for Neoverse N1.

Differential revision: https://reviews.llvm.org/D152417
2023-07-01 12:35:22 -05:00
Luke Lau
e8e0f32958 [RISCV] Fix vfwcvt/vfncvt pseudos w/ rounding mode lowering
Some signed opcodes were being lowered to their unsigned counterparts and
vice-versa.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154234
2023-06-30 21:43:19 +01:00
Jeffrey Sandoval
6555c47448 [OpenMP][NVPTX] Handle additional invalid PTX characters
For OpenMP offload, Clang emits global symbols containing the string
'<captured>', which contains characters that are invalid in PTX.
Extend the existing pass that replaces '.'  and '@' characters with
'_$_' to also replace '<' and '>' characters.

Reviewed By: cchen

Differential Revision: https://reviews.llvm.org/D154241
2023-06-30 14:58:37 -05:00
Nikita Popov
bb3763e497 Revert "[SimplifyCFG] Allow dropping block that only contains ephemeral values"
This reverts commit 20f0c68fd83a0147a8ec1722bd2e848180610288.

https://reviews.llvm.org/D153966#4464594 reports an optimization
regression in Rust.

Additionally this change has caused an unexpected 0.3% compile-time
regression.
2023-06-30 21:24:05 +02:00
Matt Arsenault
8f9eee3602 AMDGPU: Fix opaque pointer conversion error in test
The * was in the wrong place so this was missed by the script.
2023-06-30 15:04:03 -04:00
Thomas Lively
4f065fcb57 [WebAssembly] Fix incorrect assertion in SIMD reduction codegen
The codegen routine introduced in 18077e9fd688 did not account for vectors with
more than 16 lanes. Remove the incorrect assertion and bail out of the
optimization when encountering this case. Add test cases that previously
triggered the assertion. Unfortunately, these test cases now have terrible
codegen, but that is at least better than crashing.

Fixes #63500.

Differential Revision: https://reviews.llvm.org/D154124
2023-06-30 11:30:18 -07:00
Simon Pilgrim
5e2f0947c5 [X86] Add common SSE2/SSSE3 check prefix to vector truncation tests
Reduced duplicate checks where we can
2023-06-30 18:49:26 +01:00
Matt Arsenault
53acadafdd Verifier: Verify absolute_symbol metadata
This is the same as !range except for one edge case.
2023-06-30 12:31:32 -04:00
Fangrui Song
afd20587f9 MachineFunction: -fsanitize={function,kcfi}: ensure 4-byte alignment
Fix https://github.com/llvm/llvm-project/issues/63579
```
% cat a.c
void foo() {}
% clang --target=arm-none-eabi -mthumb -mno-unaligned-access -fsanitize=kcfi a.c -S -o - | grep p2align
        .p2align        1
% clang --target=armv6m-none-eabi -fsanitize=function a.c -S -o - | grep p2align
        .p2align        1
```

Ensure that -fsanitize={function,kcfi} instrumented functions are aligned by at
least 4, so that loading the type hash before the function label will not cause
a misaligned access. This is especially important for -mno-unaligned-access
configurations that don't set `setMinFunctionAlignment` to 4 or greater.

With this patch, the generated assembly for the examples above will contain `.p2align 2`
before the type hash.

If `__attribute__((aligned(N)))` or `-falign-functions=N` is specified, the
larger alignment will be used.

Reviewed By: simon_tatham, samitolvanen

Differential Revision: https://reviews.llvm.org/D154125
2023-06-30 09:13:19 -07:00
Alex Bradbury
5ba40c7be3 [RISCV] Custom lower FP_TO_FP16 and FP16_TO_FP to correct ABI of of libcall
As introduced in D99148, RISC-V uses the softPromoteHalf legalisation
for fp16 values without zfh, with logic ensuring that f16 values are
passed in lower bits of FPRs (see D98670) when F or D support is
present. This legalisation produces ISD::FP_TO_FP16 and ISD::FP16_TO_FP
nodes which (as described in ISDOpcodes.h) provide a "semi-softened
interface for dealing with f16 (as an i16)". i.e. the return type of the
FP_TO_FP16 is an integer rather than a float (and the arg of FP16_TO_FP
is an integer). The remainder of the description focuses primarily on
FP_TO_FP16 for ease of explanation.

FP_TO_FP16 is lowered to a libcall to `__truncsfhf2 (float)` or
`__truncdfhf2 (double)`. As of D92241, `_Float16` is used as the return
type of these libcalls if the host compiler accepts `_Float16` in a test
input (i.e. dst_t is set to `_Float16`). `_Float16` is enabled for the
RISC-V target as of D105001 and so the return value should be passed in
an FPR on hard float ABIs.

This patch fixes the ABI issue in what appears to be a minimally
invasive way - leaving the softPromoteHalf logic undisturbed, and
lowering FP_TO_FP16 to an f32-returning libcall, converting its result
to an XLen integer value.

As can be seen in the test changes, the custom lowering for FP16_TO_FP
means the libcall is no longer tail-callable.

Although this patch fixes the issue, there are two open items:
* Redundant fmv.x.w and fmv.w.x pairs are now somtimes produced during
  lowering (not a correctness issue).
* Now coverage for STRICT variants of FP16 conversion opcodes.

Differential Revision: https://reviews.llvm.org/D151284
2023-06-30 16:41:49 +01:00
Alex Bradbury
ee5aaa8e6c [RISCV][test] Add additional RUN lines to half-convert.ll in preparation for D151824
There wasn't previous coverage for rv32id-ilp32, rv64id-lp64,
rv32id-ilp32d, or rv64id-lp64d. This is needed as D151284 fixes a bug
related to the ABI used for libcalls for fp<->fp16 conversion when hard
FP support is present.
2023-06-30 16:41:49 +01:00
Ben Shi
8099d6c20b [CSKY] Optimize IR pattern icmp-select with INCT32/INCF32
Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D153436
2023-06-30 22:55:25 +08:00
Ben Shi
6d254a25cb [CSKY][test][NFC] Add tests of IR pattern icmp-select
These tests will be optimized with INCT32/INCF32/DECT32/DECF32
in the future.

Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D153434
2023-06-30 22:55:24 +08:00
Simon Pilgrim
4742715eb7 [DAG] Fold (*ext (*_extend_vector_inreg x)) -> (*_extend_vector_inreg x) 2023-06-30 14:42:49 +01:00
Nikita Popov
20f0c68fd8 [SimplifyCFG] Allow dropping block that only contains ephemeral values
Perform the TryToSimplifyUncondBranchFromEmptyBlock() transform if
the block is empty except for ephemeral values. The ephemeral values
will be dropped in that case.

This makes sure that assumes don't block this transforms, as reported
in https://discourse.llvm.org/t/llvm-assume-blocks-optimization/71609.

Differential Revision: https://reviews.llvm.org/D153966
2023-06-30 15:24:01 +02:00
Phoebe Wang
8d0fecd34a [X86][FP16] Pre-commit test to show a mis-combination 2023-06-30 21:08:15 +08:00