52796 Commits

Author SHA1 Message Date
David Green
2861ec84fc [AArch64][GlobalISel] Add lowering for constant BIT/BIF/BSP (#65897)
The non-constant bit/bif/bsp already work through tablegen patterns, this
patch handles the constant case, mirroring the basic support for
`or(and(X, C), and(Y, ~C))` from ISel tryCombineToBSL. BSP gets expanded
to either BIT, BIF or BSL depending on the best register allocation.
G_BIT can be replaced with G_BSP as a more general alternative.
2023-09-17 09:50:12 +01:00
Yingwei Zheng
e042ff7eef
[SDAG][RISCV] Avoid expanding is-power-of-2 pattern on riscv32/64 with zbb
This patch adjusts the legality check for riscv to use `cpop/cpopw` since `isOperationLegal(ISD::CTPOP, MVT::i32)` returns false on rv64gc_zbb.
Clang vs gcc: https://godbolt.org/z/rc3s4hjPh

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156390
2023-09-17 02:56:09 +08:00
Yingwei Zheng
b423e1f05d
[SDAG][RISCV] Avoid neg instructions when lowering atomic_load_sub with a constant rhs
This patch avoids creating (sub x0, rhs) when lowering atomic_load_sub with a constant rhs.
Comparison with GCC: https://godbolt.org/z/c5zPdP7j4

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158673
2023-09-16 17:09:41 +08:00
Philip Reames
c663401f69 [RISCV] Prefer vrgatherei16 for shuffles (#66291)
If the data type is larger than e16, and the requires more than LMUL1 register
class, prefer the use of vrgatherei16.  This has three major benefits:
1) Less work needed to evaluate the constant for e.g. vid sequences.  Remember
that arithmetic generally scales lineary with LMUL.
2) Less register pressure.  In particular, the source and indices registers
*can* overlap so using a smaller index can significantly help at m8.
3) Smaller constants.  We've got a bunch of tricks for materializing small
constants, and if needed, can use a EEW=16 load.
2023-09-15 15:57:23 -07:00
Philip Reames
ff2622b5ac [RISCV] Optimize gather/scatter to unit-stride memop + shuffle (#66279)
If we have a gather or a scatter whose index describes a permutation of the
lanes, we can lower this as a shuffle + a unit strided memory operation.  For
RISCV, this replaces a indexed load/store with a unit strided memory operation
and a vrgather (at worst).

I did not bother to implement the vp.scatter and vp.gather variants of these
transforms because they'd only be legal when EVL was VLMAX.  Given that, they
should have been transformed to the non-vp variants anyways.  I haven't checked
to see if they actually are.
2023-09-15 15:54:32 -07:00
Craig Topper
ac182deee8 [RISCV][GlobalISel] Select ALU GPR instructions
Some instruction selection patterns required for ALU GPR instructions have
already been automatically imported from existing TableGen descriptions -
this patch simply adds testing for them. The first of the GIComplexPatternEquiv
definitions required to select the shiftMaskXLen ComplexPattern has been added.

Some instructions require special handling due to i32 not being a legal
type on RV64 in SelectionDAG so we can't reuse SelectionDAG patterns.

Co-authored-by: Lewis Revill <lewis.revill@embecosm.com>

Reviewed By: nitinjohnraj

Differential Revision: https://reviews.llvm.org/D76445
2023-09-15 15:49:38 -07:00
Philip Reames
37aa07ad31 [RISCV] Move narrowIndex to be a DAG combine over target independent nodes
In D154687, we added a transform to narrow indexed load/store indices of the
form (shl (zext), C).  We can move this into a generic transform over the
target independent nodes instead, and pick up the fixed vector cases with no
additional work required.  This is an alternative to D158163.

Performing this transform points out that we weren't eliminating zero_extends
via the the generic DAG combine.  Adjust the (existing) callbacks so that we
do.

This change *removes* the existing transform on the target specific intrinsic
nodes.  If anyone has a use case this impacts, please speak up.

Note: Reviewed as part of a stack of changes in PR# 66405.
2023-09-15 15:02:14 -07:00
Mircea Trofin
0af95c3262 [mlgo] Fix regalloc tests
Post - D156491 or cbdccb3. Just re-based reference outputs.
2023-09-15 17:27:34 -04:00
Guozhi Wei
cbdccb30c2 [RA] Split a virtual register in cold blocks if it is not assigned preferred physical register
If a virtual register is not assigned preferred physical register, it means some
COPY instructions will be changed to real register move instructions. In this
case we can try to split the virtual register in colder blocks, if success, the
original COPY instructions can be deleted, and the new COPY instructions in
colder blocks will be generated as register move instructions. It results in
fewer dynamic register move instructions executed.

The new test case split-reg-with-hint.ll gives an example, the hot path contains
24 instructions without this patch, now it is only 4 instructions with this
patch.

Differential Revision: https://reviews.llvm.org/D156491
2023-09-15 19:52:50 +00:00
Philip Reames
52b33ff760 [RISCV] Avoid toggling VL for hidden splat case in constant buildvector lowering
We have the analogous case in the single insert path.  The reasoning here is that if the original VL fits in LMUL1, we'd prefer to clobber a few extra dead lanes than to force two VL toggles.  VTYPE toggles are generally cheaper than VL toggles.
2023-09-15 12:33:21 -07:00
Jon Roelofs
003bcad9a8
[ARM] Always lower direct calls as direct when the outliner is enabled (#66434)
The indirect lowering hinders the outliner's ability to see that
sequences are in fact common, since the sequence similarity is rendered
opaque by the register callee. The size savings from making them
indirect seems to be dwarfed by the outliner's savings from
de-duplication.

rdar://115178034
rdar://115459865
2023-09-15 10:04:56 -07:00
Vladislav Dzhidzhoev
4e970d7bd8
[AArch64][GlobalISel] Select llvm.aarch64.neon.st* intrinsics (#65491)
Similar to llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2023-09-15 16:35:21 +02:00
David Green
b0f0aa852d [AArch64] Guard against a invalid size request in performVecReduceAddCombine
With both +sve and +dotprod, and a scalable vecreduce(sext) we could attempt to
access the number of elements of a scalable vector. Guard against this for now,
until scalable dotprod are properly supported.
2023-09-15 14:04:21 +01:00
Nikita Popov
47324cfd7d Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size
Reapply after fixing a clang bug this exposed in D158972 and
adjusting a number of tests that failed for 32-bit targets.

-----

Add a check that the DILocalVariable fragment size in dbg.declare
does not exceed the size of the alloca.

This would have caught the invalid debuginfo regenerated by rustc
in https://github.com/llvm/llvm-project/issues/64149.

Differential Revision: https://reviews.llvm.org/D158743
2023-09-15 14:51:50 +02:00
Vladislav Dzhidzhoev
c464896dbe
[AArch64][GlobalISel] Select llvm.aarch64.neon.ld* intrinsics (#65630)
Similar to llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp.
2023-09-15 14:03:48 +02:00
Benjamin Kramer
3454cf67bd Revert "[MachineLICM] Handle Subloops"
This reverts commit 5ec9699c4d1f165364586d825baef434e2c110b4. It
accesses MI after it has been hoisted.
2023-09-15 13:20:31 +02:00
Matthew Devereau
9bbbfbc7fd
[AArch64][SME] Emit Zero instruction for NewZA functions
[The ACLE](https://github.com/ARM-software/acle/pull/268) Demands that
functions with the aarch64_pstate_za_new attribute set all bits of the
ZA register to zero upon entry.
2023-09-15 11:40:30 +01:00
Jay Foad
ceb68eea8c
[AMDGPU] Remove repeated -mtriple options from RUN lines (#66486) 2023-09-15 11:29:24 +01:00
Weining Lu
0a692b6b96 [LoongArch] Fix incorrect instruction 'and' in pattern
It should be `andi`, but not `and`.

Address buildbot failure:
https://lab.llvm.org/buildbot/#/builders/42/builds/11634
2023-09-15 16:16:06 +08:00
Martin Storsjö
7a91bbbb00
[GlobalISel] Check for unsupported Windows features on invoke (#65864)
This matches what is done on calls, since
cc981d285d1aa33df201605b9a3e22dd2311ead2 (extended for another case in
5a751e747dbf2c267e944aa961e21de7a815e7eb).

Apply both those cases on invoke just like is done for call.

Also update the preexisting comment which was left without update in
5a751e747dbf2c267e944aa961e21de7a815e7eb.

This fixes github issue #61941.
2023-09-15 11:14:40 +03:00
Pierre van Houtryve
e9e3868707
[AMDGPU] Correctly restore FP mode in FDIV32 lowering (#66346)
Addresses the FIXME for both DAGISel and GISel.
2023-09-15 08:11:01 +02:00
Rainer Orth
715fc4fc60
[Sparc] Don't emit __multi3 on 32-bit SPARC (#66362)
LLVM fails to build on 32-bit Solaris/SPARC: several programs fail to
link due to undefined references to `__multi3`. This reference is from
`lib/libLLVMScalarOpts.a(LoopStrengthReduce.cpp.o)`. However, This
function exists neither in the 32-bit `libgcc.a` nor in
`libclang_rt.builtins-sparc.a`. It's only defined in their 64-bit
counterparts.

The same issue affects several 32-bit targets, e.g. 32-bit PowerPC as
described in Issue #54460. The fix is the same: inhibit the libcall for
32-bit compilations. This patch does just that, regenerating the
affected testcases. It allows the build to complete.

Tested on `sparc-sun-solaris2.11`.
2023-09-15 07:31:59 +02:00
Arthur Eubanks
1feb00a28c [X86] Introduce a large data threshold for the medium code model
Currently clang's medium code model treats all data as large, putting them in a large data section and using more expensive instruction sequences to access them.

Following gcc's -mlarge-data-threshold, which allows putting data under a certain size in a normal data section as opposed to a large data section. This allows using cheaper code sequences to access some portion of data in the binary (which will be implemented in LLVM in a future patch).

And under the medium codel mode, only put data above the large data threshold into large data sections, not all data.

Reviewed By: MaskRay, rnk

Differential Revision: https://reviews.llvm.org/D149288
2023-09-14 15:09:25 -07:00
Kuba (Brecka) Mracek
454cc36630
[AArch64] Relax binary format switch in AArch64MCInstLower::LowerSymbolOperand to allow non-Darwin Mach-O files (#66011)
Trying to use a arm64-apple-none-macho target triple today crashes with
an assertion, this patch fixes that.
2023-09-14 11:12:30 -07:00
Jingu Kang
5ec9699c4d [MachineLICM] Handle Subloops
Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outermost loop's blocks once.

Differential Revision: https://reviews.llvm.org/D154205
2023-09-14 18:07:31 +01:00
David Green
74724902ba
[AArch64] Split Ampere1Write_Arith into rr/ri and rs/rx InstRWs. (#66384)
The ampere1 scheduling model uses IsCheapLSL predicates for ADDXri and
ADDWrr instructions, which only have 3 operands. In attempting to check
that the third is a shift, the predicate can attempt to access an out of
bounds operand, hitting an assert. This splits the rr/ri instructions
(which can never have shifts) from the rs/rx instructions to ensure they
both work correctly. Ampere1Write_1cyc_1AB was chosen for the rr/ir
instructions to match the cheap case.

This also sets CompleteModel = 0 for the ampere1 scheduling model, as at
runtime under debug it will attempt to check that as well as all
instructions having scheduling info, there is information for each
output operand.

DefIdx 1 exceeds machine model writes for
  renamable $w9, renamable $w8 = LDPWi renamable $x8, 0
(Try with MCSchedModel.CompleteModel set to false)incomplete machine
model
2023-09-14 16:29:30 +01:00
Manos Anagnostakis
008f26b12e
[AArch64] New subtarget features to control ldp and stp formation (#66098)
On some AArch64 cores, including Ampere's ampere1 and ampere1a
architectures, load and store pair instructions are faster compared to
simple loads/stores only when the alignment of the pair is at least
twice that of the individual element being loaded.

Based on that, this patch introduces four new subtarget features, two
for controlling ldp and two for controlling stp, to cover the ampere1
and ampere1a alignment needs and to enable optional fine-grained control
over ldp and stp generation in general. The latter can be utilized by
another cpu, if there are possible benefits
with a different policy than the default provided by the compiler.

More specifically, for each of the ldp and stp respectively we have:

- disable-ldp/disable-stp: Do not emit ldp/stp.
- ldp-aligned-only/stp-aligned-only: Emit ldp/stp only if the source
pointer is aligned to at least double the alignment of the type.

Therefore, for -mcpu=ampere1 and -mcpu=ampere1a
ldp-aligned-only/stp-aligned-only become the defaults, because of the
benefit from the alignment, whereas for the rest of the cpus the default
behaviour of the compiler is maintained.
2023-09-14 16:58:39 +02:00
Paul Walker
c7d65e4466 [IR] Enable load/store/alloca for arrays of scalable vectors.
Differential Revision: https://reviews.llvm.org/D158517
2023-09-14 13:49:01 +00:00
paulwalker-arm
8ba5820e7a
[SVE] Ensure SVE call operands passed via memory are correctly initialised. (#66070)
The stores created when passing operands via memory don't typically
maintain the chain, because they can be done in any order. Instead,
a new chain is created based on all collated stores. SVE parameters
passed via memory don't follow this idiom and try to maintain the
chain, which unfortunately can result in them being incorrectly
deadcoded when the chain is recreated.

This patch brings the SVE side in line with the non-SVE side to
ensure no stores become lost whilst also allowing greater flexibility
when ordering the stores.
2023-09-14 12:58:58 +01:00
David Green
adc5509186 [AArch64] Add LRINT/LLRINT/LROUND/LLROUND FP16 lowering without fullfp16 (#66174)
We apparently somehow had lowering for the STRICT nodes without any handling
for the normal operations. This makes sure we support the LRINT and LROUND
intrinsics for fp16 when +fullfp16 is not present.
2023-09-14 09:36:03 +01:00
Weining Lu
419f90e93a [LoongArch] Support llvm.is.fpclass for f32 and f64
is_fpclass (fj, mask)
->
sltu (r0, and (movfr2gr.[sd] (fclass.[sd] fj), (to_fclass_mask mask)))

[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#_fclass_sd

Reviewed By: wangleiat

Differential Revision: https://reviews.llvm.org/D159183
2023-09-14 15:43:58 +08:00
Jianjian Guan
c31dda4e6e
[RISCV] Update Zicntr and Zihpm to version 2p0 (#66323) 2023-09-14 15:43:50 +08:00
Pierre van Houtryve
3d0353793b
[AMDGPU] Fix HasFP32Denormals check in FDIV32 lowering (#66212)
Fixes SWDEV-403219
2023-09-14 08:47:10 +02:00
daisy202309
9ef15f4109
[AArch64][CodeGen] Fix wrong operand order when creating vcmla intrinsic (#65278)
Co-authored-by: lizhijin <lizhijin3@huawei.com>
2023-09-14 12:10:16 +08:00
Allen
347b3f1209
[ARM][ISel] Fix crash of ISD::FMINNUM/FMAXNUM (#65849)
The instruction of ISD::FMINNUM/FMAXNUM should be legal if HasFPARMv8 &&
HasNEON.
For the combination of armv7+fp-armv8, armv7 imply the feature HasNEON
on, and fp-armv8 matchs the feature HasFPARMv8, so it is legal

Fixes https://github.com/llvm/llvm-project/issues/65820
2023-09-14 10:35:07 +08:00
Maryam Moghadas
7b021f2e64 [PowerPC] Optimize VPERM and fix code order for swapping vector operands on LE
This patch reverts commit 7614ba0a5db8 to optimize VPERM when one of its
vector operands is XXSWAPD, similar to XXPERM. It also reorganizes the
little-endian swap code on LE, swapping the vector operand after
adjusting the mask operand. This ensures that the vector operand is
swapped at the correct point in the code, resulting in a valid
constant pool for the mask operand.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D149083
2023-09-13 15:00:49 -05:00
Jeffrey Byrnes
372115fadd [AMDGPU] Precommit test for i8 vector CopyToReg handling patch
Adds test to show impact on cross block CopyToReg & CopyFromReg handling for n x i8, and shows NFC on CC

Differential Revision: https://reviews.llvm.org/D159303

Change-Id: Ib6d9802dbebe8e3245e4ccfd4a6f23357de8c480
2023-09-13 11:27:15 -07:00
Philip Reames
2f005df066
[DAG][X86] Fold mgather/mscatter/etc with splat index (#65980)
A splat index means the operation is reading from (writing to) the same
memory location. Generally, zero is the cheapest value to splat. As
such, we'd prefer to add the splatted value to the base, and use a
constant zero as the index operand.
2023-09-13 09:26:30 -07:00
Simon Pilgrim
8b479136c9 [X86] SimplifyMultipleUseDemandedBitsForTargetNode - add X86ISD::BLENDV handling
If we know the condition mask sign bit of the demanded elements then we can directly return the LHS/RHS selection
2023-09-13 17:23:11 +01:00
Mircea Trofin
c1f8fdbb5c [mlgo] fix 2 regalloc tests post DAG change
The tests check against an expected output file - trivial change.

Reference: either D158068 or e6b85c3
2023-09-13 11:48:47 -04:00
Simon Pilgrim
4a32c48280 [X86] LowerTRUNCATE - ensure we handle cases where we truncate to a sub-128bit type (PR66194)
Fixes #66194
2023-09-13 13:15:42 +01:00
Simon Pilgrim
e6b85c3027 [DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case (REAPPLIED)
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction

As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.

Reapplied after reversion at e1e3c75c7dad72 with a tweak to the pseudo-probe-peep.ll test

Differential Revision: https://reviews.llvm.org/D158068
2023-09-13 12:33:39 +01:00
Simon Pilgrim
e1e3c75c7d Revert rG6c56cf71ee82ec3a28e0dfc2b751bd10c16929da "[DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case"
Need to address a missed test change
2023-09-13 11:27:47 +01:00
Sander de Smalen
b8ec2832c3
[AArch64][SME] Various tests should work with +sme, just as they do for +sve (#65260) 2023-09-13 11:11:51 +01:00
Simon Pilgrim
6c56cf71ee [DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction

As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.

Differential Revision: https://reviews.llvm.org/D158068
2023-09-13 11:01:58 +01:00
Qiu Chaofan
69b056d563 [PowerPC] Implement SchedModel for Power7
Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D158704
2023-09-13 14:55:07 +08:00
Matt Arsenault
231aa0f212 AMDGPU: Avoid creating vector extracts if we aren't going to do anything
Try to avoid expensive checks failures from reporting no changes
when some dead instructions were introduced.
2023-09-13 09:45:34 +03:00
Matt Arsenault
edecb60481 Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp"
This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.
2023-09-13 08:38:48 +03:00
Pravin Jagtap
3755ea93b4
[AMDGPU] Fix scan of atomicFSub in AtomicOptimizer. (#66082)
[D156301](https://reviews.llvm.org/D156301) introduced atomic
optimizations for FAdd/FSub. For FSub, reduction/scan needs to be
performed using add operation (`not sub`) and memory location will be
updated by reduced value using atomic sub later by only one lane.

---------

Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2023-09-13 09:57:10 +05:30
Jeffrey Byrnes
db47264ab3
Revert "[AMDGPU]: Allow combining into v_dot4" (#66158)
This reverts commit 7fda1b74be4a173031192d8516869e87e6b7582d.
2023-09-12 16:57:17 -07:00