52796 Commits

Author SHA1 Message Date
Igor Kirillov
c15557d64e [CodeGen] Extend ComplexDeinterleaving pass to recognise patterns using integer types
AArch64 introduced CMLA and CADD instructions as part of SVE2. This
change allows to generate such instructions when this architecture
feature is available.

Differential Revision: https://reviews.llvm.org/D153808
2023-07-19 11:01:19 +00:00
Simon Pilgrim
98b0f1360d [DAG] hoistLogicOpWithSameOpcodeHands - add support for SIGN_EXTEND_INREG nodes.
This can reuse the existing *_EXTEND node handling (with special handling for the valuetype arg)
2023-07-19 11:56:32 +01:00
Ivan Kosarev
1b32427213 [AMDGPU] Combine the SDAG and GISel versions of the fmed3.ll test.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D155590
2023-07-19 11:42:36 +01:00
Simon Pilgrim
2167ae93c9 [DAG] hoistLogicOpWithSameOpcodeHands - add support for *_EXTEND_VECTOR_INREG nodes.
This can reuse the existing *_EXTEND node handling.
2023-07-19 10:50:23 +01:00
Jay Foad
7fa7a08f21 [AMDGPU] Insert s_nop before s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
Differential Revision: https://reviews.llvm.org/D155681
2023-07-19 10:33:11 +01:00
Jay Foad
496766840f [ARM] Add a regression test for D154281
This is a reduced version of one of the tests that was broken by the
original commit of D154281 "[CodeGen] Store SP adjustment in
MachineBasicBlock. NFCI.".

Differential Revision: https://reviews.llvm.org/D155471
2023-07-19 10:32:21 +01:00
Jingu Kang
62ed3ff4bb Revert "[MachineLICM] Handle Subloops"
This reverts commit 33e60484d750291e99301e29e60fe72c8fa48ccd.
2023-07-19 10:30:50 +01:00
Simon Pilgrim
5bc836422e [X86] LowerEXTEND_VECTOR_INREG - add sign_extend_vector_inreg fast path for all-signbits source values
If the source operand is already all-signbits we don't need to create the sign extended elements - just splat the source element to the destination element width
2023-07-19 10:13:08 +01:00
Jay Foad
4389f9b2ad [AMDGPU] Regenerate is.fpclass checks 2023-07-19 09:32:22 +01:00
Martin Storsjö
20b7584455 Reland [AArch64] Fix an immediate out of range for large realignments on Windows
Also add a missing FrameSetup flag on the existing add instruction.

This fixes https://github.com/llvm/llvm-project/issues/63701.

Since the previous iteration, change ADDXrr to ADDXrx64, which
works with this use of SP.

Differential Revision: https://reviews.llvm.org/D155447
2023-07-19 11:19:04 +03:00
Patryk Wychowaniec
4e831753b9 [AVR] Expand shifts of all types except int8 and int16
Currently our AVRShiftExpand pass expands only 32-bit shifts, with the
assumption that other kinds of shifts (e.g. 64-bit ones) are
automatically reduced to 8-bit ones by LLVM during ISel.

However this is not always true and causes problems in the rust-lang runtime.

This commit changes the logic a bit, so that instead of expanding only
32-bit shifts, we expand shifts of all types except 8-bit and 16-bit.

This is not the most optimal solution, because 64-bit shifts can be
expanded to 32-bit shifts which has been deeply optimized.

I've checked the generated code using rustc + simavr, and all shifts
seem to behave correctly.

Spotted in the wild in rustc:
https://github.com/rust-lang/compiler-builtins/issues/523
https://github.com/rust-lang/rust/issues/112140

Reviewed By: benshi001

Differential Revision: https://reviews.llvm.org/D154785
2023-07-19 11:57:00 +08:00
Jianjian GUAN
eb33db4f91 [AVR] Enable verifyInstructionPredicates for AVR
This patch fixes the failed test of verifyInstructionPredicates which is caused by verifyInstructionPredicates. verifyInstructionPredicates will add JMPk without checking the target predicate.

Reviewed By: benshi001

Differential Revision: https://reviews.llvm.org/D155570
2023-07-19 11:43:39 +08:00
WANG Xuerui
f27017a063 [LoongArch] Align functions and loops better according to uarch
The LA464 micro-architecture is very sensitive to alignment of hot code,
with performance variation of up to ~12% in the go1 benchmark suite of
the Go language (as observed by me during my work on the Go loong64
port).
[[ https://go.dev/cl/479816 | Manual alignment of certain loops ]] and [[ https://go.dev/cl/479817 | automatic alignment of loop heads ]]
helps a lot there, by reducing much of the random variation and
generally increasing performance, so we naturally want to do the same
here.

Practically, LA464 is the only LoongArch micro-architecture in wide use,
and we are currently supporting just that. The first "4" in "LA464"
stands for "4-issue", in particular its instruction fetch and decode
stages are 4-wide; so functions and branch targets should be preferably
aligned to at least 16 bytes for best throughput.

The Loongson team has benchmarked various combinations of function,
loop, and branch target alignments with GCC.
[[ https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619980.html | The results ]]
show that "16-byte label alignment together with 32-byte function
alignment gives best results in terms of SPEC score". A "label" in GCC
means a branch target; while we don't currently align branch targets,
we do align loops, so in this patch we default to 32-byte function
alignment and 16-byte loop alignment.

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D148622
2023-07-19 11:25:57 +08:00
eopXD
32c257d384 [RISCV] Use the stack for MVT::f16 for fastcc when there are no other registers available
In D155502, we added code for the compiler to check GPR-s for f16
under zhinx. This commit adds code to hit the stack when we run out of
GPR-s.

With this patch and D155502, resolves #63922

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155507
2023-07-18 19:49:17 -07:00
Daniil Suchkov
6fa66acfad [AArch64] NFC. Add a test exposing a bug in FixupStatepointCallerSaved pass
This pass doesn't take register classes into account, so it ends up
trying to spill a non-GP register onto stack which is not correct.
2023-07-18 13:53:28 -07:00
David Green
74c0bdff7d [AArch64][GISel] Additional FPTrunc vector lowering
I was attempting to add llvm.reduce.fminimum/fmaximum support for GlobalISel.
In the process I noticed that llvm.reduce.fmin/fmax was missing, and could do
with being added first. That led on to adding additional vector support for
minnum/maxnum, which in turn led to needing to handle fptrunc and fpext for
some of the fp16 types. So this patch extends the vector handling for fptrunc,
adding support for f16 types which are clamped to 4 elements, and scalarizing
the rest.

I went round in circles a little with how smaller than legal vectors should be
handled, but this seems simple and seems to work, if not always optimally yet.

Differential Revision: https://reviews.llvm.org/D155311
2023-07-18 18:52:19 +01:00
Craig Topper
ea3683e98f [RISCV] Improve type promotion for i32 clmulr/clmulh on RV64.
Instead of zero extending the inputs by masking. We can shift them
left instead. This is cheaper when we don't zext.w instruction.

This does make the case where the inputs are already zero extended
or freely zero extendable worse though.

Reviewed By: wangpc

Differential Revision: https://reviews.llvm.org/D155530
2023-07-18 10:39:25 -07:00
Simon Pilgrim
d7eb9240c0 [DAG] SimplifyDemandedBits - attempt to use SimplifyMultipleUseDemandedBits for bitcasts from larger element types
Attempt to avoid multi-use ops if the bitcast doesn't need anything from them.
2023-07-18 18:38:03 +01:00
Craig Topper
0c055286b2 [RISCV] Use RISCVISD::CZERO_EQZ/CZERO_NEZ for XVentanaCondOps.
This makes Zicond and XVentanaCondOps use the same code path.
The instructions have identical semantics.

Reviewed By: wangpc

Differential Revision: https://reviews.llvm.org/D155391
2023-07-18 10:18:02 -07:00
eopXD
ca72457346 [RISCV] Add test coverage for peephole vmerge optimization of unmasked rvv instruction with a rounding mode (NFC)
No functional change intended.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D155550
2023-07-18 10:03:58 -07:00
eopXD
eb89bf8d0d [RISCV] Do not use FPR registers for fastcc if zfh/f/d is not specified in the architecture
Resolves #63917.

Also lets the compiler check for available GPR before hitting the stack.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D155502
2023-07-18 10:03:04 -07:00
Craig Topper
cdee88a2e0 [RISCV] Add isMoveReg to vmv1r/vmv2r/vmv4r/vmv8r.v.
This allows TII isCopyInstrImpl to consider them copies.

Reviewed By: wangpc

Differential Revision: https://reviews.llvm.org/D155140
2023-07-18 09:49:23 -07:00
Craig Topper
7767297b58 [RISCV] Test for D155140. NFC
The vmv1r.v v8, v9 in the last block can be removed by late
copy propagation.

Reviewed By: wangpc

Differential Revision: https://reviews.llvm.org/D155527
2023-07-18 09:49:23 -07:00
Zhuojia Shen
b0093e13fc [AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre
This patch optimizes a pair of LDRSWpre and LDRSWui (or LDURSWi)
instructions into a single LDPSWpre instruction.  This is a missing case
in D99272.

MIR test cases in D152564 are updated to verify the optimization.

Differential Revision: https://reviews.llvm.org/D152407
2023-07-18 09:46:47 -07:00
Zhuojia Shen
94f76004d5 [AArch64] Add tests for merging LDRSWpre-LDR pairs
This patch adds MIR test cases that test merging an LDRSWpre-LDR
instruction pair into an LDPSWpre instruction.  This optimization is
currently missing and will be added a subsequent patch (D152407), so all
test cases are no merge for now.

Differential Revision: https://reviews.llvm.org/D152564
2023-07-18 09:46:46 -07:00
Simon Pilgrim
3ad4f92f83 [DAG] More aggressively (extract_vector_elt (build_vector x, y), c) iff element is zero constant
We currently don't extract vector elements from multi-use build vectors unless TLI.aggressivelyPreferBuildVectorSources accepts them, which seems a little extreme for constant build vectors (especially as under some cases ComputeKnownBits will indirectly extract the data for us).

This is causing a few regressions in some upcoming SimplifyDemandedBits work I'm looking at, all of which just need to know that the element is zero, so I've tweaked the fold to accept zero elements as well, which will typically fold very easily.

Differential Revision: https://reviews.llvm.org/D155582
2023-07-18 17:31:34 +01:00
Simon Pilgrim
b8bda50932 [Sparc] Regenerate float-constants.ll test checks 2023-07-18 17:31:34 +01:00
Dinar Temirbulatov
fe22b9050c [AArch64] Regenerate a couple of vector-shuffle tests. NFC
As a request in https://reviews.llvm.org/D152205
2023-07-18 16:04:15 +00:00
Martin Storsjö
793a349e6f Revert "[AArch64] Fix an immediate out of range for large realignments on Windows"
This reverts commit b1d0bc0f4395c69097bc11b6ba8f821f621272a9.

Builds with expensive checks show that 'sp' isn't a valid register
in ADDXrr - an object file built without exprnsive checks enabled
disassembles as "add x15, xzr, x16", instead of the intended
"add x15, sp, x16".
2023-07-18 18:21:23 +03:00
John Brawn
343e204a52 [ARM] Replace TransferImpOps with copyImplicitOps
In most places where TransferImpOps is currently used we just have one
machine instruction, so it's doing the same thing as copyImplicitOps
anyway. In those cases where we have more than one machine
instruction the destination is written to in each instruction so any
implicit defs should appear on all of them (and we shouldn't see any
implicit refs as these pseudo-instruction don't have any register
inputs), meaning the current use of TransferImpOps is incorrect and
we should be using copyImplicitOps on all of the generated
instructions.

Differential Revision: https://reviews.llvm.org/D155301
2023-07-18 14:01:04 +01:00
Martin Storsjö
b1d0bc0f43 [AArch64] Fix an immediate out of range for large realignments on Windows
Also add a missing FrameSetup flag on the existing add instruction.

This fixes https://github.com/llvm/llvm-project/issues/63701.

Differential Revision: https://reviews.llvm.org/D155447
2023-07-18 15:56:36 +03:00
Luke Lau
5eb7191421 [RISCV] Add VP patterns for vandn.[vv,vx]
This builds upon D155433

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155434
2023-07-18 13:53:17 +01:00
Luke Lau
26ff4c6745 [RISCV] Add SDNode patterns for vandn.[vv,vx]
Unfortunately we can't use the standard splat_vector and vnot PatFrags because
they are preprocessed to vmv.v.x's, so we need to define helpers to catch
those. We can't use SplatPat either because we need to nest another fragment
inside of it.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155433
2023-07-18 13:53:14 +01:00
Matt Arsenault
c28e09c8d1 AMDGPU: Preserve flags in fdiv_fast lowering
We were dropping the flags and thus blocking contract into potential
fadd users. GlobalISel was already preserving the flags here.

https://reviews.llvm.org/D155443
2023-07-18 06:57:07 -04:00
Matt Arsenault
4a81283b94 AMDGPU: Generate and add fdiv tests
Prepare for new lowering strategies because we somehow didn't have
enough of them already.
2023-07-18 06:38:05 -04:00
Matt Arsenault
cdfdfe7ccc AMDGPU: Add some additional rcp/rsq tests 2023-07-18 06:37:15 -04:00
Sander de Smalen
08fd44b300 [AArch64] Force streaming-compatible codegen when attributes are set.
Before this patch, the only way to generate streaming-compatible code
was to use the `-force-streaming-compatible-sve` flag, but the compiler
should also avoid the use of instructions invalid in streaming mode
when a function has the aarch64_pstate_sm_enabled/compatible attribute.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D155428
2023-07-18 10:26:00 +00:00
Matt Arsenault
3f8ef57bed MachineSink: Fix sinking VGPR def out of a divergent loop
This fixes sinking a VGPR def out of a loop past the reconvergence
point at the SI_END_CF. There was a prior fix which introduced
blockPrologueInterferes (D121277) to fix the same basic problem for
the post RA sink. This also had the special case isIgnorableUse case
which was incorrect, because in some contexts the exec use is not
ignorable.

I'm thinking about a new way to represent this which will avoid
needing hasIgnorableUse and isBasicBlockPrologue, which would function
more like the exception handling.

Fixes: SWDEV-407790

https://reviews.llvm.org/D155343
2023-07-18 06:15:50 -04:00
Matt Arsenault
d5ab379506 AMDGPU: Add baseline test for broken machine sinking 2023-07-18 06:15:50 -04:00
David Green
4214f15660 [AArch64] Regenerate a couple of mir GlobalISel tests. NFC
See D155311
2023-07-18 08:28:27 +01:00
LiaoChunyu
65ffcc099c [RISCV] Lower VP_CTLZ_ZERO_UNDEF/VP_CTTZ_ZERO_UNDEF/VP_CTLZ by converting to FP and extracting the exponent.
D111904, D141585 made RISC-V customized lower vector ISD::CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF/CTLZ
by converting to float and using the float result.

Perhaps VP_CTLZ_ZERO_UNDEF/VP_CTTZ_ZERO_UNDEF/VP_CTLZ could use the similar feature.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155150
2023-07-18 15:25:59 +08:00
Konstantina Mitropoulou
4c42ab1199 [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns
CMP(A,C)||CMP(B,C) => CMP(MIN/MAX(A,B), C)
CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C)

This first patch handles integer types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153502
2023-07-17 17:13:47 -07:00
Konstantina Mitropoulou
11cd92a70f [NFC] Tests for future commit in DAGCombiner
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D153479
2023-07-17 17:08:32 -07:00
Matt Arsenault
04185f0b0b AMDGPU: Fix broken denormal constant folding of canonicalize
This needs to consider the dynamic denormal mode. It should be
possible to implement a runtime DAZ check with a canonicalize.
2023-07-17 19:54:20 -04:00
Matt Arsenault
bd2dca0f74 AMDGPU: Use hex floats instead of ugly bitcasting 2023-07-17 19:54:04 -04:00
Matt Arsenault
467df9c591 AMDGPU: Split and convert some rcp and rsq tests to generated checks 2023-07-17 17:34:29 -04:00
Matt Arsenault
296e24cd2e DAG: Constant fold frexp nodes
Special casing the nonfinite exponent value everywhere is kind of
annoying.
2023-07-17 17:34:29 -04:00
Amy Kwan
8e0e442c1d [AIX][TLS] Account for local-exec accesses in XCOFFObjectWriter
This is a follow up to D149722 and aims to address https://github.com/llvm/llvm-project/issues/63885.
Local-exec accesses were not previously accounted for in XCOFFObjectWriter.
Specifically, the R_TLS_LE relocation was not previously handled, which lead to
the incorrect value being written for the relocation target.

Within this patch, the value being written is set to the symbol's virtual
address and extra relocation tests are added.

Differential Revision: https://reviews.llvm.org/D155415
2023-07-17 12:15:44 -05:00
Simon Pilgrim
78be5aebaa [X86] Regenerate tail-call-casts.ll test coverage 2023-07-17 17:58:37 +01:00
Craig Topper
a64b3e92c7 [RISCV] Re-define sha256, Zksed, and Zksh intrinsics to use i32 types.
Previously we returned i32 on RV32 and i64 on RV64. The instructions
only consume 32 bits and only produce 32 bits. For RV64, the result
is sign extended to 64 bits like *W instructions.

This patch removes this detail from the interface to improve
portability and consistency. This matches the proposal for scalar
intrinsics here https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44

I've included IR autoupgrade support as well.

I'll be doing this for other builtins/intrinsics that currently use
'long' in other patches.

Reviewed By: VincentWu

Differential Revision: https://reviews.llvm.org/D154647
2023-07-17 08:58:29 -07:00