52796 Commits

Author SHA1 Message Date
jwanggit86
b853988e0d
[AMDGPU] Port AMDGPURewriteUndefForPHI to new pass manager (#66008)
This patch ports the AMDGPURewriteUndefForPHI pass to the new pass
manager. With this, the pass is supported under both the legacy and the
new pass managers.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2023-09-12 13:32:02 -07:00
Matt Arsenault
c48248d7f9 AMDGPU: Teach valueIsKnownNeverF32Denorm about frexp
https://reviews.llvm.org/D158130
2023-09-12 23:23:10 +03:00
Matt Arsenault
72a7024add AMDGPU: Correctly lower llvm.sqrt.f32
Make codegen emit correctly rounded sqrt by default.

Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare
based on !fpmath, like the fdiv case. Hack around visitation ordering
problems from AMDGPUCodeGenPrepare using forward iteration instead of
a well behaved combiner.

https://reviews.llvm.org/D158129
2023-09-12 23:22:54 +03:00
Jay Foad
928c9d6851
[AMDGPU] Fix some MIR tests (#66090)
Fix some problems in hand written MIR tests that only showed up when I
tried to run LiveIntervals on them, after which they failed machine
verification with "Use not jointly dominated by defs" errors.
2023-09-12 16:32:41 +01:00
Benjamin Kramer
bc8d85655c [NVPTX] Tighten up legal v2i16 ops a bit
TargetLoweringBase makes almost all ops legal by default, so make ones
that Expand explicit and remove redundant legal settings.
2023-09-12 16:10:20 +02:00
David Green
b4c66f4e33 Revert "[AArch64][GlobalISel] Add lowering for constant BIT/BIF/BSP"
This reverts commit cb5bad2acd7a498761d4979825d6801f5a845135 as the existing
fcopysign code looks like it might be incorrect.
2023-09-12 14:18:44 +01:00
Allen
eaf23b2480
[GIsel][AArch64] Legalize <2 x i16> for G_INSERT_VECTOR_ELT (#65830)
Widen the vector elements to 64 bits to make sure it legal instead by
clamping the number of elements. Depend on D153394.

Fixes https://github.com/llvm/llvm-project/issues/63826
2023-09-12 21:15:01 +08:00
Jay Foad
0528dbfe5c
Add some -early-live-intervals RUN lines (#66058)
This adds test coverage for an upcoming change to
TwoAddressInstructionPass::processTiedPairs.
2023-09-12 13:06:10 +01:00
Yingwei Zheng
4793c2c3de
[DAGCombiner][RISCV] Prefer to sext i32 non-negative values (#65984)
By default, `DAGCombiner` folds `sext x` to `zext x` when `x` is
non-negative. It will generate redundant `zext` inst seq on riscv64
(typically `slli (srli x, 32), 32`).
godbolt: https://godbolt.org/z/osf6adP1o
This patch applies the transform iff `zext` is **cheaper** than `sext`.
2023-09-12 19:02:35 +08:00
Paul Walker
ea42c4ac6a [SVE] Precommit test to show missing initialisation of call operand.
When calling func_f8_and_v0_passed_via_memory the memory used to
hold the first vector operand is allocated but not initialised.
2023-09-12 10:46:41 +00:00
Michal Paszkowski
efe0e10718 [SPIR-V] Support SPV_INTEL_arbitrary_precision_integers_extension, misc utils for other extensions
Differential Revision: https://reviews.llvm.org/D158764
2023-09-12 02:45:15 -07:00
Saiyedul Islam
466a8149b3
Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060)
This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.
2023-09-12 15:13:59 +05:30
Ivan Kosarev
eaf737a4e0 [AMDGPU] Remove the GFX11 runs in CodeGen/AMDGPU/fma.f16.ll.
It still fails with expensive checks enabled.

This partially reverts:
a1e38e0b8e3e [AMDGPU][GFX11] Add more test coverage for FMA instructions.
2023-09-12 10:30:52 +01:00
David Green
cb5bad2acd
[AArch64][GlobalISel] Add lowering for constant BIT/BIF/BSP (#65897)
The non-constant bit/bif/bsp already work through tablegen patterns, this
patch handles the constant case, mirroring the basic support for
`or(and(X, C), and(Y, ~C))` from ISel tryCombineToBSL. BSP gets expanded
to either BIT, BIF or BSL depending on the best register allocation.
G_BIT can be replaced with G_BSP as a more general alternative.
2023-09-12 10:13:32 +01:00
David Green
b7cb18c5eb [AArch64][GISel] Expand test coverage of FPow.
This adds some more extensive test coverage for fpow intrinsics through global
isel, and removes the unused vector libcall types. All types get scalarized,
fp16 will be expanded to fp32 and then we lower to a libcall from there.
2023-09-12 10:08:09 +01:00
Ivan Kosarev
a1e38e0b8e
[AMDGPU][GFX11] Add more test coverage for FMA instructions. (#65935)
This is another attempt to update the tests to run for GFX11. Previously
done in <https://reviews.llvm.org/D153269>, and then reverted in
<https://reviews.llvm.org/rG2d3e6c440244ad94777aa13566b0376eb3c088f1>
due to a failure on a buildbot with expensive checks enabled. Commit
4b1702e87a2687569b197aea4721353f8b788182 fixed the problem.
2023-09-12 09:40:10 +01:00
Saiyedul Islam
0a8d17e79b
[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: #65410

Differential Revision: https://reviews.llvm.org/D129818
2023-09-12 13:53:31 +05:30
pvanhout
2126a18d86 [AMDGPU] Regen combine-fma-add-mul-pre-legalize.mir 2023-09-12 08:50:12 +02:00
Matt Arsenault
cd4b906e18 RegisterCoalescer: Don't delete IMPLICIT_DEF if it's live into the same block
Live out implicit_defs need to be kept, but the check for this only
checked if the block parent was the same. This doesn't work if the
parent blocks are the same but the value is live. Fixes verifier error
"Instruction ending live segment doesn't read the register", which
would appear at the coalesced non-implicit_def def.

Fixes #38788

https://reviews.llvm.org/D158882
2023-09-12 09:28:33 +03:00
Matt Arsenault
de5585078e RegisterCoalescer: Correctly set valid lanes when keeping live out implicit defs
This fixes some verifier errors when live out implicit defs are
coalesced with identity copies. Fixes some reduced testcases from
issue #38788 but doesn't solve the original failure.

I was surprised this seems to obviate the special casing in
analyzeValue that's been there since the subregister liveness support
went in.

https://reviews.llvm.org/D158850
2023-09-12 09:28:33 +03:00
liqin.weng
1eec357494 [VP] IR expansion for maxnum/minnum
Add basic handling for VP ops that can expand to non-predicate ops

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D159494
2023-09-12 10:15:52 +08:00
Fangrui Song
cfc1a87878 [test] Change llc -march= to -mtriple= & llvm-mc -arch= to -triple=
Similar to 806761a7629df268c8aed49657aeccffa6bca449
2023-09-11 15:11:01 -07:00
Fangrui Song
806761a762 [test] Change llc -march= to -mtriple=
The issue is uncovered by #47698: for IR files without a target triple,
-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple, leaving a target triple which
may not make sense, e.g. riscv64-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without a target
triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead
of rejecting it outrightly.
2023-09-11 14:42:37 -07:00
Philip Reames
5352c79398
[RISCV] Add a combine to form masked.load from unit strided load (#65674)
Add a DAG combine to form a masked.load from a masked_strided_load
intrinsic with stride equal to element size. This covers a couple of
extra test cases, and allows us to simplify and common some existing
code on the concat_vector(load, ...) to strided load transform.

This is the first in a mini-patch series to try and generalize our
strided load and gather matching to handle more cases, and common up
different approaches to the same problems in different places.
2023-09-11 13:01:14 -07:00
Vitaly Buka
f106b3f135 Revert "[PHIElimination] Handle subranges in LiveInterval updates"
Leaks memory.

This reverts commit 3bff611068ae70e3273a46bbc72bc66b66f98c1c.
2023-09-11 11:09:26 -07:00
Jeremy Morse
1ce1732f82 [DebugInfo] Use getStableDebugLoc to pick IRBuilder DebugLocs
When IRBuilder is given an insertion position and there is debug-info, it
sets the DebugLoc of newly inserted instructions to the DebugLoc of the
insertion position. Unfortunately, that means if you insert in front of a
debug intrinsics, your "real" instructions get potentially-misleading
source locations from the debug intrinsics. Worse, if you compile -gmlt to
get source locations but no variable locations, you'll get different source
locations to a normal -g build, which is silly.

Rectify this with the getStableDebugLoc method, which skips over any debug
intrinsics to find the next "real" instruction. This is the source location
that you would get if you compile with -gmlt, and it remains stable in the
presence of debug intrinsics. The changed tests show a few locations where
this has been happening, for example selecting line-zero locations for
instrumentation on a perfectly valid call site.

Differential Revision: https://reviews.llvm.org/D159485
2023-09-11 19:00:44 +01:00
Philip Reames
299d710e3d [RISCV] Lower fixed vectors extract_vector_elt through stack at high LMUL
This is the extract side of D159332. The goal is to avoid non-linear costing on patterns where an entire vector is split back into scalars. This is an idiomatic pattern for SLP.

Each vslide operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique extracts, each with a cost linear in LMUL, the overall cost is O(LMUL2) * VLEN/ETYPE. To avoid the degenerate case, fallback to the stack if we're beyond LMUL2.

There's a subtly here. For this to work, we're *relying* on an optimization in LegalizeDAG which tries to reuse the stack slot from a previous extract. In practice, this appear to trigger for patterns within a block, but if we ended up with an explode idiom split across multiple blocks, we'd still be in quadratic territory. I don't think that variant is fixable within SDAG.

It's tempting to think we can do better than going through the stack, but well, I haven't found it yet if it exists. Here's the results for sifive-s280 on all the variants I wrote (all 16 x i64 with V):

output/sifive-x280/linear_decomp_with_slidedown.mca:Total Cycles:      20703
output/sifive-x280/linear_decomp_with_vrgather.mca:Total Cycles:      23903
output/sifive-x280/naive_linear_with_slidedown.mca:Total Cycles:      21604
output/sifive-x280/naive_linear_with_vrgather.mca:Total Cycles:      22804
output/sifive-x280/recursive_decomp_with_slidedown.mca:Total Cycles:      15204
output/sifive-x280/recursive_decomp_with_vrgather.mca:Total Cycles:      18404
output/sifive-x280/stack_by_vreg.mca:Total Cycles:      12104
output/sifive-x280/stack_element_by_element.mca:Total Cycles:      4304

I am deliberately excluding scalable vectors. It functionally works, but frankly, the code quality for an idiomatic explode loop is so terrible either way that it felt better to leave that for future work.

Differential Revision: https://reviews.llvm.org/D159375
2023-09-11 10:49:17 -07:00
Stanislav Mekhanoshin
070c2570ad
[AMDGPU] Global ISel for packed fp32 instructions (#65803) 2023-09-11 10:48:37 -07:00
Stanislav Mekhanoshin
093aa37744
[AMDGPU] Autogenerate min.ll/max.ll tests. NFC. (#65786) 2023-09-11 10:29:53 -07:00
Luke Lau
e33f3f09b8
[RISCV] Shrink vslidedown when lowering fixed extract_subvector (#65598)
As noted in
https://github.com/llvm/llvm-project/pull/65392#discussion_r1316259471,
when lowering an extract of a fixed length vector from another vector,
we don't need to perform the vslidedown on the full vector type. Instead
we can extract the smallest subregister that contains the subvector to
be extracted and perform the vslidedown with a smaller LMUL. E.g, with
+Zvl128b:

v2i64 = extract_subvector nxv4i64, 2

is currently lowered as

vsetivli zero, 2, e64, m4, ta, ma
vslidedown.vi v8, v8, 2

This patch shrinks the vslidedown to LMUL=2:

vsetivli zero, 2, e64, m2, ta, ma
vslidedown.vi v8, v8, 2

Because we know that there's at least 128*2=256 bits in v8 at LMUL=2,
and we only need the first 256 bits to extract a v2i64 at index 2.

lowerEXTRACT_VECTOR_ELT already has this logic, so this extracts it out
and reuses it.

I've split this out into a separate PR rather than include it in #65392,
with the hope that we'll be able to generalize it later.

This patch refactors extract_subvector lowering to lower to
extract_subreg directly, and to shortcut whenever the index is 0 when
extracting a scalable vector. This doesn't change any of the existing
behaviour, but makes an upcoming patch that extends the scalable path
slightly easier to read.
2023-09-11 17:25:12 +01:00
Luke Lau
46f3ea5952
[RISCV] Add extract_subvector tests for a statically-known VLEN. NFC (#65389)
This is partly a precommit for an upcoming patch, and partly to remove
the fixed length LMUL restriction similarly to what was done in
https://reviews.llvm.org/D158270, since it's no longer that relevant.
2023-09-11 16:28:53 +01:00
Simon Pilgrim
f8b04eb6d0 [X86] matchIndexRecursively - add zext(add/addlike(x,c)) -> index: zext(x), disp + zext(c) pattern handling
More restricted alternative to a8cef6b58e2d41f
2023-09-11 15:36:13 +01:00
liqin.weng
3723ede3cf [VP] IR expansion for zext/sext/trunc/fptosi/fptosi/sitofp/uitofp/fptrunc/fpext
Add basic handling for VP ops that can expand to Cast intrinsics

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D159491
2023-09-11 21:14:38 +08:00
liqin.weng
28e74e6180 [VP] IR expansion for abs/smax/smin/umax/umin
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D159495
2023-09-11 21:14:37 +08:00
Max Iyengar
dbeb3d029d Add missing vrnd intrinsics
This patch adds 8 missing intrinsics as specified in the Arm ACLE document section 2.12.1.1 : [[ https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#rounding-3 | https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#rounding-3]]

The intrinsics implemented are:

  - vrnd32z_f64
  - vrnd32zq_f64
  - vrnd64z_f64
  - vrnd64zq_f64
  - vrnd32x_f64
  - vrnd32xq_f64
  - vrnd64x_f64
  - vrnd64xq_f64

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D158626
2023-09-11 12:59:18 +01:00
Simon Pilgrim
ef87d43834 Revert rGa8cef6b58e2d41f04ed4fa63c3f628eac1a28925 "[X86] promoteExtBeforeAdd - add support for or/xor 'addlike' patterns"
Investigating reports of issues with second stage clang builds
2023-09-11 11:52:25 +01:00
Simon Pilgrim
a8cef6b58e [X86] promoteExtBeforeAdd - add support for or/xor 'addlike' patterns
Fold zext(addlike(x, C)) --> add(zext(x), C_zext) if its likely to help us create LEA instructions

Addresses some regressions exposed by D155472
2023-09-11 10:17:34 +01:00
Simon Pilgrim
79941c3a0d [X86] lea-2.ll - add test showing failure to fold shl(zext(or(x,c1)),c2) 'addlike' into LEA instruction 2023-09-11 10:17:33 +01:00
Nathan Gauër
56396b25f1 [SPIRV-V] Add SPIR-V logical triple to llc
This commits adds the minimal required bits to build a logical SPIR-V
compute shader using LLC.
- Skip OpenCL-only capabilities & extensions for Logical SPIR-V.
- Generate required metadata for entrypoints from HLSL frontend.
- Fix execution mode to GLCompute in logical.

The main issue is the lack of "vulkan" bit in the triple.
This might need to be added as a vendor?
Because as-is, SPIRV32/64 assumes OpenCL, and then, SPIRV assumes
Vulkan. This is ok-ish today, but not correct.

Differential Revision: https://reviews.llvm.org/D156424
2023-09-11 10:31:50 +02:00
Carl Ritson
3bff611068 [PHIElimination] Handle subranges in LiveInterval updates
Add handling for subrange updates in LiveInterval preservation.
This requires extending MachineBasicBlock::SplitCriticalEdge
to also update subrange intervals.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158144
2023-09-11 17:15:09 +09:00
Jingu Kang
5474d49f1f [AArch64] Remove copy instruction between uaddlv and urshr
If there are copy instructions between uaddlv and urshr for transfer from gpr
to fpr, and vice versa, try to remove them.

Differential Revision: https://reviews.llvm.org/D159265
2023-09-11 09:06:09 +01:00
Yeting Kuo
1f15155d5e
[RISCV] Disable zcmp push/pop for variadic functions. (#65302)
Variadic function needs a save region for variable arguement and the
region is possible to be overlaped with the region of zcmp push/pop
used.
2023-09-11 13:09:01 +08:00
Carl Ritson
1d8a94c4ff [AMDGPU] SILowerControlFlow: fix preservation of LiveIntervals
In emitElse live interval for SI_ELSE source must be recalculated
as SI_ELSE is removed, and new user is placed at block start.
In emitIfBreak live interval for new created AndReg must be
computed.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158141
2023-09-11 13:46:28 +09:00
Carl Ritson
46ee3b3914
[AMDGPU] SILowerI1Copies: clear kill flags on COPY (#65883)
Clear kill flags on COPY source as it will be reused.
2023-09-11 12:30:08 +09:00
Shengchen Kan
503e3a4130
[X86] Remove _REV instructions from the EVEX2VEX tables (#65752)
_REV instruction should not appear before encoding optimization, so
there is no chance to compress it during MIR optimizations.
2023-09-11 09:54:05 +08:00
Fangrui Song
61c44f1822 [X86] FastISel -fno-pic: emit R_386_PC32 when calling an intrinsic
This matches how a SelectionDAG::getExternalSymbol node is lowered. On x86-32, a
function call in -fno-pic code should emit R_386_PC32 (since ebx is not set up).
When linked as -shared (problematic!), the generated text relocation will work.

Ideally, we should mark IR intrinsics created in
CodeGenFunction::EmitBuiltinExpr as dso_local, but the code structure makes it
not very feasible.

Fix #51078
2023-09-10 15:03:36 -07:00
Simon Pilgrim
63af54a84e [AArch64] ushl_sat.ll - regenerate checks. NFC.
Add missing asm comments to reduce a future diff.
2023-09-10 19:45:20 +01:00
Simon Pilgrim
76c09d9c5e [X86] matchIndexRecursively - don't peek through multiuse sext(add_nsw(x,c)) (PR65895)
Fixes #65895
2023-09-10 16:54:18 +01:00
David Green
4e52fd8468 [AArch64] Add GlobalISel coverage for BIT/BIF/BSL. NFC
Some of the 1x vector types are expanded to scalar, but the others that do not
require constants looks OK.
2023-09-10 13:02:35 +01:00
Matt Arsenault
17bd80601e AMDGPU: Implement llvm.get.fpmode
Currently s_getreg_b32 is missing the possible mode use. Really we
need separate pseudos for mode-only accesses, but leave this as a
pre-existing issue.

https://reviews.llvm.org/D152710
2023-09-10 10:19:19 +03:00