52796 Commits

Author SHA1 Message Date
Philip Reames
d9942319d7 {RISCV] Adjust check lines to reduce duplication 2023-09-25 11:25:36 -07:00
Philip Reames
0eaf11ff41 [RISCV] Add test coverage for buildvec-of-binops 2023-09-25 11:12:04 -07:00
Matthias Braun
740ee00a4c
PPCBranchCoalescing: Fix invalid branch weights (#67211)
Re-normalize branch-weights after removing a block successor to avoid
branch-weights not adding up to 100%. This changes MIR for the
`test/CodeGen/PowerPC/branch_coalesce.ll` test like this:

```diff
-  successors: %bb.6(0x40000000); %bb.6(50.00%)
+  successors: %bb.6(0x80000000); %bb.6(100.00%)
```

This doesn't affect codegen on its own but fixing this helps with
fluctuations I have with some of my upcoming changes.
2023-09-25 10:41:04 -07:00
Ivan Kosarev
053478bbd0 [AMDGPU] Switch to using real True16 operands.
The DPP source and e64 destination operands remain unchanged for now.

Reviewed By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D156104
2023-09-25 18:21:13 +01:00
Austin Kerbow
0455596e1e [AMDGPU] Add DAG ISel support for preloaded kernel arguments
This patch adds the DAG isel changes for kernel argument preloading.
These changes are not usable with older firmware but subsequent patches
in the series will make the codegen backwards compatible. This patch
should only be submitted alongside that subsequent patch.

Preloading here begins from the start of the kernel arguments until the
amount of arguments indicated by the CL flag
amdgpu-kernarg-preload-count.

Aggregates and arguments passed by-ref are not supported.

Special care for the alignment of the kernarg segment is needed as well
as consideration of the alignment of addressable SGPR tuples when we
cannot directly use misaligned large tuples that the arguments are
loaded to.

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D158579
2023-09-25 09:32:59 -07:00
Austin Kerbow
7b70af297a [AMDGPU] Add IR lowering changes for preloaded kernargs
Preloaded kernel arguments should not be lowered in the IR pass
AMDGPULowerKernelArguments. Therefore it's necessary to calculate the
total number of user SGPRs that are available for preloading and how
many SGPRs would be required to preload each argument to determine
whether we should skip lowering i.e. the argument will be preloaded
instead.

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D156853
2023-09-25 08:54:07 -07:00
Philip Reames
95ce3c23c2
[RISCV] Be more aggressive about shrinking constant build_vector etype (#67175)
If LMUL is more than m1, we can be more aggressive about narrowing the
build_vector via a vsext if legal. If the narrow build_vector gets
lowered as a load, while both are linear in lmul, load uops are
generally more expensive than extend uops. If the narrow build_vector
gets lowered via dominant values, that work is linear in both #unique
elements and LMUL. So provided the number of unique values > 2, this is
a net win in work performed.
2023-09-25 08:09:46 -07:00
Mark Harley
eb96d6e2fb [AArch64][GlobalISel] Vector Constant Materialization
Vector constants are always lowered via constant pool loads. This patch selects
MOVI/MVNI in more cases where appropriate.
2023-09-25 13:40:33 +01:00
Diana Picus
327fdcf789 Revert "AMDGPU: Duplicate instead of COPY constants from VGPR to SGPR (#66882)"
This reverts commit a04603993b43e5ebac1531293d288315f1885886 because
it broke the OpenMP buildbot.
2023-09-25 13:40:38 +02:00
Diana
a04603993b
AMDGPU: Duplicate instead of COPY constants from VGPR to SGPR (#66882)
Teach the si-fix-sgpr-copies pass to deal with REG_SEQUENCE, PHI or
INSERT_SUBREG where the result is an SGPR, but some of the inputs are
constants materialized into VGPRs. This may happen in cases where for
instance several instructions use an immediate zero and SelectionDAG
chooses to put it in a VGPR to satisfy all of them. This however causes
the si-fix-sgpr-copies to try to switch the whole chain to VGPR and may
lead to illegal VGPR-to-SGPR copies. Rematerializing the constant into
an SGPR fixes the issue.
2023-09-25 13:20:08 +02:00
Momchil Velikov
c649fd34e9 [MachineSink][AArch64] Sink instruction copies when they can replace copy into hard register or folded into addressing mode
This patch adds a new code transformation to the `MachineSink` pass,
that tries to sink copies of an instruction, when the copies can be folded
into the addressing modes of load/store instructions, or
replace another instruction (currently, copies into a hard register).

The criteria for performing the transformation is that:
* the register pressure at the sink destination block must not
  exceed the register pressure limits
* the latency and throughput of the load/store or the copy must not deteriorate
* the original instruction must be deleted

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D152828
2023-09-25 10:49:44 +01:00
David Green
54e5de08d4
[ARM][LSR] Exclude uses outside the loop when favoring postinc. (#67090)
Extra uses for variables outside the loop can mess with the generation
of postinc variables. This patch alters the collection of loop invariant
fixups in LSR when the target is optimizing for PostInc, to exclude the
collection of these extra uses. It is expected that the variable can be
rematerialized, which will lead to a more optimal sequence of
instructions in the loop.
2023-09-25 10:09:36 +01:00
chuongg3
45f51f9f7c
[AArch64][GlobalISel] Select UMULL instruction (#65469)
Global ISel now selects `UMULL` and `UMULL2` instructions.
G_MUL instruction with input operands coming from `SEXT` or `ZEXT`
operations are turned into UMULL

G_MUL instructions with v2s64 result type is always scalarised except: 
`mul ( unmerge( ext ), unmerge( ext ))` 

So the extend could be unmerged and fold away the unmerge in the middle:
`mul ( unmerge( ext ), unmerge( ext ))` =>
`mul ( unmerge( merge( ext( unmerge )), unmerge( merge( ext( unmerge
))))` =>
`mul ( ext(unmerge)), ( ext( unmerge ))) `
2023-09-25 09:34:51 +01:00
Paulo Matos
0564065709
[SPIRV] Implement support for SPV_KHR_expect_assume (#66217)
Adds new extension SPV_KHR_expect_assume, new capability
ExpectAssumeKHR as well as the new instructions:
  * OpExpectKHR
  * OpAssumeTrueKHR

These are lowered from respectively llvm.expect.<ty> and llvm.assume
intrinsics.

Previously https://reviews.llvm.org/D157696
2023-09-25 09:52:42 +02:00
Brandon Wu
408b0810ba
[RISCV] Support floating point VCIX (#67094) 2023-09-25 13:19:21 +08:00
Yeting Kuo
7c70e50b8e
[RISCV] Fix wrong offset use caused by missing the size of Zcmp push. (#66613)
This fixes two wrong offset uses,
1. .cfi_offset of callee saves are not pushed by cm.push.
2. Reference of frame objests by frame pointer.
2023-09-25 12:05:05 +08:00
XChy
fc86d031fe
[SimplifyCFG] Transform for redirecting phis between unmergeable BB and SuccBB (#67275)
This patch extends function TryToSimplifyUncondBranchFromEmptyBlock to
handle the similar cases below.

```llvm
define i8 @src(i8 noundef %arg) {
start:
  switch i8 %arg, label %unreachable [
    i8 0, label %case012
    i8 1, label %case1
    i8 2, label %case2
    i8 3, label %end
  ]

unreachable:
  unreachable

case1:
  br label %case012

case2:
  br label %case012

case012:
  %phi1 = phi i8 [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ]
  br label %end

end:
  %phi2 = phi i8 [ %phi1, %case012 ], [ 4, %start ]
  ret i8 %phi2
}
```
The phis here should be merged into one phi, so that we can better
optimize it:

```llvm
define i8 @tgt(i8 noundef %arg) {
start:
  switch i8 %arg, label %unreachable [
    i8 0, label %end
    i8 1, label %case1
    i8 2, label %case2
    i8 3, label %case3
  ]

unreachable:
  unreachable

case1:
  br label %end

case2:
  br label %end

case3:
  br label %end

end:
  %phi = phi i8 [ 4, %case3 ], [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ]
  ret i8 %phi
}
```
Proof:
[normal](https://alive2.llvm.org/ce/z/vAWi88)
[multiple stages](https://alive2.llvm.org/ce/z/DDBQqp)
[multiple stages 2](https://alive2.llvm.org/ce/z/nGkeqN)
[multiple phi combinations](https://alive2.llvm.org/ce/z/VQeEdp)

And lookup table optimization should convert it into add %arg 1.
This patch just match similar CFG structure and merge the phis in
different cases.

Maybe such transform can be applied to other situations besides switch,
but I'm not sure whether it's better than not merging. Therefore, I only
try it in switch,

Related issue:
#63876

[Migrated](https://reviews.llvm.org/D155940)
2023-09-25 10:13:45 +08:00
Simon Pilgrim
8b36d082c4 [DAG] getNode() - fold (zext (trunc x)) -> x iff the upper bits are known zero - add SRL support
This is part of the work to address the D155472 regressions, there's a number of issues with generalizing this fold which is why I'm just adding SRL support atm.

Differential Revision: https://reviews.llvm.org/D159533
2023-09-24 13:40:07 +01:00
Simon Pilgrim
142efd6d61 [AMDGPU] Add ISD::FSHR Handling to AMDGPUISD::PERM matching
Pulled out of D159533, which encourages (zext (trunc x)) -> x folds, leading to more ISD::FSHR nodes, which was breaking some existing AMDGPUISD::PERM tests

Differential Revision: https://reviews.llvm.org/D159533
2023-09-24 13:40:07 +01:00
Ivan Kosarev
fab28e0e14 Reapply "[AMDGPU] Introduce real and keep fake True16 instructions."
Reverts 6cb3866b1ce9d835402e414049478cea82427cf1.

Analysis of failures on buildbots with expensive checks enabled showed
that the problem was triggered by changes in another commit,
469b3bfad20550968ac428738eb1f8bb8ce3e96d, and was caused by the bug
addressed in #67245.
2023-09-23 22:07:41 +01:00
Fangrui Song
e01df8716a [NVPTX] Test crash introduced by #67073
The test is adapted from https://reviews.llvm.org/D46008
2023-09-23 10:42:02 -07:00
Noah Goldstein
bc38c427d4 [DAGCombiner][AArch64] Fix incorrect cast VT in takeInexpensiveLog2
Previously, we where taking `CurVT` before finalizing `ToCast` which
meant potentially returning an `SDValue` with an illegal `ValueType`
for the operation.

Fix is to just take `CurVT` after we have finalized `ToCast` with
`PeekThroughCastsAndTrunc`.
2023-09-23 09:50:42 -05:00
Zhuojia Shen
bcc5b48b0f Reapply "[AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre"
This reverts commit 0def4e6b0f638b97a73bd4674365961d8fabda28, applies a
quick fix that disallows merging two pre-indexed loads, and adds MIR
regression tests.

Differential Revision: https://reviews.llvm.org/D152407
2023-09-22 21:08:07 -07:00
Fangrui Song
d9a0163e27 Revert "[NVPTX] Improve lowering of v2i16 logical ops. (#67073)"
This reverts commit 648579006234b7608549cf708c07aac4d6283a1f.

Caused xla/tests:float8_test_gpu to fail
```
LLVM ERROR: Cannot select: t118: v2i16 = or t375, t401
  t375: v2i16 = BUILD_VECTOR t374, t372
    t374: i16 = select t247, Constant:i16<8960>, t360
      t247: i1 = setcc t199, Constant:i16<7>, seteq:ch
        t199: i16 = extract_vector_elt t187, Constant:i64<0>
          t187: v2i16 = and t183, t410
            t183: v2i16 = BUILD_VECTOR t383, t384
            ...
```

Acked by author to revert
2023-09-22 19:24:18 -07:00
Craig Topper
972df2cecc [RISCV][GISel] Emit G_CONSTANT 0 as a copy from X0. (#67202)
We need to use a COPY so the register coalescer can replace reads
of the register we copy to with X0. This is needed so that we use
X0 on instructions that don't have an immediate form.

This was reviewed as #67202.
2023-09-22 17:04:11 -07:00
Craig Topper
7cd01afb73 [RISCV][GISel] Add test showing missed opportunity to use X0 for the LHS of sub for negate.
I had to disable the late copy propagation pass that can see through
the ADDI we were previously emitting. We really want to get this
in the register coalescer if not even earlier.
2023-09-22 17:04:11 -07:00
Rahman Lavaee
897a0b01d6
[BasicBlockSections] Split cold parts of custom-section functions. (#66731)
This PR makes `-basic-block-sections` handle functions with custom
non-dot-text sections correctly. Cold parts of such functions must be
placed in the same section (not in `.text.split`) but with a unique id.
2023-09-22 13:49:12 -07:00
Philip Reames
233b6ef66c [RISCV] Handle EltType > XLEN case in VMV_V_X_VL to VMV_S_X_VL fold
I'd guarded this case in D158874 to avoid regressions, and decided to go investigate what was going on. The solution turns out to be a generic splat matching extension to handle INSERT_SUBVECTOR. In theory, we could see these from other sources as well, but for some reason we only seem to see the i64 extract on rv32 case in practice. Not sure why that is to be honest.

Differential Revision: https://reviews.llvm.org/D159230
2023-09-22 13:43:43 -07:00
Craig Topper
98eb28b621 [RISCV][GISel] Implement instruction selection for G_PHI and G_BRCOND.
This uses a naive lowering for G_BRCOND to a BNE instruction comparing
the register to X0.
2023-09-22 13:18:42 -07:00
Artem Belevich
6485790062
[NVPTX] Improve lowering of v2i16 logical ops. (#67073)
Bitwise logical ops can always be done as b32, regardless of
availability of other v2i16 ops, that would need a new GPU.
2023-09-22 13:05:39 -07:00
Matt Harding
64d1ceaa38
Add command line option --no-trap-after-noreturn (#67051)
Add the command line option --no-trap-after-noreturn, which exposes the
pre-existing TargetOption `NoTrapAfterNoreturn`.

This pull request was split off from this one:
https://github.com/llvm/llvm-project/pull/65876
2023-09-22 22:03:21 +02:00
Rahman Lavaee
6ac71a0149
[BasicBlockSections] Introduce the basic block sections profile version 1. (#65506)
This patch introduces a new version for the basic block sections profile
as was requested in D158442, while keeping backward compatibility for
the old version.

The new encoding is as follows:
```
m <module_name>
f <function_name_1> <function_name_2>...
c <bb_id_1> <bb_id_2> <bb_id_3>
c <bb_id_4> <bb_id_5>
...
```
Module name specifier (starting with 'm') is optional and allows
distinguishing profiles for internal-linkage functions with the same
name. If not specified, profile will be applied to any function with the
same name.
Function name specifier (starting with 'f') can specify multiple
function name aliases.
Finally, basic block clusters are specified by 'c' and specify the
cluster of basic blocks, and the internal order in which they must be
placed in the same section.
2023-09-22 12:37:04 -07:00
Nemanja Ivanovic
46d5d264fc [PowerPC] Improve kill flag computation and add verification after MI peephole
The MI Peephole pass has grown to include a large number of transformations over the years. Many of the transformations require re-computation of kill flags but don't do a good job of re-computing them. This causes us to have very common failures when the compiler is built with expensive checks. Over time, we added and augmented a function that is supposed to go and fix up kill flags after each transformation but we keep missing cases.
This patch does the following:
- Removes the function to re-compute kill flags
- Adds LiveVariables to compute and maintain kill flags while transforming code
- Adds re-computation of kill flags for the post-RA peepholes for each block that contains a transformed instruction

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D133103
2023-09-22 15:26:39 -04:00
Craig Topper
8e87dc10b8
[RISCV][GISel] Add a post legalizer combiner and enable a couple comb… (#67053)
…ines.

We have an existing test that shows benefit from redundant_and and
identity combines so use them as a starting point.
2023-09-22 10:13:56 -07:00
Craig Topper
ec5b0ef7d7
[RISCV] Truncate constants to eltwidth before checking simm5 when con… (#67062)
…verting VMV_V_X to VMV_X_S.

Instruction selection knows the bits past EltWidth are ignored, we
should do the same here.
2023-09-22 10:12:12 -07:00
Luke Lau
3510552df6
[RISCV] Check for COPY_TO_REGCLASS in usesAllOnesMask (#67037)
Sometimes with mask vectors that have been widened, there is a
CopyToRegClass node in between the VMSET and the CopyToReg.

This is a resurrection of https://reviews.llvm.org/D148524, and is
needed to
remove the mask operand when it's extracted from a subvector as planned
in
https://github.com/llvm/llvm-project/pull/66267#discussion_r1331998919
2023-09-22 16:30:43 +01:00
Ivan Kosarev
6cb3866b1c Revert "[AMDGPU] Introduce real and keep fake True16 instructions."
This reverts commit 0f864c7b8bc9323293ec3d85f4bd5322f8f61b16 due to
failures on expensive checks.
2023-09-22 15:40:26 +01:00
Mirko Brkusanin
72e3713009 [IRTranslator] Set NUW flag for inbounds gep and load/store offsets
Patch by: Acim Maravic

Differential Revision: https://reviews.llvm.org/D159515
2023-09-22 16:16:28 +02:00
Simon Pilgrim
5b8204b221 [X86] SandyBridge ymm broadcast loads use port5 + port23
Unlike the per-lane mov*dup broadcast shuffles, broadcastsd/ss need port5 to splat across lanes

Found while reviewing a llvm-exegesis capture (and matches Agner + uops.info numbers) - I can't find any more easy wins from these captures so that will be it for now.
2023-09-22 15:10:27 +01:00
Paulo Matos
e7651e60a2
[SPIRV] Add support for SPV_KHR_bit_instructions (#66215)
Adds support for SPV_KHR_bit_instructions.

It is only used whenever we don't need the whole Shader capability, which is a superset of this extension.
2023-09-22 14:44:21 +02:00
David Green
8b4ca0aa4e [AArch64] Expand log/exp tests. NFC
This is extra testing for exp exp2 log log10 and log2 undef global isel.
2023-09-22 13:33:23 +01:00
Anatoly Trosinenko
eb02ee44d3 [AArch64] Move PAuth codegen down the machine pipeline
To simplify handling PAuth in the machine outliner, introduce a
separate AArch64PointerAuth pass that is executed after both
Prologue/Epilogue Inserter and Machine Outliner passes.

After moving to AArch64PointerAuth, signLR and authenticateLR are
not used outside of their class anymore, so make them private and
simplify accordingly.

The new pass is added via AArch64PassConfig::addPostBBSections(),
so that it can change the code size before branch relaxation occurs.
AArch64BranchTargets is placed there too, so it can take into account
any PACI(A|B)SP instructions and not excessively add BTIs at the start
of functions.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D159357
2023-09-22 14:49:14 +03:00
David Green
963268c52b [AArch64] Expand Sin/Cos GlobalISel testing. NFC
This fills out some extra cases for sin/cos testing for various types under
Global ISel, which seem to all do OK. The existing tests in
sincospow-vector-expansion.ll can be removed, as they are now covered
elsewhere.
2023-09-22 12:27:30 +01:00
Mirko Brkusanin
a657deb42e [AMDGPU] Update RUN line in test (NFC) 2023-09-22 12:41:54 +02:00
DianQK
d200bd1a7d
Reland "[SimplifyCFG] Hoist common instructions on switch" (#67077)
This relands commit 96ea48ff5dcba46af350f5300eafd7f7394ba606.
2023-09-22 18:29:59 +08:00
Nikita Popov
aa70f4d8cf [StackColoring] Handle fixed object index
This is a followup to #66988. The implementation there did not
account for the possibility of the catch object frame index
referrring to a fixed object, which is the case on win64.
2023-09-22 12:28:38 +02:00
Ivan Kosarev
c62f208c05 [AMDGPU] Don't suppress printing the .l and .h register suffixes.
We don't seem to have a use for the -amdgpu-keep-16-bit-reg-suffixes
option anymore. Was introduced in <https://reviews.llvm.org/D79435>.

Reviewed By: Joe_Nash, foad

Differential Revision: https://reviews.llvm.org/D156102
2023-09-22 11:13:05 +01:00
Simon Pilgrim
b61b2426ac [DAG] getNode() - remove oneuse limit from (zext (trunc (assertzext x))) -> (assertzext x) fold (REAPPLIED)
Noticed on D159533 and I've finally dealt with the x86 regressions - MatchingStackOffset wasn't peeking through AssertZext nodes while trying to find CopyFromReg/Load sources, it was only removing them if they were part of a (trunc (assertzext x)) pattern.

Reapplied after being reverted at 4389252c58b783ce5b - which should be addressed by D159537 / 6d2679992e58b
2023-09-22 11:01:38 +01:00
Ivan Kosarev
0f864c7b8b [AMDGPU] Introduce real and keep fake True16 instructions.
The existing fake True16 instructions using 32-bit VGPRs are supposed to
co-exist with real ones until all the necessary True16 functionality is
implemented and relevant tests are updated.

Reviewed By: arsenm, Joe_Nash

Differential Revision: https://reviews.llvm.org/D156101
2023-09-22 10:57:56 +01:00
Nikita Popov
b3cb4f069c [StackColoring] Handle SEH catch object stack slots conservatively
The write to the SEH catch object happens before cleanuppads are
executed, while the first reference to the object will typically
be in a catchpad.

If we make use of first-use analysis, we may end up allocating
an alloca used inside the cleanuppad and the catch object at the
same stack offset, which would be incorrect.

https://reviews.llvm.org/D86673 was a previous attempt to fix it.
It used the heuristic "a slot loaded in a WinEH pad and never
written" to detect catch objects. However, because it checks
for more than one load (while probably more than zero was
intended), the fix does not actually work.

The general approach also seems dubious to me, so this patch
reverts that change entirely, and instead marks all catch object
slots as conservative (i.e. excluded from first-use analysis)
based on the WinEHFuncInfo. As far as I can tell we don't need
any heuristics here, we know exactly which slots are affected.

Fixes https://github.com/llvm/llvm-project/issues/66984.
2023-09-22 11:50:30 +02:00