52796 Commits

Author SHA1 Message Date
Yeting Kuo
ab94fbba57
[RISCV] Prefer Zcmp push/pop instead of save-restore calls. (#66046)
Zcmp push/pop can reduce more code size then save-restore calls. There
are two reasons,
1. Call for save-restore calls needs 4-8 bytes, but Zcmp push/pop only
needs 2 bytes.
2. Zcmp push/pop can also handles small shift of sp.
2023-09-20 09:16:29 +08:00
Sergei Barannikov
dd477ebd23
[Sparc] Remove LEA instructions (NFCI) (#65850)
LEA_ADDri and LEAX_ADDri are printed / encoded the same way as ADDri. I
had to change the type of simm13Op so that it can be used in both 32-
and 64-bit modes. This required the changes in operands of some
InstAliases.
2023-09-20 03:34:39 +03:00
DianQK
96ea48ff5d
[SimplifyCFG] Hoist common instructions on Switch.
Sink common instructions are not always performance friendly. We need to implement hoist common instructions on switch instruction to solve the following problem:
```
define i1 @foo(i64 %a, i64 %b, i64 %c, i64 %d) {
start:
  %test = icmp eq i64 %a, %d
  br i1 %test, label %switch_bb, label %exit

switch_bb:                                        ; preds = %start
  switch i64 %a, label %bb0 [
    i64 1, label %bb1
    i64 2, label %bb2
  ]

bb0:                                              ; preds = %switch_bb
  %0 = icmp eq i64 %b, %c
  br label %exit

bb1:                                              ; preds = %switch_bb
  %1 = icmp eq i64 %b, %c
  br label %exit

bb2:                                              ; preds = %switch_bb
  %2 = icmp eq i64 %b, %c
  br label %exit

exit:                                             ; preds = %bb2, %bb1, %bb0, %start
  %result = phi i1 [ false, %start ], [ %0, %bb0 ], [ %1, %bb1 ], [ %2, %bb2 ]
  ret i1 %result
}
```
The pre-commit test is D156617.

Reviewed By: XChy, nikic

Differential Revision: https://reviews.llvm.org/D155711
2023-09-20 07:21:49 +08:00
Austin Kerbow
60a227c464 [AMDGPU] Use inreg for hint to preload kernel arguments
This patch is the first in a series that adds support for pre-loading
kernel arguments into SGPRs. The command-line argument
'amdgpu-kernarg-preload-count' is used to specify the number of
arguments sequentially from the first that we should attempt to preload,
the default is 0.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156852
2023-09-19 15:13:38 -07:00
Mircea Trofin
d8873df4dc
[AsmPrint] Dump raw frequencies in -mbb-profile-dump (#66818)
We were losing the function entry count, which is useful to check profile quality. For the original cases where we want
entrypoint-relative MBB frequencies, the user would just need to divide these values by the entrypoint (first MBB, with ID=0) value.
2023-09-19 14:37:06 -07:00
Philip Reames
86b32c4b55
[RISCV] Match strided load via DAG combine (#66800)
This change matches a masked.stride.load from a mgather node whose index
operand is a strided sequence. We can reuse the VID matching from
build_vector lowering for this purpose.

Note that this duplicates the matching done at IR by
RISCVGatherScatterLowering.cpp. Now that we can widen gathers to a wider
SEW, I don't see a good way to remove this duplication. The only obvious
alternative is to move thw widening transform to IR, but that's a no-go
as I want other DAGs to run first. I think we should just live with the
duplication - particularly since the reuse is isSimpleVIDSequence means
the duplication is somewhat minimal.
2023-09-19 14:10:52 -07:00
Arthur Eubanks
1a8c69176e [X86] Use RIP-relative addressing for data under large data threshold for medium code model
Since those data are assumed to be within the relocation offset limit.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D150297
2023-09-19 11:14:45 -07:00
Philip Reames
fc95de38d9 [RISCV] Require alignment when forming gather with larger element type
This fixes a bug in my 928564caa5de8b07cede51e45499934777b9938c that didn't get noticed in review.  I found it when looking at the strided load case (upcoming patch), and realized the previous commit was buggy too.

p.s. Sorry for the slightly confusing test diff.  I'd apparently used the wrong mask for the aligned positive test; it was actually unaligned.  Didn't seem worthy of a separate precommit.
2023-09-19 11:00:42 -07:00
Philip Reames
de37d965da [RISCV] Expand test coverage for widening gather and strided load idioms
While I'm here, cleanup a few implemented todos.
2023-09-19 10:43:40 -07:00
Craig Topper
bbe3ee061f
[RISCV] Add more instructions for the short forward branch optimization. (#66789)
This adds the shifts and the immediate forms of the instructions that
were already supported.

There are still more instructions that can be predicated, but this is
the rest of what we had in our downstream.
2023-09-19 10:21:39 -07:00
Yingwei Zheng
93fde2ea1b
[RISCV] Add a pass to rewrite rd to x0 for non-computational instrs whose return values are unused
When AMOs are used to implement parallel reduction operations, typically the return value would be discarded.
This patch adds a peephole pass `RISCVDeadRegisterDefinitions`. It rewrites `rd` to `x0` when `rd` is marked as dead.
It may improve the register allocation and reduce pipeline hazards on CPUs without register renaming and OOO.
Comparison with GCC: https://godbolt.org/z/bKaxnEcec

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158759
2023-09-20 01:02:19 +08:00
Craig Topper
82676d49d3
[RISCV] Fix bad isel predicate handling for Ztso. (#66739)
The predicates inside the AMOPat class were being overridden by the
Predicates = [HasStdExtA] at the instantiation.
2023-09-19 08:57:49 -07:00
Jay Foad
44e997a158
[TwoAddressInstruction] Use isPlainlyKilled in processTiedPairs (#65976)
Calling isPlainlyKilled instead of directly checking for a kill flag
should make processTiedPairs behave the same with LiveIntervals
(i.e. when compiling with -early-live-intervals) as it does with
LiveVariables.
2023-09-19 16:44:20 +01:00
Luke Lau
22d0bd8632
[DAGCombiner] Combine vp.strided.store with unit stride to vp.store (#66774)
This is the VP equivalent of #66677. If we have a strided store where
the stride is equal to the element width, we can just use a regular VP
store.
2023-09-19 16:43:50 +01:00
Luke Lau
469f6b9b4c
[DAGCombiner] Combine vp.strided.load with unit stride to vp.load (#66766)
This is the VP equivalent of #65674. We already combine MGATHER loads
with unit stride to MLOAD, so this extends it for
EXPERIMENTAL_VP_STRIDED_LOAD.
2023-09-19 16:39:28 +01:00
Philip Reames
188d5c7442 [RISCV] Add a combine to form masked.store from unit strided store
Add a DAG combine to form a masked.store from a masked_strided_store intrinsic
with stride equal to element size. This is the store analogy to PR #65674.

As seen in the tests, this does pickup a few cases that we'd previously missed
due to selection ordering.  We match strided stores early without going through
the recently added generic mscatter combines, and thus weren't recognizing the
unit strided store.
2023-09-19 07:45:35 -07:00
Mircea Trofin
a21d4abc89 [mlgo] Fix tests post PR #66334 2023-09-19 07:34:20 -07:00
Natalie Chouinard
116f7a2dcb
[SPIRV] Test basic float and int types (#66282)
Add Int16, Int64 and Float64 capabilities as always available for Vulkan
(since 1.0), and add tests covering most of the basic types from
clang/test/CodeGenHLSL/basic_types.hlsl except for half floats.

Depends on D156049
2023-09-19 10:24:53 -04:00
Natalie Chouinard
4abe3f18e2 [SPIRV] Fix bug in emitting GLSL ext inst names
Lookup extended instruction numbers in the given instruction set so that
correct names are now emitted for GLSL.std.450 instructions as well as
OpenCL.std.

Add a single test to verify correct abs intrinsic names are emitted when
targetting logical SPIR-V.

Depends on D156424

Differential Revision: https://reviews.llvm.org/D159227
2023-09-19 13:44:13 +00:00
wangpc
61d819dd52 [RISCV] Add tests for memory constraint A
We should not optimize it in D158062. This adds the test coverage.

And unneeded attributes `nonnull` and `inbounds` are removed.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D159530
2023-09-19 19:51:04 +08:00
Luke Lau
73c2cb5999 [RISCV] Merge RV32/RV64 CHECK lines in strided vp load/store tests. NFC 2023-09-19 12:24:32 +01:00
Jay Foad
e0919b189b [CodeGen] Renumber slot indexes before register allocation (#66334)
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.

This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
2023-09-19 11:18:12 +01:00
Jay Foad
1d305f95d6 [AMDGPU] Fix line endings in a test 2023-09-19 11:09:03 +01:00
Michal Paszkowski
2616c279d5 [SPIR-V] Preserve pointer address space for load/gep instructions
Differential Revision: https://reviews.llvm.org/D158761
2023-09-19 01:42:42 -07:00
JinGu Kang
59c3dcafd8
[AArch64] Remove copy instruction between uaddlv with v4i16/v8i16 and dup (#66508)
If there are copy instructions between uaddlv with v4i16/v8i16 and dup
for transfer from gpr to fpr, try to remove them with duplane. It is a
follow-up patch of https://reviews.llvm.org/D159267
2023-09-19 09:05:12 +01:00
Michal Paszkowski
ec7baca17e [SPIR-V] Remove -opaque-pointers=0 from LITs, fixes for opaque pointers support
Differential Revision: https://reviews.llvm.org/D156049
2023-09-19 00:50:42 -07:00
Matt Arsenault
1328a8534b
AMDGPU: Fix handling of -0 in round lowering (#65761) 2023-09-19 09:14:17 +03:00
Fangrui Song
af935cf0ee
[CodeLayout] Fix X1_Y_X2 and Y_X2_X1 testing for jumps from Y (#66592)
The CHECK2 test in code_placement_ext_tsp_large.ll now has the same
result as
the CHECK test: when chain(0,2,3,4,1) is merged with chain(8), the
result is now
chain(0,2,3,4,8,1).

Ideally we should have test coverage for
-ext-tsp-chain-split-threshold=1, but
it seems challenging to craft one. Perhaps the default value of
-ext-tsp-chain-split-threshold can be decreased as the
-ext-tsp-enable-chain-split-along-jumps heuristic is now more powerful.
2023-09-18 22:50:17 -07:00
Wang Pengcheng
3017545e63
[RISCV] Fix inline asm error for block address (#66640)
After commit cedf2ea, `RISCVMergeBaseOffset` can handle `BlockAddress`
currently. But we didn't handle it in `PrintAsmMemoryOperand` so we
get `invalid operand in inline asm` error.

This patch fixes the error.
2023-09-19 11:46:43 +08:00
Philip Reames
928564caa5
[RISCV] Combine a gather to a larger element type (#66694)
If we have a gather load whose indices correspond to valid offsets for a
gather with element type twice that our source, we can reduce the number
of indices and perform the operation at the larger element type.

This is generally profitable since we half VL - and these operations are
linear in VL. This may require some additional VL/VTYPE toggles, but
this appears to be worthwhile on the whole.
2023-09-18 16:55:38 -07:00
weiguozhi
9a04bc4c43
[AArch64] Move LDR_PXI from isStoreToStackSlot to isLoadFromStackSlot (#65658)
LDR_PXI is a load instruction, so it should be in isLoadFromStackSlot.
2023-09-18 15:52:41 -07:00
Philip Reames
e52c558813 [RISCV] Narrow indices of fixed vector gather/scatter nodes
Doing so allows the use of smaller constants overall, and may allow (for some small vector constants) avoiding the constant pool entirely.  This can result in extra VTYPE toggles if we get unlucky.

This was reviewed under PR #66405.
2023-09-18 11:49:14 -07:00
Craig Topper
8677aaa1a3 [RISCV][GISel] Add initial pre-legalizer combiners copying from AArch64. 2023-09-18 10:59:00 -07:00
Jon Roelofs
83e6d2edfc
Revert "[ARM] Always lower direct calls as direct when the outliner is enabled (#66434)"
This reverts commit 003bcad9a8b21e15e3786a52b1dafa844075ab84.

ARM folks say it regresses some of their benchmarks:
https://github.com/llvm/llvm-project/pull/66434#issuecomment-1722424162
2023-09-18 09:45:46 -07:00
Philip Reames
0722800289 [RISCV] Match constant indices of non-index type when forming strided ops (#65777)
When checking to see if our index expressions can be converted into strided
operations, we previously gave up if the index type wasn't an exact match for
the intptrty for the address. Per gep semantics, this mismatch implies a sext
or trunc cast to the respective index type. For constants, go ahead and
evaluate that cast instead of giving up.

Note that the motivation of this is mostly test cleanup. We canonicalize at IR
such that the gep index will match the intptrty. This is mostly useful so that
we can write both RV32 and RV64 tests from the same source. Its also helpful in
preventing confusion - I've stumbled across this at least four times now and
wasted time each one.

Note: The test change for scatters unit stride cases contains a minor
regression for rv32 and 64 bit indices.  This is an artifact of order in which
changes are landing.  This will be addressed in a near future change for all
configurations.
2023-09-18 09:41:34 -07:00
Philip Reames
bb7b8726a4 [RISCV] Merge some test checks rvv/fixed-vectors-masked-gather.ll [nfc] 2023-09-18 09:20:12 -07:00
pawosm-arm
be16b03e20
[AArch64] Remove the Z#_HI register definitions (#66353)
The Z#_HI register definitions were created during the very early SVE
enablement work and before the SVE calling convention was locked in.

As they look entirely unused, they need to go.
2023-09-18 17:18:28 +01:00
Craig Topper
8f04d81ede [SelectionDAG][RISCV] Mask constants to narrow size in TargetLowering::expandUnalignedStore.
If the SRL for Hi constant folds, but we don't remoe those bits from
the Lo, we can end up with strange constant folding through DAGCombine later.
I've only seen this with constants being lowered to constant pools
during lowering on RISC-V.
2023-09-18 09:10:19 -07:00
Craig Topper
17a12a27ec [RISCV] Add test case to show bad codegen for unaligned i64 store of a large constant.
On the first split we create two i32 trunc stores and a srl to shift
the high part down. The srl gets constant folded, but to produce
a new i32 constant. But the truncstore for the low store still uses
the original constant.

This original constant then gets converted to a constant pool
before we revisit the stores to further split them. The constant
pool prevents further constant folding of the additional srls.

After legalization is done, we run DAGCombiner and get some constant
folding of srl via computeKnownBits which can peek through the constant
pool load. This can create new constants that also need a constant pool.
2023-09-18 09:10:19 -07:00
Craig Topper
f71a9e8bb7
[SelectionDAG][RISCV][PowerPC][X86] Use TargetConstant for immediates for ISD::PREFETCH. (#66601)
The intrinsic uses ImmArg so TargetConstant would be consistent
with how other intrinsics are handled.

This hides the constants from type legalization so we can remove
the promotion support.

isel patterns are updated accordingly.
2023-09-18 08:58:50 -07:00
Nikita Popov
38c59b9f53 Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size"
This reverts commit 47324cfd7d8ca1a2a5cbb9f948ecff66a28ee6bc.

This exposed incorrect debuginfo in rustc. Revert the verification
until this has been fixed.
2023-09-18 17:24:53 +02:00
Jay Foad
d8d0588f66
[TwoAddressInstruction] Update LiveIntervals after INSERT_SUBREG with undef read (#66211)
Update LiveIntervals after rewriting:
  %reg = INSERT_SUBREG undef %reg, %subreg, subidx
to:
  undef %reg:subidx = COPY %subreg

D113044 implemented this for the non-undef case.
2023-09-18 14:51:58 +01:00
Nikita Popov
4491f0b969 [IR] Remove unnecessary bitcast from CreateMalloc()
This bitcast is no longer necessary with opaque pointers. This
results in some annoying variable name changes in tests.
2023-09-18 14:58:16 +02:00
Sergei Barannikov
caaf61eb6e
[SDag] Fold saddo[_carry] with bitwise-not argument to ssubo[_carry] (#66571)
Fold `(saddo (not a), 1)` to `(ssubo 0, a)` and
`(saddo_carry (not a), b, c)` to `(ssubo_carry b, a, !c)`.

Proof: https://alive2.llvm.org/ce/z/Lj49YM

This is the same as https://reviews.llvm.org/D46505 and
https://reviews.llvm.org/D59208, but for signed opcodes.
2023-09-18 14:45:41 +03:00
Jay Foad
102838d3f6
update_mir_test_checks.py: match undef vreg subreg definitions (#66627)
Following on from D139466 which added support for dead vreg defs, this
patch adds support for "undef" defs of subregs.

Use this to regenerate checks for amx-greedy-ra-spill-shape.ll which
previously required manual tweaks to the autogenerated checks to fix an
EXPENSIVE_CHECKS failure; see commit
8b7c1fbd9647a5a6ef246a6b5b2543ea0f5a2337
2023-09-18 12:14:46 +01:00
Piyou Chen
b83a1ed594 [RISCV] Only emit .option when extension is supported
It maybe emit the .option directive without any follow up. Only emit the .option push/pop when there are supported extension difference between function and module.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D159399
2023-09-18 00:30:13 -07:00
Piyou Chen
d861b3183c [RISCV][NFC] precommit for D159399
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D159400
2023-09-18 00:18:08 -07:00
wangpc
cedf2ea7b5 [RISCV] Teach RISCVMergeBaseOffset to handle BlockAddress
We can get `BlockAddress` in user code via `Labels as Values` so
we should be able to merge the access to `BlockAddress`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D159429
2023-09-18 11:47:14 +08:00
wangpc
28efe4d38e [RISCV] Add tests for merging base offset of BlockAddress
We can get `BlockAddress` in user code via `Labels as Values`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D159428
2023-09-18 11:47:13 +08:00
Vettel
ddae50d1e6
[RISCV] Combine trunc (sra sext (x), zext (y)) to sra (x, smin (y, scalarsizeinbits(y) - 1)) (#65728)
For RVV, If we want to perform an i8 or i16 element-wise vector
arithmetic right shift in the upper C/C++ program, the value to be
shifted would be first sign extended to i32, and the shift amount would
also be zero_extended to i32 to perform the vsra.vv instruction, and
followed by a truncate to get the final calculation result, such pattern
will later expanded to a series of "vsetvli" and "vnsrl" instructions
later, this is because the RVV spec only support 2 * SEW -> SEW
truncate. But for vector, the shift amount can also be determined by
smin (Y, ScalarSizeInBits(Y) - 1)). Also, for the vsra instruction, we
only care about the low lg2(SEW) bits as the shift amount.

- Alive2: https://alive2.llvm.org/ce/z/u3-Zdr
- C++ Test cases : https://gcc.godbolt.org/z/q1qE7fbha
2023-09-17 17:11:28 +08:00