52796 Commits

Author SHA1 Message Date
Mircea Trofin
cda388c440 [mlgo] Fix test post PR #76697
Opcode values changed, trivial fix.
2024-01-03 19:38:03 -08:00
Micah Weston
7df28fd61a
[SHT_LLVM_BB_ADDR_MAP][AsmPrinter] Implements PGOAnalysisMap emitting in AsmPrinter with tests. (#75202)
Uses machine analyses to emit PGOAnalysisMap into the bb-addr-map ELF
section. Implements filecheck tests to verify emitting new fields.

This patch emits optional PGO related analyses into the bb-addr-map ELF
section during AsmPrinter. This currently supports Function Entry Count,
Machine Block Frequencies. and Machine Branch Probabilities. Each is
independently enabled via the `feature` byte of `bb-addr-map` for the given
function.

A part of [RFC - PGO Accuracy Metrics: Emitting and Evaluating Branch and Block Analysis](https://discourse.llvm.org/t/rfc-pgo-accuracy-metrics-emitting-and-evaluating-branch-and-block-analysis/73902).
2024-01-03 19:17:44 -05:00
Nicolai Hähnle
49b492048a
AMDGPU: Fix packed 16-bit inline constants (#76522)
Consistently treat packed 16-bit operands as 32-bit values, because
that's really what they are. The attempt to treat them differently was
ultimately incorrect and lead to miscompiles, e.g. when using non-splat
constants such as (1, 0) as operands.

Recognize 32-bit float constants for i/u16 instructions. This is a bit
odd conceptually, but it matches HW behavior and SP3.

Remove isFoldableLiteralV216; there was too much magic in the dependency
between it and its use in SIFoldOperands. Instead, we now simply rely on
checking whether a constant is an inline constant, and trying a bunch of
permutations of the low and high halves. This is more obviously correct
and leads to some new cases where inline constants are used as shown by
tests.

Move the logic for switching packed add vs. sub into SIFoldOperands.
This has two benefits: all logic that optimizes for inline constants in
packed math is now in one place; and it applies to both SelectionDAG and
GISel paths.

Disable the use of opsel with v_dot* instructions on gfx11. They are
documented to ignore opsel on src0 and src1. It may be interesting to
re-enable to use of opsel on src2 as a future optimization.

A similar "proper" fix of what inline constants mean could potentially
be applied to unpacked 16-bit ops. However, it's less clear what the
benefit would be, and there are surely places where we'd have to
carefully audit whether values are properly sign- or zero-extended. It
is best to keep such a change separate.

Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)
2024-01-04 00:10:15 +01:00
Ahmed Bougacha
155d5849da [AArch64] Avoid jump tables in swiftasync clobber-live-reg test. NFC.
The upstream test relies on jump-tables, which are lowered in
dramatically different ways with later arm64e/ptrauth patches.

Concretely, it's failing for at least two reasons:
- ptrauth removes x16/x17 from tcGPR64 to prevent indirect tail-calls
  from using either register as the callee, conflicting with their usage
  as scratch for the tail-call LR auth checking sequence.  In the
  1/2_available_regs_left tests, this causes the MI scheduler to move
  the load up across some of the inlineasm register clobbers.

- ptrauth adds an x16/x17-using pseudo for jump-table dispatch, which
  looks somewhat different from the regular jump-table dispatch codegen
  by itself, but also prevents compression currently.

They seem like sensible changes.  But they mean the tests aren't really
testing what they're intented to, because there's always an implicit
x16/x17 clobber when using jump-tables.

This updates the test in a way that should work identically regardless
of ptrauth support, with one exception, #1 above, which merely reorders
the load/inlineasm w.r.t. eachother.
I verified the tests still fail the live-reg assertions when
applicable.
2024-01-03 13:51:46 -08:00
Craig Topper
bdcd7c0ba0
[DAGCombiner][RISCV] Preserve disjoint flag in folding (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2) (#76860)
Since we are shifting both inputs to the original Or by the same amount
and inserting zeros in the LSBs, the result should still be disjoint.
2024-01-03 13:14:13 -08:00
Craig Topper
f64d1c810a [RISCV] Add test cases for folding disjoint Or into a scalar load address. NFC
After 47a1704ac94c8aeb1aa7e0fc438ff99d36b632c6 we are able to
reassociate a disjoint Or used as a GEP index to get the constant
closer to a load to fold it. This is show by the first test.

We are not able to do this if the GEP created a shift left to scale
the index as the second test shows.

To make this work, we need to preserve the disjoint flag when pulling
the Or through the shift.
2024-01-03 12:17:57 -08:00
Craig Topper
47a1704ac9
[SelectionDAG][X86] Use disjoint flag in SelectionDAG::isADDLike. (#76847)
Keep the haveNoCommonBitsSet check because we haven't started inferring
the flag yet.

I've added tests for two transforms, but these are not the only
transforms that use isADDLike.
2024-01-03 11:54:29 -08:00
Arthur Eubanks
c4146121e9 Revert "Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG""
This reverts commit 0e46b49de43349f8cbb2a7d4c6badef6d16e31ae.

Causes crashes, see repro on 0e46b49de4.
2024-01-03 17:09:46 +00:00
Arthur Eubanks
ece1359857 Revert "[PowerPC] Add test after #75271 on PPC. NFC. (#75616)"
This reverts commit 5cfc7b3342ce4de0bbe182b38baa8a71fc83f8f8.

This depends on 0e46b49de43349f8cbb2a7d4c6badef6d16e31ae which is being reverted.
2024-01-03 17:09:45 +00:00
Mirko Brkušanin
82e33d6203
[AMDGPU] Add VDSDIR instructions for GFX12 (#75197) 2024-01-03 16:32:00 +01:00
Simon Pilgrim
1d27669e8a [X86] combineConcatVectorOps - fold 512-bit concat(blendi(x,y,c0),blendi(z,w,c1)) to AVX512BW mask select
Yet another yak shaving regression fix for #73509
2024-01-03 12:38:26 +00:00
Saiyedul Islam
df1b5ae31d
[AMDGPU][GlobalISel] Update tests to check for COV5 (#76257)
Update GlobalISel tests to assume ABI to be code object version 5.
2024-01-03 17:53:47 +05:30
Simon Pilgrim
39be138cb7 [X86] combineTargetShuffle - fold SHUF128(CONCAT(),CONCAT()) to peek through upper subvectors
If SHUF128 is accessing only the upper half of a vector source that is a concatenation/insert_subvector then try to access the subvector directly and adjust the element mask accordingly.
2024-01-03 11:09:14 +00:00
Simon Pilgrim
58a335a3f1 [X86] Fold concat_vectors(permq(x),permq(x)) -> permq(concat_vectors(x,x))
Handle a common subvector shuffle pattern in combineConcatVectorOps
2024-01-03 11:09:14 +00:00
David Green
771fd1ad2a
[DAG] Extend input types if needed in combineShiftToAVG. (#76791)
This atempts to fix #76734 which is a crash in invalid TRUNC nodes types
from unoptimized input code in combineShiftToAVG. The NVT can be VT if
the larger type was legal and the adds will not overflow, in which case
the inputs should be extended.

From what I can tell this appears to be valid (if not optimal for this
case): https://alive2.llvm.org/ce/z/fRieHR

The result has also been changed to getExtOrTrunc in case that VT==NVT,
which is not handled by SEXT/ZEXT.
2024-01-03 10:52:01 +00:00
Stanislav Mekhanoshin
3bcee8568a
[AMDGPU] Global ISel for llvm.prefetch (#76183) 2024-01-03 01:02:55 -08:00
David Green
d659bd1635
[GlobalISel][AArch64] Tail call libcalls. (#74929)
This tries to allow libcalls to be tail called, using a similar method
to DAG where the type is checked to make sure they match, and if so the
backend, through lowerCall checks that the tailcall is valid for all
arguments.
2024-01-03 07:59:36 +00:00
David Green
5b5614c92f
[AArch64][GlobalISel] Add legalization for vecreduce.fmul (#73309)
There are no native operations that we can use for floating point mul,
so lower by splitting the vector into chunks multiple times. There is
still a missing fold for fmul_indexed, that could help the gisel test
cases a bit.
2024-01-03 07:49:20 +00:00
brendaso1
923f6ac018
[FastISel][AArch64] Compare Instruction Miscompilation Fix (#75993)
When shl is folded in compare instruction, a miscompilation occurs when
the CMP instruction is also sign-extended. For the following IR:

  %op3 = shl i8 %op2, 3
  %tmp3 = icmp eq i8 %tmp2, %op3

It used to generate

   cmp w8, w9, sxtb #3

which means sign extend w9, shift left by 3, and then compare with the
value in w8. However, the original intention of the IR would require
`%op2` to first shift left before extending the operands in the
comparison operation . Moreover, if sign extension is used instead of
zero extension, the sample test would miscompile. This PR creates a fix
for the issue, more specifically to not fold the left shift into the CMP
instruction, and to create a zero-extended value rather than a
sign-extended value.
2024-01-03 13:49:05 +08:00
Craig Topper
4e347b4e38 Revert "[RISCV][ISel] Combine scalable vector add/sub/mul with zero/sign extension (#72340)"
This reverts most of commit 5b155aea0e529b7b5c807e189fef6ea5cd5faec9.
I have left the new test file, but regenerated the checks.

This causes failures in our downstream testing. The input types
to the extends need to be checked so we don't create RISCVISD::VZEXT_VL
with illegal or unsupported input type.
2024-01-02 19:49:42 -08:00
Kai Luo
8ae73fea3a [PowerPC] Precommit test for #72845. NFC. 2024-01-03 03:03:48 +00:00
Craig Topper
bf684a97f3
[RISCV] Don't emit vxrm writes for vnclip(u).wi with shift of 0. (#76578)
If there's no shift being performed, the rounding mode doesn't matter.

We could do the same for vssra and vssrl, but they are no-ops with a
shift of 0 so would be better off being removed earlier.
2024-01-02 09:50:06 -08:00
Thorsten Schütt
4b9194952d
[GlobalIsel] Combine selects with constants (#76089)
A first small step at combining selects.
2024-01-02 17:26:39 +01:00
Pierre van Houtryve
33565750e4
[AMDGPU] Fix moveToValu for copy to phys SGPRs (#76715)
Fixes #76031
2024-01-02 14:45:33 +01:00
Jay Foad
cf025c767e
[AMDGPU] GFX12 global_atomic_ordered_add_b64 instruction and intrinsic (#76149) 2024-01-02 13:02:20 +00:00
David Green
d714be978c
[AArch64] Check for exact size when finding 1's for CMLE. (#76452)
This is a fix for the second half of #75822, where smaller constants can
also be bitcast to larger types. We should be checking the size is what
we expect it to be when matching ones.
2024-01-02 09:24:08 +00:00
Jim Lin
9e1ad3cff6 [RISCV] Remove blank lines at the end of testcases. NFC. 2024-01-02 13:13:04 +08:00
David Green
d4a6995e94 [AArch64][GlobalISel] Legalize large G_SEXT_INREG
These come from the legalization of other operations, but it makes sense to
split the operations into legal sizes before lowering them.
2024-01-01 15:28:08 +00:00
Matt Arsenault
25cd249355 AMDGPU: Don't assert on select of v32i16/v32f16 2024-01-01 21:24:41 +07:00
Matt Arsenault
459270934b AMDGPU: Add more select bf16 vector tests 2024-01-01 21:24:41 +07:00
David Green
90c397fc56 [AArch64] Add icmp and fcmp tests for GlobalISel. NFC 2023-12-31 18:45:01 +00:00
Yingwei Zheng
1228becf7d
[FuncAttrs] Deduce noundef attributes for return values (#76553)
This patch deduces `noundef` attributes for return values.
IIUC, a function returns `noundef` values iff all of its return values
are guaranteed not to be `undef` or `poison`.
Definition of `noundef` from LangRef:
```
noundef
This attribute applies to parameters and return values. If the value representation contains any 
undefined or poison bits, the behavior is undefined. Note that this does not refer to padding 
introduced by the type’s storage representation.
```
Alive2: https://alive2.llvm.org/ce/z/g8Eis6

Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=30dcc33c4ea3ab50397a7adbe85fe977d4a400bd&to=c5e8738d4bfbf1e97e3f455fded90b791f223d74&stat=instructions:u
|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.01%|+0.01%|-0.01%|+0.01%|+0.03%|-0.04%|+0.01%|

The motivation of this patch is to reduce the number of `freeze` insts
and enable more optimizations.
2023-12-31 20:44:48 +08:00
Phoebe Wang
a384cd5012
[X86][BF16] Add subvec_zero_lowering patterns (#76507) 2023-12-31 11:14:41 +08:00
Mikhail Gudim
69bc371835
[RISCV][GlobalISel] Zbkb support for G_ROTL and G_ROTR (#76599)
These instructions are legal in the presence of Zbkb extension.
2023-12-30 00:45:18 -05:00
Min-Yih Hsu
4bd79ea3fe [M68k] Add pc-relative displacement (PCD) addressing mode for MOVSX
And disable offset folding altogether since we cannot always gain the
precise offset there to see if that fits into a certain size of
displacement.
2023-12-29 11:52:49 -08:00
Ivan Kosarev
b6daac023a
[AMDGPU][True16] Remove the VGPR_LO/HI16 register classes. (#76500) 2023-12-29 12:13:24 +00:00
yingopq
e13e95bc44
[Mips] Optimize (shift x (and y, BitWidth - 1)) to (shift x, y) (#73889)
Do optimization to turn x >> (shift & 31/63) into a single srlv instead
of andi + srlv, since the mips variable shift instruction already
implicitly masks the shift, like x86, wasm and AMDGPU. Copy the
X86DAGToDAGISel::isUnneededShiftMask() function to MIPS for checking
whether need combine two instructions to one.
2023-12-29 14:53:55 +05:30
Chia
87779fd823
[RISCV][ISel] Remove redundant min/max in saturating truncation (#75145)
This patch closed #73424, which is also a missed-optimization case
similar to #68466 on X86.

## Source Code
```
define void @trunc_sat_i8i16(ptr %x, ptr %y) {
  %1 = load <8 x i16>, ptr %x, align 16
  %2 = tail call <8 x i16> @llvm.smax.v8i16(<8 x i16> %1, <8 x i16> <i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128>)
  %3 = tail call <8 x i16> @llvm.smin.v8i16(<8 x i16> %2, <8 x i16> <i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127>)
  %4 = trunc <8 x i16> %3 to <8 x i8>
  store <8 x i8> %4, ptr %y, align 8
  ret void
}
```
## Before this patch: 
```
trunc_sat_i8i16:                  # @trunc_maxmin_id_i8i16
        vsetivli        zero, 8, e16, m1, ta, ma
        vle16.v v8, (a0)
        li      a0, -128
        vmax.vx v8, v8, a0
        li      a0, 127
        vmin.vx v8, v8, a0
        vsetvli zero, zero, e8, mf2, ta, ma
        vnsrl.wi        v8, v8, 0
        vse8.v  v8, (a1)
        ret
```

## After this patch: 
```
trunc_sat_i8i16:                  # @trunc_maxmin_id_i8i16
	vsetivli	zero, 8, e8, mf2, ta, ma
	vle16.v	v8, (a0)
	csrwi	vxrm, 0
	vnclip.wi	v8, v8, 0
	vse8.v	v8, (a1)
	ret
```
2023-12-29 16:15:47 +08:00
wanglei
da5378e87e [LoongArch] Fix incorrect pattern [X]VBITSELI_B instructions
Adjusted the operand order of [X]VBITSELI_B to correctly match vselect.
2023-12-29 14:44:29 +08:00
Chia
5b155aea0e
[RISCV][ISel] Combine scalable vector add/sub/mul with zero/sign extension (#72340)
This PR mainly aims at resolving the below missed-optimization case,
while it could also be considered as an extension of the previous patch
https://reviews.llvm.org/D133739?id=

## Missed-Optimization Case
Compiler Explorer: https://godbolt.org/z/GzWzP7Pfh
### Source Code: 
```
define <vscale x 2 x i16> @multiple_users(ptr  %x, ptr  %y, ptr %z) {
  %a = load <vscale x 2 x i8>, ptr %x
  %b = load <vscale x 2 x i8>, ptr %y
  %b2 = load <vscale x 2 x i8>, ptr %z
  %c = sext <vscale x 2 x i8> %a to <vscale x 2 x i16>
  %d = sext <vscale x 2 x i8> %b to <vscale x 2 x i16>
  %d2 = sext <vscale x 2 x i8> %b2 to <vscale x 2 x i16>
  %e = mul <vscale x 2 x i16> %c, %d
  %f = add <vscale x 2 x i16> %c, %d2
  %g = sub <vscale x 2 x i16> %c, %d2
  %h = or <vscale x 2 x i16> %e, %f
  %i = or <vscale x 2 x i16> %h, %g
  ret <vscale x 2 x i16> %i
}
```
### Before This Patch
```
# %bb.0:
        vsetvli a3, zero, e16, mf2, ta, ma
        vle8.v  v8, (a0)
        vle8.v  v9, (a1)
        vle8.v  v10, (a2)
        svf2       v11, v8
        vsext.vf2       v8, v9
        vsext.vf2       v9, v10
        vmul.vv v8, v11, v8
        vadd.vv v10, v11, v9
        vsub.vv v9, v11, v9
        vor.vv  v8, v8, v10
        vor.vv  v8, v8, v9
        ret
```
###  After This Patch 
```
# %bb.0:
	vsetvli	a3, zero, e8, mf4, ta, ma
	vle8.v	v8, (a0)
	vle8.v	v9, (a1)
	vle8.v	v10, (a2)
	vwmul.vv	v11, v8, v9
	vwadd.vv	v9, v8, v10
	vwsub.vv	v12, v8, v10
	vsetvli	zero, zero, e16, mf2, ta, ma
	vor.vv	v8, v11, v9
	vor.vv	v8, v8, v12
	ret
```
We can see Add/Sub/Mul are combined with the Sign Extension.

## Relation to the Patch D133739
The patch D133739 introduced an optimization for folding `ADD_VL`/
`SUB_VL` / `MUL_V` with `VSEXT_VL` / `VZEXT_VL`. However, the patch did
not consider the case of non-fixed length vector case, thus this PR
could also be considered as an extension for the D133739.

Furthermore, in the current `SelectionDAG`, we represent scalable vector
add (or any binary operator) as a normal `ADD` operation. It might be
better to use an Opcode like `ADD_VL`, which needs further conversation
and decision.
2023-12-29 14:36:38 +08:00
wanglei
c7367f985e [LoongArch] Fix incorrect pattern XVREPL128VEI_{W/D} instructions
Remove the incorrect patterns for `XVREPL128VEI_{W/D}` instructions,
and add correct patterns for XVREPLVE0_{W/D} instructions
2023-12-29 14:03:53 +08:00
wanglei
47c88bcd5d [LoongArch] Fix LASX vector_extract codegen
Custom lowering `ISD::EXTRACT_VECTOR_ELT` with lasx.
2023-12-29 13:48:53 +08:00
Qiu Chaofan
c97a7675ee
[PowerPC] Expand FSINCOS of fp128 (#76494) 2023-12-29 11:27:06 +08:00
Phoebe Wang
6c87f46795 [X86][NFC] Remove meaningless FIXME
Solved by #76485.
2023-12-29 10:45:42 +08:00
David Green
1c87d5c4fc [AArch64][GlobalISel] Lower fminnm/fmaxnm through Global ISel
Whilst this might technically not be correct if a combine treats signed zeroes
differently, where the neon operations are more defined than the minnum/maxnum
nodes. It mirrors what SDAG does, which allows us to lower aarch64_neon_fminnm
and aarch64_neon_fmaxnm through the existing selection patterns.
2023-12-28 20:02:30 +00:00
Mircea Trofin
dc1931a8c5 [mlgo] Fix post PR #76319
Some opcodes changed.
2023-12-28 08:28:51 -08:00
Craig Topper
2dccf11b92
[RISCV] Remove gp and tp from callee saved register lists. (#76483)
This appears to match gcc behavior.

Resolves
https://discourse.llvm.org/t/risc-v-calling-convention-implementation-in-clang-tp-and-gp-registers/75757
2023-12-27 21:36:03 -08:00
Phoebe Wang
e499ae53b3
[X86][BF16] Support INSERT_SUBVECTOR and CONCAT_VECTORS (#76485) 2023-12-28 13:29:01 +08:00
Wang Pengcheng
13cdee9047
[RISCV][MC] Add support for experimental Zcmop extension (#76395)
This implements experimental support for the Zcmop extension as
specified here:
https://github.com/riscv/riscv-isa-manual/blob/main/src/zimop.adoc.

This change adds only MC support.
2023-12-28 13:03:16 +08:00
Phoebe Wang
3081bacb60
[X86][BF16] Add X86SubVBroadcastld patterns (#76479) 2023-12-28 10:08:27 +08:00