52796 Commits

Author SHA1 Message Date
Min-Yih Hsu
1e39575a98
[RISCV] CSE by swapping conditional branches (#71111)
DAGCombiner, as well as InstCombine, tend to canonicalize GE/LE into
GT/LT, namely:
```
X >= C --> X > (C - 1)
```
Which sometime generates off-by-one constants that could have been CSE'd
with surrounding constants.
Instead of changing such canonicalization, this patch tries to swap
those branch conditions post-isel, in the hope of resurfacing more
constant CSE opportunities. More specifically, it performs the following
optimization:

For two constants C0 and C1 from
```
li Y, C0
li Z, C1
```
To remove redundnat `li Y, C0`,
 1. if C1 = C0 + 1 we can turn: 
    (a) blt Y, X -> bge X, Z
    (b) bge Y, X -> blt X, Z
 2. if C1 = C0 - 1 we can turn: 
    (a) blt X, Y -> bge Z, X
    (b) bge X, Y -> blt Z, X

This optimization will be done by PeepholeOptimizer through
RISCVInstrInfo::optimizeCondBranch.
2023-11-03 09:03:52 -07:00
Philip Reames
f6f769203d [tests] Autogenerate a couple of tests
As usual, making it easier for an upcoming test delta to be seen.

Note that several of these are examples of extremely bad testing practice.
Checking internal debug output (for no real purpose), and checking the
result of a fully O2 + llc run instead of reducing the specific problematic
pass.
2023-11-03 08:42:23 -07:00
Paul Walker
de88371d9d
[LLVM][AArch64] Add ASM constraints for reduced GPR register ranges. (#70970)
[LLVM][AArch64] Add ASM constraints for reduced GPR register ranges.
    
The patch adds the follow ASM constraints:
  Uci => w8-w11/x8-x11
  Ucj => w12-w15/x12-x15
    
These constraints are required for SME load/store instructions
where a reduced set of GPRs are used to specify ZA array vectors.
    
NOTE: GCC has agreed to use the same constraint syntax.
2023-11-03 15:34:45 +00:00
Jessica Del
6e4692c9ee
[AMDGPU] - Add s_wqm intrinsics (#71048)
Add intrinsics to generate `s_wqm_b32` and `s_wqm_b64`.

Support VGPR arguments by inserting a `v_readfirstlane`.
2023-11-03 14:48:59 +01:00
Paul Walker
17970df6dc
[LLVM][SVE] Move ADDVL isel patterns under UseScalarIncVL feature flag. (#71173)
Also removes a duplicate pattern.
2023-11-03 13:23:02 +00:00
Nikita Popov
e4a4122eb6
[IR] Remove zext and sext constant expressions (#71040)
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.

There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.

This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
2023-11-03 10:46:07 +01:00
Nikita Popov
060de415af Reapply [InstCombine] Simplify and/or of icmp eq with op replacement (#70335)
Relative to the first attempt, this contains two changes:

First, we only handle the case where one side simplifies to true or
false, instead of calling simplification recursively. The previous
approach would return poison if one operand simplified to poison
(under the equality assumption), which is incorrect.

Second, we do not fold llvm.is.constant in simplifyWithOpReplaced().
We may be assuming that a value is constant, if the equality holds,
but it may not actually be constant. This is nominally just a QoI
issue, but the std::list implementation in libstdc++ relies on the
precise behavior in a way that causes miscompiles.

-----

and/or in logical (select) form benefit from generic simplifications via
simplifyWithOpReplaced(). However, the corresponding fold for plain
and/or currently does not exist.

Similar to selects, there are two general cases for this fold
(illustrated with `and`, but there are `or` conjugates).

The basic case is something like `(a == b) & c`, where the replacement
of a with b or b with a inside c allows it to fold to true or false.
Then the whole operation will fold to either false or `a == b`.

The second case is something like `(a != b) & c`, where the replacement
inside c allows it to fold to false. In that case, the operand can be
replaced with c, because in the case where a == b (and thus the icmp is
false), c itself will already be false.

As the test diffs show, this catches quite a lot of patterns in existing
test coverage. This also obsoletes quite a few existing special-case
and/or of icmp folds we have (e.g. simplifyAndOrOfICmpsWithLimitConst),
but I haven't removed anything as part of this patch in the interest of
risk mitigation.

Fixes #69050.
Fixes #69091.
2023-11-03 10:16:15 +01:00
Brandon Wu
74f38df1d1
[RISCV] Support Xsfvfnrclipxfqf extensions (#68297)
FP32-to-int8 Ranged Clip Instructions

https://sifive.cdn.prismic.io/sifive/0aacff47-f530-43dc-8446-5caa2260ece0_xsfvfnrclipxfqf-spec.pdf
2023-11-03 10:52:37 +08:00
Brandon Wu
945d2e6e60
[RISCV] Support Xsfvfwmaccqqq extensions (#68296)
Bfloat16 Matrix Multiply Accumulate Instruction

https://sifive.cdn.prismic.io/sifive/c391d53e-ffcf-4091-82f6-c37bf3e883ed_xsfvfwmaccqqq-spec.pdf
2023-11-03 10:08:26 +08:00
Michael Maitland
801a30aa8f
[CodeGen][MIR] Support parsing of scalable vectors in MIR (#70893)
This patch builds on the support for vectors by adding ability to parse
scalable vectors in MIR and updates error messages to reflect that ability.
2023-11-02 21:49:18 -04:00
Brandon Wu
65dc96c2cf
[RISCV] Fix wrong implication for zvknhb. (#66860) 2023-11-03 09:32:21 +08:00
Zhaoxuan Jiang
1f54ef78d5
[AArch64] Only clear kill flags if necessary when merging str (#69680)
Previously the kill flags of the source register were unconditionally
cleared when a `str` pair was merged, which results in suboptimal
register allocation and inhibits some renaming opportunities which may
allow further merging `str`.
2023-11-02 17:03:21 -07:00
Kai Luo
7b5505b0d5
[PowerPC] Change registers used in test due to ABI breakage. NFC. (#70758)
Usage of `r30` and `r31` has broken current traceback table's convention
on AIX. Avoid using CSRs in livein list.
2023-11-03 07:08:33 +08:00
Craig Topper
9769026858 [RISCV] Add (i32 (and GPR:, TrailingOnesMask:)) pattern for RV64 with legal i32. 2023-11-02 15:03:05 -07:00
Craig Topper
014390d937
[RISCV] Implement cross basic block VXRM write insertion. (#70382)
This adds a new pass to insert VXRM writes for vector instructions. With
the goal of avoiding redundant writes.

The pass does 2 dataflow algorithms. The first is a forward data flow to
calculate where a VXRM value is available. The second is a backwards
dataflow to determine where a VXRM value is anticipated.

Finally, we use the results of these two dataflows to insert VXRM writes
where a value is anticipated, but not available.

The pass does not split critical edges so we aren't always able to
eliminate all redundancy.

The pass will only insert vxrm writes on paths that always require it.
2023-11-02 14:09:27 -07:00
qcolombet
839f1e40b1
[X86][SDAG] Improve the lowering of s|uitofp i8|i16 to half (#70834)
Prior to this patch, vector `s|uitofp` from narrow types (`<= i16`) were
scalarized when the hardware doesn't support fp16 conversions natively.
This patch fixes that by avoiding using `i16` as an intermediate type
when there is no hardware support conversion from this type to half. In
other words, when the target doesn't support `avx512fp16`, we avoid
using intermediate `i16` vectors for `s|uitofp` conversions.

Instead we extend the narrow type to `i32`, which will be converted to
`float` and downcasted to `half`.
Put differently, we go from:
```
s|uitofp iNarrow %src to half
```
To
```
%tmp = s|zext iNarrow %src to i32
%tmpfp = s|uitofp i32 %tmp to float
fptrunc float %tmpfp to half
```

Note that this patch:
- Doesn't change the actual lowering of i32 to half. I.e., the `float`
intermediate step and the final downcasting are what existed for this
input type to half.
- Changes only the intermediate type for the lowering of `s|uitofp`.
I.e., the first `s|zext` from i16 to i32.

Remark: The vector and scalar lowering of `s|uitofp` don't use the same
code path. Not super happy about that, but I'm not planning to fix that,
at least in this PR.

This fixes https://github.com/llvm/llvm-project/issues/67080
2023-11-02 21:25:36 +01:00
Amara Emerson
d62c6ad2b0 Fix more RISCV GISel tests using -march instead of -mtriple 2023-11-02 12:42:00 -07:00
Nico Weber
6acd1671e6 Revert "[AMDGPU] Generate wwm-reserved.ll (NFC)"
This reverts commit b3523d7e6d8834468cfcb66e629adbe17da90ea5.
Breaks tests on mac, see:
https://github.com/llvm/llvm-project/commit/b3523d7e6d88344#commitcomment-131547708
2023-11-02 14:55:41 -04:00
Simon Pilgrim
4c41e7ce20 [X86] Add the second test case mentioned on Issue #65895 2023-11-02 16:19:34 +00:00
Ramkumar Ramachandra
5e1d81ac68
LegalizeIntegerTypes: implement PromoteIntRes for xrint (#71055)
Recently, 98c90a1 (ISel: introduce vector ISD::LRINT, ISD::LLRINT;
custom RISCV lowering) introduced vector variants of llvm.lrint,
llvm.llrint, and bundled several tests along with the code change.
However, it forgot to test lrint and llrint on fixed vectors on RISC-V,
and it turns out that that fixed-vectors-lrint.ll requires
PromoteIntRes_XRINT to be implemented. Implement it, and add tests for
fixed-vector lrint, llrint.
2023-11-02 15:53:56 +00:00
Paul Walker
c95253b1ba [LLVM][SVE] Clean VLS tests to not use wide vectors as function return types. 2023-11-02 12:41:37 +00:00
Jay Foad
b90cfe4601
[AMDGPU] New ttracedata intrinsics (#70235)
Add llvm.amdgcn.s.ttracedata and llvm.amdgcn.s.ttracedata.imm which map
directly to the corresponding instructions s_ttracedata and
s_ttracedata_imm. These are inherently whole-wave operations so any
non-uniform inputs are readfirstlaned.
2023-11-02 10:35:15 +00:00
Jay Foad
65bad23e43 [AMDGPU] Fix test for #70532 (Implement moveToVALU for S_CSELECT_B64) 2023-11-02 10:31:02 +00:00
Jay Foad
1590cac494
[AMDGPU] Implement moveToVALU for S_CSELECT_B64 (#70352)
moveToVALU previously only handled S_CSELECT_B64 in the trivial case
where it was semantically equivalent to a copy. Implement the general
case using V_CNDMASK_B64_PSEUDO and implement post-RA expansion of
V_CNDMASK_B64_PSEUDO with immediate as well as register operands.
2023-11-02 10:08:09 +00:00
Jessica Del
41cf94e6b8
[AMDGPU] - Add s_quadmask intrinsics (#70804)
Add intrinsics to generate `s_quadmask_b32`
and `s_quadmask_b64`.

Support VGPR arguments by inserting a `v_readfirstlane`.
2023-11-02 10:37:52 +01:00
Thomas Symalla
18839aec4e
[AMDGPU] Detect kills in register sets when trying to form V_CMPX instructions. (#68293)
During the SIOptimizeExecMasking pass, we try to form V_CMPX
instructions by detecting S_AND_SAVEEXEC and V_MOV instructions.
Generally, we require the input operand of the V_MOV, which is the input
operand to the to-be-formed V_CMPX, to be alive. This is forced by
clearing the kill flags on the operand after V_CMPX has been generated.

However, if we have a kill of a register set that contains said
register, this will not be detected by clearKillFlags.
With this change, possible additional kill-flag candidates will be
detected during the final call to findInstrBackwards and then, the kill
flag will be removed to keep all registers in the set alive.

Co-authored-by: Thomas Symalla <thomas.symalla@amd.com>
2023-11-02 10:36:27 +01:00
Carl Ritson
b3523d7e6d [AMDGPU] Generate wwm-reserved.ll (NFC) 2023-11-02 17:50:42 +09:00
Carl Ritson
0eb516817d
[AMDGPU] Remove dom tree requirements from SIWholeQuadMode pass (#71012)
SIWholeQuadMode preserves dominator and post dominator trees, but does
not require them.
2023-11-02 17:16:19 +09:00
Matt Arsenault
5a9b99630b X86: Move ExpandLargeFpConvert tests to test/Transforms 2023-11-02 15:50:31 +09:00
Matt Arsenault
d636d73f94 Revert "Move ExpandLargeDivRem to llvm/test/CodeGen/X86 because they need a triple"
This reverts commit 6bf1b4e8e0776e6f27013434d8b632016ccc795c.

Requiring a triple does not require moving these to a codegen test directory.
Move these to an x86 specific subdirectory of a transforms test.
2023-11-02 14:38:12 +09:00
Tobias Stadler
ba0763e4cb [GlobalISel][M68k] Update test after 373c343
Missed test case in experimental target, which was not covered by pre-merge checks.
2023-11-02 03:32:47 +01:00
Tobias Stadler
373c343a77 Reland: [GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND
Reland 3686a0b after fixing an exposed miscompile in #68840

Differential Revision: https://reviews.llvm.org/D159140
2023-11-02 00:18:19 +01:00
Valery Pykhtin
e808f8a616
[AMDGPU] GCNRegPressurePrinter pass to print GCNRegPressure values for testing. (#70031)
Using GCNDownwardRPTracker or GCNUpwardRPTracker the pass collects register pressure values for a function and prints these values next to instructions. Output can be used to generate Filecheck rules in mir tests.
2023-11-01 23:01:39 +01:00
Craig Topper
cfb791aa4b [RISCV] Add RV64 i32 patterns for bseti/bclri/binvi.
Needed for -riscv-experimental-rv64-legal-i32 and probably GISel.
2023-11-01 13:30:47 -07:00
Jay Foad
86f2e09250
[AMDGPU] Tweak handling of GlobalAddress operands in SI_PC_ADD_REL_OFFSET (#70960)
When SI_PC_ADD_REL_OFFSET is expanded to S_GETPC/S_ADD/S_ADDC, the
GlobalAddress operands have to be adjusted by 4 or 12 bytes to account
for the offset from the end of the S_GETPC instruction to the literal
operands. Do this all in SIInstrInfo::expandPostRAPseudo instead of
duplicating the adjustment code in both AMDGPULegalizerInfo and
SITargetLowering. NFCI.
2023-11-01 19:48:30 +00:00
Craig Topper
c4649d05cf [RISCV] Teach RISCVOptWInstrs that 'bset x0, 30-0' satisfies isSignExtendingOpW.
Constant materialization can use bset x0, 11 to create 2048.
2023-11-01 12:29:37 -07:00
Fangrui Song
a62b86a3e6
[AArch64,ELF] Restrict MOVZ/MOVK to non-PIC large code model (#70178)
There is no PIC support for -mcmodel=large

(https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst)
and Clang recently rejects -mcmodel= with PIC (#70262).

The current backend code assumes that the large code model is non-PIC.
This patch adds `!getTargetMachine().isPositionIndependent()` conditions
to clarify that the support is non-PIC only. In addition, add some tests
as change detectors in case PIC large code model is supported in the
future.

If other front-ends/JITs use the large code model with PIC, they will
get small code model code sequence, instead of potentially-incorrect
MOVZ/MOVK sequence, which is only suitable for non-PIC. The sequence
will cause text relocations using ELF linkers.

(The small code model code sequence is usually sufficient as ADRP+ADD or
ADRP+LDR targets [-2**32,2**32), which has a doubled range of x86-64
R_X86_64_REX_GOTPCRELX/R_X86_64_PC32 [-2**32,2**32).)
2023-11-01 12:10:44 -07:00
Craig Topper
5570d3250f [RISCV] Don't promote i32 and/or/xor with -riscv-experimental-rv64-legal-i32.
Some test improvements, but also some regressions that need to be
fixed.
2023-11-01 11:36:46 -07:00
Craig Topper
8912200966
[RISCV] Add experimental support for making i32 a legal type on RV64 in SelectionDAG. (#70357)
This will select i32 operations directly to W instructions without
custom nodes. Hopefully this can allow us to be less dependent on
hasAllNBitUsers to recover i32 operations in RISCVISelDAGToDAG.cpp.

This support is enabled with a command line option that is off by
default.

Generated code is still not optimal.

I've duplicated many test cases for this, but its not complete. Enabling this runs all existing lit tests without crashing.
2023-11-01 09:36:41 -07:00
Sander de Smalen
7dc20abed0 [AArch64] Fix spillfill-sve.mir with expensive checks.
This fixes an issue introduced by PR #70679.

Using constrainRegClass() is not strong enough to actually force
the use of a register to be a PPR register class. It will need an
actual COPY to do the conversion.

The downside is that this introduces an extra register, which is an
issue we may want to fix at a later point using a custom copy operation
where the register allocator uses the same register when it can.
2023-11-01 16:29:44 +00:00
Shengchen Kan
860f9e5170
[NFC][X86] Reorder the registers to reduce unnecessary iterations (#70222)
* Introduce field `PositionOrder` for class `Register` and
`RegisterTuples`
* If register A's `PositionOrder` < register B's `PositionOrder`, then A
is placed before B in the enum in X86GenRegisterInfo.inc
* The new order of registers in the enum for X86 will be
      1. Registers before AVX512,
      2. AVX512 registers (X/YMM16-31, ZMM0-31, K registers)
      3. AMX registers (TMM)
      4.  APX registers (R16-R31)
* Add a new target hook `getNumSupportedRegs()` to return the number of
registers for the function (may overestimate).
* Replace `getNumRegs()` with `getNumSupportedRegs()` in LiveVariables
to eliminate iterations on unsupported registers

This patch can reduce 0.3% instruction count regression for sqlite3
during compile-stage (O3) by not iterating on APX registers
for #67702
2023-11-02 00:12:05 +08:00
Nikita Popov
57384aeb37 [ConstantFold] Avoid creating undesirable cast expressions
Similar to what we do for binops, for undesirable casts we should
call the constant folding API instead of the constant expr API,
to avoid indirect creation of undesirable cast ops.
2023-11-01 16:50:52 +01:00
Nikita Popov
7a5c14cb27 [X86] Regenerate test checks (NFC) 2023-11-01 16:37:30 +01:00
Sander de Smalen
2efea512c2
[AArch64] Fix spilling/filling of virtual registers in PNR regclass. (#70679)
We made the assumption that the registers were always physical
registers, which doesn't have to be true.
2023-11-01 10:57:12 +00:00
Simon Pilgrim
f471f6ff2f [X86] combineTruncateWithSat - relax minimum truncation size for PACKSS/PACKUS
truncateVectorWithPACK handling of sub-128-bit result types was improved some time ago, so remove the old 64-bit limit

Fixes #68466
2023-11-01 10:33:35 +00:00
Simon Pilgrim
432e11478a [X86] fpclamptosat_vec.ll - add AVX2/AVX512 test coverage 2023-11-01 10:04:28 +00:00
Simon Pilgrim
dc5e6e4c07 [X86] Add fpclamptosat to vXi8 test coverage
Adds additional test coverage for Issue #68466
2023-11-01 10:04:28 +00:00
Craig Topper
2862d17b30 [RISCV][GISel] Add test case for FP load/store legalization. NFC 2023-10-31 23:49:56 -07:00
Qiu Chaofan
b46e768455
[DAGCombine] Fold setcc_eq infinity into is.fpclass (#67829) 2023-11-01 11:51:15 +09:00
Min-Yih Hsu
87f671756d
[RISCV] Use FLI + FNEG to materialize some negative FP constants (#70825)
Most of the FP constants supported by FLI are positive. For negative FP
constants X whose positive values is supported by FLI, we can use `(FNEG
(FLI -X))` to materialize X.
2023-10-31 17:52:50 -07:00