48997 Commits

Author SHA1 Message Date
Craig Topper
a64b3e92c7 [RISCV] Re-define sha256, Zksed, and Zksh intrinsics to use i32 types.
Previously we returned i32 on RV32 and i64 on RV64. The instructions
only consume 32 bits and only produce 32 bits. For RV64, the result
is sign extended to 64 bits like *W instructions.

This patch removes this detail from the interface to improve
portability and consistency. This matches the proposal for scalar
intrinsics here https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44

I've included IR autoupgrade support as well.

I'll be doing this for other builtins/intrinsics that currently use
'long' in other patches.

Reviewed By: VincentWu

Differential Revision: https://reviews.llvm.org/D154647
2023-07-17 08:58:29 -07:00
Craig Topper
fda45d9198 [RISCV] Add FP compare test to condops.ll to show a missed opportunity to remove an xori. NFC
This is a case that D155288 won't get.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D155327
2023-07-17 08:47:42 -07:00
Simon Pilgrim
e9caa37e9c [DAG] Move lshr narrowing from visitANDLike to SimplifyDemandedBits
Inspired by some of the cases from D145468

Let SimplifyDemandedBits handle the narrowing of lshr to half-width if we don't require the upper bits, the narrowed shift is profitable and the zext/trunc are free.

A future patch will propose the equivalent shl narrowing combine.

Differential Revision: https://reviews.llvm.org/D146121
2023-07-17 15:50:09 +01:00
Amaury Séchet
a23d6c760c [NFC] Add test case for D154533. 2023-07-17 14:19:15 +00:00
Jay Foad
92542f2a40 [AMDGPU] Add targets gfx1150 and gfx1151
This is the target definition only. Currently they are treated the same
as GFX 11.0.x.

Differential Revision: https://reviews.llvm.org/D155429
2023-07-17 13:06:12 +01:00
Jay Foad
a2453c6130 [AMDGPU] Add test case for zext of f16 to i32
Preserve the test case from this abandoned review:
D51925 [AMDGPU] Fix issue for zext of f16 to i32
2023-07-17 12:55:29 +01:00
Simon Pilgrim
fd2de54920 [X86] Canonicalize vXi64 SIGN_EXTEND_INREG vXi1 to use v2Xi32 splatted shifts instead
If somehow a vXi64 bool sign_extend_inreg pattern has been lowered to vector shifts (without PSRAQ support), then try to canonicalize to vXi32 shifts to improve likelihood of value tracking being able to fold them away.

Using a PSLLQ and bitcasted PSRAD node make it very difficult for later fold to recover from this.
2023-07-17 10:18:03 +01:00
David Sherwood
cc68e05bd2 [SVE][CodeGen] Improve codegen for some zero-extends of masked loads
When doing a masked load of an illegal unpacked type and then
zero-extending to some illegal wider types we sometimes end up
with pointless 'and' instructions that are trying to zero bits
that we already know are zero. This patch fixes that by adding
more cases to performSVEAndCombine.

Differential Revision: https://reviews.llvm.org/D155281
2023-07-17 08:19:27 +00:00
Luke Lau
b5bcd4f60b [RISCV] Add VL nodes and VP patterns for unary zvbb instructions
This follows the pattern of lowering VP nodes to equivalent
RISCVISD::*_VL nodes. The nodes are modelled after the VP ISD nodes rather
than the actual zvbb instructions, and I've included a merge operand to be
consistent with the underlying pseudos that were recently refactored.

I've defined the nodes in RISCVInstrInfoVVLpatterns.td as the nodes aren't Zvk
specific, but the patterns are in RISCVInstrInfoZvk.td.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155229
2023-07-17 09:17:58 +01:00
David Sherwood
6a036316b3 [SVE][CodeGen] Add more test cases for zero-extends of masked loads
This patch adds test cases for extending masked loads of illegal
unpacked types into illegal wider types.

Pre-commits tests for D155281
2023-07-17 08:06:15 +00:00
Piyou Chen
7ce4e933ea [RISCV] Implement prefetch locality by NTLH
We add the MemOperand then backend will generate NTLH automatically.

```
__builtin_prefetch(ptr,  0 /* rw==read */, 0 /* locality */); => ntl.all + prefetch.r (ptr)
__builtin_prefetch(ptr,  0 /* rw==read */, 1 /* locality */); => ntl.pall + prefetch.r (ptr)
__builtin_prefetch(ptr,  0 /* rw==read */, 2 /* locality */); => ntl.p1 + prefetch.r (ptr)
__builtin_prefetch(ptr,  0 /* rw==read */, 3 /* locality */); => prefetch.r (ptr)
```

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154691
2023-07-16 20:32:46 -07:00
Jay Foad
a1a9c53ae7 [GlobalISel] Fix infinite loop in reassociation combine
Don't reassociate (C1+C2)+Y -> C1+(C2+Y).

Fixes https://github.com/llvm/llvm-project/issues/63849

Differential Revision: https://reviews.llvm.org/D155284
2023-07-16 14:15:24 +01:00
Jim Lin
348c67e254 [RISCV] Merge rv32/rv64 vector single-width shift intrinsic tests that have the same content. NFC. 2023-07-16 13:09:10 +08:00
Brad Smith
7973d51965 [Mips] Set setMaxAtomicSizeInBitsSupported
Set setMaxAtomicSizeInBitsSupported for Mips. Set the value as appropriate for 64-bit MIPS vs 32-bit.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D141189
2023-07-15 17:29:25 -04:00
Jon Chesterfield
6043d4dfec [amdgpu] Accept an optional max to amdgpu-lds-size attribute for use in PromoteAlloca 2023-07-15 21:37:21 +01:00
Stephen Peckham
ac5d5351d4 Use empty symbol name for XCOFF text csect
When generating XCOFF, the compiler generates a csect with an internal
name.  Each function results in a label within the csect.  This patch
replaces the internal name ".text" with an empty string "".  This avoids
adding special code to handle a function text() in the source file, and
works better with some XCOFF tools that are confused when the csect and
the first function have the same address.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D154854
2023-07-15 16:13:48 -04:00
Nitin John Raj
6a35ceaacf [RISCV][GlobalISel] Legalize add, sub and binary logical instructions for narrow types
For rv32, we test the legalization of i8, i16 and i32. For rv64, we additionally test the legalization of i64.

This is the first of a series of commits aiming to legalize arithmetic instructions for RISCV.

Reviewed By: craig.topper, arsenm

Differential Revision: https://reviews.llvm.org/D154978
2023-07-14 18:22:53 -07:00
Matt Arsenault
ef4a2b6096 AMDGPU: Expand testing of AMDGPUCodeGenPrepare fdiv handling
- Switch to generated checks
- Use a different run line per denormal mode to reduce test duplication
- Add test coverage for rsqrt cases
- Add test coverage for repeated arcp denominator
- Fix the optnone test
2023-07-14 18:57:40 -04:00
Maurice Heumann
a1cdb323e2 [ARM] Adjust strd/ldrd codegen alignment requirements
In change https://reviews.llvm.org/D152790, it was discovered that the
alignment requirement calculation for LDRD/STRD codegen was suboptimal
and the calculation for volatile loads and stores was adjusted.

This change here adopts the calculation for the remaining non-volatile
occurances.

Recommitting after undefined behavior fix in D155093.

Differential Revision: https://reviews.llvm.org/D153800
2023-07-14 12:54:18 -07:00
Konstantina Mitropoulou
21ca892f69 [NFC][AMDGPU] Add automated tests in or.ll
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155265
2023-07-14 12:39:14 -07:00
Mikhail Gudim
c158ddd99e Reapply [RISCV] Fold binary op into select if profitable.
This fixes some bugs in the original commit:
  (1) Operands are passed in correct order when creating new constant
  and the binary operator. New tests were added to cover these cases.
  (2) Check was added to see if it is safe to commute the select and the binary operator.

Reviewed By: Craig Topper

Differential Revision: https://reviews.llvm.org/D152147
2023-07-14 15:30:54 -04:00
Kamau Bridgeman
62c1cf7c63 [PowerPC][Future] Enable __builtin_mma_xxm[t|f]acc
Future cpu instructions dmxxinstdmr512 and dmxxextfdmr512 insert and extract
quad vectors from the new wide accumulator(wacc) register class.
The introduction of these new instructions renders the p10 instructions
xxmtacc and xxmfacc obsolete since the new wacc register class is a better
choice for handing quad vector operations. This patch ensures that, for
future cpu, instructions dmxxinstdmr512 and dmxxextfdmr512 are generated
by custom lowering the intrinsics for xxm[t|f]acc to produce no instructions.

Reviewed By: amyk, lei

Differential Revision: https://reviews.llvm.org/D153034
2023-07-14 13:38:40 -05:00
Craig Topper
3a0a25f9b6 [RISCV] Support i32 clmul* intrinsics on RV64.
We can use an i64 clmul to emulate i32 clmul.
For clmulh and clmulr we need to zero extend the 32 bit input
to 64 bits then extract either bits [63:32] or [62:31].

Unfortunately, without Zba we need to use 2 shifts for the
zero extends. These can be optimized out later if the producing
instruction already zeroed the upper bits or if we can use lwu.

There are alternative sequences we can use for clmulh/clmulr
when the zero extend isn't free, but those are best handled by
a DAG combine to give the best opportunity for removing the extend.

This allows us to implement i32 clmul C intrinsics proposed in
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D154729
2023-07-14 11:20:03 -07:00
Simon Cook
4083ecfd7f [RISCV] Cleanups in CORE-V (xcv) extensions
This is a mostly NFC change cleaning up and clarifying components of the
in-tree CORE-V (xcv*) extensions following discussions on the remaining
extensions.

This makes the following changes to the xcbitmanip and xcvmac support:

1. Add missing extensions from RISCVISAInfo, such that they can be
   supported in clang's -march option.
2. Clarify the extension version number is 1.0.0 in documentation.
3. Clarify the extensions are by OpenHW Group, and the capitilization
   of the CORE-V extension family.
4. Add CORE-V to extension name in RISCVFeatures, both to be consistent
   with other vendors, and also better distinguish e.g. CORE-V bit
   manipulation vs RISC-V's standard Zb extensions.

Differential Revision: https://reviews.llvm.org/D155283
2023-07-14 18:21:08 +01:00
Serge Pavlov
be794e3d92 [X86][FPEnv] Lowering of {get,set,reset}_fpenv
The change implements lowering of `get_fpenv`, `set_fpenv` and
`reset_fpenv`.

Differential Revision: https://reviews.llvm.org/D81833
2023-07-14 22:10:53 +07:00
Serge Pavlov
25a81a1871 Precommit tests on lowering *_fpenv on X86 2023-07-14 22:10:12 +07:00
Alex Bradbury
95075d3d2c [RISCV][test] Add RV32I and RV64I RUN lines to condops.ll test
Some of these test cases will be changed by upcoming combines, even in
the non-zicond case.
2023-07-14 13:29:40 +01:00
Alex Bradbury
5c5a1a2927 [RISCV] Introduce RISCVISD::CZERO_{EQZ,NEZ} nodes produce them when zicond is present in lowerSELECT
This patch is a step towards altering how we handle the emission of
condops. Marking ISD::SELECT as legal is a major change in the codegen
path, and gives few options for maintaining the old codegen path when
it is believed to be better (e.g. a better branchless sequence is
possible using non-zicond instructions, or the branch-based sequence is
preferable).

This removes the existing SelectionDAG patterns and moves the logic into
lowerSELECT. Along some small codegen changes you'll note a few minor
regressions in the generated code quality - this are due to the fact
that by lowering the SELECT node early we miss out on combines that
would kick in later when setcc condcodes that aren't natively supported
have been expanded (thus exposing opportunities for optimisation by
performing logical negation and swapping truev/falsev). I've opted to
split out work that addresses these into follow-on patches (especially
as zicond is still 'experimental').

matchSetCC is a straight-forward translation from the version in
RISCVISelDAGToDAG. Ideally, in the future it can be converted to a
helper shared between both files.

Differential Revision: https://reviews.llvm.org/D155083
2023-07-14 11:31:27 +01:00
Simon Pilgrim
720debcf64 [X86] Fold PACKSS(NOT(X),NOT(Y)) -> NOT(PACKSS(X,Y)) 2023-07-14 10:36:21 +01:00
David Green
edf9f88566 [AArch64] Handle 64bit vector s/umull from extracts
This is similar to D153632, but for mul nodes instead of add/sub. They get
recognised in LowerMUL in order to detect the mul(ext, ext), in a way that will
work for i64 nodes as well as i16/i32. This extends it to look for
mul(subvector_extract(ext(x), 0), subvector_extract(ext(y), 0)), generating a
subvector_extract(mull(x,y)) if it matches.

Differential Revision: https://reviews.llvm.org/D154063
2023-07-14 10:25:12 +01:00
Yeting Kuo
2ac99205ee [RISCV] Narrow types of index operand matched pattern (shl (zext), C).
(shl (zext to iXLenVec), C) is a possible pattern in auto-vectorized code for
indexed loads/stores. But extending to iXLen might be too aggressive, RVV
indexed load/store instructions zero extend their indexed operand to XLEN.
The patch tries to narrow the type of the zero extension. It's benefit to
decrease register pressure.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154687
2023-07-14 15:45:44 +08:00
XinWang10
2d6a5ab5eb [X86]Recommit D154193 - Remove TEST in AND32ri+TEST16rr in peephole-opt
Previously we remove a pattern like:
  %reg = and32ri %in_reg, 5
  ...                         // EFLAGS not changed.
  %src_reg = subreg_to_reg 0, %reg, %subreg.sub_index
  test64rr %src_reg, %src_reg, implicit-def $eflags
We can remove test64rr since it has same functionality as and subreg_to_reg avoid the opt in previous code, so we handle this case specially.
And this case is also can be opted for the same reason, like:
  %reg = and32ri %in_reg, 5
  ...                         // EFLAGS not changed.
  %src_reg = copy %reg.sub_16bit:gr32
  test16rr %src_reg, %src_reg, implicit-def $eflags
The COPY from gr32 to gr16 prevent the opt in previous code too, just handle it specially as what we did for test64rr.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D154193
2023-07-14 03:42:42 -04:00
pvanhout
e5296c52e5 [AMDGPU] Relax restrictions on unbreakable PHI users in BreakLargePHis
The previous heuristic rejected a PHI if one of its user was an unbreakable PHI, no matter what the other users were.

This worked well in most cases, but there's one case in rocRAND where
it doesn't work. In that case, a PHI node has 2 PHI users where one is
breakable but not the other. When that PHI node isn't broken performance falls by 35%.

Relaxing the restriction to "require that  half of the PHI node users are breakable" fixes the issue, and seems like a sensible change.

Solves SWDEV-409648, SWDEV-398393

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D155184
2023-07-14 09:02:51 +02:00
Weining Lu
ef33d6cbfc [XRay] Add initial support for loongarch64
Only support patching FunctionEntry/FunctionExit/FunctionTailExit for now.

Reviewed By: MaskRay, xen0n
Co-Authored-By: zhanglimin <zhanglimin@loongson.cn>

Differential Revision: https://reviews.llvm.org/D140727
2023-07-14 09:27:13 +08:00
Sean Fertile
5e28d30f1f [XCOFF][AIX] Peephole optimization for toc-data.
Followup to D101178 - peephole optimization that converts a
load address instruction and a consuming load/store into just the
load/store when its safe to do so.

eg: converts the 2 instruction code sequence
  la 4, i[TD](2)
  stw 3, 0(4)
to
  stw 3, i[TD](2)

Differential Revision: https://reviews.llvm.org/D101470
2023-07-13 20:40:09 -04:00
Jon Chesterfield
d3316bc111 [amdgpu] Delete elide-module-lds attribute
Requires D155190

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155238
2023-07-14 00:36:33 +01:00
Jon Chesterfield
74e928a081 [amdgpu][lds] Remove recalculation of LDS frame from backend
Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend.

Prior to this patch:
The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol
metadata so that the assembler can build lookup tables out of it. There is a fragile association between
kernel functions and named structs which is used to recompute the frame layout in the backend, with
fatal_errors catching inconsistencies in the second calculation.

After this patch:
The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same
absolute_symbol metadata that the assembler uses to place objects within that frame size.

Deleted the now dead allocation code from the backend. Left for a later cleanup:
- enabling lowering for anonymous functions
- removing the elide-module-lds attribute (test churn, it's not used by llc any more)
- adjusting the dynamic alignment check to not use symbol names

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155190
2023-07-13 23:54:38 +01:00
Craig Topper
0aecddcee9 [RISCV] Add Zce extension.
According to the spec, Zce is an alias for Zca, Zcb, Zcmp, and Zcmt.
If F is enabled on RV32 it also includes Zcf.

This patch adds the Zce and the implication rule which unfortunately
requires custom handling for adding Zcf.

I've also made all the Zc* extensions imply Zca.

I've also added an error for Zcf without RV32.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D153742
2023-07-13 12:22:06 -07:00
Nemanja Ivanovic
329b8cd3e3 [PowerPC] Improve code gen for vector add
Improve codegen for vectors modulo additions.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D154447
2023-07-13 15:21:49 -04:00
Jeffrey Byrnes
6b7805fcb1 [AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms
This adds the IGLP strategy for single-wave gemms. The SchedGroup pipeline is laid out in multiple phases, with each phase corresponding to a distinct pattern present in gemm kernels. The resilience of the optimization is dependent upon IR (as seen by pre-RA scheduling) continuing to have these patterns (as defined by instruction class and dependencies) in their current relative ordering.

The kernels of interest have these specific phases:
NT: 1, 2a, 2c
NN: 1, 2a, 2b
TT: 1, 2b, 2c
TN: 1, 2b

The general approach taken was to have a long SchedGroup pipeline. In this way the scheduler will have less capability of doing the wrong thing. In order to resolve the challenge of correctly fitting these long pipelines, we leverage the rules infrastructure to help the solver.

Differential Revision: https://reviews.llvm.org/D149773

Change-Id: I1a35962a95b4bdf740602b8f110d3297c6fb9d96
2023-07-13 12:03:04 -07:00
Ivan Kosarev
289ae6525d [AMDGPU][MC] Fix handling of A16 operands in intersect_ray instructions.
The patch adds the support for 'noa16' operands in non-A16 variants of
the instructions, fixes validation of A16 operands and eliminates the
custom conversion to MCInst.

Part of <https://github.com/llvm/llvm-project/issues/62629>.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155057
2023-07-13 19:46:03 +01:00
Luke Lau
55e2772e9f [RISCV] Add initial SDNode patterns for unary zvbb instructions
This patch adds pseudos and SDNode patterns for vbrev.v, vrev8.v, vclz.v,
vctz.v and vcpop.v.
I've only added them for integer element types so far since we're lacking tests
for floats.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155216
2023-07-13 19:39:04 +01:00
Amara Emerson
432338a673 Don't assert on a non-pointer value being used for a "p" inline asm constraint.
GCC and existing codebases allow the use of integral values to be used
with this constraint. A recent change D133914 in this area started causing asserts.
Removing the assert is enough as the rest of the code works fine.

rdar://109675485

Differential Revision: https://reviews.llvm.org/D155023
2023-07-13 10:45:56 -07:00
Simon Pilgrim
c660a2f0ab [X86] Fold ANDNP(X,NOT(Y)) -> NOT(OR(X,Y))
Removing the x86-specific node helps further folding and improves commutativity
2023-07-13 16:56:20 +01:00
Mateja Marjanovic
fa46feb314 [AMDGPU] Use V_FMA_MIX* more often
Combine mul (f32) + fptrunc (f32->f16) to "v_fma_mixlo_f16 mulSrc1, mulSrc2, 0".

Differential Revision: https://reviews.llvm.org/D153544
Reviewers: arsenm, foad
2023-07-13 16:56:16 +02:00
Mateja Marjanovic
d3140f9363 Precommit for more usage of V_FMA/MAD_MIX*
Make fdiv.f16.ll autogenerated.
2023-07-13 16:26:21 +02:00
pvanhout
07c5920487 Reland "[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64"
This time without the extra `->dump()`

A recent addition to the device libs, `__ockl_dm_trim`, caused a series of
failures at O0 due to a i64 ballot intrinsic being inlined into a wave32 function.

The quick fix for this is to support codegen for this rare case.
A proper long-term fix for this type of issue is still being discussed.

Fixes SWDEV-408929, SWDEV-408957, SWDEV-409885, SWDEV-410193

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D155050
2023-07-13 15:58:48 +02:00
pvanhout
aec971adec Revert "[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64"
This reverts commit cfa2d0a3aa0beb5422107dc9943cb0eae6d93896.
2023-07-13 15:52:27 +02:00
Mateja Marjanovic
701c4adcea Check for denormal flushing when selecting V_FMA/MAD_MIX* 2023-07-13 15:26:20 +02:00
Oliver Stannard
aea8db8eb9 Revert "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI."
This reverts commit 58d1eaa3b6ce4f7285c51f83faff7a3ac374c746.
2023-07-13 14:25:39 +01:00