55141 Commits

Author SHA1 Message Date
Craig Topper
644899addd [RISCV][GISel] Port portions of float-intrinsics.ll and double-intrinsics.ll. NFC
Remove the legalizer test for the same intrinsics as it is no longer
interesting with end to end tests.
2024-09-18 13:13:34 -07:00
Craig Topper
d5d1417659
[RISCV][GISel] Use libcalls for rint, nearbyint, trunc, round, and roundeven intrinsics. (#108779) 2024-09-18 12:07:44 -07:00
Craig Topper
8e4909aa19
[RISCV] Remove unnecessary vand.vi from vXi1 and nvXvi1 VECTOR_REVERSE codegen. (#109071)
Use a setne with 0 instead of a trunc. We know we zero extended the node
so we can get by with a non-zero check only. The truncate lowering
doesn't know that we zero extended so has to mask the lsb.

I don't think DAG combine sees the trunc before we lower it to RISCVISD
nodes so we don't get a chance to use computeKnownBits to remove the
AND.
2024-09-18 09:43:48 -07:00
Sarah Spall
67518a44fe
[HLSL] Implement elementwise popcount (#108121)
Add new elementwise popcount builtin to support HLSL function
'countbits'.
elementwise popcount only accepts integer types.
Add hlsl intrinsic 'countbits'
Closes #99094
2024-09-18 08:19:52 -07:00
Phoebe Wang
a10c9f994b
Revert "[X86][BF16] Add libcall for F80 -> BF16" (#109140)
Reverts llvm/llvm-project#109116
2024-09-18 21:35:38 +08:00
Phoebe Wang
76eda76f9f
[X86][BF16] Add libcall for F80 -> BF16 (#109116)
This fixes #108936, but the calling convention doesn't match with GCC. I
doubt we have such a lib function for now, so leave the calling
convention as is.
2024-09-18 21:23:10 +08:00
Mahesh-Attarde
311e4e3245
[X86][AVX10.2] Support AVX10.2 MOVZXC new Instructions. (#108537)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965

Chapter 14 INTEL® AVX10 ZERO-EXTENDING PARTIAL VECTOR COPY INSTRUCTIONS

---------

Co-authored-by: mattarde <mattarde@intel.com>
2024-09-18 21:01:51 +08:00
Simon Pilgrim
4b529f840c [X86] Fold extractsubvector(permv3(src0,mask,src1),c) -> extractsubvector(permv3(src0,widensubvector(extractsubvector(mask,c)),src1),0) iff c != 0
For cross-lane shuffles, extract the mask operand (uppper) subvector directly, and make use of the free implicit extraction of the lowest subvector of the result.
2024-09-18 12:06:08 +01:00
Piotr Sobczak
adf02ae41f
[AMDGPU] Simplify lowerBUILD_VECTOR (#109094)
Simplify `lowerBUILD_VECTOR` by commoning up the way the vectors
are split.
Also reorder the checks to avoid a long condition inside `if`.
2024-09-18 12:58:16 +02:00
Mahesh-Attarde
f5ad9e1ca5
[X86][AVX10.2] Support AVX10.2-COMEF new instructions. (#108063)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965

Chapter 8  AVX10 COMPARE SCALAR FP WITH ENHANCED EFLAGS INSTRUCTIONS

---------

Co-authored-by: mattarde <mattarde@intel.com>
2024-09-18 17:55:29 +08:00
Luke Lau
edac1b2d63
[RISCV] Promote bf16 ops to f32 with zvfbfmin (#108937)
For f16 with zvfhmin, we promote most ops and VP ops to f32. This does
the same for bf16 with zvfbfmin, so the two fp types should now be in
sync.

There are a few places in the custom lowering where we need to check for
a LMUL 8 f16/bf16 vector that can't be promoted and must be split, this
extracts that out into isPromotedOpNeedingSplit.

In a follow up NFC we can deduplicate the code that sets up the
promotions.
2024-09-18 17:39:40 +08:00
Luke Lau
8d7d4c25cb
[RISCV] Split fp rounding ops with zvfhmin nxv32f16 (#108765)
This adds zvfhmin test coverage for fceil, ffloor, fnearbyint, frint,
fround and froundeven and splits them at nxv32f16 to avoid crashing,
similarly to what we do for other nodes that we promote.

This also sets ftrunc to promote which was previously missing. We
already promote the VP version of it, vp_froundtozero.
Marking it as promoted affects some of the cost model tests since
they're no longer expanded.
2024-09-18 16:36:13 +08:00
Aditi Medhane
5a8d2dd1f9
[AMDGPU] Handle subregisters properly in generic operand legalizer (#108496)
Fix for the issue found during COPY introduction during legalization of
PHI operands for sgpr to vgpr copy when subreg is involved.
2024-09-18 13:14:49 +05:30
Ahmed S. Taei
22a2d74c0c
[NVPTX] Emit ld.v4.b16 for loading <4 x bfloat> (#109069)
This PR enables emitting a single load instruction for <4 x bfloat>,
otherwise, 2 ld.b32 loads are generated.
2024-09-17 21:06:46 -07:00
Heejin Ahn
08bba6503b
[WebAssembly] Support binary generation for new EH (#109027)
This adds support for binary generation for the new EH proposal.

So far the only case that we emitted variable immediate operands in
binary has been `br_table`'s destinations. (Other `variable_ops` uses in
TableGen files are register operands, such as the operands of `call`, so
they don't get emitted in binary as a part of the same instruction.)

With this PR, variable immediate operands can include `try_table`'s
operands:
- The number of of catch clauses
- catch clauses sub-opcodes
  - `catch`: 0x00
  - `catch_ref`: 0x01
  - `catch_all`: 0x02
  - `catch_all_ref`: 0x03
- catch clauses' destinations

With `try_table`, we now have variable expr operands for `try_table`'s
catch clauses' tags. We treat their fixups in the same way we do for
tags in other instructions such as in `throw`.

Diff without whitespace will be easier to view.
2024-09-17 14:58:19 -07:00
Michael Maitland
1ebe16bf43 [RISCV] Add VL optimization related tests
These tests are good candidate for VL optimization. This is a pre-commit for
PR #108640, but can could probably also be improved by the peephole VL
optimizations.
2024-09-17 11:14:20 -07:00
Simon Pilgrim
6153582c9c [X86] combineX86ShuffleChainWithExtract - peek through insert_subvector(undef,vec,0) widening patterns when tracking subvector sources
Helps replace a number of X86ISD::VPERMV3 nodes that are shuffling subvectors from the same source with X86ISD::VPERMV equivalents.
2024-09-17 18:43:32 +01:00
Stephen Tozer
51a29b5f16 Revert2 "[DebugInfo][DWARF] Set is_stmt on first non-line-0 instruction in BB (#105524)"
Reverted due to large .debug_line size regressions for some
configurations; work currently in place to improve the output of this
behaviour in PR #108251.

This patch also modifies two tests that were created or modified after
the original commit landed and are affected by the revert:

  llvm/test/CodeGen/X86/pseudo_cmov_lower2.ll
  llvm/test/DebugInfo/X86/empty-line-info.ll

This reverts commit 5fef40c2c477e92187bd4e5c18091eca6b8465cc.
2024-09-17 18:29:20 +01:00
Michael Maitland
64972834c1
[RISCV][GISEL] Introduce the RISCVPostLegalizerLowering pass (#108991)
This is mostly a copy of the AArch64PostLegalizerLoweringPass, except it
removes all of the AArch64 combines.

This pass allows us to lower instructions after the generic
post-legalization combiner has had a chance to run.

We will be adding combines to this pass in future patches.
2024-09-17 13:18:35 -04:00
Craig Topper
da46244e49 Revert "[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific."
This reverts commit 884ff9e3f9741ac282b6cf8087b8d3f62b8e138a.

Regression was reported in Halide for arm32.
2024-09-17 09:04:43 -07:00
Philip Reames
594579b7af [RISCV] Autogenerate compress-opt-select.ll
I realized after spending way too much time looking at this, that
we can avoid objdump entirely here by having the assembly simply
not print the aliases.  Once we do that, we can simply autogen
this test, and updates become trivial and understandable.
2024-09-17 08:47:20 -07:00
Farzon Lotfi
0f97b4824a
[Scalarizer][DirectX] Add support for scalarization of Target intrinsics (#108776)
Since we are using the Scalarizer pass in the backend we needed a way to
allow this pass to operate on Target intrinsics.
We achieved this by adding `TargetTransformInfo ` to the Scalarizer
pass. This allowed us to call a function available to the DirectX
backend to know if an intrinsic is a target intrinsic that should be
scalarized.
2024-09-17 11:35:42 -04:00
Philip Reames
c532e6db27 [RISCV] Restructure compress-opt-select.ll
Two major changes:
- Remove use of sed preprocessing - this was being used to create two
  versions of each test, and the result is much more readable if we
  just duplicate the tests.
- Use a regex for matching the condition.  An upcoming change causes
  us to reverse the branch direction (which doesn't matter to the
  purpose of these tests at all), so using the regex makes the test
  more stable.
2024-09-17 08:29:03 -07:00
Philip Reames
b153cc5c2b [RISCV] Fix boundary error in compress-opt-select.ll
Per the comment, this test is intending to test the first constant which
can't be encoded via a c.addi.  However, -32 *can* be encoded as in a
c.addi, and all that's preventing it from doing so is the register
allocators choice to use a difference destination register on the
add than it's source.  (Which compressed doesn't support.)

The current LLC codegen for this test looks like:

	addi	a1, a0, -32
	li	a0, -99
	bnez	a1, .LBB0_2
	li	a0, 42
.LBB0_2:
	ret

After https://github.com/llvm/llvm-project/pull/108889, we sink the LI, and
the register allocator picks the same source and dest register for the addi
resulting in the c.addi form being emitted.  So, to avoid a confusing diff
let's fix the test to check what was originally intended.
2024-09-17 07:43:28 -07:00
David Green
2242cd2b6a
[DAG] Fold vecreduce.or(sext(x)) to sext(vecreduce.or(x)) (#108959)
The same is true for and / xor reductions, where the sext / zext can be
sank down through the bitwise operation.
https://alive2.llvm.org/ce/z/TvzCd5
2024-09-17 15:24:00 +01:00
Mikhail R. Gadelha
d2125e1db6
[RISCV] Support STRICT_UINT_TO_FP and STRICT_SINT_TO_FP (#102503)
This patch adds support for the missing STRICT_UINT_TO_FP and
STRICT_SINT_TO_FP for riscv and adds a test case for rv32 which was
previously crashing.

The code is in line with how other strict_* nodes are handled
(e.g., getting op(1) instead of op(0) when it's a strict node, as op(0)
in a strict node is the entry token).
2024-09-17 11:21:52 -03:00
Michael Maitland
ee2add0683
[GISEL] Fix bugs and clarify spec of G_EXTRACT_SUBVECTOR (#108848)
The implementation was missing the fact that `G_EXTRACT_SUBVECTOR`
destination and source vector can be different types.

Also fix a bug in the MIR builder for `G_EXTRACT_SUBVECTOR` to generate
the correct opcode.

Clarify the G_EXTRACT_SUBVECTOR specification.
2024-09-17 10:08:39 -04:00
Simon Pilgrim
b222ec1865 [X86] vector-reduce-add-mask.ll - regenerate vpmulhuw asm comments. NFC 2024-09-17 11:40:38 +01:00
Simon Pilgrim
742e04de96 [X86] combineConcatVectorOps - handle *_EXTEND nodes 2024-09-17 11:40:37 +01:00
David Green
8411214c56 [AArch64] Tests for vecreduce.or(sext(x)), with or/and/xor and sext/zext. NFC 2024-09-17 11:27:05 +01:00
Csanád Hajdú
72901fe19e
[AArch64] Fold UBFMXri to UBFMWri when it's an LSR or LSL alias (#106968)
Using the LSR or LSL aliases of UBFM can be faster on some CPUs, so it
is worth changing 64 bit UBFM instructions, that are equivalent to 32
bit LSR/LSL operations, to 32 bit variants.

This change folds the following patterns:
* If `Imms == 31` and `Immr <= Imms`:
   `UBFMXri %0, Immr, Imms`  ->  `UBFMWri %0.sub_32, Immr, Imms`
* If `Immr == Imms + 33`:
   `UBFMXri %0, Immr, Imms`  ->  `UBFMWri %0.sub_32, Immr - 32, Imms`
2024-09-17 11:21:23 +01:00
Benjamin Kramer
3e32e45591 [NVPTX] Verify ptx in the right version 2024-09-17 10:40:21 +02:00
SpencerAbson
79d380f2ca
[AArch64][SVE2] Add codegen patterns for SVE2 FAMINMAX (#107284)
Tablegen patterns were previously added to lower the following sequences
from generic IR to NEON FAMIN/FAMAX instructions

- `fminimum((abs(a), abs(b)) -> famin(a, b)`
- `fmaximum((abs(a)), abs(b)) -> famax(a, b)`
- https://github.com/llvm/llvm-project/pull/103027
- `fminnum[nnan](abs(a), abs(b)) -> famin(a, b)`
- `fmaxnum[nnan](abs(a), abs(b)) -> famax(a, b)`
- https://github.com/llvm/llvm-project/pull/104766 

The same idea has been applied for the scalable vector variants of
[FAMIN](https://developer.arm.com/documentation/ddi0602/2024-06/SVE-Instructions/FAMIN--Floating-point-absolute-minimum--predicated--)/[FAMAX](https://developer.arm.com/documentation/ddi0602/2024-06/SVE-Instructions/FAMAX--Floating-point-absolute-maximum--predicated--).
('nnan' documenatation:
https://llvm.org/docs/LangRef.html#fast-math-flags).

- Changes to LLVM
	- lib/target/AArch64/AArch64SVEInstrInfo.td
- Add 'AArch64fminnm_p_nnan' and 'AArch64fmaxnm_p_nnan' patfrags
(patterns predicated on the 'nnan' flag).
		- Add 'AArch64famax_p' and 'AArch64famin_p'
	- test/CodeGen/AArch64/aarch64-sve2-faminmax.ll
- Add tests to verify the new patterns, including both positive and
negative tests for 'nnan' predicated behavior.
2024-09-17 09:12:27 +01:00
Simon Pilgrim
c91f2a259f
[X86] Consistently use 'k' for predicate mask registers in instruction names (#108780)
We use 'k' for move instructions and to indicate masked variants of evex instructions, but otherwise we're very inconsistent when we use 'k' vs 'r'.
2024-09-17 08:57:57 +01:00
Thorsten Schütt
acfa294b5e
[GlobalIsel] Canonicalize G_FCMP (#108891)
As a side-effect, we start constant folding fcmps.
2024-09-17 09:42:04 +02:00
Luke Lau
6af2f225a0
[RISCV] Restrict combineOp_VLToVWOp_VL w/ bf16 to vfwmadd_vl with zvfbfwma (#108798)
We currently make sure to check that if folding an op to an f16 widening
op that we have zvfh. We need to do the same for bf16 vectors, but with
the further restriction that we can only combine vfmadd_vl to vfwmadd_vl
(to get vfwmaccbf16.v{v,f}).

The added test case currently crashes because we try to fold an add to a
bf16 widening add, which doesn't exist in zvfbfmin or zvfbfwma

This moves the checks into the extension support checks to keep it one
place.
2024-09-17 13:35:25 +08:00
Craig Topper
884ff9e3f9 [LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific.
Only scalarize single element vectors when vector FSUB is not
supported and scalar FNEG is supported.
2024-09-16 21:48:42 -07:00
bwlodarcz
f99bb02d7d
[SPIR-V] Emit DebugTypeBasic for NonSemantic DI (#106980)
The commit introduces support for fundamental DI instruction. Metadata
handlers required for this instruction is stored inside debug records
(https://llvm.org/docs/SourceLevelDebugging.html) parts of the module
which rises the necessity of it's traversal.
2024-09-16 18:26:22 -07:00
Craig Topper
f022111e64 [RISCV][GISel] Restore s32 support on RV64 for DIV, and REM.
This reverts commit 2599d695128381e6932b43f0e95649c533308d6d.

I was was plannig to remove s32 as a legal type on RV64, but
I'm rethinking that.
2024-09-16 16:29:20 -07:00
Craig Topper
a366323c8d [RISCV][GISel] Restore s32 support for G_ABS on RV64.
This reverts commit 5e6a1987a5d4574d3c3811f878ddbbbf7c35fa01.

I was was plannig to remove s32 as a legal type on RV64, but
I'm rethinking that.
2024-09-16 16:29:20 -07:00
Philip Reames
7e56a09278 [RISCV] Add a testcase for an unprofitable machine-sink issue
This corresponds to an upcoming change which will fully explain
why this is a machine-sink issue.
2024-09-16 14:17:42 -07:00
David Majnemer
49c5cebb29 [X86] Improve support for vXi8 arithmetic shifts, logical left shifts
Use SWAR techniques for arithmetic shifts: we use the same technique as
logical right shift but with an additional step of sign extending the
result.

Also, use the logical shift left technique even on AVX512 as vpmovzxbw
and vpmovwb are actually quite expensive.
2024-09-16 20:33:12 +00:00
Kevin McAfee
62cdc2a347
[NVPTX] Convert calls to indirect when call signature mismatches function signature (#107644)
When there is a function signature mismatch between a call
instruction and the callee, lower the call to an indirect call. The
current behavior is to produce direct calls that may or may not be
valid PTX. Consider the following example with mismatching return
types:

```
%struct.1 = type <{i64}>
%struct.2 = type <{i64}>
declare %struct.1 @callee()
...
%call1 = call %struct.2 @callee()
%call2 = call i64 @callee()
```

The return type of `callee` in PTX is `.b8 _[8]`. The return type of
`%call1` will be the same and so the PTX has no problems. The return
type of `%call2` will be `.b64`, so the types will not match and PTX
will be unacceptable to ptxas. This despite all the types having the
same size. The same is true for mismatching parameter types.

If we instead convert these calls to indirect calls, we will generate
functional PTX when the types have the same size. If they do not have
the same size then the PTX will likely be incorrect, though this will not
necessarily be caught by ptxas. Also, even if the sizes are the same, if
the types differ then it is technically undefined behavior. This change
allows for more flexibility in the bitcode that can be lowered to
functioning PTX, at the cost of sometimes producing PTX that is less
clearly wrong than it would have been previously (i.e. incorrect indirect
calls are not as obviously wrong as incorrect direct calls). We consider
it okay to generate PTX with undefined behavior as the behavior of
calls with mismatching types is not explicitly defined.
2024-09-16 13:08:18 -07:00
Craig Topper
4eb9780261 [RISCV] Fix IR for store_large_offset_no_opt_i16 in make-compressible-zbc.mir. NFC
The IR used loads instead of stores.
2024-09-16 12:51:07 -07:00
Philip Reames
8f023ec81d [RISCV] Add coverage for select C, C1, C2 where (C1-C2)*[0,1] is cheap 2024-09-16 12:46:43 -07:00
Farzon Lotfi
8ee685e601
[NFC][DirectX] fix intrinsics that need IntrNoMem and test typo (#108852)
In the process of adding scalarization support for DirectX target
intrinsics I found that intrinsics that weren't marked with `IntrNoMem`
did not get removed by
`RecursivelyDeleteTriviallyDeadInstructionsPermissive`. So this change
is to make it more clear that our intrinsics don't have side effects.

I only added `IntrNoMem` to the intrinics in `IntrinsicsDirectX.td` I
was involved with. There a potentially a few other cases that might
warrant this attribute, but will need input on the others.
2024-09-16 14:19:29 -04:00
David Green
960c975acd
[AArch64] Expand scmp/ucmp vector operations with sub (#108830)
Unlike scalar, where AArch64 prefers expanding scmp/ucmp with select,
under Neon we can use the arithmetic expansion to generate fewer
instructions. Notably it also prevents the scalarization of vselect
during vector-legalization.
2024-09-16 18:44:52 +01:00
Thorsten Schütt
5c348f692a
[GlobalIsel] Canonicalize G_ICMP (#108755)
As a side-effect, we start constant folding icmps.

Split out from https://github.com/llvm/llvm-project/pull/105991.
2024-09-16 19:25:34 +02:00
David Green
823eab2bd5 [AArch64] Add a selection for vector scmp/ucmp tests. NFC 2024-09-16 14:25:15 +01:00
David Green
feac761f37
[GlobalISel][AArch64] Add G_FPTOSI_SAT/G_FPTOUI_SAT (#96297)
This is an implementation of the saturating fp to int conversions for
GlobalISel. On AArch64 the converstion instrctions work this way,
producing saturating results. LegalizerHelper::lowerFPTOINT_SAT is
ported from SDAG.

AArch64 has a lot of existing tests for fptosi_sat, covering a wide
range of types. I have tried to make most of them work all at once, but
a few fall back due to other missing features such as f128 handling for
min/max.
2024-09-16 10:33:59 +01:00