52796 Commits

Author SHA1 Message Date
Yeting Kuo
ab89cfd0f1
[RISCV] Use vwsll.vi/vx + vwaddu.wv to lower vector.interleave when Zvbb enabled. (#67521)
The replacement could avoid an assignment to GPR when the type is vector
of i8/i16 and vwmaccu.wv which may have higher cost than vwsll.vi/vx.
2023-09-28 09:10:03 +08:00
Luke Lau
bd675f5899
[RISCV] Reduce LMUL when index is known when lowering insert_vector_elt (#66087)
Continuing on from #65997, if the index of insert_vector_elt is a
constant then we can work out what the minimum number of registers will
be needed for the slideup and choose a smaller type to operate on.

This reduces the LMUL for not just the slideup but also for the scalar
insert.
2023-09-27 20:26:36 +01:00
Craig Topper
8e6db7e2b2
[RISCV] Fix a crash from trying to truncate an FP type in lowerBuildV… (#67488)
…ectorOfConstants.

ComputeNumSignBits can return an answer for FP constants based on
bitcasting them to int.

Check for an integer type so we don't create an illegal truncate.

We could support this case with bitcasts, but I leave that to a separate
patch.
2023-09-27 12:21:10 -07:00
Luke Lau
5ffbdd9ed5 [RISCV] Handle .vx pseudos in hasAllNBitUsers (#67419)
Vector pseudos with scalar operands only use the lower SEW bits (or less in the
case of shifts and clips). This patch accounts for this in hasAllNBitUsers for
both SDNodes in RISCVISelDAGToDAG. We also need to handle this in
RISCVOptWInstrs otherwise we introduce slliw instructions that are less
compressible than their original slli counterpart.

This is a reland of aff6ffc8760b99cc3d66dd6e251a4f90040c0ab9 with the
refactoring omitted.
2023-09-27 19:53:50 +01:00
Philip Reames
487dd5f1e3 Revert "[RISCV] Handle .vx pseudos in hasAllNBitUsers (#67419)"
This reverts commit aff6ffc8760b99cc3d66dd6e251a4f90040c0ab9.  Version landed differs from version reviewed in (stylistic) manner worthy of separate review.
2023-09-27 11:24:49 -07:00
Fangrui Song
e705b37a77 [CodeLayout] Add unittest for cache-directed sort
The function reordering algorithm added by https://reviews.llvm.org/D152834 and
used by BOLT (https://reviews.llvm.org/D153039) is untested.

Add some tests at the appropriate layer.

Depends on D159526

Differential Revision: https://reviews.llvm.org/D159527
2023-09-27 10:52:12 -07:00
Luke Lau
aff6ffc876
[RISCV] Handle .vx pseudos in hasAllNBitUsers (#67419)
Vector pseudos with scalar operands only use the lower SEW bits (or less
in the
case of shifts and clips). This patch accounts for this in
hasAllNBitUsers for
both SDNodes in RISCVISelDAGToDAG. We also need to handle this in
RISCVOptWInstrs otherwise we introduce slliw instructions that are less
compressible than their original slli counterpart.
2023-09-27 18:12:29 +01:00
Nick Desaulniers
97187e1278
[AArch64] update "rm" inline asm test (#67472)
Because `x0` is not listed in the clobber list, regalloc could (one day
when #20571 is fixed) allocate `$0` to `x0`:

  ldr x0, x0

This will produce an error when validating the instruction. The intent
of this test FWICT is to check that the parameter in w0 is stored to a
stack slot using w0, since this target triple is the exotic arm64_32
(ILP32). Update the test to simply use "m" constraint. The clobber list
is underconstrained otherwise.
2023-09-27 08:30:36 -07:00
Momchil Velikov
eff4ef25b3 Revert "[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)"
This reverts commit ace20e24287bf531bb1185e213642c3b49eb293c.

This might be causing a buildbot failure at
https://green.lab.llvm.org/green/job/clang-stage1-RA/35786/
2023-09-27 14:24:59 +01:00
Nikita Popov
47b7f33b13
[IR] Allow llvm.ptrmask of vectors (#67434)
llvm.ptrmask is currently limited to pointers only, and does not accept
vectors of pointers. This is an unnecessary limitation, especially as
the underlying instructions (getelementptr etc) do support vectors of
pointers.

We should relax this sooner rather than later, to avoid introducing code
that assumes non-vectors (#67166).
2023-09-27 15:01:43 +02:00
Simon Pilgrim
57b0194b69 [X86] IsNOT - fold PCMPGT(C, X) -> PCMPGT(X,C-1)
To invert the result, we can profitably commute a PCMPGT node if the LHS was a constant (C > min_signed_value): https://alive2.llvm.org/ce/z/LxcPqm

Allows the constant to fold, and helps reduce register pressure

Fixes #67347
2023-09-27 12:33:55 +01:00
Ivan Kosarev
be8b559956 [AMDGPU] Test codegen'ing True16 additions.
The GlobalISel part is to be addressed later.

Differential Revision: https://reviews.llvm.org/D156106
2023-09-27 11:10:48 +01:00
Ivan Kosarev
3ff7d51eb8 [AMDGPU][True16] Pre-commit addition tests.
Differential Revision: https://reviews.llvm.org/D156529
2023-09-27 10:27:33 +01:00
Momchil Velikov
ace20e2428
[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432) 2023-09-27 10:05:32 +01:00
Sam McCall
0afbcb20fd Revert "[NVPTX] Add support for maxclusterrank in launch_bounds (#66496)"
This reverts commit dfab31b41b4988b6dc8129840eba68f0c36c0f13.

SemaDeclAttr.cpp cannot depend on Basic's private headers
(lib/Basic/Targets/NVPTX.h)
2023-09-27 10:59:04 +02:00
Jianjian GUAN
5278cc364b [RISCV] Support select/merge like ops for fp16 vectors when only have Zvfhmin
This patch supports VP_MERGE, VP_SELECT, SELECT, SELECT_CC for fp16 vectors when only have Zvfhmin.

Reviewed By: michaelmaitland

Differential Revision: https://reviews.llvm.org/D159053
2023-09-27 14:53:14 +08:00
Jakub Chlanda
dfab31b41b
[NVPTX] Add support for maxclusterrank in launch_bounds (#66496)
Since SM_90 CUDA supports specifying additional argument to the
launch_bounds attribute: maxBlocksPerCluster, to express the maximum
number of CTAs that can be part of the cluster. See:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-dimension-directives-maxclusterrank
and

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#launch-bounds
for details.
2023-09-27 08:51:26 +02:00
Jim Lin
d6b1b998a0
[RISCV] Support fmaximum/fminimum for fp16 vector when only Zvfhmin enabled (#67393)
This patch promotes fmaximum/fminimum for fp16 vector to float
operation.
2023-09-27 13:02:35 +08:00
Jianjian Guan
435da4ef55
[RISCV] Promote SETCC and VP_SETCC of f16 vectors when only have zvfhmin (#66866)
This patch implements the promotion of fp16 vectors SETCC and VP_SETCC
when we only have zvfhmin but no zvfh.
2023-09-27 11:00:19 +08:00
Nick Desaulniers
35a364fa5c
[TargetLowering] fix index OOB (#67494)
I accidentally introduced this in

commit 330fa7d2a4e0 ("[TargetLowering] Deduplicate choosing InlineAsm
constraint between ISels (#67057)")

Fix forward.
2023-09-26 15:50:26 -07:00
Nick Desaulniers
2e2c61ebd7
[x86] precommit test conversion via update_llc_test_checks.py (#67463)
I'm looking to update this test; pre-processing it with
update_llc_test_checks.py makes it clearer what I'm changing in #20571.
2023-09-26 15:22:11 -07:00
Craig Topper
b26157edf0 [RISCV] Correct Zhinx load/store patterns to use AddrRegImm. 2023-09-26 11:54:59 -07:00
Craig Topper
9eebfa80f5 [RISCV] Autogenerate tests to add missing CHECK lines. NFC 2023-09-26 11:35:53 -07:00
Douglas Yung
6716d3dd77 Move test split-deadloop.mir that was added in e3d714f to AArch64 directory instead of ARM. 2023-09-26 09:51:47 -07:00
Jay Foad
e3d714f2cc [AMDGPU] Add gfx1150 test coverage in trans-forwarding-hazards.mir
This demonstrates that gfx1150 does not have FeatureVALUTransUseHazard.
2023-09-26 17:24:43 +01:00
David Green
b10721e941 [AArch64] A few extra rshrn intrinsic tests. NFC 2023-09-26 17:13:27 +01:00
Zhaoxuan Jiang
baf3903218
[AArch64] Bail out of HomogeneousPrologEpilog for functions with swif… (#67417)
…tasync argument

swiftasync introduces a number of frame adjustments which is
incompatible with current implementation of HomogeneousPrologEpilog
pass.
2023-09-26 08:42:01 -07:00
weiguozhi
31f81e96a4
[RA] Don't split a register generated from another split (#67351)
Split a register generated from another split usually doesn't bring us
too much benefit. It may also cause dead loop as pr67188 shows if the
heuristic cost always satisfy the split condition. So prevent such
splitting.

It fixed pr67188.
2023-09-26 08:38:18 -07:00
Philip Reames
e39add89cd
[RISCV] Transform build_vector((binop X_i, C_i)..) to binop (build_vector, build_vector) (#67358)
If we have a build_vector of identical binops, we'd prefer to have a
single vector binop in most cases. We do need to make sure that the two
build_vectors aren't more difficult to materialize than the original
build_vector. To start with, let's restrict ourselves to the case where
one build_vector is a fully constant vector.

Note that we don't need to worry about speculation safety here. We are
not speculating any of the lanes, and thus none of the typical - e.g.
div-by-zero - concerns apply.

I'll highlight that the constant build_vector heuristic is just one we
could chose here. We just need some way to be reasonable sure the cost
of the two build_vectors isn't going to completely outweigh the savings
from the binop formation. I'm open to alternate heuristics here - both
more restrictive and more permissive.

As noted in comments, we can extend this in a number of ways. I decided
to start small as a) that helps keep things understandable in review and
b) it covers my actual motivating case.
2023-09-26 07:53:35 -07:00
David Green
03647e2e4b [AArch64] Handle scalable vectors in combineFMulOrFDivWithIntPow2.
The transform will still not trigger as takeInexpensiveLog2 will bail out for
any scalable vector, but this guards against a scalable typesize error.
2023-09-26 15:34:34 +01:00
Nathan Gauër
c01b5bbba3
[SPIRV] Add OpAccessChain instruction support (#66253)
This commit adds 2 new instructions in the selector:
 - OpAccessChain
 - OpInBoundsAccessChain.

The choice between the two relies on the `inbounds` marker.

Those instruction are not used for OpenCL, to maintain the same
behavior as previously. They are only added when building for logical
SPIR-V, as it doesn't support the pointer equivalent.

Because logical SPIR-V doesn't support pointer cast either, the
assign_ptr_type intrinsic need to be generated so OpAccessChain gets
lowered with the correct pointer type, instead of i8*.

Fixes #66107

---------

Signed-off-by: Nathan Gauër <brioche@google.com>
2023-09-26 16:33:17 +02:00
Ivan Kosarev
64482d5766
[AMDGPU] Fix passing CodeGen/AMDGPU/frem.ll on gfx1150. (#67425)
We would currently crash on it trying to use t16 instructions instead of
fake16 ones.
2023-09-26 15:13:23 +01:00
Ivan Kosarev
287f6cdd17 [AMDGPU] Remove the support for non-True16 copies between different register sizes.
Differential Revision: https://reviews.llvm.org/D156985
2023-09-26 14:46:34 +01:00
Jingu Kang
ff68e43c81 [MachineLICM] Handle Subloops
It is a re-commit from reverted commit 3454cf67bd0a650097dc6ca99874a34e1d59b500.

Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outermost loop's blocks once.

Differential Revision: https://reviews.llvm.org/D154205
2023-09-26 14:25:11 +01:00
Momchil Velikov
fe763d8ad4
[AArch64] Limit immediate offsets when folding instructions into addressing modes (#67345)
Don't increase/decrease immediate offsets in folded instructions beyond
the limits of `LDP`.
2023-09-26 14:21:32 +01:00
Muhammad Omair Javaid
431969ede1 Revert "[SimplifyCFG] Transform for redirecting phis between unmergeable BB and SuccBB (#67275)"
This reverts commit fc86d031fec5e47c6811efd3a871742ad244afdd.

This change breaks LLVM buildbot clang-aarch64-sve-vls-2stage
https://lab.llvm.org/buildbot/#/builders/176/builds/5474
I am going to revert this patch as the bot has been failing for more than a day without a fix.
2023-09-26 15:47:16 +05:00
esmeyi
d7195c57d8 Reland https://reviews.llvm.org/D159073.
The patch failed in test-suite due to a liveness error after rebasing on https://reviews.llvm.org/D133103, and now it's fixed.

```
[PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel.

Summary: rldicl/rldicr can be eliminated if it's used to clear thehigh-order or low-order n bits and all bits cleared will be ANDed with 0 byandi/andis. Or they can be folded to `andi 0` if all bits to AND are alreadyzero in the input.

Reviewed By: qiucf, shchenz

Differential Revision: https://reviews.llvm.org/D159073
```
2023-09-26 06:24:47 -04:00
David Green
cab01a8b49 [AArch64] Additional testing for i128 and non-temporal loads/stores undef BE. NFC 2023-09-26 11:01:48 +01:00
Jay Foad
d85d143ad9
[AMDGPU] New image intrinsic optimizer pass (#67151)
Implement a new pass to combine multiple image_load_2dmsaa and
2darraymsaa intrinsic calls into a single image_msaa_load if:

- they refer to the same vaddr except for sample_id,
- they use a constant sample_id and they fall into the same group,
- they have the same dmask and the number of instructions and the
  number of vaddr/vdata dword transfers is reduced by the combine

This should be valid on all GFX11 but a hardware bug renders it
unworkable on GFX11.0.* so it is only enabled for GFX11.5.

Based on a patch by Rodrigo Dominguez!
2023-09-26 09:33:49 +01:00
Kai Luo
5fabc8ba22 [PowerPC] Add test to show wrong target flags printed at MO_TLSGDM_FLAG operand. NFC. 2023-09-26 05:13:26 +00:00
Wang Pengcheng
08165c444e
[RISCV] Add searchable table for tune information (#66193)
There are many information that can be used for tuning, like
alignments, cache line size, etc. But we can't make all of them
`SubtargetFeature` because some of them are not with enumerable
value, for example, `PrefetchDistance` used by `LoopDataPrefetch`.

In this patch, a searchable table `RISCVTuneInfoTable` is added,
in which each entry contains the CPU name and all tune information
defined in `RISCVTuneInfo`. Each field of `RISCVTuneInfo` should
have a default value and processor definitions can override the
default value via `let` statements.

We don't need to define a `RISCVTuneInfo` for each processor and
it will use the default value (which is for `generic`) if no
`RISCVTuneInfo` defined.

For processors in the same series, a subclass can inherit from
`RISCVTuneInfo` and override the fields. And we can also override
the fields in processor definitions if there are some differences
in the same processor series.

When initilizing `RISCVSubtarget`, we will use `TuneCPU` as the
key to serach the tune info table. So, the behavior here is if
we don't specify the tune CPU, we will use specified `CPU`, which
is expected I think. 

This patch almost undoes 61ab106, in which I added tune features
of preferred function/loop alignments. More tune information can
be added in the future.
2023-09-26 12:26:35 +08:00
WANG Rui
6417ce4336 [LoongArch] Improve codegen for i8/i16 'atomicrmw xchg a, {0,-1}'
Similar to D156801 for RISCV.

Link: https://github.com/rust-lang/rust/pull/114034
Link: https://github.com/llvm/llvm-project/issues/64090

Reviewed By: SixWeining, xen0n

Differential Revision: https://reviews.llvm.org/D159252
2023-09-26 11:46:07 +08:00
WANG Rui
555e2397aa [LoongArch] Add test cases for atomicrmw xchg {0,-1} {i8,i16}
Add test cases for atomicrmw xchg {0,-1} {i8,i16}.

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D159251
2023-09-26 11:46:06 +08:00
esmeyi
77147a95b8 Revert "[PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel."
This reverts commit 2de74e1bd4d540063d7495fa6254781abd41e179.

A test-suite failure occurs due to this commit, will fix soon.
2023-09-25 23:31:34 -04:00
esmeyi
2de74e1bd4 [PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel.
Summary: rldicl/rldicr can be eliminated if it's used to clear the high-order or low-order n bits and all bits cleared will be ANDed with 0 by andi/andis. Or they can be folded to `andi 0` if all bits to AND are already zero in the input.

Reviewed By: qiucf, shchenz

Differential Revision: https://reviews.llvm.org/D159073
2023-09-25 23:11:34 -04:00
Jim Lin
5e1f5f4720
[RISCV] Fix the float value to test constantpool lowering under differe… (#67297)
After https://reviews.llvm.org/D142953, the float value 1.0 can be
optimized as lui+fmv.w.x. But this test aims to test the constantpool
lowering under different code model. Fix the float value to cannot be
optimized to lui+fmv.w.x .
2023-09-26 09:06:49 +08:00
Min-Yih Hsu
de17384c05
[RISCV][GISel] Add RegBank selection for G_SMULH (#67381)
Along with its missing tests in instruction selection and legalizer.
2023-09-25 17:59:23 -07:00
Min-Yih Hsu
0d7c340c2c
[RISCV][GISel] Add instruction selection for G_SEXT, G_ZEXT, and G_SEXT_INREG (#67359)
G_SEXT and G_ZEXT are supported via patterns imported from SDISel;
G_SEXT_INREG is selected using hand-written code as there is no
(functional) rule at this moment to import G_SEXT_INREG from
ISD::SEXT_INREG.

Credit helps from @topperc on G_SEXT and G_ZEXT.
2023-09-25 15:08:21 -07:00
Artem Belevich
671e2ba45b
[NVPTX] Improve lowering of v2i16 logical ops. (#67365)
Bitwise logical ops can always be done as b32, regardless of
availability of other v2i16 ops, that would need a new GPU.

Includes the missing lowering for 2-argument register operation variants
and additional tests for `and`.
2023-09-25 14:29:48 -07:00
Craig Topper
62f5636838 [RISCV] Don't set KILL flag on X0 in RISCVInstrInfo::movImm.
Extracted from #67159.
2023-09-25 13:40:08 -07:00