52796 Commits

Author SHA1 Message Date
Gleb Popov
0356d0cfdc
Print more descriptive error message when trying to link a global with appending linkage (#69613)
This is a proper fix for https://github.com/llvm/llvm-project/issues/40308
2024-04-03 12:26:12 +01:00
Chen Zheng
29c7d1a60c [PPC] [NFC] add testcase for more store forwarding 2024-04-03 04:46:29 -04:00
David Green
6288f36c16
[AArch64][GlobalISel] Basic add_sat and sub_sat vector handling. (#80650)
This tries to fill in the basic vector handling for sadd_sat/uadd_sat
and ssub_sat/usub_sat. It just handles the basics, marking legal types
and clamping illegally sized vectors to legal ones.
2024-04-03 08:44:51 +01:00
Ryotaro KASUGA
ea4a11926b
Reapply "[CodeGen] Fix register pressure computation in MachinePipeli… (#87312)
…ner (#87030)"

Fix broken test.

This reverts commit b8ead2198f27924f91b90b6c104c1234ccc8972e.
2024-04-03 09:28:09 +09:00
Craig Topper
a9af66a90e [RISCV] Lower (vector_interleave X, undef) to (vzext_vl X). (#87283)
If the odd vector is undef or poison, the widening add and multiply trick
doesn't work unless we freeze the odd vector.

Unfortunately, freezing doesn't work when the operand is provably
undef/poison. MIR doesn't have a representation for freeze so it
just becomes a COPY from IMPLICIT_DEF which freely propagates undef
to each operand independently.

To work around this, check for undef explicitly and lower to a VZEXT_VL
of the even vector. This produces better code than we'd get from a
freeze anyway.

I've left a FIXME for adding a freeze. I'll do that as a separate patch
as it affects other tests and doesn't help with the new test.
2024-04-02 11:58:41 -07:00
Craig Topper
8c1dc5dd58 [RISCV] Add test for miscompile of vector.interleave when odd vector is literal poison.
The interleave lowering relies on a math trick that requires passing
the odd vector to two math instructions. In order to be correct
these instructions must see the same value.

If the odd vector is provably poison or undef, SelectionDAG will
create a vwadd and vwmaccu where the operand is a copy from IMPLICIT_DEF.
Later this will become just the undef flag on the operand. This
gives the register allocator freedom to pick a different register
for each instruction.
2024-04-02 11:49:08 -07:00
Simon Pilgrim
8bc2d19c13 [X86] canonicalizeShuffleWithOp - don't fold VPERMI(BINOP(X,Y)) -> BINOP(VPERMI(X),VPERMI(Y))
VPERMI (VPERMQ/PD) is nearly always lane-crossing and poorly merges with target shuffles (other than itself).

For now, I've restricted VPERMI to only merge with itself, constants, loads and splats.

We might be able to merge with a few other special cases (AND/ANDNP with constant?), which could help the shuffle-vs-trunc-256.ll AVX512VL regression, but since that now gives similar codegen to the other AVX512 variants, I'd prefer to improve the shuffle lowering for that properly.
2024-04-02 18:38:37 +01:00
Michael Maitland
153b8431bb
[RISCV][GISEL] Legalize G_BITCAST for scalable vectors (#85970)
SelectionDAG marks ISD::BITCAST as legal between scalable vector types
and ISelDAGToDAG deletes them.

We mark G_BITCAST between scalable vectors as legal in GISel. A future
patch will handle what to do with them after the legalizer (likley
either drop them in a isel-preprocess or convert them to COPYs).

BITCAST is needed for legalization of G_INSERT and G_EXTRACT. This is a
precommit for legalization of G_INSERT and G_EXTRACT.
2024-04-02 12:30:51 -04:00
Bevin Hansson
cd6434f9ec
[ExpandLargeDivRem] Scalarize vector types. (#86959)
expand-large-divrem cannot handle vector types.
If overly large vector element types survive into
isel, they will likely be scalarized there, but since
isel cannot handle scalar integer types of that size,
it will assert.

Handle vector types in expand-large-divrem by
scalarizing them and then expanding the scalar type
operation. For large vectors, this results in a
*massive* code expansion, but it's better than
asserting.
2024-04-02 16:37:36 +02:00
Farzon Lotfi
82d8a95611
[SPIRV][HLSL] Add HLSL intrinsic tests (#86844)
This PR is part of bookkeeping for #83882.
It also brings the SPIRV hlsl intrinsic tests in
parity with where the testing is on the DXIL backend.
2024-04-02 10:21:21 -04:00
Kevin P. Neal
737fc353d2 [FPEnv][AArch64] Correct strictfp test.
Correct strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

These tests needed the strictfp attribute added to some function
definitions and some function calls.

Test changes verified with D146845.
2024-04-02 09:35:44 -04:00
Il-Capitano
0ef7437780
[SelectionDAG][Statepoint] Fix truncation of gc.statepoint ID argument (#85908)
The ID argument of `gc.statepoint` gets incorrectly truncated to 32 bits
during code generation.
This is fixed by using `uint64_t` instead of `unsigned` for the `ID`
member in `SelectionDAGBuilder::StatepointLoweringInfo`, and a
`patchpoint` test case is extended to check for 64 bit ID generation in
stackmaps.
2024-04-02 09:28:19 -04:00
Vyacheslav Levytskyy
6cce67a8f9
[SPIR-V] Fix validity of atomic instructions (#87051)
This PR fixes validity of atomic instructions and improves type
inference. More tests are able now to be accepted by `spirv-val`.
2024-04-02 10:59:18 +02:00
Thorsten Schütt
8bb9443333
[GlobalIsel] Combine G_EXTRACT_VECTOR_ELT (#85321)
preliminary steps
2024-04-02 09:01:24 +02:00
Luke Lau
59dd10faf8 [RISCV] Add tests for fixed vector vwsll. NFC
We are missing patterns for fixed vectors, where the sexts and zexts are
legalized to _vl nodes.
2024-04-02 13:02:03 +08:00
Gulfem Savrun Yeniceri
b8ead2198f Revert "[CodeGen] Fix register pressure computation in MachinePipeliner (#87030)"
This reverts commit a4dec9d6bc67c4d8fbd4a4f54ffaa0399def9627
because the test failed in the following builder:
https://luci-milo.appspot.com/ui/p/fuchsia/builders/prod/clang-linux-x64/b8751864477467126481/overview
2024-04-01 18:27:41 +00:00
Ryotaro KASUGA
a4dec9d6bc
[CodeGen] Fix register pressure computation in MachinePipeliner (#87030)
`RegisterClassInfo::getRegPressureSetLimit` has been changed to return a
smaller value than before so the limit may become negative in later
calculations. As a workaround, change to use
`TargetRegisterInfo::getRegPressureSetLimit`.
Also improve tests.
2024-04-01 17:04:44 +09:00
Vitaly Buka
cbb27bef3e [CodeGen] Fix test after #86049 2024-04-01 00:44:27 -07:00
Vitaly Buka
d76a1233f7 [CodeGen] Fix test after #86049 2024-03-31 23:48:23 -07:00
Vitaly Buka
b890c17892 [CodeGen] Fix test after #86049 2024-03-31 23:22:07 -07:00
Vitaly Buka
289d2cc3f3 [CodeGen] Fix test after #86049 2024-03-31 23:10:21 -07:00
Sameer Sahasrabuddhe
421557974a
[AMDGPU] Use glue for convergence tokens at call-like operations (#86766)
The earlier implementation on AMDGPU used explicit token operands at
SI_CALL and SI_CALL_ISEL. This is now replaced with CONVERGENCECTRL_GLUE
operands, with the following effects:

- The treatment of tokens at call-like operations is now consistent with
the treatment at intrinsics.
- Support for tail calls using implicit tokens at SI_TCRETURN "just
works".
- The extra parameter at call-like instructions is eliminated, thus
restoring those instructions and their handling to the original state.

The new glue node is placed after the existing glue node for the
outgoing call parameters, which seems to not interfere with selection of
the call-like nodes.
2024-04-01 10:51:13 +05:30
Vitaly Buka
20f56e1f8e
[CodeGen] Add default lowering for llvm.allow.{runtime,ubsan}.check() (#86049)
RFC:
https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641
2024-03-31 22:19:33 -07:00
Yingchi Long
70deb7bfe9
[BPF] expand cttz, ctlz for i32, i64 (#73668)
Fixes: https://github.com/llvm/llvm-project/issues/62252

Depends on: #73667
2024-04-01 10:57:54 +08:00
Ruiling, Song
216b5e9666
[AMDGPU] Expose RTZ version of f16 interpolation for gfx11+ (#86614) 2024-04-01 09:48:37 +08:00
Austin Kerbow
b5b34dbb27
[AMDGPU] Use directive for kernarg preload header padding (#86004) 2024-03-31 11:03:03 -07:00
Austin Kerbow
0234d90d81
[AMDGPU] Extend MFMA padding option to gfx90a+ (#86768)
It was shown experimentally that this may have some benefit on newer HW.
2024-03-31 10:46:05 -07:00
Jacek Caban
799e1d6a12
[IR] Use EXPORTAS for ARM64EC mangled symbols with dllexport attribute. (#81940)
We currently just use mangled name. This works fine, because linker
should detect that and demangle it for the export table. However, on
MSVC, the compiler is more specific and passes demangled name as well,
with EXPORTAS. This PR aims to match that. MSVC doesn't use quotes in
this case, so I added '#' to the list of characters that don't need it.
2024-03-30 16:48:39 +01:00
Brandon Wu
29e8bfc13c
[RISCV] RISCV vector calling convention (2/2) (#79096)
This commit handles vector arguments/return for function definition/call,
the new class RVVArgDispatcher is added for doing all vector register
assignment including mask types, data types as well as tuple types.
It precomputes the register number for each argument as per
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#standard-vector-calling-convention-variant
and it's passed to calling convention function to handle all vector arguments.

Depends on: #78550
2024-03-30 21:05:33 +08:00
Jay Foad
95258419f6
[AMDGPU] Use AMDGPU::isIntrinsicAlwaysUniform in isSDNodeAlwaysUniform (#87085)
This is mostly just a simplification, but tests show a slight codegen
improvement in code using the deprecated amdgcn.icmp/fcmp intrinsics.
2024-03-30 08:01:18 +00:00
Shilei Tian
3a106e5b2c
[GlobalISel] Fold G_ICMP if possible (#86357)
This patch tries to fold `G_ICMP` if possible.
2024-03-29 15:59:50 -04:00
Helena Kotas
b42fa8645c
[DXIL] Add lowering for ceil (#87043)
Add lowering of llvm.ceil intrinsics to DXIL ops.

Fixes #86984
2024-03-29 15:09:44 -04:00
Alex MacLean
7daa65a088
Reland "[NVPTX] Use .common linkage for common globals" (#86824)
Switch from `.weak` to `.common` linkage for common global variables
where possible. The `.common` linkage is described in
[PTX ISA 11.6.4. Linking Directives: .common]
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#linking-directives-common)
> Declares identifier to be globally visible but “common”.
>
>Common symbols are similar to globally visible symbols. However
multiple object files may declare the same common symbol and they may
have different types and sizes and references to a symbol get resolved
against a common symbol with the largest size.
>
>Only one object file can initialize a common symbol and that must have
the largest size among all other definitions of that common symbol from
different object files.
>
>.common linking directive can be used only on variables with .global
storage. It cannot be used on function symbols or on symbols with opaque
type.

I've updated the logic and tests to only use `.common` for PTX 5.0 or
greater and verified that the new tests now pass with `ptxas`.
2024-03-29 11:58:41 -07:00
Farzon Lotfi
e74332a266
[HLSL][DXIL] HLSL's round should follow roundeven behavior (#87078)
fixes #86999
2024-03-29 13:19:28 -04:00
Shilei Tian
661bb9daae
[GlobalISel] Handle div-by-pow2 (#83155)
This patch adds similar handling of div-by-pow2 as in `SelectionDAG`.
2024-03-29 12:41:47 -04:00
Marc Auberer
d3bc9cc99b
[AArch64][GISEL] Regenerate select tests with inline register classes (#87013)
Use inline register class syntax for select test file.
2024-03-29 15:45:06 +01:00
Luke Lau
3f69d90351
[RISCV] Add missing RISCVMaskedPseudo for TIED pseudos (#86787)
This was preventing us from folding away the vmerge into its mask.
2024-03-29 22:21:22 +08:00
Thorsten Schütt
84299df301
[GlobalIsel] add trunc flags (#87045)
https://github.com/llvm/llvm-project/pull/85592
2024-03-29 13:38:08 +01:00
Luke Lau
76ba3c8e64 [RISCV] Add test case for vmerge fold for tied pseudos with rounding mode. NFC 2024-03-29 19:47:09 +08:00
Luke Lau
2a315d800b
[RISCV] Combine (or disjoint ext, ext) -> vwadd (#86929)
DAGCombiner (or InstCombine) will convert an add to an or if the bits
are disjoint, which can prevent what was originally an (add {s,z}ext,
{s,z}ext) from being selected as a vwadd.

This teaches combineBinOp_VLToVWBinOp_VL to recover it by treating it as
an add.
2024-03-29 19:45:24 +08:00
Luke Lau
131be5de90 [RISCV] Add more disjoint or tests for vwadd[u].{w,v}v. NFC 2024-03-29 19:11:26 +08:00
Wang Pengcheng
610b9e23c5
[SDAG] Use shifts if ISD::MUL is illegal when lowering ISD::CTPOP (#86505)
We can avoid libcalls.

Fixes #86205
2024-03-29 15:38:39 +08:00
Sudharsan Veeravalli
e005a09df5
[RISCV][TypePromotion] Dont generate truncs if PromotedType is greater than Source Type (#86941)
We currently check if the source and promoted types are not equal before
generating truncate instructions. This does not work for RV64 where the
promoted type is i64 and this lead to a crash due to the generation of
truncate instructions from i32 to i64.

Fixes #86400
2024-03-28 21:22:05 -07:00
Philip Reames
9ea0396f16
[RISCV] Extend pattern matches involving shNadd to support disjoint or (#87001)
I tried to add representative tests while not duplicating complete
coverage.  If there's other tests you'd like to see, let me know.
2024-03-28 16:34:04 -07:00
Marc Auberer
c482fad2c1
[AArch64][GISEL] Consider fcmp true and fcmp false in cond code selection (#86972)
Fixes #86917

`FCMP_TRUE` and `FCMP_FALSE` were previously not considered and we ended
up in an llvm_unreachable assertion.
2024-03-28 23:08:38 +01:00
Luke Lau
a3c2d8c072
[RISCV] Combine ({s,u}{div,rem} (zext, zext)) -> (zext ({s,u}{div,rem} (zext, zext))) (#86779)
This narrows unsigned and signed div and rem nodes via
combineBinOpOfZExt.

Unlike other binary ops, there are no widening div or rem instructions.
So we will end up with an extra vzext.vf2.

However I'm assuming that div/rem are expensive enough that by reducing
their EMUL we will gain back the cost.

Alive2 proof: https://alive2.llvm.org/ce/z/Et_L6y
2024-03-29 05:55:38 +08:00
Craig Topper
23d45e55ed
[MCP] Remove dead copies from basic blocks with successors. (#86973)
Previously we wouldn't remove dead copies from basic blocks with
successors. The comment said we didn't want to trust the live-in lists.
The comment is very old so I'm not sure if that's still a concern today.

This patch checks the live-in lists and removes copies from
MaybeDeadCopies if they are referenced by any live-ins in any
successors. We only do this if the tracksLiveness property is set. If
that property is not set, we retain the old behavior.
2024-03-28 14:43:49 -07:00
Helena Kotas
62d6beba97
[DXIL] Add lowering for reversebits and trunc (#86909)
Add lowering of `llvm.bitreverse` and `llvm.trunc` intrinsics to DXIL
ops.

Fixes #86582
Fixes #86581
2024-03-28 17:41:33 -04:00
Zaara Syeda
6582509daa
[AIX] Handle toc-data offset overflowing 16-bits (#80092)
When the toc-data offset overflows the 16-bits, we can truncate the
value to the 16-bit value as the linker will handle overflow through
fixup code.
2024-03-28 13:55:13 -04:00
Jonas Paulsson
16b7cc69ef
[SystemZ] Eliminate call sequence instructions early. (#77812)
On SystemZ, the outgoing argument area which is big enough for all calls
in the function is created once during the prolog, as opposed to
adjusting the stack around each call. The call-sequence instructions are
therefore not really useful any more than to compute the maximum call
frame size, which has so far been done by PEI, but can just as well be
done at an earlier point.

This patch removes the mapping of the CallFrameSetupOpcode and
CallFrameDestroyOpcode and instead computes the MaxCallFrameSize
directly after instruction selection and then removes the ADJCALLSTACK
pseudos. This removes the confusing pseudos and also avoids the problem
of having to keep the call frame size accurate when creating new MBBs.

This fixes #76618 which exposed the need to maintain the call frame size
when splitting blocks (which was not done).
2024-03-28 18:26:38 +01:00