52796 Commits

Author SHA1 Message Date
Simon Pilgrim
170c525d79 [X86] combineExtractVectorElt - fold extract(trunc(x),c) -> trunc(extract(x,c)) 2024-04-08 11:01:19 +01:00
Pengcheng Wang
364028a1a5
[RISCV] Zimop/Zcmop are ratified
Remove them from experimental.

See also:
https://github.com/riscv/riscv-isa-manual/blob/main/src/zimop.adoc

Reviewers: kito-cheng

Reviewed By: kito-cheng

Pull Request: https://github.com/llvm/llvm-project/pull/87966
2024-04-08 16:40:02 +08:00
David Green
9fd2e2c2fd
[DAG][AArch64] Support masked loads/stores with nontemporal flags (#87608)
SVE has some non-temporal masked loads and stores. The metadata coming
from the nodes is not copied to the MMO at the moment though, meaning it
will generate a normal instruction. This patch ensures that the right
flags are set if the instruction has non-temporal metadata.
2024-04-08 08:53:27 +01:00
David Green
ac321cbb03
[AArch64][GlobalISel] Legalize Insert vector element (#81453)
This attempts to standardize and extend some of the insert vector
element lowering. Most notably:
- More types are handled by splitting illegal vectors.
- The index type for G_INSERT_VECTOR_ELT is canonicalized to
  TLI.getVectorIdxTy(), similar to extact_vector_element.
- Some of the existing patterns now have the index type specified to
  make sure they can apply to GISel too.
- The C++ selection code has been removed, relying on tablegen patterns.
- G_INSERT_VECTOR_ELT with small GPR input elements are pre-selected to
  use a i32 type, allowing the existing patterns to apply.
- Variable index inserts are lowered in post-legalizer lowering,
  expanding into a stack store and reload.
2024-04-08 08:44:13 +01:00
Bevin Hansson
110c22fe12
[ExpandLargeFpConvert] Support bfloat. (#87619)
The conversion expansions did not properly handle bfloat types.

I'm not certain that these expansions are completely correct;
I don't have any experience with AMDGPU or the ability to run
anything to test it.

Note that it doesn't seem like AMDGPU with GlobalISel can
handle fptrunc of float to bfloat, which is needed for itofp.
I've omitted the GISEL run for the bfloat case.

This fixes #85379.
2024-04-08 09:07:55 +02:00
Pengcheng Wang
f3b5597364
[RISCV] Use larger copies when register tuples are aligned
When the encoding of register tuples are aligned, we can use a copy
with larger LMUL to reduce copies.

Reviewers: preames, topperc, lukel97

Reviewed By: topperc, lukel97

Pull Request: https://github.com/llvm/llvm-project/pull/84455
2024-04-08 13:24:57 +08:00
Haohai Wen
cebf77fb93
[CodeGen][DebugInfo] Add missing DebugLoc for SplitCriticalEdge (#72192)
In SplitCriticalEdge, DebugLoc of the branch instruction in new created
MBB was set to empty. It should be set and we can find proper DebugLoc
for it in most cases. This patch set it to non empty merged DebugLoc of
current MBB branches.
2024-04-08 09:44:34 +08:00
Philip Reames
da675b922c [RISCV] Expand test coverage of stack offsets between 2^11 and 2^15
Adds two sets of tests.  First, one for prolog/epilogue insertions where
the second stack adjustment can be done with shNadd for zba.  Second, a
set of tests with offsets off SP in the same ranges, but also adding
varying alignments.
2024-04-07 15:22:25 -07:00
Jianjian Guan
bc8726b16b
[RISCV] Support codegen of vfmv.v.f for bfloat vector with both Zvfbfmin and Zfbfmin (#87318)
vfmv, vfmerge should support bfloat vector when we have both Zvfbfmin
and Zfbfmin, this patch tries to support vfmv first.
2024-04-07 10:41:47 +08:00
AtariDreams
8389b3bf60
[X86] Fix typo: QWORD alignment is greater than or equal to 8, not greater than 8 (#87819)
Align(8) is QWORD aligned, but this was checking to see if alignment was
greater than that, when it should have been checking for being greater
than OR EQUAL to Align(8).

This bug was introduced in
https://github.com/llvm/llvm-project/commit/6a6af30d433d7 during the
transition to the Align type.
2024-04-07 08:43:13 +08:00
darkbuck
8e98435ae9
[GISel][Combine] Enhance combining on G_BUILD_VECTOR
Reviewers: aemerson, arsenm

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/87831
2024-04-06 18:33:01 -04:00
Sizov Nikita
d38bff460a
[AArch64] SimplifyDemandedBitsForTargetNode - add AArch64ISD::BICi handling (#76644)
Fold BICi if all destination bits are already known to be zeroes

```llvm
define <8 x i16> @haddu_known(<8 x i8> %a0, <8 x i8> %a1) {
  %x0 = zext <8 x i8> %a0 to <8 x i16>
  %x1 = zext <8 x i8> %a1 to <8 x i16>
  %hadd = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %x0, <8 x i16> %x1)
  %res = and <8 x i16> %hadd, <i16 511, i16 511, i16 511, i16 511,i16 511, i16 511, i16 511, i16 511>
  ret <8 x i16> %res
}
declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>)
```

```
haddu_known:                            // @haddu_known
        ushll   v0.8h, v0.8b, #0
        ushll   v1.8h, v1.8b, #0
        uhadd   v0.8h, v0.8h, v1.8h
        bic     v0.8h, #254, lsl #8 <-- this one will be removed as we know high bits are zero extended
        ret
```

Fixes #53881
Fixes #53622
2024-04-06 21:41:24 +01:00
Matt Arsenault
4cb110a84f
[RFC] IR: Support atomicrmw FP ops with vector types (#86796)
Allow using atomicrmw fadd, fsub, fmin, and fmax with vectors of
floating-point type. AMDGPU supports atomic fadd for <2 x half> and <2 x
bfloat> on some targets and address spaces.

Note this only supports the proper floating-point operations; float
vector typed xchg is still not supported. cmpxchg still only supports
integers, so this inserts bitcasts for the loop expansion.

I have support for fp vector typed xchg, and vector of int/ptr
separately implemented but I don't have an immediate need for those
beyond feature consistency.
2024-04-06 15:27:45 -04:00
Amara Emerson
60fc4ac67a [GlobalISel] Don't form anyextending atomic loads.
Until we can reliably check the legality and improve our selection of these,
don't form them at all.
2024-04-05 13:34:59 -07:00
Craig Topper
4abb722ffa [RISCV] Add tests for opportunities to reassociate to form more shXadd instructions. NFC
These tests consist of patterns like (sh3add Z, (add X, (slli Y, 6)))
that can be reassociated to form (sh3add (sh3add Y, Z), X).
2024-04-05 12:50:48 -07:00
Craig Topper
0a6a40d62e [RISCV] Add Zca predicate to BrccCompressOpt patterns used for MinSize.
Previously we only checked for C.
2024-04-05 12:39:39 -07:00
Craig Topper
e7e78274a6 [RISCV] Remove uses of sed from compress-opt-branch.ll. NFC
sed was being used to use the same test functions with eq/ne branch
condition.

This commit duplicates the test functions so that we have a version
with each condition. This allows us to remove 2 RUN lines.

I plan to add a Zca testing to this file which now requires 1 new
RUN line instead of 2.
2024-04-05 12:35:46 -07:00
Craig Topper
3c37f926a1 [RISCV] Fix comment in compress-opt-branch.ll to match description. NFC
Test description says constant does not fit in 12 bits, but the constant
used was -2048 which does fit in 12 bits. Update to -2049.

Also remove uses of -NOT in favor of positive checks. One of the -NOT
should have been using RESBROPT instead of "c.beqz" so that it would
check for the absense of the correct instruction based on the sed
replacement on the RUN line.
2024-04-05 11:52:46 -07:00
Simon Pilgrim
b861e2736a [X86] pr45995.ll - add nounwind to silence cfi noise 2024-04-05 16:36:35 +01:00
Simon Pilgrim
6a6335fa39 [X86] bool-vector.ll - add nounwind to silence cfi noise 2024-04-05 16:36:34 +01:00
Michael Liao
a1b2f0cc44 Reland "[GlobalISel] Fix the infinite loop issue in commute_int_constant_to_rhs"
- That test needs to disable combine rules by name and hence requires `asserts`.
2024-04-05 10:34:12 -04:00
AtariDreams
c5d000b1a8
[Thumb] Resolve FIXME: Use 'mov hi, $src; mov $dst, hi' (#81908)
Consider the following:

        ldr     r0, [r4]
        ldr     r7, [r0, #4]
        cmp     r7, r3
        bhi     .LBB0_6
        cmp     r0, r2
        push    {r0}
        pop     {r4}
        bne     .LBB0_3
        movs    r0, r6
        pop     {r4, r5, r6, r7}
        pop     {r1}
        bx      r1

Here is a snippet of the generated THUMB1 code of the K&R malloc
function that clang currently compiles to.

push    {r0} ends up being popped to pop {r4}.

movs r4, r0 would destroy the flags set by cmp right above.

The compiler has no alternative in this case, except one:
the only alternative is to transfer through a high register.

However, it seems like LLVM does not consider that this is a valid
approach, even though it is a free clobbering a high register.

This patch addresses the FIXME so the compiler can do that when it can
in r10 or r11, or r12.
2024-04-05 10:18:22 +01:00
Koakuma
697dd93ae3
[SPARC] Implement L and H inline asm argument modifiers (#87259)
This adds support for using the L and H argument modifiers for twinword
operands in inline asm code, such as in:

```
%1 = tail call i64 asm sideeffect "rd %pc, ${0:L} ; srlx ${0:L}, 32, ${0:H}", "={o4}"()
```

This is needed by the Linux kernel.
2024-04-05 04:34:07 +07:00
Victor Campos
74373c1bef
Revert "[ARM][Thumb2] Mark BTI-clearing instructions as scheduling region boundaries" (#87699)
Reverts llvm/llvm-project#79173

The testcase fails in non-asserts builds.
2024-04-04 21:29:21 +01:00
Eli Friedman
c83f23d6ab
[AArch64] Fix heuristics for folding "lsl" into load/store ops. (#86894)
The existing heuristics were assuming that every core behaves like an
Apple A7, where any extend/shift costs an extra micro-op... but in
reality, nothing else behaves like that.

On some older Cortex designs, shifts by 1 or 4 cost extra, but all other
shifts/extensions are free. On all other cores, as far as I can tell,
all shifts/extensions for integer loads are free (i.e. the same cost as
an unshifted load).

To reflect this, this patch:

- Enables aggressive folding of shifts into loads by default.

- Removes the old AddrLSLFast feature, since it applies to everything
except A7 (and even if you are explicitly targeting A7, we want to
assume extensions are free because the code will almost always run on a
newer core).

- Adds a new feature AddrLSLSlow14 that applies specifically to the
Cortex cores where shifts by 1 or 4 cost extra.

I didn't add support for AddrLSLSlow14 on the GlobalISel side because it
would require a bunch of refactoring to work correctly. Someone can pick
this up as a followup.
2024-04-04 11:25:44 -07:00
Daniil Kovalev
d97d560fbf
[AArch64][PAC][MC][ELF] Support PAuth ABI compatibility tag (#85236)
Depends on #87545

Emit `GNU_PROPERTY_AARCH64_FEATURE_PAUTH` property in
`.note.gnu.property` section depending on
`aarch64-elf-pauthabi-platform` and `aarch64-elf-pauthabi-version` llvm
module flags.
2024-04-04 21:05:03 +03:00
Gulfem Savrun Yeniceri
be8fd86f6a Revert "[GlobalISel] Fix the infinite loop issue in commute_int_constant_to_rhs"
This reverts commit 1f01c580444ea2daef67f95ffc5fde2de5a37cec
because combine-commute-int-const-lhs.mir test failed in
multiple builders.
https://lab.llvm.org/buildbot/#/builders/124/builds/10375
https://luci-milo.appspot.com/ui/p/fuchsia/builders/prod/clang-linux-x64/b8751607530180046481/overview
2024-04-04 16:39:31 +00:00
Craig Topper
51f1cb5355
[X86] Add or_is_add patterns for INC. (#87584)
Should fix the cases noted in #86857
2024-04-04 08:04:21 -07:00
Piotr Sobczak
5b59ae423a
[DAG] Preserve NUW when reassociating (#87621)
Similarly to the generic case below, preserve the NUW flag when
reassociating adds with constants.
2024-04-04 16:47:25 +02:00
Simon Pilgrim
c1742525d0 [X86] evex-to-vex-compress.mir - update test checks missed in #87636 2024-04-04 15:42:29 +01:00
Victor Campos
5ad320abe3
[ARM][Thumb2] Mark BTI-clearing instructions as scheduling region boundaries (#79173)
Following https://github.com/llvm/llvm-project/pull/68313 this patch
extends the idea to M-profile PACBTI.

The Machine Scheduler can reorder instructions within a scheduling
region depending on the scheduling policy set. If a BTI-clearing
instruction happens to partake in one such region, it might be moved
around, therefore ending up where it shouldn't.

The solution is to mark all BTI-clearing instructions as scheduling
region boundaries. This essentially means that they must not be part of
any scheduling region, and as consequence never get moved:

 - PAC
 - PACBTI
 - BTI
 - SG

Note that PAC isn't BTI-clearing, but it's replaced by PACBTI late in
the compilation pipeline.

As far as I know, currently it isn't possible to organically obtain code
that's susceptible to the bug:

- Instructions that write to SP are region boundaries. PAC seems to
always be followed by the pushing of r12 to the stack, so essentially
PAC is always by itself in a scheduling region.
- CALL_BTI is expanded into a machine instruction bundle. Bundles are
unpacked only after the last machine scheduler run. Thus setjmp and BTI
can be separated only if someone deliberately run the scheduler once
more.
- The BTI insertion pass is run late in the pipeline, only after the
last machine scheduling has run. So once again it can be reordered only
if someone deliberately runs the scheduler again.

Nevertheless, one can reasonably argue that we should prevent the bug in
spite of the compiler not being able to produce the required conditions
for it. If things change, the compiler will be robust against this
issue.

The tests written for this are contrived: bogus MIR instructions have
been added adjacent to the BTI-clearing instructions in order to have
them inside non-trivial scheduling regions.
2024-04-04 12:44:32 +01:00
Luke Lau
4e0b8eae4c [RISCV] Add tests for vwsll for extends > .vf2. NFC
These cannot be picked up by TableGen patterns alone and need to be handled
by combineBinOp_VLToVWBinOp_VL
2024-04-04 18:43:15 +08:00
Simon Pilgrim
2d0087424f
[DAG] Remove extract_vector_elt(freeze(x)), idx -> freeze(extract_vector_elt(x), idx) fold (#87480)
Reverse the fold with handling inside canCreateUndefOrPoison for cases where we know that the extract index is in bounds.

This exposed a number or regressions, and required some initial freeze handling of SCALAR_TO_VECTOR, which will require us to properly improve demandedelts support to handle its undef upper elements.

There is still one outstanding regression to be addressed in the future - how do we want to handle folds involving frozen loads?

Fixes #86968
2024-04-04 11:10:55 +01:00
Jay Foad
3cf539fb04
[AMDGPU] Combine or remove redundant waitcnts at the end of each MBB (#87539)
Call generateWaitcnt unconditionally at the end of
SIInsertWaitcnts::insertWaitcntInBlock. Even if we don't need to
generate a new waitcnt instruction it has the effect of combining or
removing redundant waitcnts that were already present. Tests show
various small improvements in waitcnt placement.
2024-04-04 10:14:16 +01:00
Vyacheslav Levytskyy
47e996d89d
[SPIR-V] Fix OpVariable instructions place in a function (#87554)
This PR:
* fixes OpVariable instructions place in a function (see
https://github.com/llvm/llvm-project/issues/66261),
* improves type inference,
* helps avoiding unneeded bitcasts when validating function call's

This allows to improve existing and add new test cases with more strict
checks. OpVariable fix refers to "All OpVariable instructions in a
function must be the first instructions in the first block" requirement
from SPIR-V spec.
2024-04-04 10:50:35 +02:00
Luke Lau
3a7b5223a6
[DAGCombiner][RISCV] Handle truncating splats in isNeutralConstant (#87338)
On RV64, we legalize zexts of i1s to (vselect m, (splat_vector i64 1),
(splat_vector i64 0)), where the splat_vectors are implicitly
truncating.

When the vselect is used by a binop we want to pull the vselect out via
foldSelectWithIdentityConstant. But because vectors with an element size
< i64 will truncate, isNeutralConstant will return false.

This patch handles truncating splats by getting the APInt value and
truncating it. We almost don't need to do this since most of the neutral
elements are either one/zero/all ones, but it will make a difference for
smax and smin.

I wasn't able to figure out a way to write the tests in terms of select,
since we need the i1 zext legalization to create a truncating
splat_vector.

This supercedes #87236. Fixed vectors are unfortunately not handled by
this patch (since they get legalized to _VL nodes), but they don't seem
to appear in the wild.
2024-04-04 12:36:15 +08:00
Luke Lau
07d5f49186
[RISCV] Add patterns for fixed vector vwsll (#87316)
Fixed vectors have their sext/zext operands legalized to _VL nodes, so
we need to handle them in the patterns.

This adds a riscv_ext_vl_oneuse pattern since we don't care about the
type of extension used for the shift amount, and extends
Low8BitsSplatPat to handle other _VL nodes. We don't actually need to
check the mask or VL there since none of the _VL nodes have passthru
operands.

The remaining test cases that are widening from i8->i64 need to be
handled by extending combineBinOp_VLToVWBinOp_VL.

This also fixes Low8BitsSplatPat incorrectly checking the vector size
instead of the element size to determine if the splat value might have
been truncated below 8 bits.
2024-04-04 11:30:23 +08:00
darkbuck
1f01c58044
[GlobalISel] Fix the infinite loop issue in commute_int_constant_to_rhs
- When both operands are constant, the matcher runs into an infinite
  loop as the commutation should be applied only when LHS is a constant
  and RHS is not.

Reviewers: arsenm

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/87426
2024-04-03 20:52:21 -04:00
Michael Maitland
63c925ca80 [RISCV][GISEL] Instruction selection for G_ZEXT, G_SEXT, and G_ANYEXT with scalable vector type 2024-04-03 15:56:08 -07:00
Michael Maitland
188ca374ee [RISCV][GISEL] Regbankselect for G_ZEXT, G_SEXT, and G_ANYEXT with scalable vector type 2024-04-03 15:56:04 -07:00
Michael Maitland
35a9393a3f [RISCV][GISEL] Instruction selection for G_ICMP 2024-04-03 15:47:34 -07:00
Michael Maitland
05f673bcef [RISCV][GISEL] Regbank select for scalable vector G_ICMP 2024-04-03 15:47:34 -07:00
Michael Maitland
8aa3a77eaf [RISCV][GISEL] Legalize G_ZEXT, G_SEXT, and G_ANYEXT, G_SPLAT_VECTOR, and G_ICMP for scalable vector types
This patch legalizes G_ZEXT, G_SEXT, and G_ANYEXT. If the type is a
legal mask type, then the instruction is legalized as the element-wise
select, where the condition on the select is the mask typed source
operand, and the true and false values are 1 or -1 (for
zero/any-extension and sign extension) and zero. If the type is a legal integer
or vector integer type, then the instruction is marked as legal.

The legalization of the extends may introduce a G_SPLAT_VECTOR, which
needs to be legalized in this patch for the extend test cases to pass.

A G_SPLAT_VECTOR is legal if the vector type is a legal integer or
floating point vector type and the source operand is sXLen type. This is
because the SelectionDAG patterns only support sXLen typed
ISD::SPLAT_VECTORS, and we'd like to reuse those patterns. A
G_SPLAT_VECTOR is cutom legalized if it has a legal s1 element vector
type and s1 scalar operand. It is legalized to G_VMSET_VL or G_VMCLR_VL
if the splat is all ones or all zeros respectivley. In the case of a
non-constant mask splat, we legalize by promoting the scalar value to
s8.

In order to get the s8 element vector back into s1 vector, we use a
G_ICMP. In order for the splat vector and extend tests to pass, we also
need to legalize G_ICMP in this patch.

A G_ICMP is legal if the destination type is a legal bool vector and the LHS and
RHS are legal integer vector types.
2024-04-03 15:27:15 -07:00
David Green
52ae02db40 [AArch64] Add a test for non-temporal masked loads / stores. NFC 2024-04-03 19:31:25 +01:00
Michael Maitland
07d3f2a8de [RISCV][GISEL] Run update_mir_test_checks on llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-xor.mir 2024-04-03 10:37:44 -07:00
Amaury Séchet
1aedf949e0 [NFC] Automatically generate indirect-branch-tracking-eh2.ll 2024-04-03 15:22:23 +00:00
Weining Lu
0f5f931a9b [CodeGen] Fix test after #86049 2024-04-03 22:28:02 +08:00
aniplcc
d650fcd6bf
[DAG] SimplifyDemandedVectorElts - add ISD::AVGCEILS/AVGCEILU/AVGFLOORS/AVGFLOORU nodes (#86284)
Fixes #84768
2024-04-03 15:00:50 +01:00
Simon Pilgrim
2bf7ddf06f [X86] Add vector truncation tests for nsw/nuw flags
Based off #85592 - our truncation -> PACKSS/PACKUS folds should be able to use the nsw/nuw flags to recognise when we don't need to mask/sext_inreg prior to the PACKSS/PACKUS nodes.
2024-04-03 13:35:55 +01:00
AinsleySnow
52b18430ae
[VP][DAGCombine] Use simplifySelect when combining vp.select. (#87342)
Hi all,

This patch is a follow-up of #79101. It migrates logic from
`visitVSELECT` to `visitVP_SELECT` to simplify `vp.select`. With this
patch we can do the following combinations:

```
vp.select undef, T, F --> T (if T is a constant), F otherwise
vp.select <condition>, undef, F --> F
vp.select <condition>, T, undef --> T
vp.select false, T, F --> F
vp.select <condition>, T, T --> T
```

I'm a total newbie to llvm and I'm sure there's room for improvements in
this patch. Please let me know if you have any advice. Thank you in
advance!
2024-04-03 07:45:50 -04:00