52796 Commits

Author SHA1 Message Date
wanglei
edd4c6c6dc
[LoongArch] Make sure that the LoongArchISD::BSTRINS node uses the correct MSB value (#84454)
The `MSB` must not be greater than `GRLen`. Without this patch, newly
added test cases will crash with LoongArch32, resulting in a 'cannot
select' error.
2024-03-11 08:59:17 +08:00
Simon Pilgrim
862c7e0218 [X86] combineAndShuffleNot - ensure the type is legal before create X86ISD::ANDNP target nodes
Fixes #84660
2024-03-10 16:23:51 +00:00
Simon Pilgrim
92d7aca441
[X86] Add missing immediate qualifier to the (V)CMPSS/D instructions (#84496)
Matches (V)CMPPS/D and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-03-09 16:21:25 +00:00
Jay Foad
fd3eaf76ba
[GISel] Enforce G_PTR_ADD RHS type matching index size for addr space (#84352) 2024-03-09 09:07:22 +00:00
Craig Topper
6b270358c7
[SelectionDAG] Allow FREEZE to be hoisted before FP SETCC. (#84358)
No nans/infs in SelectionDAG is complicated. Hopefully I've captured
all of the cases. I've only applied to ConsiderFlags to the SDNodeFlags
since those are the only ones that will be droped by hoisting. The
condition code and TargetOptions would still be in effect.
    
Recovers some regression from #84232.
2024-03-08 17:21:21 -08:00
yingopq
755b439694
[Mips] Fix missing sign extension in expansion of sub-word atomic max (#77072)
Add sign extension "SEB/SEH" before compare.

Fix #61881
2024-03-08 15:41:31 -05:00
David Majnemer
edc1c3d24e [AArch64] Make more vector f16 operations legal
v8f16 is a legal type but promoting to v16f16 would result in an illegal
type.

Let's legalize these by a combination of splitting+promoting resulting
in a pair of v4f16.

Also, we were being overly cautious with different v4f16 nodes. Mark
more of them safe to promote to v4f32.
2024-03-08 19:52:54 +00:00
David Majnemer
5f935e9181 [AArch64] Optimize fp64 <-> fp16 SIMD conversions
Legalization would result in needless scalarization. Add some
DAGCombines to fix this up.
2024-03-08 19:52:53 +00:00
Shilei Tian
e963d0740e
[AMDGPU] Replace isInlinableLiteral16 with specific version (#84402)
The current implementation of `isInlinableLiteral16` assumes, a 16-bit
inlinable
literal is either an `i16` or a `fp16`. This is not always true because
of
`bf16`. However, we can't tell `fp16` and `bf16` apart by just looking
at the
value. This patch splits `isInlinableLiteral16` into three versions,
`i16`,
`fp16`, `bf16` respectively, and call the corresponding version.
2024-03-08 14:49:52 -05:00
Craig Topper
a456885efc
[SelectionDAG] Allow FREEZE to be hoisted before integer SETCC. (#84241)
Teach canCreateUndefOrPoison that ISD::SETCC with integer operands can
never create undef/poison. FP SETCC is more complicated and will be
handled in a future patch.

Teach isGuaranteedNotToBeUndefOrPoison that ISD::CONDCODE is not
poison/undef. Its a special constant only used by setcc/select_cc like
nodes. This is needed since the hoisting will only hoist if exactly one
operand might be poison. setcc has 3 operand including the condition
code.
    
Recovers some regression from #84232.
2024-03-08 10:17:54 -08:00
Lukacma
2b4d8188b2
[Clang][LLVM][SVE2.1] Created intrinsics for DUPQ instr. (#83260)
This patch adds clang and llvm support for following intrinsic and maps
it to DUPQ instruction:
```
   // Variants are also available for:
   // _s8, _u16, _s16, _u32, _s32, _u64, _s64
   // _bf16, _f16, _f32, _f64
   svuint8_t svdup_laneq[_u8](svuint8_t zn, uint64_t imm_idx);
```
2024-03-08 15:35:48 +00:00
Paul Walker
bd6eb54886
[LLVM][CodeGen] Teach SelectionDAG how to expand FREM to a vector math call. (#83859)
This removes, at least when a vector library is available, a failure
case for scalable vectors. Doing so means we can confidently cost vector
FREM instructions without making an assumption that later passes will
transform the IR before it gets to the code generator.

NOTE: Whilst only FREM has been implemented the same mechanism
can be used for the other libm related ISD nodes.
2024-03-08 12:09:05 +00:00
zhongyunde 00443407
a110a1c0ed [AArch64] MachineCombiner msub matching for i64 2024-03-08 18:14:26 +08:00
zhongyunde 00443407
3a62edcf52 [AArch64] MachineCombiner msub matching
Pattern should be sorted in priority order since the pattern evalutor
stops checking as soon as it finds a faster sequence.
so for a * b - c * d, we prefer to match the 2nd operands of sub,
which can be use msub to fold them.

Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm

Fix https://github.com/llvm/llvm-project/issues/84152
2024-03-08 18:14:25 +08:00
Sizov Nikita
ef1eb0315e
[AArch64] Add neon bici test for haddu and shadd (#84073)
Add neon bici test for haddu and shadd, prerequisite for #76644
2024-03-08 09:45:58 +00:00
Pierre van Houtryve
4b1910b11d
[GlobalISel][AMDGPU] Import patterns with multiple defs (#84171)
Fixes #63216
2024-03-08 09:39:10 +01:00
Vyacheslav Levytskyy
fb1be9b33c
[SPIR-V] Insert a bitcast before load/store instruction to keep SPIR-V code valid (#84069)
This PR introduces a step after instruction selection where instructions
can be traversed from the perspective of their validity from the
specification point of view. The PR adds also a way to correct
load/store when there is a type mismatch contradicting the specification
-- an additional bitcast is inserted to keep types consistent.
Correspondent test cases are added and existing test cases are
corrected.

This PR helps to successfully validate with the `spirv-val` tool
(https://github.com/KhronosGroup/SPIRV-Tools) some output that
previously led to validation errors and crashes of back translation from
SPIRV to LLVM IR from the side of SPIRV Translator project
(https://github.com/KhronosGroup/SPIRV-LLVM-Translator).

The added step of bringing instructions to required by the specification
type correspondence can be (should be and will be) extended beyond
load/store instructions to ensure validity rules of other SPIRV
instructions related to type inference.
2024-03-08 08:31:56 +01:00
Amara Emerson
f6b825f51e Revert "Revert "[AArch64][GlobalISel] Fix incorrect selection of monotonic s32->s64 anyext load.""
Attempt 2. The first one was trying to call isa<> on an MI reference that was free'd.

This reverts commit ee24409c40ff35c3221892d9723331c233ca9f0e.
2024-03-07 23:28:33 -08:00
Fangrui Song
66bd3cd75b [AMDGPU,test] Change llc -march= to -mtriple=
PR #75982 had been created before these tests were added, therefore
some test were not updated.
2024-03-07 19:09:18 -08:00
Chen Zheng
cc34e56b86 [PPC][NFC] add an option to expose the bug in 74951 2024-03-07 20:52:44 -05:00
Chen Zheng
e7a22e72de [PPC] precommit cases for issue 74915 2024-03-07 20:22:26 -05:00
Igor Kudrin
0cd7942c7f
[llvm-dwarfdump] Fix parsing DW_CFA_AARCH64_negate_ra_state (#84128)
The saved state of the AARCH64_DWARF_PAUTH_RA_STATE register was not
updated, so `llvm-dwarfdump` continued to dump it as `reg34=1` even if
the correct value is `0`:

```
> llvm-dwarfdump -v test.o
...
0000002c 00000024 00000030 FDE cie=00000000 pc=00000030...00000064
  Format:       DWARF32
  DW_CFA_advance_loc: 4
  DW_CFA_AARCH64_negate_ra_state:
  DW_CFA_advance_loc: 4
  DW_CFA_def_cfa_offset: +16
  DW_CFA_offset: W30 -16
  DW_CFA_remember_state:
  DW_CFA_advance_loc: 16
  DW_CFA_def_cfa_offset: +0
  DW_CFA_advance_loc: 4
  DW_CFA_AARCH64_negate_ra_state:
  DW_CFA_restore: W30
  DW_CFA_advance_loc: 4
  DW_CFA_restore_state:
  DW_CFA_advance_loc: 12
  DW_CFA_def_cfa_offset: +0
  DW_CFA_advance_loc: 4
  DW_CFA_AARCH64_negate_ra_state:
  DW_CFA_restore: W30
  DW_CFA_nop:

  0x30: CFA=WSP
  0x34: CFA=WSP: reg34=1
  0x38: CFA=WSP+16: W30=[CFA-16], reg34=1
  0x48: CFA=WSP: W30=[CFA-16], reg34=1
  0x4c: CFA=WSP: reg34=1               <--- should be '=0'
  0x50: CFA=WSP+16: W30=[CFA-16], reg34=1
  0x5c: CFA=WSP: W30=[CFA-16], reg34=1
  0x60: CFA=WSP: reg34=1               <--- should be '=0'
```
2024-03-08 07:34:20 +07:00
Craig Topper
0d4978f3cf [RISCV] Update some tests I missed in 909ab0e0d1903ad2329ca9fdf248d21330f9437f. NFC 2024-03-07 16:21:41 -08:00
Amara Emerson
26fa440957 [GlobalISel] Fix yet another pointer type invalid combining issue, this time in tryFoldSelectOfConstants() 2024-03-07 15:58:28 -08:00
Amara Emerson
a01e9ce86f [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT
We should moreElements <3 x s1> to <4 x s1> before we try to widen the element,
otherwise we end up with a <3 x s21> nonsense type.
2024-03-07 15:40:19 -08:00
Evgenii Kudriashov
10edabbcf3
[X86][GlobalISel] Enable G_SDIV/G_UDIV/G_SREM/G_UREM (#81615)
* Create a libcall for s64 type for 32 bit targets.
* Fix a bug in REM selection: SUBREG_TO_REG is not intended to produce a
value from super registers.
* Replace selector tests by end-to-end tests. Other passes
check the selected MIR better.
2024-03-08 00:10:53 +01:00
Craig Topper
909ab0e0d1
[RISCV] Insert a freeze before converting select to AND/OR. (#84232)
Select blocks poison, but AND/OR do not. We need to insert a freeze
to block poison propagation.

This creates suboptimal codegen which I will try to fix with other
patches. I'm prioritizing the correctness fix since we have 2 bug reports.

Fixes #84200 and #84350
2024-03-07 15:03:51 -08:00
Amara Emerson
641b98a0d1 [GlobalISel] Fix crash in tryFoldAndOrOrICmpsUsingRanges() with pointer types. 2024-03-07 12:56:40 -08:00
Noah Goldstein
9f96db8e31 [X86] Fold (icmp ult (add x,-C),2) -> (or (icmp eq X,C), (icmp eq X,C+1)) for Vectors
This is undoing a middle-end transform which does the opposite. Since
X86 doesn't have unsigned vector comparison instructions pre-AVX512,
the simplified form gets worse codegen.

Fixes #66479

Proofs: https://alive2.llvm.org/ce/z/UCz3wt

Closes #84104
Closes #66479
2024-03-07 13:12:09 -06:00
Noah Goldstein
3e73a080fa [X86] Add tests for folding (icmp ult (add x,-C),2) -> (or (icmp eq X,C), (icmp eq X,C+1)); NFC 2024-03-07 13:12:09 -06:00
Florian Mayer
ee24409c40 Revert "[AArch64][GlobalISel] Fix incorrect selection of monotonic s32->s64 anyext load."
This reverts commit 7524ad9aa7b1b5003fe554a6ac8e434d50027dfb.

Broke sanitizer build bots, e.g. https://lab.llvm.org/buildbot/#/builders/5/builds/41588/steps/9/logs/stdio
2024-03-07 09:43:21 -08:00
Michael Maitland
96049fcf4e [GISEL] Add IRTranslation for shufflevector on scalable vector types (#80378)
Recommits llvm/llvm-project#80378 which was reverted in
llvm/llvm-project#84330. The problem was that the change in
llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir used
217 as an opcode instead of a regex.
2024-03-07 09:10:03 -08:00
Jay Foad
8f79cdd8da [AArch64] Add -verify-machineinstrs to a test
This would have helped identify problems with #83905 which only showed
up in an LLVM_ENABLE_EXPENSIVE_CHECKS build.
2024-03-07 17:06:16 +00:00
Michael Maitland
552da24843
Revert "[GISEL] Add IRTranslation for shufflevector on scalable vector types" (#84330)
Reverts llvm/llvm-project#80378

causing Buildbot failures that did not show up with check-llvm or CI.
2024-03-07 10:16:31 -05:00
SahilPatidar
9e0f5909d0
[DAG] Fix Failure to reassociate SMAX/SMIN/UMAX/UMIN (#82175)
Resolve #58110
2024-03-07 15:15:17 +00:00
Michael Maitland
2b8aaef09e
[GISEL] Add IRTranslation for shufflevector on scalable vector types (#80378)
This patch is stacked on
https://github.com/llvm/llvm-project/pull/80372,
https://github.com/llvm/llvm-project/pull/80307, and
https://github.com/llvm/llvm-project/pull/80306.

ShuffleVector on scalable vector types gets IRTranslate'd to
G_SPLAT_VECTOR since a ShuffleVector that has operates on scalable
vectors is a splat vector where the value of the splat vector is the 0th
element of the first operand, because the index mask operand is the
zeroinitializer (undef and poison are treated as zeroinitializer here).
This is analogous to what happens in SelectionDAG for ShuffleVector.

`buildSplatVector` is renamed to`buildBuildVectorSplatVector`. I did not
make this a separate patch because it would cause problems to revert
that change without reverting this change too.
2024-03-07 09:50:29 -05:00
ostannard
503c55e170
[AArch64] Move SLS later in pass pipeline (#84210)
Currently, the SLS hardening pass is run before the machine outliner,
which means that the outliner creates new functions and calls which do
not have the SLS hardening applied.

The fix for this is to move the SLS passes to after the outliner, as has
recently been done for the return address signing pass.

This also avoids a bug where the SLS outliner emits code with
instructions after a return, which the outliner doesn't correctly
handle.
2024-03-07 09:28:49 +00:00
Luke Lau
c59129a7c7
[RISCV] Recursively split concat_vector into smaller LMULs (#83035)
This is the concat_vector equivalent of #81312, in that we recursively
split concat_vectors with more than two operands into smaller
concat_vectors.

This allows us to break up the chain of vslideups, as well as perform
the vslideups at a smaller LMUL, which in turn reduces register pressure
as the previous lowering performed N vslideups at the highest result
LMUL. For now, it stops splitting past MF2.

This is done as a DAG combine so that any undef operands are combined
away: If we do this during lowering then we end up with unnecessary
vslideups of undefs.
2024-03-07 16:50:26 +08:00
Jay Foad
7a0e222a17 Revert "Convert many LivePhysRegs uses to LiveRegUnits (#83905)"
This reverts commit 2a13422b8bcee449405e3ebff957b4020805f91c.

It was causing test failures on the expensive check builders.
2024-03-07 08:20:26 +00:00
Amara Emerson
7524ad9aa7 [AArch64][GlobalISel] Fix incorrect selection of monotonic s32->s64 anyext load.
This load isn't selected by tablegen due to the anyext, but wasn't generating
a subreg_to_reg. Maybe it shouldn't be formed at all during the combiner but to stop
crashes later in codegen select it manually for now.
2024-03-07 00:12:17 -08:00
Fangrui Song
e63ea9d6f7 [CommandFlags] Rename option -relax-elf-relocations to -x86-relax-relocations
relax-elf-relocations is misleading and there were AMDGPU/SystemZ tests
misusing this x86-specific option.
2024-03-06 23:03:11 -08:00
Amara Emerson
00efb34352 [AArch64][GlobalISel] Fix crash during G_SHUFFLE_VECTOR legalization.
A new widening rule was running before the shuffle was canonicalized into a
homogenous form. Moving the rules around to ensure it's done before the
widening fixes the crash, although this particular test still falls back.
2024-03-06 22:43:00 -08:00
David Green
44be5a7fdc
[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875)
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
2024-03-06 17:40:13 +00:00
Simon Pilgrim
0bd9255f8a
[X86] Improve KnownBits for X86ISD::PSADBW nodes (#83830)
Don't just return the known zero upperbits, compute the absdiff Knownbits and perform the horizontal sum.

Add implementations that handle both the X86ISD::PSADBW nodes and the INTRINSIC_WO_CHAIN intrinsics (pre-legalization).
2024-03-06 17:23:15 +00:00
Craig Topper
c161720ab4
[RISCV] Slightly improve expanded multiply emulation in getVLENFactoredAmount. (#84113)
Instead of initializing the accumulator to 0. Initialize it on first
assignment with a mv from the register that holds VLENB << ShiftAmount.

Fix a missing kill flag on the final Add.

I have no real interest in this case, just an easy optimization I
noticed.
2024-03-06 08:56:37 -08:00
Krzysztof Drewniak
6540f1635a
[AMDGPU] Add IR-level pass to rewrite away address space 7 (#77952)
This commit adds the -lower-buffer-fat-pointers pass, which is
applicable to all AMDGCN compilations.

The purpose of this pass is to remove the type `ptr addrspace(7)` from
incoming IR. This must be done at the LLVM IR level because `ptr
addrspace(7)`, as a 160-bit primitive type, cannot be correctly handled
by SelectionDAG.

The detailed operation of the pass is described in comments, but, in
summary, the removal proceeds by:
1. Rewriting loads and stores of ptr addrspace(7) to loads and stores of
i160 (including vectors and aggregates). This is needed because the
in-register representation of these pointers will stop matching their
in-memory representation in step 2, and so ptrtoint/inttoptr operations
are used to preserve the expected memory layout

2. Mutating the IR to replace all occurrences of `ptr addrspace(7)` with
the type `{ptr addrspace(8), ptr addrspace(6) }`, which makes the two
parts of a buffer fat pointer (the 128-bit address space 8 resource and
the 32-bit address space 6 offset) visible in the IR. This also impacts
the argument and return types of functions.

3. *Splitting* the resource and offset parts. All instructions that
produce or consume buffer fat pointers (like GEP or load) are rewritten
to produce or consume the resource and offset parts separately. For
example, GEP updates the offset part of the result and a load uses the
resource and offset parts to populate the relevant
llvm.amdgcn.raw.ptr.buffer.load intrinsic call.

At the end of this process, the original mutated instructions are
replaced by their new split counterparts, ensuring no invalidly-typed IR
escapes this pass. (For operations like call, where the struct form is
needed, insertelement operations are inserted).

Compared to LGC's PatchBufferOp (

32cda89776/lgc/patch/PatchBufferOp.cpp
): this pass
- Also handles vectors of ptr addrspace(7)s
- Also handles function boundaries
- Includes the same uniform buffer optimization for loops and
conditionals
- Does *not* handle memcpy() and friends (this is future work)
- Does *not* break up large loads and stores into smaller parts. This
should be handled by extending the legalization
of *.buffer.{load,store} to handle larger types by producing multiple
instructions (the same way ordinary LOAD and STORE are legalized). That
work is planned for a followup commit.
- Does *not* have special logic for handling divergent buffer
descriptors. The logic in LGC is, as far as I can tell, incorrect in
general, and, per discussions with @nhaehnle, isn't widely used.
Therefore, divergent descriptors are handled with waterfall loops later
in legalization.

As a final matter, this commit updates atomic expansion to treat buffer
operations analogously to global ones.

(One question for reviewers: is the new pass is the right place? Should
it be later in the pipeline?)

Differential Revision: https://reviews.llvm.org/D158463
2024-03-06 09:49:58 -06:00
Mirko Brkušanin
1fd1f4c0e1
[AMDGPU] Handle amdgpu.last.use metadata (#83816)
Convert !amdgpu.last.use metadata into MachineMemOperand for last use
and handle it in SIMemoryLegalizer similar to nontemporal and volatile.
2024-03-06 16:33:52 +01:00
Emma Pilkington
4490003a22
[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905)
The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling also matches
the asm directive I added in bc82cfb.
2024-03-06 09:51:48 -05:00
yandalur
f7d354af57
[Hexagon] Fix shift value when folding shl DAG node (#83853)
When folding (or (shl xx, s), (zext y)) to (COMBINE (shl xx, s-32), y),
fix resulting shift value in HexagonISD::COMBINE node to not generate
negative values.

---------

Co-authored-by: Yashas Andaluri <yandalur@qti.qualcomm.com>
2024-03-06 08:17:02 -06:00
Joseph Huber
1fc5e50ceb
[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906)
Summary:
This patch implements the LLVM floating point environment control
intrinsics and also exposes it through clang. We encode the floating
point environment as a 64-bit value that simply concatenates the values
of the mode registers and the current trap status. We only fetch the
bits relevant for floating point instructions. That is, rounding mode,
denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16
overflow, and active exceptions.
2024-03-06 08:11:54 -06:00