52796 Commits

Author SHA1 Message Date
Craig Topper
ffa2810f7b
[RISCV] Optimize lowering of VECREDUCE_FMINIMUM/VECREDUCE_FMAXIMUM. (#85165)
Use a normal min/max reduction that doesn't propagate nans and force
the result to nan at the end if any elements were nan.
2024-03-14 12:51:29 -07:00
Thorsten Schütt
4f873730d6 precommit test 2024-03-14 19:18:40 +01:00
Florian Mayer
58a20a0b96
[MTE] fix bug that prevented stack coloring with MTE (#84422) 2024-03-14 09:26:58 -07:00
Jonas Paulsson
6588ac3017
[MachineCombiner] Don't ignore PHI depths (#82025)
The depths of the Root and the NewRoot are to be compared in
MachineCombiner::improvesCriticalPathLen(), and while the call to
BlockTrace.getInstrCycles(*Root) includes the Depth of a PHI, for some
reason PHI nodes have been ignored in getOperandDef(). 

This patch removes the special handling of PHIs in getOperandDef() so that
Root and NewRoot get a fair comparison. This does not affect loop headers
as MachineTraceMetrics handles that case by ignoring incoming PHI edges.
2024-03-14 11:47:29 -04:00
Craig Topper
23323e2837
[TargetLowering][RISCV] Propagate fastmath flags for the vector operations emitted in expandVecReduce. (#85164)
We used the fastmath flags for any scalar ops created, but not vector.
2024-03-14 08:39:32 -07:00
Michael Maitland
818e0272f5 [RISCV] Model integer min max instructions from Zbb execute in late-B ALU
We don't model the early vs late ALU so we just need to remove usage of
SiFivePipeA for these instructions.
2024-03-14 06:02:53 -07:00
Thorsten Schütt
5f774619ea
[GlobalIsel] Combine ADDO (#82927)
Perform the requested arithmetic and produce a carry output in addition
to the normal result.

Clang has them as builtins (__builtin_add_overflow_p). The middle end
has intrinsics for them (sadd_with_overflow).

AArch64: ADDS Add and set flags

On Neoverse V2, they run at half the throughput of basic arithmetic and
have a limited set of pipelines.
2024-03-14 12:45:19 +01:00
Vyacheslav Levytskyy
afec257d36
[SPIRV] Add type inference of function parameters by call instances (#85077)
This PR adds type inference of function parameters by call instances.
Two use cases that demonstrate the problem are added.
2024-03-14 10:50:11 +01:00
paperchalice
edc2066465
[CodeGen][GC] Skip function without GC in GCLoweringPass (#84421) 2024-03-14 13:07:41 +08:00
Carl Ritson
c29b265eb9 Reapply "[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104)"
This reverts commit 7d508eb5d38f4bbbab4230a666d9e742e271af61.
2024-03-14 10:56:43 +09:00
Kolya Panchenko
aa68e2814d
[RISCV] Support llvm.masked.compressstore intrinsic (#83457)
The changeset enables lowering of `llvm.masked.compressstore(%data,
%ptr, %mask)` for RVV for fixed vector type into:
```
%0 = vcompress %data, %mask, %vl
%new_vl = vcpop %mask, %vl
vse %0, %ptr, %1, %new_vl
```
Such lowering is only possible when `%data` fits into available LMULs
and otherwise `llvm.masked.compressstore` is scalarized by
`ScalarizeMaskedMemIntrin` pass.
Even though RVV spec in the section `15.8` provide alternative sequence
for compressstore, use of `vcompress + vcpop` should be a proper
canonical form to lower `llvm.masked.compressstore`. If RISC-V target
find the sequence from `15.8` better, peephole optimization can
transform `vcompress + vcpop` into that sequence.
2024-03-13 15:18:51 -04:00
Usman Nadeem
0b46884036
Revert "Revert "[AArch64] Improve lowering of truncating uzp1"" (#85119)
Reverts llvm/llvm-project#85115
The fix was already merged in
79cd2c0bb9
2024-03-13 11:58:10 -07:00
Mehdi Amini
06e310fee1
Revert "[AArch64] Improve lowering of truncating uzp1" (#85115)
Reverts llvm/llvm-project#82457

The bot is broken, likely because of mid-air collision.
2024-03-13 11:32:53 -07:00
Nadeem, Usman
79cd2c0bb9 [AArch64] Fix tests after PR82457
Change-Id: I44a7e4a10af750b3339d6564c6ce6c2e5c17778e
2024-03-13 09:55:22 -07:00
Usman Nadeem
57b991ab39
[AArch64] Improve lowering of truncating uzp1 (#82457)
There were two existing patterns:
    `concat_vectors(trunc(x), trunc(y)) -> uzp1(x, y)`
`concat_vectors(assertzext(trunc(x)), assertzext(trunc(y))) -> uzp1(x,
y)`

Move them into a class and add the following `assertsext` pattern to it:
`concat_vectors(assertsext(trunc(x)), assertsext(trunc(y))) -> uzp1(x,
y)`

Add the following transform for v8i8 and v4i16 result types to help with
pattern matching:
  `truncating uzp1(x, y) -> trunc(concat(x, y))`
And a pattern to go with it:
  `trunc(concat_vectors(x, y)) -> uzp1 (x, y)`

Add another isel pattern for v8i8 and v4i16 result vector types, similar
to
the existing concat pattern, but with a trunc node in the begining:
`trunc(concat_vectors(assertext_trunc(x), assertext_trunc(y))) ->
xtn(uzp1(x, y))`
2024-03-13 09:05:55 -07:00
Zaara Syeda
cc761a7c35
[PowerPC][NFC] Rename ADDItocL to match the 64-bit naming convention (#85099)
In preparation of adding a similar instruction for large code model on
AIX for 32-bit, rename the exisitng ADDItocL 64-instruction to ADDItocL8
to match the naming convention of other instructions with 32-bit and
64-bit variants.
2024-03-13 11:57:07 -04:00
Zaara Syeda
37b5eb0a0a
[AIX][TOC] Add -mtocdata/-mno-tocdata options on AIX (#67999)
This patch enables support that the XL compiler had for AIX under
-qdatalocal/-qdataimported.
2024-03-13 10:26:31 -04:00
Harald van Dijk
ceb744eb2f
[AMDGPU] Fix canonicalization of truncated values. (#83054)
We were relying on roundings to implicitly canonicalize, which is
generally safe, except with roundings that may be optimized away.

Fixes #82937.
2024-03-13 12:08:39 +00:00
Simon Pilgrim
a7af53e99b [DAG] visitSUB - convert some folds to use SDPatternMatch
General cleanup and allows us to handle several commutable matches with a single pattern
2024-03-13 12:00:24 +00:00
Nikita Popov
20b15e645c [Tests] Drop inrange attribute from some tests (NFC)
These don't actually test anything related to inrange, so drop the
attribute.
2024-03-13 11:49:16 +01:00
Sander de Smalen
e42e97a4ad
[AArch64][SME] Don't mark 'smstart za' as using/defining VG. (#84775)
VG is only used/defined when changing the streaming mode, using 'smstart
sm' or plainly 'smstart' (same for smstop).
2024-03-13 08:21:33 +00:00
Vyacheslav Levytskyy
0a443f13b4
[SPIR-V] Add implementation of G_SPLAT_VECTOR opcode and fix invalid types processing (#84766)
This PR:
* adds support for G_SPLAT_VECTOR generic opcode that may be legally
generated instead of G_BUILD_VECTOR by previous passes of the translator
(see https://github.com/llvm/llvm-project/pull/80378 for the source of
breaking changes);
* improves deduction of types for opaque pointers.

This PR also fixes the following issues:
* if a function has ptr argument(s), two functions that have different
SPIR-V type definitions may get identical LLVM function types and break
agreements of global register and duplicate checker;
* checks for pointer types do not account for TypedPointerType.

Update of tests:
* A test case is added to cover the issue with function ptr parameters.
* The first case, that is support for G_SPLAT_VECTOR generic opcode, is
covered by existing test cases.
* Multiple additional checks by `spirv-val` is added to cover more
possibilities of generation of invalid code.
2024-03-13 08:32:01 +01:00
Lu Weining
e4edbae0aa
Revert "[llvm][LoongArch] Improve loongarch_lasx_xvpermi_q instrinsic" (#84708)
Reverts llvm/llvm-project#82984

See the discussion in https://github.com/llvm/llvm-project/pull/83540.
2024-03-13 11:51:47 +08:00
4ast
2aacb56e83
BPF address space insn (#84410)
This commit aims to support BPF arena kernel side
[feature](https://lore.kernel.org/bpf/20240209040608.98927-1-alexei.starovoitov@gmail.com/):
- arena is a memory region accessible from both BPF program and
userspace;
- base pointers for this memory region differ between kernel and user
spaces;
- `dst_reg = addr_space_cast(src_reg, dst_addr_space, src_addr_space)`
translates src_reg, a pointer in src_addr_space to dst_reg, equivalent
pointer in dst_addr_space, {src,dst}_addr_space are immediate constants;
- number 0 is assigned to kernel address space;
- number 1 is assigned to user address space.

On the LLVM side, the goal is to make load and store operations on arena
pointers "transparent" for BPF programs:
- assume that pointers with non-zero address space are pointers to
  arena memory;
- assume that arena is identified by address space number;
- assume that address space zero corresponds to kernel address space;
- assume that every BPF-side load or store from arena is done via
pointer in user address space, thus convert base pointers using
`addr_space_cast(src_reg, 0, 1)`;

Only load, store, cmpxchg and atomicrmw IR instructions are handled by
this transformation.

For example, the following C code:

```c
   #define __as __attribute__((address_space(1)))
   void copy(int __as *from, int __as *to) { *to = *from; }
```

Compiled to the following IR:

```llvm
    define void @copy(ptr addrspace(1) %from, ptr addrspace(1) %to) {
    entry:
      %0 = load i32, ptr addrspace(1) %from, align 4
      store i32 %0, ptr addrspace(1) %to, align 4
      ret void
    }
```

Is transformed to:

```llvm
    %to2 = addrspacecast ptr addrspace(1) %to to ptr     ;; !
    %from1 = addrspacecast ptr addrspace(1) %from to ptr ;; !
    %0 = load i32, ptr %from1, align 4, !tbaa !3
    store i32 %0, ptr %to2, align 4, !tbaa !3
    ret void
```

And compiled as:

```asm
    r2 = addr_space_cast(r2, 0, 1)
    r1 = addr_space_cast(r1, 0, 1)
    r1 = *(u32 *)(r1 + 0)
    *(u32 *)(r2 + 0) = r1
    exit
```

Co-authored-by: Eduard Zingerman <eddyz87@gmail.com>
2024-03-13 02:27:25 +02:00
Michael Maitland
2f400a2fd7
[GISEL] Add G_VSCALE instruction (#84542) 2024-03-12 20:22:49 -04:00
S. Bharadwaj Yadavalli
54f631d116
[DirectX][NFC] Model precise overload type specification of DXIL Ops (#83917)
Implement an abstraction to specify precise overload types supported by
DXIL ops. These overload types are typically a subset of LLVM
intrinsics.

Implement the corresponding changes in DXILEmitter backend.

Add tests to verify expected errors for unsupported overload types at
code generation time.

Add tests to check for correct overload error output.
2024-03-12 16:51:18 -04:00
Arthur Eubanks
6bbb73b4cb
[X86] Fix determining if globals with size <8 bits are large (#84975)
Previously any global under 8 bits would accidentally be considered 0
sized, which is considered a large global.
2024-03-12 12:43:29 -07:00
Arthur Eubanks
45219702e7 [test][X86] Precommit test for large data threshold and i1 global 2024-03-12 19:08:40 +00:00
Simon Pilgrim
c1af6ab505 [X86] getFauxShuffleMask - recognise CONCAT(SUB0, SUB1) style patterns
Handles the INSERT_SUBVECTOR(INSERT_SUBVECTOR(UNDEF,SUB0,0),SUB1,N) pattern

Currently limited to v8i64/v8f64 cases as only AVX512 has decent cross lane 2-input shuffles, the plan is to relax this as I deal with some regressions
2024-03-12 17:40:19 +00:00
Jun Wang
c4e517f59c
[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035)
A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows programmers to let
the compiler know the number of workgroups to be launched in each of the
three dimensions and do optimizations based on that information.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-03-12 10:30:39 -07:00
Matt Arsenault
bd72ebd8d1
AMDGPU: Add some more mfma hazard recognizer tests (#84727) 2024-03-12 22:05:47 +05:30
Jake Egan
fa1d13590c [AIX][tests] Disable failing tests on AIX
These new tests are failing on the AIX bot because the -I option isn't supported.

Disable these tests for now until they can be fixed.
2024-03-12 12:11:18 -04:00
Nemanja Ivanovic
08dd645c15
[RISC-V] Bad immediate value for Zcmp instructions with E extension (#84925)
When we are using the Zcmp extension together with the E extension in
32-bit mode and we need to spill both callee-saved registers as well as
needing a couple of 32-bit stack slots, we emit a meaningless stack
adjustment with cm.push/cm.popret. Furthermore this leads to the stack
slot for the ra being clobbered so control returns to a random location.

This is just a pre-commit test so that the PR for the fix shows the
difference in code generation.
2024-03-12 16:26:49 +01:00
Bjorn Pettersson
4d0f79e346 Pre commit test cases SRL/SRA support in canCreateUndefOrPoison. NFC
Add test cases to show that we can't push freeze through SRA/SRL with
'exact' flag when there are multiple uses.
2024-03-12 16:03:18 +01:00
Danial Klimkin
afd4758703
Revert "[NVPTX] Add support for atomic add for f16 type" (#84918)
Reverts llvm/llvm-project#84295 due to breakages.
2024-03-12 15:01:18 +01:00
Adrian Kuegel
8e0f4b943f
[NVPTX] Add support for atomic add for f16 type (#84295)
atom.add.noftz.f16 is supported since SM 7.0
2024-03-12 09:12:44 +01:00
Dhruv Chawla (work)
1d900e2984
[AArch64][GlobalISel] Avoid generating inserts for undefs when selecting G_BUILD_VECTOR (#84452)
It is safe to ignore undef values when selecting G_BUILD_VECTOR as undef
values choose random registers for copying values from.
2024-03-12 11:57:07 +05:30
Phoebe Wang
e89b4bcf32
[X86] Remove SlowDivide tuning from GRTTuning (#84676)
The DIV32/64 throughput was improved since Goldmont in the Atom
architecture. The Alder Lake-E shows similar number too. So we shouldn't
add such tunings to Gracemont and later products.

Checked from Agner Fog's table and uops.info.
2024-03-12 13:41:49 +08:00
Craig Topper
884b051a42 Recommit "[TypePromotion] Support positive addition amounts in isSafeWrap. (#81690)"
With special case with Add constant is 0.

Original message:
We can support these by changing the sext promotion to -zext(-C) and
replacing a sgt check with ugt. Reframing the logic in terms of how the
unsigned range are affected. More comments in the patch.

The new cases check isLegalAddImmediate to avoid some
regressions in lit tests.
2024-03-11 12:39:38 -07:00
Michael Maitland
034cc2f5d0
[GISEL] Add G_INSERT_SUBVECTOR and G_EXTRACT_SUBVECTOR (#84538)
G_INSERT and G_EXTRACT are not sufficient to use to represent both
INSERT/EXTRACT on a subregister and INSERT/EXTRACT on a vector.

We would like to be able to INSERT/EXTRACT on vectors in cases that
INSERT/EXTRACT on vector subregisters are not sufficient, so we add
these opcodes.

I tried to do a patch where we treated G_EXTRACT as both
G_EXTRACT_SUBVECTOR and G_EXTRACT_SUBREG, but ran into an infinite loop
at this
[point](8b5b294ec2/llvm/lib/Target/RISCV/RISCVISelLowering.cpp (L9932))
in the SDAG equivalent code.
2024-03-11 13:47:30 -04:00
Simon Pilgrim
6cd68c2f87 [X86] Add base SSE2 coverage to SRL/SRA combines tests 2024-03-11 16:25:05 +00:00
Simon Pilgrim
7dc4d5f6a0 [X86] Add AVX512 (x86-64-v4) coverage to generic shift combines tests 2024-03-11 16:22:47 +00:00
Sivan Shani
5e688f0dbd [llvm][arm] add T1 and T2 assembly options for vlldm and vlstm
Re-land 634b0243b8f7acc85af4f16b70e91d86ded4dc83.

T1 allow for an optional registers list,
the register list must be {d0-d15}.
T2 define a mandatory register list,
the register list must be {d0-d31}.

The requirements for T1/T2 are as follows:
                T1              T2
Require:        v8-M.Main,      v8.1-M.Main,
                secure state    secure state
16 D Regs       valid           valid
32 D Regs       UNDEFINED       valid
No D Regs       NOP             NOP
2024-03-11 14:27:28 +00:00
Pierre van Houtryve
d4569d42b5
[AMDGPU] Let LowerModuleLDS run twice on the same module (#81729)
If all variables in the module are absolute, this means we're running
the pass again on an already lowered module, and that works.
If none of them are absolute, lowering can proceed as usual.
Only diagnose cases where we have a mix of absolute/non-absolute GVs,
which means we added LDS GVs after lowering, which is broken.

See #81491
Split from #75333
2024-03-11 09:20:01 +01:00
Craig Topper
561ddb1687 Revert "[TypePromotion] Support positive addition amounts in isSafeWrap. (#81690)"
This reverts commit 0813b90ff5d195d8a40c280f6b745f1cc43e087a.

Fixes miscompile reported in #84718.
2024-03-11 00:51:21 -07:00
AtariDreams
4e0e9b17c6
[SelectionDAG] Switch to LiveRegUnits (#84197) 2024-03-11 12:47:39 +05:30
Carl Ritson
4a21e3afa2
[LiveIntervals] repairIntervalsInRange: recompute width changes (#78564)
Extend repairIntervalsInRange to completely recompute the interva for a
register if subregister defs exist without precise subrange matches
(LaneMask exactly matching subregister).
This occurs when register sequences are lowered to copies such that the
size of the copies do not match any uses of the subregisters formed
(i.e. during twoaddressinstruction).

The subranges without this change are probably legal, but do not match
those generated by live interval computation. This creates problems with
other code that assumes subranges precisely cover all subregisters
defined, e.g. shrinkToUses().
2024-03-11 15:24:17 +09:00
Kito Cheng
b7f97d3661
[RISCV] Place mergeable small read only data into srodata section (#82214)
Small mergeable read only data was place on the sdata before, but it
also means it lose the mergeable property, which means lose some code
size optimization opportunity during link time.
2024-03-11 13:57:06 +08:00
Carl Ritson
d9e6aa7048
[AMDGPU] Update LiveInterval def index for early-clobber (#79285)
On converting an instruction to an early-clobber definition in
convertToThreeAddress, we must also update live intervals for the
register to start at the early-clobber index.
2024-03-11 14:54:11 +09:00
Craig Topper
d8d2dea7fc
[RISCV] Handle FP riscv_masked_strided_load with 0 stride. (#84576)
Previously, we tried to create an integer extending load. We need to a
non-extending FP load instead.

Fixes #84541.
2024-03-10 21:22:37 -07:00