52796 Commits

Author SHA1 Message Date
Krzysztof Drewniak
88871784fd
[AMDGPU] Allow buffer intrinsics to be marked volatile at the IR level (#77847)
In order to ensure the correctness of ptr addrspace(7) lowering, we need
a backwards-compatible way to flag buffer intrinsics as volatile that
can't be dropped (unlike metadata).

To acheive this in a backwards-compatible way, we use bit 31 of the
auxilliary immediates of buffer intrinsics as the volatile flag. When
this bit is set, the MachineMemOperand for said intrinsic is marked
volatile. Existing code will ensure that this results in the appropriate
use of flags like glc and dlc.

This commit also harmorizes the handling of the auxilliary immediate for
atomic intrinsics, which new go through extract_cpol like loads and
stores, which masks off the volatile bit.
2024-01-12 11:20:01 -06:00
Jay Foad
dec74a8347
[AMDGPU] Fix VS_CNT overflow assertion (#77935)
Always set the upper bound for VS_CNT higher than the lower bound.
Before #77439 this code was only executed on function entry where the
lower bound was 0 so it was not a problem.

Fixes #77931
2024-01-12 17:11:19 +00:00
Philip Reames
e4d01bb227
[SCEV] Special case sext in isKnownNonZero (#77834)
The existing logic in isKnownNonZero relies on unsigned ranges, which
can be problematic when our range calculation is imprecise. Consider the
following:
  %offset.nonzero = or i32 %offset, 1
  -->  %offset.nonzero U: [1,0) S: [1,0)
  %offset.i64 = sext i32 %offset.nonzero to i64
  -->  (sext i32 %offset.nonzero to i64) U: [-2147483648,2147483648)
                                         S: [-2147483648,2147483648)

Note that the unsigned range for the sext does contain zero in this case
despite the fact that it can never actually be zero.

Instead, we can push the query down one level - relying on the fact that
the sext is an invertible operation and that the result can only be zero
if the input is. We could likely generalize this reasoning for other
invertible operations, but special casing sext seems worthwhile.
2024-01-12 07:45:28 -08:00
Maciej Gabka
5dbf178154
[TLI][NFC] Fix ordering of ArmPL and SLEEF tests (#77609)
This patch sorts the tests which check if SLEEF and ArmPL mappings are
used, in the order of the math functions base names.
2024-01-12 15:06:25 +00:00
Natalie Chouinard
4f47372f8c
[SPIR-V] Add Float16 support when targeting Vulkan (#77115)
Add Float16 to Vulkan's available capabilities, and guard Float16Buffer
(Kernel-only capability) against being added outside OpenCL
environments.

Add tests to verify half and half vector types, and validate with
spirv-val.

Fixes #66398
2024-01-12 10:03:48 -05:00
Matthew Devereau
a8f83cc159
[AArch64][SME] Fix multi vector cvt builtins (#77656)
This fixes cvt multi vector builtins that erroneously had inverted
return vectors and vector parameters. This caused the incorrect
instructions to be emitted.
2024-01-12 09:55:52 +00:00
Fangrui Song
7e604485e1 [test] Improve x86 inline asm tests
Reorganize *asm-modifier* and make other cleanups.
2024-01-11 23:35:46 -08:00
Amara Emerson
1833e3fafa Fix test failure introduced in 3baedb411121c188c4bb07f47efb755bf4d4cf87 2024-01-11 22:06:01 -08:00
Shengchen Kan
4f71068b72 [X86] Correct the asm comment for compression NF_ND -> NF 2024-01-12 12:55:11 +08:00
Carl Ritson
6752f1517d
[TwoAddressInstruction] Recompute live intervals for partial defs (#74431)
Force live interval recomputation for a register if its definition is
narrowed to become partial. The live interval repair process cannot
otherwise detect these changes.
2024-01-12 13:26:01 +09:00
Emil J
3baedb4111
[GISel] Fix #77762: extend correct source registers in combiner helper rule extend_through_phis (#77765)
Since we already know which register we want to extend, we don't have to
ask its defining MI about it

---------

Co-authored-by: Emil Tywoniak <Emil.Tywoniak@hightec-rt.com>
2024-01-12 12:09:58 +08:00
Dávid Ferenc Szabó
d1d1e7d6d0
[NFC] Updating the tests for combine-ext.mir (#77756) 2024-01-12 10:20:53 +07:00
Shengchen Kan
9095eec052
[X86][CodeGen] Support EVEX compression: NDD to nonNDD (#77731) 2024-01-12 10:03:30 +08:00
Philip Reames
5ce067d592 Revert "[LSR][TTI][RISCV] Disable terminator folding for RISC-V."
This reverts commit fdb87640ee2be63af9b0e0cd943cb13d79686a03, and thus
re-enables terminator folding for RISCV.  The reported miscompile has
been fixed in f5dd70c58277d925710e5a7c25c86d7565cc3c6c.
2024-01-11 13:20:02 -08:00
Usman Nadeem
c3e3aa9c33
[AArch64][SVE2] Generate XAR (#77160)
Bitwise exclusive OR and rotate right by immediate

Select xar (x, y, imm) for the following pattern:
    or (shl (xor x, y), nBits-imm), (shr (xor x, y), imm)

This is essentially:
    rotr (xor(x, y), imm)
2024-01-11 10:56:29 -08:00
Luke Lau
114e6d7ba0 [RISCV] Add test for strided gather with recursive disjoint or. NFC
This already gets converted to a strided intrinsic because we currently call
haveNoCommonBitsSet when checking or instructions, but an upcoming patch will
change this logic and we want to preserve this case.

Note that this IR is in the form that comes from instcombine. The splats need
to be inline constexprs, otherwise isSplatValue() will fail. (It can't
currently handle splats where the shufflevector is an instruction, and the
insertelement is a constexpr.
2024-01-12 00:02:28 +07:00
Mirko Brkušanin
3867e6689e
[AMDGPU] Add new GFX12 image atomic float instructions (#76946) 2024-01-11 17:28:04 +01:00
HaohaiWen
b6fc463d4c
[SEH] Redirect test output to /dev/null (#77784) 2024-01-11 23:31:57 +08:00
Luke Lau
3b3ee1f534 [RISCV] Add test for strided gather with disjoint or. NFC 2024-01-11 22:08:57 +07:00
Nikita Popov
13b5882ee6 [PowerPC] Add test for #77748 (NFC) 2024-01-11 15:45:52 +01:00
HaohaiWen
f892cc36fd
[BranchFolding] Fix missing predecessors of landing-pad (#77608)
When removing an empty machine basic block, all of its successors should
be inherited by its fall through MBB. This keeps CFG as only have one
entry which is required by LiveDebugValues.

Reland #77441 as LiveDebugValues test.
2024-01-11 22:09:41 +08:00
Jay Foad
b120dae9bb
[AMDGPU] Support GFX12 VDSDIR instructions WAITVMSRC operand in GCNHazardRecognizer (#77628)
Modify GCNHazardRecognizer::fixLdsDirectVMEMHazard() so the waitvsrc
operand
in gfx12 DS_PARAM_LOAD or DS_DIRECT_LOAD instructions is set
appropriately
depending on whether a hazard is found or not, rather than inserting an
S_WAITCNT_DEPCTR instruction if a hazard needs to be mitigated.

Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>
2024-01-11 13:20:19 +00:00
John Brawn
40d5c2bcd4
[clang][AArch64] Add a -mbranch-protection option to enable GCS (#75486)
-mbranch-protection=gcs (enabled by -mbranch-protection=standard) causes
generated objects to be marked with the gcs feature. This is done via
the guarded-control-stack module flag, in a similar way to
branch-target-enforcement and sign-return-address.

Enabling GCS causes the GNU_PROPERTY_AARCH64_FEATURE_1_GCS bit to be set
on generated objects. No code generation changes are required, as GCS
just requires that functions are called using BL and returned from using
RET (or other similar variant instructions), which is already the case.
2024-01-11 12:53:23 +00:00
Amara Emerson
bbbe8ecc17
[GlobalISel][Localizer] Allow localization of a small number of repeated phi uses. (#77566)
We previously had a heuristic that if a value V was used multiple times
in a single PHI, then to avoid potentially rematerializing into many predecessors
we bail out. The phi uses only counted as a single use in the shouldLocalize() hook
because it counted the PHI as a single instruction use, not factoring in it may
have many incoming edges.

It turns out this heuristic is slightly too pessimistic, and allowing a small number
of these uses to be localized can improve code size due to shortening live ranges,
especially if those ranges span a call.

This change results in some improvements in size on CTMark -Os:
```
Program                                       size.__text
                                              before         after           diff
kimwitu++/kc                                  451676.00      451860.00       0.0%
mafft/pairlocalalign                          241460.00      241540.00       0.0%
tramp3d-v4/tramp3d-v4                         389216.00      389208.00      -0.0%
7zip/7zip-benchmark                           587528.00      587464.00      -0.0%
Bullet/bullet                                 457424.00      457348.00      -0.0%
consumer-typeset/consumer-typeset             405472.00      405376.00      -0.0%
SPASS/SPASS                                   410288.00      410120.00      -0.0%
lencod/lencod                                 426396.00      426108.00      -0.1%
ClamAV/clamscan                               380108.00      379756.00      -0.1%
sqlite3/sqlite3                               283664.00      283372.00      -0.1%
                           Geomean difference                               -0.0%
```
I experimented with different variations and thresholds. Using 3 instead
of 2 resulted in a further 0.1% improvement on ClamAV but also regressed
sqlite3 by the same %.
2024-01-11 18:57:37 +08:00
Shengchen Kan
e4e0b65838 [X86][test] Pre-commit test for #77731 2024-01-11 18:51:34 +08:00
Sjoerd Meijer
75d820dcdd
[AArch64] MI Scheduler: create more LDP/STP pairs (#77565)
Target hook `canPairLdStOpc` is missing quite a few opcodes for which
LDPs/STPs can created. I was hoping that it would not be necessary to
add these missing opcodes here and that the attached motivating test
case would be handled by the LoadStoreOptimiser (especially after
#71908), but it's not. The problem is that after register allocation
some things are a lot harder to do. Consider this for the motivating
example

```
[1] renamable $q1 = LDURQi renamable $x9, -16 :: (load (s128) from %ir.r51, align 8, !tbaa !0)
[2] renamable $q2 = LDURQi renamable $x0, -16 :: (load (s128) from %ir.r53, align 8, !tbaa !4)
[3] renamable $q1 = nnan ninf nsz arcp contract afn reassoc nofpexcept FMLSv2f64 killed renamable $q1(tied-def 0), killed renamable $q2, renamable $q0, implicit $fpcr
[4] STURQi killed renamable $q1, renamable $x9, -16 :: (store (s128) into %ir.r51, align 1, !tbaa !0)
[5] renamable $q1 = LDRQui renamable $x9, 0 :: (load (s128) from %ir.r.G0001_609.0, align 8, !tbaa !0)
```
We can't combine the the load in line [5] into the load on [1]:
regisister q1 is used in between. And we can can't combine [1] into 
[5]: it is aliasing with the STR on line [4].

So, adding some missing opcodes here seems the best/easiest approach.
I will follow up to add some more missing cases here.
2024-01-11 09:46:47 +00:00
Simon Pilgrim
7bf13fe812
[DAG] Fold (sext (sext_inreg x)) -> (sext (trunc x)) if the trunc is free (#77616) 2024-01-11 09:39:30 +00:00
Thorsten Schütt
d7642b2200
[GlobalIsel] Combine select to integer minmax (second attempt). (#77520)
Instcombine canonicalizes selects to floating point and integer minmax.
This and the dag combiner canonicalize to floating point minmax. None of
them canonicalizes to integer minmax. On Neoverse V2 basic integer
arithmetic and integer minmax have the same costs.
2024-01-11 09:50:33 +01:00
Diana Picus
16945bc16d
[AMDGPU] Don't send DEALLOC_VGPRs after calls (#77439)
Calls do not have to wait for VsCnt, so after they return there might
still be scratch stores in progress. It's important that we don't send
the DEALLOC_VGPR message in that case, since that might release the
VGPRs and scratch allocation before those stores are complete.
2024-01-11 09:14:52 +01:00
Luke Lau
e8790027b1
[RISCV] Allow vsetvlis with same register AVL in doLocalPostpass (#76801) 2024-01-11 12:12:46 +07:00
Shengchen Kan
1fe7bdb87b
[X86][CodeGen] Support lowering for NDD ADD/SUB/ADC/SBB/OR/XOR/NEG/NOT/INC/DEC/IMUL (#77564)
We supported encoding/decoding for these instructions in

https://github.com/llvm/llvm-project/pull/76319
https://github.com/llvm/llvm-project/pull/76721
https://github.com/llvm/llvm-project/pull/76919
2024-01-11 12:15:17 +08:00
Min-Yih Hsu
03be448cce
[RISCV][AMDGPU] Mark test/CodeGen/Generic/live-debug-label.ll XFAIL for RISCV and AMDGPU (#77631)
Both RISC-V and AMDGPU(GCN) deploy two VirtRegRewriter in their codegen
pipeline. This test prematurely stops at the first one, which doesn't
cleanup the virtual register map and cause an assertion failure. Ideally
we can solve this by teaching `-stop-after` how to stop at the last
instance of a Pass, but we're just marking XFAIL for these two targets
for now.
2024-01-10 16:47:34 -08:00
Craig Topper
3378514a4d
[RISCV] Use any_extend for type legalizing atomic_compare_swap with Zacas. (#77669)
With Zacas we will use amocas.w which doesn't require the input to be
sign extended.
2024-01-10 12:41:11 -08:00
Craig Topper
0a1b066bba
[RISCV] Support isel for Zacas for XLen and i32. (#77666)
This adds new isel patterns for Zacas that take priority over the
pseudoinstructions we use for the A extension.

Support for 2x XLen types will come in a separate patch since they need
to be done differently.
2024-01-10 12:00:40 -08:00
CarolineConcatto
14e7dac92a
[Clang][LLVM][AArch64]SVE2.1 update the intrinsics according to acle[1] (#76844)
This patch changes the following intrinsic

 ```svst1uwq[_{d}]  replaced by svst1wq[_{d}]
 svst1uwq_vnum[_{d}] replaced by svst1wq_vnum[_{d}]
 svst1udq[_{d}]  replaced by svst1dq[_{d}]
 svst1udq_vnum[_{d}] replaced by svst1dq_vnum[_{d}]
```
Drops 'u' from the quadword stores because it is simply truncating the
quadwords to 32 bits

```
 svextq_lane[_{d}] replaced by  svextq[_{d}]
```
EXTQ follows the previous defined EXT intrinsics

```
 svdot[_{d}_{2}_{3}] replaced by svdot[_{d}_{2}]
```
Introduced with the latest SME2 ACLE change

[1]https://github.com/ARM-software/acle/pull/257
2024-01-10 17:12:14 +00:00
Sander de Smalen
d7ac412333
[AArch64][SME] Fix definition of uclamp/sclamp instructions. (#77619)
For some reason the arguments were in the wrong order.
2024-01-10 17:07:03 +00:00
HaohaiWen
9bde5becb4
[BranchFolding][SEH] Add test to track SEH CFG optimization (#77598)
This test tracks BranchFolding pass which removes fall through jump and
leaves landing-pad to be machine basic block of no predecessors. It
would raise bug as introduced in #77441.
2024-01-10 22:34:18 +08:00
Ulrich Weigand
9aa8c82748 [SystemZ] Fix 256-bit shifts when i128 is legal
When i128 is a legal type, SelectionDAG now attempts to use
SRL_PARTS etc. with type i128, which is not implemented.  Fix
by marking those as Expand, just like we do for i64.

Fixes https://github.com/llvm/llvm-project/issues/77132
2024-01-10 15:12:19 +01:00
Simon Pilgrim
cc21aa1922 [X86] lower1BitShuffle - fold permute(setcc(x,y)) -> setcc(permute(x),permute(y)) for 32/64-bit element vectors
Noticed in #77459 - for wider element types, its usually better to pre-shuffle the comparison arguments if we can, like we already for broadcasts
2024-01-10 12:35:50 +00:00
Simon Pilgrim
78cf2c041b [X86] pr77459.ll - add missing AVX512 check prefixes
Missed these in 3210ce276350a247220b193db12a9b45d1034724 for the #77459 fix
2024-01-10 12:09:38 +00:00
Jay Foad
08da7ac80c
[AMDGPU] Fix broken sign-extended subword buffer load combine (#77470) 2024-01-10 10:50:13 +00:00
Ivan Kosarev
084f1c2ee0
[AMDGPU][True16] Support V_CEIL_F16. (#73108)
As not all fake instructions have their real counterparts implemented
yet, we specify no AssemblerPredicate for UseFakeTrue16Insts to allow
both fake and real True16 instructions in assembler and disassembler
tests in the -mattr=+real-true16 mode during the transition period.

Source DPP and desitnation VOPDstOperand_t16 operands are still not
supported and will be addressed separately.
2024-01-10 08:46:19 +00:00
Craig Topper
b788692fa5 [RISCV][NFC] Remove unused CHECK prefixes to fix buildbots. NFC 2024-01-09 23:37:18 -08:00
Serge Pavlov
7fc7ef1434
[GlobalISel] Lowering of {get,set,reset}_fpenv (#75086)
The intrinsics get_fpenv, set_fpenv and reset_fpenv in this change are
implemented as calls to math library functions. Target specific lowering
will be implemented later on.
2024-01-10 14:18:00 +07:00
Juneyoung Lee
7388b7422f
[WebAssembly] Correctly consider signext/zext arg flags at function declaration (#77281)
This patch fixes WebAssembly's FastISel pass to correctly consider
signext/zeroext parameter flags at function declaration.
Previously, the flags at call sites were only considered during code
generation, which caused an interesting bug report #63388 .
This is problematic especially because in WebAssembly's ABI, either
signext or zeroext can be tagged to a function argument, and it must be
correctly reflected in the generated code. Unit test
https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/WebAssembly/signext-zeroext.ll
shows that `i8 zeroext %t` and `i8 signext %t`'s code gen are different.
2024-01-09 23:54:43 -06:00
jiahanxie353
e42a70afab [RISCV][GISel] IRTranslate and Legalize some instructions with scalable vector type
* Add IRTranslate tests for ADD, SUB, AND, OR, and XOR with scalable
  vector types to show that they work as expected.
* Legalize G_ADD, G_SUB, G_AND, G_OR, and G_XOR of scalable vector
  type for the RISC-V vector extension.
2024-01-09 21:51:30 -07:00
Chia
a79d13f12a
[RISCV][ISel] Use vaaddu with rounding mode rnu for ISD::AVGCEILU. (#77473)
Similar to #76550, but for `ISD::AVGCEILU`.
Specifically, this patch aims to use `vaaddu` with rounding mode rnu
(i.e `vxrm[1:0] = 0b00`) for `ISD::AVGCEILU`.

### Source code 
```
define <vscale x 8 x i8> @vaaddu_vv_nxv8i8_ceil(<vscale x 8 x i8> %x, <vscale x 8 x i8> %y) {
  %xzv = zext <vscale x 8 x i8> %x to <vscale x 8 x i16>
  %yzv = zext <vscale x 8 x i8> %y to <vscale x 8 x i16>
  %add = add nuw nsw <vscale x 8 x i16> %xzv, %yzv
  %one = insertelement <vscale x 8 x i16> poison, i16 1, i32 0
  %splat = shufflevector <vscale x 8 x i16> %one, <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer
  %add1 = add nuw nsw <vscale x 8 x i16> %add, %splat
  %div = lshr <vscale x 8 x i16> %add1, %splat
  %ret = trunc <vscale x 8 x i16> %div to <vscale x 8 x i8>
  ret <vscale x 8 x i8> %ret
}
```

### Before this patch 
```
vaaddu_vv_nxv8i8_ceil:
        vsetvli a0, zero, e8, m1, ta, ma
        vwaddu.vv       v10, v8, v9
        vsetvli zero, zero, e16, m2, ta, ma
        vadd.vi v10, v10, 1
        vsetvli zero, zero, e8, m1, ta, ma
        vnsrl.wi        v8, v10, 1
        ret
```
### After this patch 
```
vaaddu_vv_nxv8i8_ceil:
        vsetvli a0, zero, e8, m1, ta, ma
        csrwi vxrm, 0
        vaaddu.vv v8, v8, v9
        ret
```
2024-01-10 12:08:16 +09:00
HaohaiWen
c9124adfd8
Revert "[SEH][CodeGen] Add test to track CFG optimization bug for SEH" (#77542)
Reverts llvm/llvm-project#77441
I'll land it with fix.
2024-01-10 09:25:45 +08:00
Kai Luo
6615581526
[PowerPC] Make verifier happy when lowering llvm.trap (#77266)
`llvm.trap` is lowered to `PPC::TRAP` and `PPC::TRAP` is set as
terminator. Verifier complains about terminator should not lie in the
middle of an MBB. See #77095.

Fix it by removing `isTerminator` and `isBarrier` and then set `isTrap`
which was introduced by https://reviews.llvm.org/D48836# and is being
used by X86 and AArch64.

`PPC::TRAP` is not a hardware memory barrier and `llvm.trap` doesn't
indicate a memory barrier either.
2024-01-10 09:23:30 +08:00
Zequan Wu
4e8986fc58
[Coverage] Mark coverage sections as metadata sections on COFF. (#76834)
Mark `.lcovmap$M`, `.lcovfun$M`, `.lcovd` and `.lcovn` as metadata
sections on COFF so they are not loaded into memory.
2024-01-09 16:58:28 -05:00