55501 Commits

Author SHA1 Message Date
Michael Maitland
c2c4db8d8f
[RISCV][VLOPT] Add support for 11.11 div instructions (#112201)
This adds support for these instructions and also tests getOperandInfo
for these instructions as well.
2024-10-14 14:44:33 -04:00
Michael Maitland
82e89c0271
[RISCV][VLOPT] Add support for 11.9 min/max instructions (#112198)
This adds support for these instructions and also tests getOperandInfo
for these instructions as well.
2024-10-14 14:43:56 -04:00
Yuta Saito
d4efc3e097
[Coverage][WebAssembly] Add initial support for WebAssembly/WASI (#111332)
Currently, WebAssembly/WASI target does not provide direct support for
code coverage.
This patch set fixes several issues to unlock the feature. The main
changes are:

1. Port `compiler-rt/lib/profile` to WebAssembly/WASI.
2. Adjust profile metadata sections for Wasm object file format.
- [CodeGen] Emit `__llvm_covmap` and `__llvm_covfun` as custom sections
instead of data segments.
    - [lld] Align the interval space of custom sections at link time.
- [llvm-cov] Copy misaligned custom section data if the start address is
not aligned.
    - [llvm-cov] Read `__llvm_prf_names` from data segments
3. [clang] Link with profile runtime libraries if requested

See each commit message for more details and rationale.
This is part of the effort to add code coverage support in Wasm target
of Swift toolchain.
2024-10-15 02:41:43 +09:00
Michael Maitland
2f077ece2f [RISCV][VLOPT] Enable VLOptimizer for vl-opt.ll test file 2024-10-14 10:36:06 -07:00
Michael Marjieh
b5600c6f85
[TargetLowering][SelectionDAG] Exploit nneg Flag in UINT_TO_FP (#108931)
1. Propagate the nneg flag in WidenVecRes
2. Use SINT_TO_FP in expandUINT_TO_FP when possible.
2024-10-14 20:55:48 +04:00
Shilei Tian
a74659445d
[AMDGPU] Skip terminators when forcing emit zero flag (#112116)
When forcing emit zero, we need to skip terminators of a MBB; otherwise
the terminator list of the MBB would be broken.
2024-10-14 11:46:18 -04:00
Michael Maitland
a31e834ba8 [RISCV][VLOPT] Update test cases to use riscv-enable-vl-optimizer and better formatting 2024-10-14 08:44:16 -07:00
Albert Huang
aa2c0f35a1
[ARM] [AArch32] Add support for Arm China STAR-MC1 CPU (#110085)
STAR-MC1 is an Armv8m CPU.

Technical specifications available at:

https://www.armchina.com/download/Documents/Application-Notes/Technical-Reference-Manual?infoId=160
2024-10-14 15:48:12 +01:00
Simon Pilgrim
d81c2f16a3 [X86] canCreateUndefOrPoisonForTargetNode - X86ISD::VPERMV3 shuffles don't create undef/poison
The operands might contain an undef/poison element, but the shuffle node itself will not create one by itself.

Improves test case from #109272
2024-10-14 14:54:03 +01:00
Simon Pilgrim
fd8a4b0073 [X86] combineAndnp - fold ANDN(SEXT(SETCC()),X) -> SELECT(NOT(SETCC()),X,0) on AVX512 targets
Reverse the generic foldVSelectToSignBitSplatMask fold on AVX512 targets where we can use the SETCC result directly in predicated moves/instructions.

Fixes #109272
2024-10-14 14:54:03 +01:00
Akshat Oke
cd6c2b80be
[NewPM][CodeGen] Port StackColoring to NPM (#111812) 2024-10-14 19:23:34 +05:30
c8ef
a3b0c31ebc
Revert "[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes." (#112200)
Reverts llvm/llvm-project#111774

This appears to be causing some tests to fail.
2024-10-14 21:43:49 +08:00
c8ef
11f625cb87
[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes. (#111774)
Closes #108218.

This patch adds icmp+select patterns for integer min/max matchers in
SDPatternMatch, similar to those in IR PatternMatch.
2024-10-14 21:19:34 +08:00
Akshat Oke
8b20f1b924
[MIR] Fix tests for flags in register info (#112179)
[MIR] Serialize virtual register flags #110228 introduces register flags
which appear empty in .mir dumps. Future tests should use
`-simplify-mir`.
2024-10-14 18:28:54 +05:30
Simon Pilgrim
f7788618dd [X86] vselect-packss.ll - regenerate test checks with vpternlog comments 2024-10-14 12:11:30 +01:00
Simon Pilgrim
ccb9835edb
[X86] LowerShift - lower vXi8 shifts of an uniform constant using PSHUFB (#112175)
If each 128-bit vXi8 lane is shifting the same constant value, we can pre-compute the 8 valid shift results and use PSHUFB to act as a LUT with the shift amount.

Fixes #110317
2024-10-14 12:10:41 +01:00
Serge Pavlov
52e5683ddd
[GlobalISel][ARM] Legalization of G_CONSTANT using constant pool (#98308)
ARM uses complex encoding of immediate values using small number of
bits. As a result, some values cannot be represented as immediate
operands, they need to be synthesized in a register. This change
implements legalization of such constants with loading values from
constant pool.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2024-10-14 16:40:21 +07:00
Akshat Oke
bec839d8ee
[AMDGPU] Serialize WWM_REG vreg flag (#110229) 2024-10-14 14:37:21 +05:30
Akshat Oke
dbfca24b99
[MIR] Serialize virtual register flags (#110228)
[MIR] Serialize virtual register flags

This introduces target-specific vreg flag serialization. Flags are represented as `uint8_t` and the `TargetRegisterInfo` override provides methods `getVRegFlagValue` to deserialize and `getVRegFlagsOfReg` to serialize.
2024-10-14 14:19:53 +05:30
David Green
a07639f4bb
[AArch64] Increase inline memmove limit to 16 stored registers (#111848)
The memcpy inline limit has been 16 for a long time, this patch makes
the memmove inline limit the same, allowing small-constant sized
memmoves to be emitted inline. The 16 is the number of registers stored,
which equates to a limit of 256 bytes.
2024-10-14 08:57:32 +01:00
YunQiang Su
c01ddbe916
RISC-V: Select FCANONICALIZE (#112083)
We can use `FMIN.x OP,OP` to canonlize a float.
2024-10-14 14:12:36 +08:00
Jim Lin
dba54fb074
[RISCV] Add support for inline asm constraint vd (#111653)
It constrains vector registers excluding v0. Refer to
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html RISC-V part.

This patch also adds a testcase for constraints vr, vd and vm.
2024-10-14 10:47:59 +08:00
duk
464a7ee79e
[CodeGen] Generalize trap emission after SP check fail (#109744)
Generalize and improve some target-specific code that emits traps after
stack protector failure in SelectionDAG & GlobalIsel.
2024-10-12 20:01:22 -04:00
Miguel Saldivar
6fd229a655
[X86] Invert (and X, ~(and ~Y, Z)) back into (and X, (or Y, ~Z)) (#109215)
When `andn` is available, we should avoid switching `s &= ~(z & ~y);` into `s &= ~z | y;`

This patch turns this assembly from:
```
foo:
        not     rcx
        and     rsi, rdx
        andn    rax, rsi, rdi
        or      rcx, rdx
        and     rax, rcx
        ret
```
into:
```
foo:
        and     rsi, rdx
        andn    rcx, rdx, rcx
        andn    rax, rsi, rdi
        andn    rax, rcx, rax
        ret
```
Fixes #108731
2024-10-12 11:28:39 +01:00
Matt Arsenault
cb2f161957
AArch64: Remove incorrect REQUIRES arm-registered-target from test (#111983) 2024-10-12 13:26:17 +04:00
Craig Topper
902520256b [RISCV] Make (sext_inreg X, i1) legal for XTHeadBb to cover the existing isel pattern.
I just happened to notice the untested isel pattern.
2024-10-11 16:16:07 -07:00
Tex Riddell
82b40fd4fd
Fix scalar overload name constructed by ReplaceWithVeclib.cpp (#111095)
ReplaceWithVeclib.cpp would construct overload name using all the
arguments in the intrinsic, but overloads should only be constructed
from arguments for which isVectorIntrinsicWithOverloadTypeAtArg returns
true, including the return type first (index -1).

Additionally,
- skip when `Intrinsic::not_intrinsic`, otherwise
`isVectorIntrinsicWithOverloadTypeAtArg` asserts for some
IntrinsicCalls.

Unblocks translation for pow and atan2 intrinsics.

Fixes #111093
2024-10-11 14:38:35 -07:00
Craig Topper
8b46d40221 [RISCV] Re-generate orc-b-patterns.ll for store clustering. NFC
The patch added orc-b-patterns.ll landed while store clustering was
still in review.
2024-10-11 14:28:51 -07:00
Alex Bradbury
2967e5f800
[RISCV] Enable store clustering by default (#73796)
Builds on #73789, enabling store clustering by default using the same
heuristic.
2024-10-11 20:25:53 +01:00
Simon Pilgrim
03447ab98d [X86] Add test coverage for #110317
Add tests showing potential to use PSHUFB for shifts of constant uniform values by using a pre-computed LUT of all legal shift amounts
2024-10-11 17:22:56 +01:00
Janek van Oirschot
50866e84d1
Revert "[AMDGPU] Avoid resource propagation for recursion through multiple functions" (#112013)
Reverts llvm/llvm-project#111004
2024-10-11 17:10:28 +01:00
Juan Manuel Martinez Caamaño
2d5f3b0a61
[AMDGPU][SIPreEmitPeephole] mustRetainExeczBranch: use BranchProbability and TargetSchedmodel (#109818)
Remove s_cbranch_execnz branches if the transformation is
profitable according to `BranchProbability` and `TargetSchedmodel`.
2024-10-11 17:45:59 +02:00
Janek van Oirschot
67160c5ab5
[AMDGPU] Avoid resource propagation for recursion through multiple functions (#111004)
Avoid constructing recursive MCExpr definitions when multiple functions
cause a recursion.

Fixes #110863
2024-10-11 16:42:50 +01:00
Michael Maitland
1c94388f38
[RISCV] Introduce VLOptimizer pass (#108640)
The purpose of this optimization is to make the VL argument, for
instructions that have a VL argument, as small as possible. This is
implemented by visiting each instruction in reverse order and checking
that if it has a VL argument, whether the VL can be reduced.

By putting this pass before VSETVLI insertion, we see three kinds of
changes to generated code:
1. Eliminate VSETVLI instructions
2. Reduce the VL toggle on VSETVLI instructions that also change vtype
3. Reduce the VL set by a VSETVLI instruction

The list of supported instructions is currently whitelisted for safety.
In the future, we could add more instructions to `isSupportedInstr` to
support even more VL optimization.

We originally wrote this pass because vector GEP instructions do not
take a VL, which leads us to emit code that uses VL=VLMAX to implement
GEP in the RISC-V backend. As a result, some of the vector instructions
will write to lanes, specifically between the intended VL and VLMAX,
that will never be read. As an alternative to this pass, we considered
adding a vector predicated GEP instruction, but this would not fit well
into the intrinsic type system since GEP has a variable number of
arguments, each with arbitrary types. The second approach we considered
was to put this pass after VSETVLI insertion, but we found that it was
more difficult to recognize optimization opportunities, especially
across basic block boundaries -- the data flow analysis was also a bit
more expensive and complex.

While this pass solves the GEP problem, we have expanded it to handle
more cases of VL optimization, and there is opportunity for the analysis
to be improved to enable even more optimization. We have a few follow up
patches to post, but figured this would be a good start.

---------

Co-authored-by: Craig Topper <craig.topper@sifive.com>
Co-authored-by: Kito Cheng <kito.cheng@sifive.com>
2024-10-11 09:45:35 -04:00
Benjamin Maxwell
c3a10dc849
[AArch64] Disable consecutive store merging when Neon is unavailable (#111519)
Lowering fixed-size BUILD_VECTORS without Neon may introduce stack
spills, leading to more stores/reloads than if the stores were not
merged. In some cases, it can also prevent using paired store
instructions.

In the future, we may want to relax when SVE is available, but
currently, the SVE lowerings for BUILD_VECTOR are limited to a few
specific cases.
2024-10-11 14:15:01 +01:00
Emilio Cota
9a696b68b7 Revert "[NVPTX] Prefer prmt.b32 over bfi.b32 (#110766)"
This reverts commit 3f9998af4f79e95fe8be615df9d6b898008044b9.

It breaks downstream tests with egregious numerical differences.

Unfortunately no upstream tests are broken, but the fact that
a prior iteration of the commit (pre-optimization) does work
with our downstream tests (coming from the Triton repo) supports
the claim that the final version of the commit is incorrect.

Reverting now so that the original author can evaluate.
2024-10-11 08:42:33 -04:00
Daniel Mokeev
26b832a9ec
[RISCV] Add DAG combine to turn (sub (shl X, 8-Y), (shr X, Y)) into orc.b (#111828)
This patch generalizes the DAG combine for `(sub (shl X, 8), X) =>
(orc.b X)`
into the more general form of `(sub (shl X, 8 - Y), (srl X, Y)) =>
(orc.b X)`.

Alive2 generalized proof: https://alive2.llvm.org/ce/z/dFcf_n
Related issue: https://github.com/llvm/llvm-project/issues/96595
Related PR: https://github.com/llvm/llvm-project/pull/96680
2024-10-11 20:41:47 +08:00
Matt Arsenault
14705a912f
CodeGen: Remove redundant REQUIRES registered-target from tests (#111982)
These are already in target specific test directories.
2024-10-11 16:16:12 +04:00
Petar Avramovic
7b0d56be1d
AMDGPU/GlobalISel: Fix inst-selection of ballot (#109986)
Both input and output of ballot are lane-masks:
result is lane-mask with 'S32/S64 LLT and SGPR bank'
input is lane-mask with 'S1 LLT and VCC reg bank'.
Ballot copies bits from input lane-mask for
all active lanes and puts 0 for inactive lanes.
GlobalISel did not set 0 in result for inactive lanes
for non-constant input.
2024-10-11 11:40:27 +02:00
Fabian Ritter
173c68239d
[AMDGPU] Enable unaligned scratch accesses (#110219)
This allows us to emit wide generic and scratch memory accesses when we
do not have alignment information. In cases where accesses happen to be
properly aligned or where generic accesses do not go to scratch memory,
this improves performance of the generated code by a factor of up to 16x
and reduces code size, especially when lowering memcpy and memmove
intrinsics.

Also: Make the use of the FeatureUnalignedScratchAccess feature more
consistent: FeatureUnalignedScratchAccess and EnableFlatScratch are now
orthogonal, whereas, before, code assumed that the latter implies the
former at some places.

Part of SWDEV-455845.
2024-10-11 08:50:49 +02:00
Yingwei Zheng
ec3e0a5900
Revert "[CodeGenPrepare] Convert ctpop(X) ==/!= 1 into ctpop(X) u</u> 2/1" (#111932)
Reverts llvm/llvm-project#111284 to fix clang stage2 builds.
Investigating...

Failed buildbots:
https://lab.llvm.org/buildbot/#/builders/76/builds/3576
https://lab.llvm.org/buildbot/#/builders/168/builds/4308
https://lab.llvm.org/buildbot/#/builders/127/builds/1087
2024-10-11 11:08:07 +08:00
Phoebe Wang
9882b35a3a
[X86][StrictFP] Combine fcmp + select to fmin/fmax for some predicates (#109512)
X86 maxss/minss etc. instructions won't turn SNaN to QNaN, so we can
combine fcmp + select to them for some predicates.
2024-10-11 10:18:40 +08:00
Yingwei Zheng
e3894f58e1
[CodeGenPrepare] Convert ctpop(X) ==/!= 1 into ctpop(X) u</u> 2/1 (#111284)
Some targets have better codegen for `ctpop(X) u< 2` than `ctpop(X) ==
1`. After https://github.com/llvm/llvm-project/pull/100899, we set the
range of ctpop's return value to indicate the argument/result is
non-zero.

This patch converts `ctpop(X) ==/!= 1` into `ctpop(X) u</u> 2/1` in CGP
to fix https://github.com/llvm/llvm-project/issues/95255.
2024-10-11 09:08:38 +08:00
YunQiang Su
72fb379225
AArch64: Select FCANONICALIZE (#104429)
FMINNM/FMAXNM instructions of AArch64 follow IEEE754-2008. We can use
them to canonicalize a floating point number. And
FMINNUM_IEEE/FMAXNUM_IEEE is used by something like expanding
FMINIMUMNUM/FMAXIMUMNUM, so let's define them.

---------

Co-authored-by: Your Name <you@example.com>
2024-10-11 08:45:14 +08:00
Finn Plummer
2647505027
[HLSL] Implement the degrees intrinsic (#111209)
- add degrees builtin
    - link degrees api in hlsl_intrinsics.h
    - add degrees intrinsic to IntrinsicsDirectX.td
    - add degrees intrinsic to IntrinsicsSPIRV.td
- add lowering from clang builtin to dx/spv intrinsics in CGBuiltin.cpp
    - add semantic checks to SemaHLSL.cpp
- add expansion of directx intrinsic to llvm fmul for DirectX in
DXILIntrinsicExpansion.cpp
    - add mapping to spir-v intrinsic in SPIRVInstructionSelector.cpp

    - add test coverage:
- degrees.hlsl -> check hlsl lowering to dx/spv degrees intrinsics
- degrees-errors.hlsl/half-float-only-errors -> check semantic warnings
- hlsl-intrinsics/degrees.ll -> check lowering of spir-v degrees
intrinsic to SPIR-V backend
- DirectX/degrees.ll -> check expansion and scalarization of directx
degrees intrinsic to fmul
      
Resolves #99104
2024-10-10 16:34:26 -07:00
Justin Fargnoli
d832a1c744
[NVPTX] Only run LowerUnreachable when necessary (#109868)
Before CUDA 12.3 `ptxas` did not recognize that the trap instruction
terminates a basic block. Instead, it would assume that control flow
continued to the next instruction. The next instruction could be in the
block that's lexically below it. This would lead to phantom CFG edges
being created within ptxas.

[NVPTX: Lower unreachable to exit to allow ptxas to accurately
reconstruct the
CFG.](1ee4d880e8)
added the LowerUnreachable pass to NVPTX to work around this. Several
other WAR patches followed.

This bug in `ptxas` was fixed in CUDA 12.3 and is thus impossible to
encounter when targeting PTX ISA v8.3+

This commit reverts the WARs for the `ptxas` bug when targeting PTX ISA
v8.3+

CC @maleadt
2024-10-10 12:57:43 -07:00
Finn Plummer
d36cef0b17
[HLSL][DXIL] Implement WaveGetLaneIndex Intrinsic (#111576)
- add additional lowering for directx backend in CGBuiltin.cpp
    - add directx intrinsic to IntrinsicsDirectX.td
    - add semantic check of arguments in SemaHLSL.cpp
    - add mapping to DXIL op in DXIL.td

    - add testing of semantics in WaveGetLaneIndex-errors.hlsl
    - add testing of dxil lowering in WaveGetLaneIndex.ll
  
Resolves #70105
2024-10-10 11:44:44 -07:00
Jay Foad
62b3a4bc70
[AMDGPU] Improve codegen for s_barrier_init (#111866) 2024-10-10 19:40:02 +01:00
Justin Fargnoli
3f9998af4f
[NVPTX] Prefer prmt.b32 over bfi.b32 (#110766)
In [[NVPTX] Improve lowering of
v4i8](cbafb6f2f5)
@Artem-B add the ability to lower ISD::BUILD_VECTOR with bfi PTX
instructions. @Artem-B did this because:
([source](https://github.com/llvm/llvm-project/pull/67866#discussion_r1343066911))

> Under the hood byte extraction/insertion ends up as BFI/BFE
instructions, so we may as well do that in PTX, too.
https://godbolt.org/z/Tb3zWbj9b

However, the example that @Artem-B linked was targeting sm_52. On modern
architectures, ptxas uses prmt.b32.
[Example](https://godbolt.org/z/Ye4W1n84o).

Thus, remove uses of NVPTXISD::BFI in favor of NVPTXISD::PRMT.
2024-10-10 10:24:02 -07:00
Ellis Hoag
cb5fbd2f60
[CodeLayout] Do not verify after assigning blocks (#111754)
Rather than invariantly running `F->verify()` when asserts are enabled,
run machine IR verification in LIT tests only.

Swap `CHECK-PERF` and `CHECK-SIZE` in `code_placement_ext_tsp_large.ll`.

Remove `={0,1,true,false}` from flags in tests.
2024-10-10 09:01:50 -07:00