54875 Commits

Author SHA1 Message Date
yonghong-song
7852ebc088
[BPF] Make -mcpu=v3 as the default (#107008)
Before llvm20, (void)__sync_fetch_and_add(...) always generates locked
xadd insns. In linux kernel upstream discussion [1], it is found that
for arm64 architecture, the original semantics of
(void)__sync_fetch_and_add(...), i.e., __atomic_fetch_add(...), is
preferred in order for jit to emit proper native barrier insns.

In llvm commits [2] and [3], (void)__sync_fetch_and_add(...) will
generate the following insns:
  - for cpu v1/v2: locked xadd insns to keep backward compatibility
  - for cpu v3/v4: __atomic_fetch_add() insns

To ensure proper barrier semantics for (void)__sync_fetch_and_add(...),
cpu v3/v4 is recommended.

This patch enables cpu=v3 as the default cpu version. For users wanting
to use cpu v1, -mcpu=v1 needs to be explicitly added to clang/llc
command line.

  [1]
https://lore.kernel.org/bpf/ZqqiQQWRnz7H93Hc@google.com/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f
  [2] https://github.com/llvm/llvm-project/pull/101428
  [3] https://github.com/llvm/llvm-project/pull/106494
2024-09-03 07:15:18 -07:00
Him188
0748f4227c
[AArch64][GlobalISel] Legalize 128-bit types for FABS (#104753)
This patch adds a common lower action for `G_FABS`, which generates `and
x8, x8, #0x7fffffffffffffff` to reset the sign bit. The action does not
support vectors since `G_AND` does not support fp128.


This approach is different than what SDAG is doing. SDAG stores the
value onto stack, clears the sign bit in the most significant byte, and
loads the value back into register. This involves multiple memory ops
and sounds slower.
2024-09-03 12:47:26 +01:00
Simon Pilgrim
377045ece6 [X86] canCreateUndefOrPoisonForTargetNode - X86ISD::CMPP (CMPPS/D) nodes do not generate poison 2024-09-03 10:33:04 +01:00
Simon Pilgrim
6c59dfb018 [X86] Add test showing failure to remove freeze from all_of pattern 2024-09-03 10:15:51 +01:00
Brandon Wu
7e6bad112c
[RISCV] Rename vcix_state register to sf_vcix_state. NFC (#106995)
Since it's SiFive VCIX specific register, it's better to have a prefix
so that it's more understandable.
2024-09-03 15:55:39 +08:00
Michael Marjieh
00c198b2ca
[MachinePipeliner] Make Recurrence MII More Accurate (#105475)
Current RecMII calculation is bigger than it needs to be. The
calculation was refined in this patch.
2024-09-03 16:15:17 +09:00
Christudasan Devadasan
042104985c
[AMDGPU][NewPM] Port SIShrinkInstructions to new pass manager. (#106967) 2024-09-03 10:52:50 +05:30
Craig Topper
9a1eded9b9
[RISCV] Custom legalize f16/bf16 FCOPYSIGN with Zfhmin/Zbfmin. (#107039)
The LegalizeDAG expansion will go through memory since i16 isn't a legal
type. Avoid this by using FMV nodes.

Similar to what we did for #106886 for FNEG and FABS. Special care is
needed to handle the Sign operand being a different type.
2024-09-02 22:04:09 -07:00
Craig Topper
dc19b59ea2 [RISCV] Rename test cases in bfloat-arith.ll and half-arith.ll. NFC
Use _bf16 or _h instead of _s. The _s was copied from float-arith.ll
2024-09-02 19:00:38 -07:00
Shilei Tian
cb949b74e8 [NFC][FIX] Work around update_test_checks bug 2024-09-02 12:33:24 -04:00
Shilei Tian
f32f0289fd [NFC] Update check lines of the test case llvm/test/CodeGen/AMDGPU/remove-no-kernel-id-attribute.ll 2024-09-02 12:23:26 -04:00
Sam Tebbs
44cfbef1b3
[AArch64] Lower partial add reduction to udot or svdot (#101010)
This patch introduces lowering of the partial add reduction intrinsic to
a udot or svdot for AArch64. This also involves adding a
`shouldExpandPartialReductionIntrinsic` target hook, which AArch64 will
return false from in the cases that it can be lowered.
2024-09-02 14:06:14 +01:00
Nikita Popov
224112f833 [ARM] Regenerate test checks (NFC) 2024-09-02 14:15:03 +02:00
Simon Pilgrim
f19dff1b80 [X86] scmp/ucmp - add SSE42/AVX2/AVX512 test coverage to show current state of vector legalization/lowering 2024-09-02 12:17:21 +01:00
Oliver Stannard
9cf68679c4
[ARM] Fix failure to register-allocate CMP_SWAP_64 pseudo-inst (#106721)
This test case was failing to compile with a "ran out of registers
during register allocation" error at -O0. This was because CMP_SWAP_64
has 3 operands which must be an even-odd register pair, and two other
GPR operands. All of the def operands are also early-clobber, so
registers can't be shared between uses and defs. Because the function
has an over-aligned alloca it needs frame and base pointers, so r6 and
r11 are both reserved. That leaves r0/r1, r2/r3, r4/r5 and r8/r9 as the
only valid register pairs, and if the two individual GPR operands happen
to get allocated to registers in different pairs then only 2 pairs will
be available for the three GPRPair operands.

To fix this, I've merged the two GPR operands into a single GPRPair
operand. This means that the instruction now has 4 GPRPair operands,
which can always be allocated without relying on luck. This does
constrain register allocation a bit more, but this pseudo instruction is
only used at -O0, so I don't think that's a problem.
2024-09-02 08:54:10 +01:00
Craig Topper
c950ecb90e
[RISCV] Remove zfbfmin.ll. NFC (#106937)
Most of it is redundant with bfloat-convert.ll. One testcase is found in
bfloat-imm.ll. The load and stores are more thoroughly tested in
bfloat-mem.ll.
2024-09-02 00:18:52 -07:00
Akshat Oke
da13754103
AMDGPU/NewPM Port SILoadStoreOptimizer to NPM (#106362) 2024-09-02 11:41:56 +05:30
Craig Topper
776aef1a5a [RISCV] Correct the rounding mode for llvm.lround.i64.f32 with RV64+Zfinx.
We should use RMM instead of DYN.
2024-09-01 13:26:38 -07:00
Marina Taylor
747d89a897
[AArch64] Add tests for fused FP literals. NFC (#106731)
This is for an upcoming change to the threshold on Apple targets for
using a constant pool for FP literals versus building them with integer
moves.

This file is based on literal_pools_float.ll. I tried to bolt on to the
existing test, but it got messy as that file is already testing a matrix
of combinations, so creating this new file instead.
2024-09-01 21:11:37 +01:00
Craig Topper
5aa83eb677 [RISCV] Add test for llvm.round.i32.f16 RV64+Zfhmin/Zhinxmin. NFC
We have special handling for this in type legalization, but we
didn't have a test.
2024-09-01 12:15:00 -07:00
Yingwei Zheng
affc0c64b6
[SDAG] Expand vector [u|s]cmp in VectorLegalizer (#106883)
Address comment
https://github.com/llvm/llvm-project/pull/106747#issuecomment-2322922855.
2024-09-01 22:35:52 +08:00
Craig Topper
3bdec31316
[RISCV] Custom legalize f16/bf16 FNEG/FABS with Zfhmin/Zbfmin. (#106886)
The LegalizeDAG expansion will go through memory since i16 isn't a legal
type. Avoid this by using FMV nodes.
2024-08-31 23:57:40 -07:00
S. Bharadwaj Yadavalli
8aa8c0590c
[DXIL][Analysis] Collect Function properties in Metadata Analysis (#105728)
Basic infrastructure to collect Function properties in Metadata Analysis
- Add a `SmallVector` of entry properties to the metadata information.
- Add a structure to represent function properties. Currently
`numthreads` and shader kind properties of shader entry functions are
represented.
2024-08-31 17:56:06 -04:00
Craig Topper
8638fe1444
[X86] Fix livein handling in emitStackProbeInlineWindowsCoreCLR64. (#106828)
Stop adding liveins for virtual registers. In the livein interface, the
register goes through a MCPhysReg which is uint16_t. This causes the
virtual register bit to be dropped making it alias to some nonsense
physical register.

Recompute the liveins for the continue block to handle any live
registers that are needed by instructions that were spliced from the
original block. This fixing the machine verifier error so we can remove
that fixme now.
2024-08-31 10:11:59 -07:00
Brandon Wu
22f98740b6
[llvm][RISCV] Support RISCV vector tuple CodeGen and Calling Convention (#97995)
This patch handles target lowering and calling convention.

For target lowering, the vector tuple type represented as multiple
scalable vectors is now changed to a single `MVT`, each `MVT` has a
corresponding register class.

The load/store of vector tuples are handled as the same way but need
another vector insert/extract instructions to get sub-register group.

Inline assembly constraint for vector tuple type can directly be modeled
as "vr" which is identical to normal vector registers.

For calling convention, it no longer needs an alternative algorithm to
handle register allocation, this makes the code easier to maintain and
read.

Stacked on https://github.com/llvm/llvm-project/pull/97994
2024-08-31 19:28:36 +08:00
Brandon Wu
db67a66e8e
Revert "[RISCV] RISCV vector calling convention (2/2)" (#97994)
This reverts commit 91dd844aa499d69c7ff75bf3156e2e3593a88057.

Stacked on https://github.com/llvm/llvm-project/pull/97993
2024-08-31 19:02:35 +08:00
Luke Lau
58e1c0e416
[RISCV] Discard the false operand in vmerge.vvm -> vmv.v.v peephole (#106688)
vmerge.vvm needs to have an all ones mask, so nothing is taken from the
false operand. So instead of checking that the passthru is the same as
false, just use the passthru directly for the tail elements.

This supersedes the convertVMergeToVMv part of #105788, as noted in
https://github.com/llvm/llvm-project/pull/105788/files#r1731683971
2024-08-31 13:20:53 +08:00
Luke Lau
8e972efb58
[RISCV] Add scalable vector patterns for vfwmaccbf16.v{v,f} (#106771)
We can reuse the patterns for vfwmacc.v{v,f} as long as we swap out
fpext_oneuse for riscv_fpextend_bf16 in the scalar case.
2024-08-31 13:20:19 +08:00
Craig Topper
e6e429179e [RISCV] Cleanup CHECK prefixes in half-arith.ll. NFC
Remove prefixes that donn't appear on RUN lines.
Rename prefixes for consistency.
Add RV32/RV64 prefixes where necessary to fix a conflict.
2024-08-30 18:57:50 -07:00
yonghong-song
06c531e808
BPF: Generate locked insn for __sync_fetch_and_add() with cpu v1/v2 (#106494)
This patch contains two pars:
- first to revert the patch https://github.com/llvm/llvm-project/pull/101428.
- second to remove `atomic_fetch_and_*()` to `atomic_<op>()`
  conversion (when return value is not used), but preserve 
  `__sync_fetch_and_add()` to locked insn with cpu v1/v2.
2024-08-30 14:00:33 -07:00
Vitaly Buka
982d2445f2
Revert "AtomicExpand: Allow incrementally legalizing atomicrmw" (#106792)
Reverts llvm/llvm-project#103371

There is `heap-use-after-free`, commented on
206b5aff44a95754f6dd7a5696efa024e983ac59

Maybe `if (Next == E || BB != Next->getParent()) {` is enough,
but not sure, what was the intent there,
2024-08-30 13:51:53 -07:00
Luke Lau
0efa38699a
[RISCV] Check VL dominates and potentially move in tryReduceVL (#106753)
Similar to what we do in foldVMV_V_V with the passthru, if we end up
changing the Src's VL in tryReduceVL we need to make sure it dominates.

Fixes #106735
2024-08-31 01:50:24 +08:00
Craig Topper
c25293c6dd
[LegalizeVectorOps][RISCV] Don't promote VP_FABS/FNEG/FCOPYSIGN. (#106659)
Promoting canonicalizes NaNs which changes the semantics. Bitcast to
integer and use logic ops instead.
2024-08-30 09:44:51 -07:00
Craig Topper
688843bda8
[RISCV] Add constant folding combine for FMV_X_ANYEXTW/H. (#106653) 2024-08-30 09:43:42 -07:00
Brendan Dahl
5703d8572f
[WebAssembly] Add intrinsics to wasm_simd128.h for all FP16 instructions (#106465)
Getting this to work required a few additional changes:
- Add builtins for any instructions that can't be done with plain C
currently.
- Add support for the saturating version of fp_to_<s,i>_I16x8. Other
vector sizes supported this already.
- Support bitcast of f16x8 to v128. Needed to return a __f16x8 as
v128_t.
2024-08-30 08:42:37 -07:00
Matt Arsenault
206b5aff44
AtomicExpand: Allow incrementally legalizing atomicrmw (#103371)
If a lowering changed control flow, resume the legalization
loop at the first newly inserted block.

This will allow incrementally legalizing atomicrmw and cmpxchg.

The AArch64 test might be a bugfix. Previously it would lower
the vector FP case as a cmpxchg loop, but cmpxchgs get lowered
but previously weren't. Maybe it shouldn't be reporting cmpxchg
for the expand type in the first place though.
2024-08-30 19:11:45 +04:00
Philip Reames
924907bc6a
[DAG] Prefer 0.0 over -0.0 as neutral value for FADD w/NoSignedZero (#106616)
When getting a neutral value, we can prefer using a positive zero over a
negative zero if nsz is set on the FADD (or reduction). A positive zero
should be cheaper to materialize on basically all targets.

Arguably, we should be doing this kind of canonicalization in
DAGCombine, but we don't do that for any of the other reduction
variants, so this seems like path of least resistance. This does mean
that we can only do this for "fast" reductions. Just nsz isn't enough,
as that goes through the SEQ_FADD path where the IR level start value
isn't folded away.

If folks think this is to RISCV specific, let me know. There's a trivial
RISCV specific implementation. I went with the generic one as I through
this might benefit other targets.
2024-08-30 07:56:14 -07:00
Patryk Wychowaniec
86a60e7f1e
[AVR] Fix parsing & emitting relative jumps (#106722)
Ever since 6859685a87ad093d60c8bed60b116143c0a684c7 (or, precisely,
84428dafc0941e3a31303fa1b286835ab2b8e234) relative jumps emitted by the
AVR codegen are off by two bytes - this pull request fixes it.

## Abstract

As compared to absolute jumps, relative jumps - such as rjmp, rcall or
brsh - have an implied `pc+2` behavior; that is, `jmp 100` is `pc =
100`, but `rjmp 100` gets understood as `pc = pc + 100 + 2`.

This is not reflected in the AVR codegen:


f95026dbf6/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp (L89)

... which always emits relative jumps that are two bytes too far - or
rather it _would_ emit such jumps if not for this check:


f95026dbf6/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp (L517)

... which causes most of the relative jumps to be actually resolved
late, by the linker, which applies the offsetting logic on its own,
hiding the issue within LLVM.

[Some time
ago](697a162fa6)
we've had a similar "jumps are off" problem that got solved by touching
`shouldForceRelocation()`, but I think that has worked only by accident.
It's exploited the fact that absolute vs relative jumps in the parsed
assembly can be distinguished through a "side channel" check relying on
the existence of labels (i.e. absolute jumps happen to named labels, but
relative jumps are anonymous, so to say). This was an alright idea back
then, but it got broken by 6859685a87ad093d60c8bed60b116143c0a684c7.

I propose a different approach:
- when emitting relative jumps, offset them by `-2` (well, `-1`,
strictly speaking, because those instructions rely on right-shifted
offset),
- when parsing relative jumps, treat `.` as `+2` and read `rjmp .+1234`
as `rjmp (1234 + 2)`.

This approach seems to be sound and now we generate the same assembly as
avr-gcc, which can be confirmed with:

```cpp
// avr-gcc test.c -O3 && avr-objdump -d a.out

int main() {
    asm(
"      foo:\n\t"
"        rjmp  .+2\n\t"
"        rjmp  .-2\n\t"
"        rjmp  foo\n\t"
"        rjmp  .+8\n\t"
"        rjmp  end\n\t"
"        rjmp  .+0\n\t"
"      end:\n\t"
"        rjmp .-4\n\t"
"        rjmp .-6\n\t"
"      x:\n\t"
"        rjmp x\n\t"
"        .short 0xc00f\n\t"
);
}
```

avr-gcc is also how I got the opcodes for all new tests like `inst-brbc.s`, so we should be good.
2024-08-30 15:25:54 +02:00
wanglei
eaf87d3275 [LoongArch] Optimize for immediate value materialization using BSTRINS_D instruction
Reviewed By: heiher, SixWeining

Pull Request: https://github.com/llvm/llvm-project/pull/106332
2024-08-30 16:38:42 +08:00
wanglei
5b77e254e8
[LoongArch] Pre-commit test for immediate value materialization using BSTRINS_D
Reviewed By: SixWeining

Pull Request: https://github.com/llvm/llvm-project/pull/106331
2024-08-30 16:37:20 +08:00
Max Beck-Jones
1693d8eb9a
[AArch64][SelectionDAG] Vector splitting and promotion for histogram intrinsic (#103037)
Adds support for wider-than-legal vector types for the histogram
intrinsic (llvm.experimental.vector.histogram.add) by splitting the
vector. Also adds integer promotion for the Inc operand.
2024-08-30 08:54:12 +01:00
Alex MacLean
e004566547
[NVPTX][AA] Traverse use-def chain to find non-generic addrspace (#106477)
Address space information may be encoded anywhere along the use-def
chain. Take advantage of this by traversing the chain until we find a
non-generic addrspace.
2024-08-29 21:26:58 -07:00
Florian Mayer
ddaf2e2d29
[HWASan] add OptimizationRemark for alloca safety (#105872) 2024-08-29 20:50:51 -07:00
Justin Fargnoli
cdaebf6f0e
[NVPTX] Fix crash caused by ComputePTXValueVTs (#104524)
When [lowering return
values](99a10f1fe8/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (L3422))
from LLVM IR to SelectionDAG, we check that [the number of values
`SelectionDAG` tells us to return is equal to the number of values that
`ComputePTXValueVTs()` tells us to
return](99a10f1fe8/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (L3441)).
However, this check can fail on valid IR. For example:

```
define <6 x half> @foo() {
  ret <6 x half> zeroinitializer
}
```

`ComputePTXValueVTs()` tells us to return ***3*** `v2f16` values, while
`SelectionDAG` tells us to return ***6*** `f16` values. Thus, the
compiler will crash.

`ComputePTXValueVTs()` [supports all `half` element vectors with an even
number of
elements](99a10f1fe8/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (L213)).
Whereas `SelectionDAG` [only supports power-of-2 sized
vectors](4e078e3797/llvm/lib/CodeGen/TargetLoweringBase.cpp (L1580)).
This is the root of the discrepancy.

Assuming that the developers who added the code to
`ComputePTXValueVTs()` overlooked this, I've restricted
`ComputePTXValueVTs()` to compute the same number of return values as
`SelectionDAG`, instead of extending `SelectionDAG` to support
non-power-of-2 sized vectors.
2024-08-29 18:18:51 -07:00
Luke Lau
dbbfc952f0
[RISCV] Separate ActiveElementsAffectResult into VL and Mask flags (#106517)
In #106110 we had to mark v[f]slide1down.vx as
ActiveElementsAffectResult since the elements in the body depend on VL.
However it doesn't depend on the mask, so this was overly conservative
and broke the vmerge peephole.

We can recover this by splitting up ActiveElementsAffectResult into VL
and Mask bits, so we can more accurately model v[f]slide1down.vx and
re-enable the peephole.
2024-08-30 07:46:06 +08:00
Craig Topper
aa91d90cb0
[LegalizeVectorOps][PowerPC] Use xor to expand fneg. (#106595)
This preserves the semantis of fneg and matches what we do in
LegalizeDAG.

I kept the legal FSUB check to force unrolling for some targets that
don't have FSUB but have XOR. On Aarch64, using xor broke some tests that
expected to see a (v1f64 (fma (insertvector_elt (f64 (fneg
(extractvectorelt X)))))) pattern.
2024-08-29 15:00:23 -07:00
Stephen Tozer
412e3e394d [ExtendLifetimes][NFC] Add explicit triple to remaining fake-use tests
One of the tests for the new fake use intrinsic are failing on darwin
buildbots due to relying on behaviour for their expected triple; this
commit adds explicit triples to the few remaining fake-use tests that
didn't have them.

Fixes commit 3d08ade (#86149).

Buildbot failures:
https://lab.llvm.org/buildbot/#/builders/23/builds/2505
2024-08-29 22:27:23 +01:00
Craig Topper
4ca817d051
[GlobalISel] Add bail outs for scalable vectors to some combines. (#106496)
These combines call getNumElements() which isn't valid for scalable
vectors.
2024-08-29 14:02:53 -07:00
Craig Topper
d5c292d8ef
[GISel][RISCV] Correctly handle scalable vector shuffles of pointer vectors in IRTranslator. (#106580) 2024-08-29 12:35:50 -07:00
Philip Reames
59762a0ecf [RISCV] Add coverage for <3 x float> reduction with neutral start
We can do slightly better on the neutral value when we have nsz.
2024-08-29 12:21:35 -07:00