llvm-project

Author	SHA1	Message	Date
yonghong-song	7852ebc088	[BPF] Make -mcpu=v3 as the default (#107008 ) Before llvm20, (void)__sync_fetch_and_add(...) always generates locked xadd insns. In linux kernel upstream discussion [1], it is found that for arm64 architecture, the original semantics of (void)__sync_fetch_and_add(...), i.e., __atomic_fetch_add(...), is preferred in order for jit to emit proper native barrier insns. In llvm commits [2] and [3], (void)__sync_fetch_and_add(...) will generate the following insns: - for cpu v1/v2: locked xadd insns to keep backward compatibility - for cpu v3/v4: __atomic_fetch_add() insns To ensure proper barrier semantics for (void)__sync_fetch_and_add(...), cpu v3/v4 is recommended. This patch enables cpu=v3 as the default cpu version. For users wanting to use cpu v1, -mcpu=v1 needs to be explicitly added to clang/llc command line. [1] https://lore.kernel.org/bpf/ZqqiQQWRnz7H93Hc@google.com/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] https://github.com/llvm/llvm-project/pull/101428 [3] https://github.com/llvm/llvm-project/pull/106494	2024-09-03 07:15:18 -07:00
Him188	0748f4227c	[AArch64][GlobalISel] Legalize 128-bit types for FABS (#104753 ) This patch adds a common lower action for `G_FABS`, which generates `and x8, x8, #0x7fffffffffffffff` to reset the sign bit. The action does not support vectors since `G_AND` does not support fp128. This approach is different than what SDAG is doing. SDAG stores the value onto stack, clears the sign bit in the most significant byte, and loads the value back into register. This involves multiple memory ops and sounds slower.	2024-09-03 12:47:26 +01:00
Simon Pilgrim	377045ece6	[X86] canCreateUndefOrPoisonForTargetNode - X86ISD::CMPP (CMPPS/D) nodes do not generate poison	2024-09-03 10:33:04 +01:00
Simon Pilgrim	6c59dfb018	[X86] Add test showing failure to remove freeze from all_of pattern	2024-09-03 10:15:51 +01:00
Brandon Wu	7e6bad112c	[RISCV] Rename `vcix_state` register to `sf_vcix_state`. NFC (#106995 ) Since it's SiFive VCIX specific register, it's better to have a prefix so that it's more understandable.	2024-09-03 15:55:39 +08:00
Michael Marjieh	00c198b2ca	[MachinePipeliner] Make Recurrence MII More Accurate (#105475 ) Current RecMII calculation is bigger than it needs to be. The calculation was refined in this patch.	2024-09-03 16:15:17 +09:00
Christudasan Devadasan	042104985c	[AMDGPU][NewPM] Port SIShrinkInstructions to new pass manager. (#106967 )	2024-09-03 10:52:50 +05:30
Craig Topper	9a1eded9b9	[RISCV] Custom legalize f16/bf16 FCOPYSIGN with Zfhmin/Zbfmin. (#107039 ) The LegalizeDAG expansion will go through memory since i16 isn't a legal type. Avoid this by using FMV nodes. Similar to what we did for #106886 for FNEG and FABS. Special care is needed to handle the Sign operand being a different type.	2024-09-02 22:04:09 -07:00
Craig Topper	dc19b59ea2	[RISCV] Rename test cases in bfloat-arith.ll and half-arith.ll. NFC Use _bf16 or _h instead of _s. The _s was copied from float-arith.ll	2024-09-02 19:00:38 -07:00
Shilei Tian	cb949b74e8	[NFC][FIX] Work around update_test_checks bug	2024-09-02 12:33:24 -04:00
Shilei Tian	f32f0289fd	[NFC] Update check lines of the test case `llvm/test/CodeGen/AMDGPU/remove-no-kernel-id-attribute.ll`	2024-09-02 12:23:26 -04:00
Sam Tebbs	44cfbef1b3	[AArch64] Lower partial add reduction to udot or svdot (#101010 ) This patch introduces lowering of the partial add reduction intrinsic to a udot or svdot for AArch64. This also involves adding a `shouldExpandPartialReductionIntrinsic` target hook, which AArch64 will return false from in the cases that it can be lowered.	2024-09-02 14:06:14 +01:00
Nikita Popov	224112f833	[ARM] Regenerate test checks (NFC)	2024-09-02 14:15:03 +02:00
Simon Pilgrim	f19dff1b80	[X86] scmp/ucmp - add SSE42/AVX2/AVX512 test coverage to show current state of vector legalization/lowering	2024-09-02 12:17:21 +01:00
Oliver Stannard	9cf68679c4	[ARM] Fix failure to register-allocate CMP_SWAP_64 pseudo-inst (#106721 ) This test case was failing to compile with a "ran out of registers during register allocation" error at -O0. This was because CMP_SWAP_64 has 3 operands which must be an even-odd register pair, and two other GPR operands. All of the def operands are also early-clobber, so registers can't be shared between uses and defs. Because the function has an over-aligned alloca it needs frame and base pointers, so r6 and r11 are both reserved. That leaves r0/r1, r2/r3, r4/r5 and r8/r9 as the only valid register pairs, and if the two individual GPR operands happen to get allocated to registers in different pairs then only 2 pairs will be available for the three GPRPair operands. To fix this, I've merged the two GPR operands into a single GPRPair operand. This means that the instruction now has 4 GPRPair operands, which can always be allocated without relying on luck. This does constrain register allocation a bit more, but this pseudo instruction is only used at -O0, so I don't think that's a problem.	2024-09-02 08:54:10 +01:00
Craig Topper	c950ecb90e	[RISCV] Remove zfbfmin.ll. NFC (#106937 ) Most of it is redundant with bfloat-convert.ll. One testcase is found in bfloat-imm.ll. The load and stores are more thoroughly tested in bfloat-mem.ll.	2024-09-02 00:18:52 -07:00
Akshat Oke	da13754103	AMDGPU/NewPM Port SILoadStoreOptimizer to NPM (#106362 )	2024-09-02 11:41:56 +05:30
Craig Topper	776aef1a5a	[RISCV] Correct the rounding mode for llvm.lround.i64.f32 with RV64+Zfinx. We should use RMM instead of DYN.	2024-09-01 13:26:38 -07:00
Marina Taylor	747d89a897	[AArch64] Add tests for fused FP literals. NFC (#106731 ) This is for an upcoming change to the threshold on Apple targets for using a constant pool for FP literals versus building them with integer moves. This file is based on literal_pools_float.ll. I tried to bolt on to the existing test, but it got messy as that file is already testing a matrix of combinations, so creating this new file instead.	2024-09-01 21:11:37 +01:00
Craig Topper	5aa83eb677	[RISCV] Add test for llvm.round.i32.f16 RV64+Zfhmin/Zhinxmin. NFC We have special handling for this in type legalization, but we didn't have a test.	2024-09-01 12:15:00 -07:00
Yingwei Zheng	affc0c64b6	[SDAG] Expand vector [u\|s]cmp in VectorLegalizer (#106883 ) Address comment https://github.com/llvm/llvm-project/pull/106747#issuecomment-2322922855.	2024-09-01 22:35:52 +08:00
Craig Topper	3bdec31316	[RISCV] Custom legalize f16/bf16 FNEG/FABS with Zfhmin/Zbfmin. (#106886 ) The LegalizeDAG expansion will go through memory since i16 isn't a legal type. Avoid this by using FMV nodes.	2024-08-31 23:57:40 -07:00
S. Bharadwaj Yadavalli	8aa8c0590c	[DXIL][Analysis] Collect Function properties in Metadata Analysis (#105728 ) Basic infrastructure to collect Function properties in Metadata Analysis - Add a `SmallVector` of entry properties to the metadata information. - Add a structure to represent function properties. Currently `numthreads` and shader kind properties of shader entry functions are represented.	2024-08-31 17:56:06 -04:00
Craig Topper	8638fe1444	[X86] Fix livein handling in emitStackProbeInlineWindowsCoreCLR64. (#106828 ) Stop adding liveins for virtual registers. In the livein interface, the register goes through a MCPhysReg which is uint16_t. This causes the virtual register bit to be dropped making it alias to some nonsense physical register. Recompute the liveins for the continue block to handle any live registers that are needed by instructions that were spliced from the original block. This fixing the machine verifier error so we can remove that fixme now.	2024-08-31 10:11:59 -07:00
Brandon Wu	22f98740b6	[llvm][RISCV] Support RISCV vector tuple CodeGen and Calling Convention (#97995 ) This patch handles target lowering and calling convention. For target lowering, the vector tuple type represented as multiple scalable vectors is now changed to a single `MVT`, each `MVT` has a corresponding register class. The load/store of vector tuples are handled as the same way but need another vector insert/extract instructions to get sub-register group. Inline assembly constraint for vector tuple type can directly be modeled as "vr" which is identical to normal vector registers. For calling convention, it no longer needs an alternative algorithm to handle register allocation, this makes the code easier to maintain and read. Stacked on https://github.com/llvm/llvm-project/pull/97994	2024-08-31 19:28:36 +08:00
Brandon Wu	db67a66e8e	Revert "[RISCV] RISCV vector calling convention (2/2)" (#97994 ) This reverts commit 91dd844aa499d69c7ff75bf3156e2e3593a88057. Stacked on https://github.com/llvm/llvm-project/pull/97993	2024-08-31 19:02:35 +08:00
Luke Lau	58e1c0e416	[RISCV] Discard the false operand in vmerge.vvm -> vmv.v.v peephole (#106688 ) vmerge.vvm needs to have an all ones mask, so nothing is taken from the false operand. So instead of checking that the passthru is the same as false, just use the passthru directly for the tail elements. This supersedes the convertVMergeToVMv part of #105788, as noted in https://github.com/llvm/llvm-project/pull/105788/files#r1731683971	2024-08-31 13:20:53 +08:00
Luke Lau	8e972efb58	[RISCV] Add scalable vector patterns for vfwmaccbf16.v{v,f} (#106771 ) We can reuse the patterns for vfwmacc.v{v,f} as long as we swap out fpext_oneuse for riscv_fpextend_bf16 in the scalar case.	2024-08-31 13:20:19 +08:00
Craig Topper	e6e429179e	[RISCV] Cleanup CHECK prefixes in half-arith.ll. NFC Remove prefixes that donn't appear on RUN lines. Rename prefixes for consistency. Add RV32/RV64 prefixes where necessary to fix a conflict.	2024-08-30 18:57:50 -07:00
yonghong-song	06c531e808	BPF: Generate locked insn for __sync_fetch_and_add() with cpu v1/v2 (#106494 ) This patch contains two pars: - first to revert the patch https://github.com/llvm/llvm-project/pull/101428. - second to remove `atomic_fetch_and_*()` to `atomic_<op>()` conversion (when return value is not used), but preserve `__sync_fetch_and_add()` to locked insn with cpu v1/v2.	2024-08-30 14:00:33 -07:00
Vitaly Buka	982d2445f2	Revert "AtomicExpand: Allow incrementally legalizing atomicrmw" (#106792 ) Reverts llvm/llvm-project#103371 There is `heap-use-after-free`, commented on 206b5aff44a95754f6dd7a5696efa024e983ac59 Maybe `if (Next == E \|\| BB != Next->getParent()) {` is enough, but not sure, what was the intent there,	2024-08-30 13:51:53 -07:00
Luke Lau	0efa38699a	[RISCV] Check VL dominates and potentially move in tryReduceVL (#106753 ) Similar to what we do in foldVMV_V_V with the passthru, if we end up changing the Src's VL in tryReduceVL we need to make sure it dominates. Fixes #106735	2024-08-31 01:50:24 +08:00
Craig Topper	c25293c6dd	[LegalizeVectorOps][RISCV] Don't promote VP_FABS/FNEG/FCOPYSIGN. (#106659 ) Promoting canonicalizes NaNs which changes the semantics. Bitcast to integer and use logic ops instead.	2024-08-30 09:44:51 -07:00
Craig Topper	688843bda8	[RISCV] Add constant folding combine for FMV_X_ANYEXTW/H. (#106653 )	2024-08-30 09:43:42 -07:00
Brendan Dahl	5703d8572f	[WebAssembly] Add intrinsics to wasm_simd128.h for all FP16 instructions (#106465 ) Getting this to work required a few additional changes: - Add builtins for any instructions that can't be done with plain C currently. - Add support for the saturating version of fp_to_<s,i>_I16x8. Other vector sizes supported this already. - Support bitcast of f16x8 to v128. Needed to return a __f16x8 as v128_t.	2024-08-30 08:42:37 -07:00
Matt Arsenault	206b5aff44	AtomicExpand: Allow incrementally legalizing atomicrmw (#103371 ) If a lowering changed control flow, resume the legalization loop at the first newly inserted block. This will allow incrementally legalizing atomicrmw and cmpxchg. The AArch64 test might be a bugfix. Previously it would lower the vector FP case as a cmpxchg loop, but cmpxchgs get lowered but previously weren't. Maybe it shouldn't be reporting cmpxchg for the expand type in the first place though.	2024-08-30 19:11:45 +04:00
Philip Reames	924907bc6a	[DAG] Prefer 0.0 over -0.0 as neutral value for FADD w/NoSignedZero (#106616 ) When getting a neutral value, we can prefer using a positive zero over a negative zero if nsz is set on the FADD (or reduction). A positive zero should be cheaper to materialize on basically all targets. Arguably, we should be doing this kind of canonicalization in DAGCombine, but we don't do that for any of the other reduction variants, so this seems like path of least resistance. This does mean that we can only do this for "fast" reductions. Just nsz isn't enough, as that goes through the SEQ_FADD path where the IR level start value isn't folded away. If folks think this is to RISCV specific, let me know. There's a trivial RISCV specific implementation. I went with the generic one as I through this might benefit other targets.	2024-08-30 07:56:14 -07:00
Patryk Wychowaniec	86a60e7f1e	[AVR] Fix parsing & emitting relative jumps (#106722 ) Ever since 6859685a87ad093d60c8bed60b116143c0a684c7 (or, precisely, 84428dafc0941e3a31303fa1b286835ab2b8e234) relative jumps emitted by the AVR codegen are off by two bytes - this pull request fixes it. ## Abstract As compared to absolute jumps, relative jumps - such as rjmp, rcall or brsh - have an implied `pc+2` behavior; that is, `jmp 100` is `pc = 100`, but `rjmp 100` gets understood as `pc = pc + 100 + 2`. This is not reflected in the AVR codegen: `f95026dbf6/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp (L89)` ... which always emits relative jumps that are two bytes too far - or rather it _would_ emit such jumps if not for this check: `f95026dbf6/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp (L517)` ... which causes most of the relative jumps to be actually resolved late, by the linker, which applies the offsetting logic on its own, hiding the issue within LLVM. [Some time ago](`697a162fa6`) we've had a similar "jumps are off" problem that got solved by touching `shouldForceRelocation()`, but I think that has worked only by accident. It's exploited the fact that absolute vs relative jumps in the parsed assembly can be distinguished through a "side channel" check relying on the existence of labels (i.e. absolute jumps happen to named labels, but relative jumps are anonymous, so to say). This was an alright idea back then, but it got broken by 6859685a87ad093d60c8bed60b116143c0a684c7. I propose a different approach: - when emitting relative jumps, offset them by `-2` (well, `-1`, strictly speaking, because those instructions rely on right-shifted offset), - when parsing relative jumps, treat `.` as `+2` and read `rjmp .+1234` as `rjmp (1234 + 2)`. This approach seems to be sound and now we generate the same assembly as avr-gcc, which can be confirmed with: ```cpp // avr-gcc test.c -O3 && avr-objdump -d a.out int main() { asm( " foo:\n\t" " rjmp .+2\n\t" " rjmp .-2\n\t" " rjmp foo\n\t" " rjmp .+8\n\t" " rjmp end\n\t" " rjmp .+0\n\t" " end:\n\t" " rjmp .-4\n\t" " rjmp .-6\n\t" " x:\n\t" " rjmp x\n\t" " .short 0xc00f\n\t" ); } ``` avr-gcc is also how I got the opcodes for all new tests like `inst-brbc.s`, so we should be good.	2024-08-30 15:25:54 +02:00
wanglei	eaf87d3275	[LoongArch] Optimize for immediate value materialization using BSTRINS_D instruction Reviewed By: heiher, SixWeining Pull Request: https://github.com/llvm/llvm-project/pull/106332	2024-08-30 16:38:42 +08:00
wanglei	5b77e254e8	[LoongArch] Pre-commit test for immediate value materialization using BSTRINS_D Reviewed By: SixWeining Pull Request: https://github.com/llvm/llvm-project/pull/106331	2024-08-30 16:37:20 +08:00
Max Beck-Jones	1693d8eb9a	[AArch64][SelectionDAG] Vector splitting and promotion for histogram intrinsic (#103037 ) Adds support for wider-than-legal vector types for the histogram intrinsic (llvm.experimental.vector.histogram.add) by splitting the vector. Also adds integer promotion for the Inc operand.	2024-08-30 08:54:12 +01:00
Alex MacLean	e004566547	[NVPTX][AA] Traverse use-def chain to find non-generic addrspace (#106477 ) Address space information may be encoded anywhere along the use-def chain. Take advantage of this by traversing the chain until we find a non-generic addrspace.	2024-08-29 21:26:58 -07:00
Florian Mayer	ddaf2e2d29	[HWASan] add OptimizationRemark for alloca safety (#105872 )	2024-08-29 20:50:51 -07:00
Justin Fargnoli	cdaebf6f0e	[NVPTX] Fix crash caused by ComputePTXValueVTs (#104524 ) When [lowering return values](`99a10f1fe8/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (L3422)`) from LLVM IR to SelectionDAG, we check that [the number of values `SelectionDAG` tells us to return is equal to the number of values that `ComputePTXValueVTs()` tells us to return](`99a10f1fe8/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (L3441)`). However, this check can fail on valid IR. For example: ``` define <6 x half> @foo() { ret <6 x half> zeroinitializer } ``` `ComputePTXValueVTs()` tells us to return *3* `v2f16` values, while `SelectionDAG` tells us to return *6* `f16` values. Thus, the compiler will crash. `ComputePTXValueVTs()` [supports all `half` element vectors with an even number of elements](`99a10f1fe8/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (L213)`). Whereas `SelectionDAG` [only supports power-of-2 sized vectors](`4e078e3797/llvm/lib/CodeGen/TargetLoweringBase.cpp (L1580)`). This is the root of the discrepancy. Assuming that the developers who added the code to `ComputePTXValueVTs()` overlooked this, I've restricted `ComputePTXValueVTs()` to compute the same number of return values as `SelectionDAG`, instead of extending `SelectionDAG` to support non-power-of-2 sized vectors.	2024-08-29 18:18:51 -07:00
Luke Lau	dbbfc952f0	[RISCV] Separate ActiveElementsAffectResult into VL and Mask flags (#106517 ) In #106110 we had to mark v[f]slide1down.vx as ActiveElementsAffectResult since the elements in the body depend on VL. However it doesn't depend on the mask, so this was overly conservative and broke the vmerge peephole. We can recover this by splitting up ActiveElementsAffectResult into VL and Mask bits, so we can more accurately model v[f]slide1down.vx and re-enable the peephole.	2024-08-30 07:46:06 +08:00
Craig Topper	aa91d90cb0	[LegalizeVectorOps][PowerPC] Use xor to expand fneg. (#106595 ) This preserves the semantis of fneg and matches what we do in LegalizeDAG. I kept the legal FSUB check to force unrolling for some targets that don't have FSUB but have XOR. On Aarch64, using xor broke some tests that expected to see a (v1f64 (fma (insertvector_elt (f64 (fneg (extractvectorelt X)))))) pattern.	2024-08-29 15:00:23 -07:00
Stephen Tozer	412e3e394d	[ExtendLifetimes][NFC] Add explicit triple to remaining fake-use tests One of the tests for the new fake use intrinsic are failing on darwin buildbots due to relying on behaviour for their expected triple; this commit adds explicit triples to the few remaining fake-use tests that didn't have them. Fixes commit 3d08ade (#86149). Buildbot failures: https://lab.llvm.org/buildbot/#/builders/23/builds/2505	2024-08-29 22:27:23 +01:00
Craig Topper	4ca817d051	[GlobalISel] Add bail outs for scalable vectors to some combines. (#106496 ) These combines call getNumElements() which isn't valid for scalable vectors.	2024-08-29 14:02:53 -07:00
Craig Topper	d5c292d8ef	[GISel][RISCV] Correctly handle scalable vector shuffles of pointer vectors in IRTranslator. (#106580 )	2024-08-29 12:35:50 -07:00
Philip Reames	59762a0ecf	[RISCV] Add coverage for <3 x float> reduction with neutral start We can do slightly better on the neutral value when we have nsz.	2024-08-29 12:21:35 -07:00

1 2 3 4 5 ...

54875 Commits