llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	21704a685d	[AMDGPU] Fix printing hasInitWholeWave in mir (#123232 )	2025-01-17 03:00:02 -08:00
Phoebe Wang	fbb9d49506	[X86][APX] Support APX + AMX-MOVRS/AMX-TRANSPOSE (#123267 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/784266	2025-01-17 17:51:42 +08:00
Will Froom	c8ba551da1	[AArch64] Return early rather than asserting when Size of value passed to targetShrinkDemandedConstant is not 32 or 64 (#123084 ) See https://github.com/llvm/llvm-project/issues/123029 for details.	2025-01-17 08:41:33 +00:00
Phoebe Wang	1274bca2ad	[X86][APX] Support APX + MOVRS (#123264 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/784266	2025-01-17 16:06:31 +08:00
Vikram Hegde	225fc4f356	[AMDGPU][SDAG] Try folding "lshr i64 + mad" to "mad_u64_u32" (#119218 ) The intention is to use a "copy" instead of a "sub" to handle the high parts of 64-bit multiply for this specific case. This unlocks copy prop use cases where the copy can be reused by later multiply+add sequences if possible. Fixes: SWDEV-487672, SWDEV-487669	2025-01-17 11:09:39 +05:30
Matt Arsenault	7475f0a345	DAG: Avoid forming shufflevector from a single extract_vector_elt (#122672 ) This avoids regressions in a future AMDGPU commit. Previously we would have a build_vector (extract_vector_elt x), undef with free access to the elements bloated into a shuffle of one element + undef, which has much worse combine support than the extract. Alternatively could check aggressivelyPreferBuildVectorSources, but I'm not sure it's really different than isExtractVecEltCheap.	2025-01-17 08:44:43 +07:00
Matt Arsenault	ca95519704	AMDGPU: Implement isExtractVecEltCheap (#122460 ) Once again we have excessive TLI hooks with bad defaults. Permit this for 32-bit element vectors, which are just use-different-register. We should permit 16-bit vectors as cheap with legal packed instructions, but I see some mixed improvements and regressions that need investigation.	2025-01-17 08:38:01 +07:00
Matt Arsenault	4431106630	DAG: Fix vector bin op scalarize defining a partially undef vector (#122459 ) This avoids some of the pending regressions after AMDGPU implements isExtractVecEltCheap. In a case like shl <value, undef>, splat k, because the second operand was fully defined, we would fall through and use the splat value for the first operand, losing the undef high bits. This would result in an additional instruction to handle the high bits. Add some reduced testcases for different opcodes for one of the regressions.	2025-01-17 08:34:03 +07:00
Luke Lau	a761e26b23	[RISCV] Allow non-loop invariant steps in RISCVGatherScatterLowering (#122244 ) The motivation for this is to allow us to match strided accesses that are emitted from the loop vectorizer with EVL tail folding (see #122232) In these loops the step isn't loop invariant and is based off of @llvm.experimental.get.vector.length. We can relax this as long as we make sure to construct the updates after the definition inside the loop, instead of the preheader. I presume the restriction was previously added so that the step would dominate the insertion point in the preheader. I can't think of why it wouldn't be safe to calculate it in the loop otherwise.	2025-01-17 08:58:56 +08:00
Philip Reames	bb6e94a05d	[RISCV] Custom legalize <N x i128>, <4 x i256>, etc.. shuffles (#122352 ) I have a particular user downstream who likes to write shuffles in terms of unions involving _BitInt(128) types. This isn't completely crazy because there's a bunch of code in the wild which was written with SSE in mind, so 128 bits is a common data fragment size. The problem is that generic lowering scalarizes this to ELEN, and we end up with really terrible extract/insert sequences if the i128 shuffle is between other (non-i128) operations. I explored trying to do this via generic lowering infrastructure, and frankly got lost. Doing this a target specific DAG is a bit ugly - really, there's nothing hugely target specific here - but oh well. If reviewers prefer, I could probably phrase this as a generic DAG combine, but I'm not sure that's hugely better. If reviewers have a strong preference on how to handle this, let me know, but I may need a bit of help. A couple notes: * The argument passing weirdness is due to a missing combine to turn a build_vector of adjacent i64 loads back into a vector load. I'm a bit surprised we don't get that, but the isel output clearly has the build_vector at i64. * The splat case I plan to revisit in another patch. That's a relatively common pattern, and the fact I have to scalarize that to avoid an infinite loop is non-ideal.	2025-01-16 14:55:45 -08:00
Brox Chen	8a0c2e7567	[AMDGPU][True16][MC][CodeGen] true16 for v_cndmask_b16 (#119736 ) Support true16 format for v_cndmask_b16 in MC and CodeGen in true16 and fake16 flow. Since we are replacing `v_cndmask_b16` to `v_cndmask_b16_t16/fake16`, we have to at least update the fake16 codeGen to get codeGen test passing. For this case, we have to update the true16 and with fake16 together, otherwise some of the true16 tests will fail	2025-01-16 17:18:28 -05:00
Princeton Ferro	3ba339b5e7	[NVPTX] Improve support for {ex2,lg2}.approx (#120519 ) - Add support for `@llvm.exp2()`: - LLVM: `float` -> PTX: `ex2.approx{.ftz}.f32` - LLVM: `half` -> PTX: `ex2.approx.f16` - LLVM: `<2 x half>` -> PTX: `ex2.approx.f16x2` - LLVM: `bfloat` -> PTX: `ex2.approx.ftz.bf16` - LLVM: `<2 x bfloat>` -> PTX: `ex2.approx.ftz.bf16x2` - Any operations with non-native vector widths are expanded. On targets not supporting f16/bf16, values are promoted to f32. - Add CONDITIONAL support for `@llvm.log2()` [^1]: - LLVM: `float` -> PTX: `lg2.approx{.ftz}.f32` - Support for f16/bf16 is emulated by promoting values to f32. [1]: CUDA implements `exp2()` with `ex2.approx` but `log2()` is implemented differently, so this is off by default. To enable, use the flag `-nvptx-approx-log2f32`.	2025-01-16 12:21:32 -08:00
Raphael Moreira Zinsly	01d7f434d2	[RISCV] Stack clash protection for dynamic alloca (#122508 ) Create a probe loop for dynamic allocation and add the corresponding SelectionDAG support in order to use it.	2025-01-16 11:58:42 -08:00
Adam Yang	4446a9849a	[HLSL][SPIRV][DXIL] Implement `WaveActiveSum` intrinsic (#118580 ) ``` - add clang builtin to Builtins.td - link builtin in hlsl_intrinsics - add codegen for spirv intrinsic and two directx intrinsics to retain signedness information of the operands in CGBuiltin.cpp - add semantic analysis in SemaHLSL.cpp - add lowering of spirv intrinsic to spirv backend in SPIRVInstructionSelector.cpp - add lowering of directx intrinsics to WaveActiveOp dxil op in DXIL.td - add test cases to illustrate passespendent pr merges. ``` Resolves #70106 --------- Co-authored-by: Finn Plummer <canadienfinn@gmail.com>	2025-01-16 10:35:23 -08:00
Craig Topper	fc7a1ed0ba	[RISCV] Fold vp.reverse(vp.load(ADDR, MASK)) -> vp.strided.load(ADDR, -1, MASK). (#123115 ) Co-authored-by: Brandon Wu <brandon.wu@sifive.com>	2025-01-16 08:20:17 -08:00
Luke Lau	437e1a70ca	[RISCV][VLOPT] Handle tied pseudos in getOperandInfo (#123170 ) For .wv widening instructions when checking if the opperand is vs1 or vs2, we take into account whether or not it has a passthru. For tied pseudos though their passthru is the vs2, and we weren't taking this into account.	2025-01-16 23:00:13 +08:00
peterbell10	5e5fd0e6fc	[NVPTX] Select bfloat16 add/mul/sub as fma on SM80 (#121065 ) SM80 has fma for bfloat16 but not add/mul/sub. Currently these ops incur a promotion to f32, but we can avoid this by writing them in terms of the fma: ``` FADD(a, b) -> FMA(a, 1.0, b) FMUL(a, b) -> FMA(a, b, -0.0) FSUB(a, b) -> FMA(b, -1.0, a) ``` Unfortunately there is no `fma.ftz` so when ftz is enabled, we still fall back to promotion.	2025-01-16 14:53:24 +00:00
Simon Pilgrim	95ff3b5167	[X86] vector-compress.ll - regenerate with missing AVX2 test coverage Shows some really poor codegen for the maskbit extraction that we should address.	2025-01-16 14:17:33 +00:00
Vyacheslav Levytskyy	6ada0022ce	[SPIR-V] Fix --target-env version value in the test case (#123191 ) This PR fixes `--target-env` version value in the test case `llvm/test/CodeGen/SPIRV/validate/sycl-tangle-group-algorithms.ll`: the issue was introduced in https://github.com/llvm/llvm-project/pull/122755	2025-01-16 14:26:29 +01:00
Simon Pilgrim	24df8f5da4	[X86] vector-compress.ll - add nounwind attoribute to remove cfi noise	2025-01-16 10:13:28 +00:00
Oliver Stannard	9e436c2daa	[MachineCP] Correctly handle register masks and sub-registers (#122734 ) When passing an instruction with a register mask, the machine copy propagation pass was dropping the information about some copy instructions which define a register which is preserved by the mask, because that register overlaps a register which is partially clobbered by it. This resulted in a miscompilation for AArch64, because this caused a live copy to be considered dead. The fix is to clobber register masks by finding the set of reg units which is preserved by the mask, and clobbering all units not in that set. This is based on #122472, and fixes the compile time performance regressions which were caused by that.	2025-01-16 09:39:27 +00:00
David Green	ccd8d0b548	[AArch64][GlobalISel] Add gisel coverage for double-reductions. NFC The extra tests are simpler for GISel to detect.	2025-01-16 09:24:09 +00:00
Pedro Lobo	c23f2417dc	[CodeGenPrepare] Replace `undef` use with `poison` [NFC] (#123111 ) When generating a constant vector, if `UseSplat` is false, the indices different from the index of the extract can be filled with `poison` instead of `undef`.	2025-01-16 08:17:55 +00:00
Christudasan Devadasan	1797fb6b23	[AMDGPU][NewPM] Port SILowerControlFlow pass into NPM. (#123045 )	2025-01-16 11:06:38 +05:30
LiqinWeng	d2484127cd	[VP] IR expansion to Int Func Call (#122867 ) Add basic handling for VP ops that can expand to Int intrinsics, which includes: ctpop/cttz/ctlz/sadd.sat/uadd.sat/ssub.sat/usub.sat/fshl/fshr	2025-01-16 10:12:29 +08:00
Ashley Coleman	4f48abff0f	[HLSL] Implement elementwise firstbitlow builtin (#116858 ) Closes https://github.com/llvm/llvm-project/issues/99116 Implements `firstbitlow` by extracting common functionality from `firstbithigh` into a shared function while also fixing a bug for an edge case where `u64x3` and larger vectors will attempt to create vectors larger than the SPRIV max of 4. --------- Co-authored-by: Steven Perron <stevenperron@google.com>	2025-01-15 15:36:50 -07:00
Brian Cain	d8a68fe680	[Hexagon] Omit calls to specialized {float,fix} routines (#117423 ) These were introduced in 1213a7a57fdc (Hexagon backend support, 2011-12-12) but they aren't present in libclangrt.builtins-hexagon. The generic versions of these functions are present in the builtins, though. So it should suffice to call those instead.	2025-01-15 14:21:48 -06:00
peterbell10	0068078dca	[NVPTX] Remove `NVPTX::IMAD` opcode, and rely on intruction selection only (#121724 ) I noticed that NVPTX will sometimes emit `mad.lo` to multiply by 1, e.g. in https://gcc.godbolt.org/z/4j47Y9W4c. This happens when DAGCombiner operates on the add before the mul, so the imad contraction happens regardless of whether the mul could have been simplified. To fix this, I remove `NVPTXISD::IMAD` and only combine to mad during selection. This allows the default DAGCombiner patterns to simplify the graph without any NVPTX-specific intervention.	2025-01-15 20:09:18 +00:00
Philip Reames	e19bc76812	[RISCV] Precommit test coverage for pr118873	2025-01-15 10:18:56 -08:00
jofrn	c8bbbaa5c7	[SelectionDAG][AMDGPU] Negative offset when selecting scratch sv offsets (#122251 ) APInt will fail when given a negative offset. SelectScratchSVAddr utilizes this function and can be given a negative offset as well, so this change modifies it to use APSInt instead.	2025-01-15 06:56:28 -05:00
David Green	9025c269aa	[AArch64] Add an extra test case for adds and subs combines. NFC	2025-01-15 10:51:44 +00:00
Mariusz Sikora	b3924cb9ec	[AMDGPU] Set Convergent property for image.(getlod/sample*) intrinsics which uses WQM (#122908 ) This change adds IntrConvergent property to image.getlod intrinsic and to several image.sample intrinsics. All image.sample intrinsics apart from LOD(_L), Level 0(_LZ), Derivative(_D) will be marked as Convergent.	2025-01-15 10:23:28 +01:00
Simon Pilgrim	8ac00ca486	[X86] lowerShuffleWithUndefHalf - don't split vXi8 unary shuffles if the 128-bit source lanes are already in place (#122919 ) Allows us to use PSHUFB to shuffle the lanes, and then perform a sub-lane permutation down to the lower half Fixes #116815	2025-01-15 08:19:54 +00:00
Luke Lau	02403f4e45	[RISCV] Split strided-load-store.ll tests into EVL and VP. NFC None of the changes in #122232 or the upcoming #122244 are specific to the EVL, so split out the EVL tail-folded loops into separate "integration tests" that reflect the output of the loop vectorizer.	2025-01-15 13:42:53 +08:00
Alex MacLean	273a94b3d5	[NVPTX] Add some more immediate instruction variants (#122746 ) While this likely won't impact the final SASS, it makes for more compact PTX.	2025-01-14 21:28:29 -08:00
Shoreshen	b665dddd70	[AMDGPU] Add tests for v_sat_pk_u8_i16 codegen (#122438 ) Preparation for #121124 This PR provides tests added into [PR](https://github.com/llvm/llvm-project/pull/121124) that add selection patterns for instruction `v_sat_pk`, in order to specify the change of the tests before and after the commit. Pre-commit tests PR for #121124 : Add selection patterns for instruction `v_sat_pk`	2025-01-14 19:26:46 -05:00
Florian Hahn	0b3912622e	[ARM] Update LV test in test/Codegen/ARM after 1de3dc7d23.	2025-01-14 22:41:31 +00:00
Steven Perron	e511b3e24a	[SPIRV] Fix graphic test to use correct triple. (#122738 )	2025-01-14 14:17:17 -05:00
S. Bharadwaj Yadavalli	a4b7a2d021	[DirectX] Propagate shader flags mask of callees to callers (#118306 ) Propagate shader flags mask of callees to callers. Add tests to verify propagation of shader flags	2025-01-14 13:18:16 -05:00
Vyacheslav Levytskyy	9ba27ca5c7	[SPIR-V] Ensure no uses of intrinsic global variables after module translation (#122729 ) Ensure that the backend satisfies the requirement of the verifier that disallows uses of intrinsic global variables. This PR fixes https://github.com/llvm/llvm-project/issues/110495	2025-01-14 17:45:10 +01:00
Vyacheslav Levytskyy	b74d3e179d	[SPIR-V] Specify target environment in tests referring to the BuiltIn WorkgroupSize variable (#122755 ) https://github.com/KhronosGroup/SPIRV-Tools/pull/5407 introduces a check for WorkgroupSize variable to be a 3-component 32-bit int vector, and indeed, we see this requirement in https://registry.khronos.org/vulkan/specs/latest/man/html/WorkgroupSize.html#VUID-WorkgroupSize-WorkgroupSize-04427 However, OpenCL imposes different requirements, documented here: https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Env.html#_built_in_variables OpenCL environment requires WorkgroupSize variable to have components of size_t size that will be 32 or 64 depending on a target. This is the way how the SPIR-V Backend implements this, by querying pointer size of the current platform/target. To allow spirv-val to account target environments difference, this PR adds `--target-env <env>` to test cases referring to the BuiltIn WorkgroupSize variable.	2025-01-14 17:44:07 +01:00
Brox Chen	f1b1c7f3c1	[AMDGPU][True16][CodeGen] Undo sub(x,c) to add in true16 flow (#118854 ) Undo sub x, c -> add x, -c canonicalization in true16 fow. This duplicating the pattern from fake16 and implemement the same pattern in true16 format	2025-01-14 10:57:33 -05:00
Brox Chen	5e26ff35c1	[AMDGPU][True16][MC] true16 for v_cmp_lt_f16 (#122499 ) True16 format for v_cmp_lt_f16. Update VOPC t16 and fake16 pseudo.	2025-01-14 10:03:36 -05:00
Simon Pilgrim	0bd098b1cc	[X86] Fold VPERMV3(WIDEN(X),M,WIDEN(Y)) -> VPERMV(CONCAT(X,Y),M') iff the CONCAT is free (#122750 ) Minor followup to #122485 - if the source operands were widened half-size subvectors, then attempt to concatenate the subvectors directly, and then adjust the shuffle mask so references to the second operand now refer to the upper half of the concat result.	2025-01-14 13:26:38 +00:00
Jay Foad	e87f94a6a8	[llvm-project] Fix typos mutli and mutliple. NFC. (#122880 )	2025-01-14 11:59:41 +00:00
Acim Maravic	cc3aab580b	[AMDGPU] Handle nontemporal and amdgpu.last.use metadata in amdgpu-lower-buffer-fat-pointers (#120139 )	2025-01-14 11:22:20 +01:00
Piotr Sobczak	40fa7f5e8b	[AMDGPU] Fix computed kill mask (#122736 ) Replace S_XOR with S_ANDN2 when computing the kill mask in demote/kill lowering. This has the effect of AND'ing demote/kill condition with exec which is needed for proper live mask update. The S_XOR is inadequate because it may return true for lane with exec=0. This patch fixes an image corruption in game. I think the issue went unnoticed because demote/kill condition is often naturally dependent on exec, so AND'ing with exec is usually not required.	2025-01-14 10:00:40 +01:00
Guy David	1a935d7a17	[llvm] Mark scavenging spill-slots as spilled stack objects. (#122673 ) This seems like an oversight when copying code from other backends.	2025-01-14 10:18:31 +02:00
Piotr Fusik	cfe5a0847a	[RISCV] Enable Zbb ANDN/ORN/XNOR for more 64-bit constants (#122698 ) This extends PR #120221 to 64-bit constants that don't match the 12-low-bits-set pattern.	2025-01-14 09:15:14 +01:00
Piotr Fusik	87d7aebdd4	[RISCV][test] Add more 64-bit tests in zbb-logic-neg-imm.ll	2025-01-14 08:28:58 +01:00

1 2 3 4 5 ...

56971 Commits