llvm-project

Author	SHA1	Message	Date
Matt Arsenault	f4598194b5	DAG: Fold bitcast of scalar_to_vector to anyext (#122660 ) scalar_to_vector is difficult to make appear and test, but I found one case where this makes an observable difference. It fires more often than this in the test suite, but most of them have no net result in the final code. This helps reduce regressions in a future commit.	2025-01-13 19:38:58 +07:00
Matt Arsenault	e9a55770dc	AMDGPU: Add gfx9 run line to scalar_to_vector test (#122659 )	2025-01-13 19:35:56 +07:00
Akshat Oke	73b0e8a191	[AMDGPU][NewPM] Port AMDGPUOpenCLEnqueuedBlockLowering to NPM (#122434 )	2025-01-13 17:52:30 +05:30
Sander de Smalen	3efe83291f	[AArch64] Fix chain for calls from agnostic-ZA functions. The lowering code was using the wrong chain value, which meant that the 'smstart' after the call from streaming agnostic-ZA functions -> non-streaming private-ZA functions was incorrectly removed from the DAG.	2025-01-13 12:06:50 +00:00
Simon Pilgrim	6c5941b09f	[X86] subvectorwise-store-of-vector-splat.ll - regenerate VPTERNLOG comments	2025-01-13 11:36:58 +00:00
Momchil Velikov	5315f3f8cb	Handle leading underscores in update_cc_test_checks.py (#121800 ) For some ABIs `update_cc_test_checks.py` is unable to generate tests because of the mismatch between the mangled function names reported by clang's `-asd-dump` and the function names in LLVM IR. This patch fixes it by striping the leading underscore from the mangled name for global functions if the data layout string says they have one.	2025-01-13 11:24:05 +00:00
Sam Tebbs	795e35a653	Reland "[LoopVectorizer] Add support for partial reductions" with non-phi operand fix. (#121744 ) This relands the reverted #120721 with a fix for cases where neither reduction operand are the reduction phi. Only 63114239cc8d26225a0ef9920baacfc7cc00fc58 and 63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted PR. --------- Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>	2025-01-13 11:20:35 +00:00
quic_hchandel	171d3edd05	[RISCV] Add Qualcomm uC Xqciint (Interrupts) extension (#122256 ) This extension adds eleven instructions to accelerate interrupt servicing. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support. --------- Co-authored-by: Harsh Chandel <hchandel@qti.qualcomm.com>	2025-01-13 16:36:05 +05:30
Haojian Wu	d2ba364440	Fix an unused-variable warning in release build.	2025-01-13 12:03:35 +01:00
Durgadoss R	7e2eb0f83e	[NVPTX] Add float to tf32 conversion intrinsics (#121507 ) This patch adds the missing variants of float to tf32 conversion intrinsics, with their corresponding lit tests. PTX Spec link: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-01-13 16:17:42 +05:30
Jay Foad	a3b3c26048	[TableGen] Use assert instead of PrintFatalError in TGLexer. NFC. (#122303 ) Do not use the PrintFatalError diagnostic machinery for conditions that can never happen with any input.	2025-01-13 10:30:55 +00:00
Oliver Stannard	e2a071ece5	[MachineCP] Correctly handle register masks and sub-registers (#122472 ) When passing an instruction with a register mask, the machine copy propagation pass was dropping the information about some copy instructions which define a register which is preserved by the mask, because that register overlaps a register which is partially clobbered by it. This resulted in a miscompilation for AArch64, because this caused a live copy to be considered dead. The fix is to clobber register masks by finding the set of reg units which is preserved by the mask, and clobbering all units not in that set.	2025-01-13 09:55:08 +00:00
Akshat Oke	4f96fb5fb3	Reapply "Spiller: Detach legacy pass and supply analyses instead (#119181 )" (#122665 ) Makes Inline Spiller amenable to the new PM. This reapplies commit a531800344dc54e9c197a13b22e013f919f3f5e1 reverted because of two unused private members reported on sanitizer bots.	2025-01-13 14:14:13 +05:30
Mel Chen	56a37a3c76	[SLPVectorizer] Refactor HorizontalReduction::createOp (NFC) (#121549 ) This patch simplifies select-based integer min/max reductions by utilizing `llvm::getMinMaxReductionPredicate`, and generates intrinsic-based min/max reductions by utilizing `llvm::getMinMaxReductionIntrinsicOp`.	2025-01-13 16:11:31 +08:00
Kazu Hirata	76af93fbea	Partially revert "[TableGen] Avoid repeated hash lookups (NFC) (#122586 )" This partially reverts commit 07ff786e39e2190449998d3af1000454dee501be. The hunk being reverted in this patch seems to break: tools/llvm-gsymutil/ARM_AArch64/macho-merged-funcs-dwarf.yaml under LLVM_ENABLE_EXPENSIVE_CHECKS.	2025-01-12 23:50:58 -08:00
Akshat Oke	f431f93a77	[CodeGen][NewPM] Use proper NPM AtomicExpandPass in AMDGPU (#122086 ) `PassRegistry.def` already has this entry, but the dummy definition was being pulled instead. I couldn't reproduce the build failures that FIXME referenced, maybe the Dummy pass getting in the way was part of the cause.	2025-01-13 10:38:24 +05:30
Akshat Oke	7bf1cb702b	[AMDGPU][NewPM] Port AMDGPURemoveIncompatibleFunctions to NPM (#122261 )	2025-01-13 10:11:40 +05:30
Shilei Tian	f15da5fb78	[AMDGPU] Fix an invalid cast in `AMDGPULateCodeGenPrepare::visitLoadInst` (#122494 ) Fixes: SWDEV-507695	2025-01-12 23:40:25 -05:00
Sameer Sahasrabuddhe	77e6f434ec	[SPIRV] convergence anchor intrinsic does not have a parent token (#122230 )	2025-01-13 09:54:57 +05:30
Pengcheng Wang	681c4a2068	Reapply "[RISCV] Rework memcpy test (#120364 )" Use descriptive names and add more cases. This recommits 59bba39 which was reverted in 4637c77.	2025-01-13 12:06:26 +08:00
Pengcheng Wang	4637c77746	Revert "[RISCV] Rework memcpy test" (#122662 ) Reverts llvm/llvm-project#120364 The test should be updated due to some recent changes.	2025-01-13 11:36:37 +08:00
Pengcheng Wang	59bba39a69	[RISCV] Rework memcpy test (#120364 ) Use descriptive names and add more cases.	2025-01-13 11:28:24 +08:00
Justin Bogner	0e51b54b7a	[DirectX] Implement the resource.store.rawbuffer intrinsic (#121282 ) This introduces `@llvm.dx.resource.store.rawbuffer` and generalizes the buffer store docs under DirectX/DXILResources. Fixes #106188	2025-01-12 18:52:20 -07:00
Florian Hahn	8df64ed777	[LV] Don't consider IV increments uniform if exit value is used outside. In some cases, there might be a chain of uniform instructions producing the exit value. To generate correct code in all cases, consider the IV increment not uniform, if there are users outside the loop. Instead, let VPlan narrow the IV, if possible using the logic from 3ff1d01985752. Test case from #122602 verified with Alive2: https://alive2.llvm.org/ce/z/bA4EGj Fixes https://github.com/llvm/llvm-project/issues/122496. Fixes https://github.com/llvm/llvm-project/issues/122602.	2025-01-12 22:03:21 +00:00
Florian Hahn	f5a35a31bf	[LV] Add test cases with incorrect IV live-outs. Add test cases for https://github.com/llvm/llvm-project/issues/122496 and https://github.com/llvm/llvm-project/issues/122602.	2025-01-12 20:55:20 +00:00
Florian Hahn	3ff1d01985	Recommit "[VPlan] Try to narrow wide and replicating recipes to uniform recipes." This reverts commit 0ebb3ac7c92c4c1c44e7f3d17832d75ec5a42a67. Re-applies commit with typos fixed.	2025-01-12 20:10:28 +00:00
Florian Hahn	0ebb3ac7c9	Revert "[VPlan] Try to narrow wide and replicating recipes to uniform recipes." This reverts commit 1afba19913253dda865a8e57b37b9f4dabead1ac. Typo breaking the build	2025-01-12 19:37:45 +00:00
Florian Hahn	1afba19913	[VPlan] Try to narrow wide and replicating recipes to uniform recipes. Use the existing VPlan-based analysis to identify recipes that only have their first lane demanded and transform them to uniform recpliate recipes. This simplifies the generated code in some places and prepares for fixing https://github.com/llvm/llvm-project/issues/122496.	2025-01-12 19:32:01 +00:00
Kazu Hirata	43fdd6e81d	[memprof] Migrate away from PointerUnion::is (NFC) (#122622 ) Note that PointerUnion::is have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> In this patch, I'm calling call().getBase() for an instance of PointerUnion. call() alone would return an instance of IndexCall, which wraps PointerUnion. Note that isa<> cannot directly accept an instance of IndexCall, at least without defining CastInfo. I'm not touching PointerUnion::dyn_cast for now because it's a bit complicated; we could blindly migrate it to dyn_cast_if_present, but we should probably use dyn_cast when the operand is known to be non-null.	2025-01-12 11:06:42 -08:00
eleviant	d047dbd95e	Add function merger to be run during LTO link with gold plugin (#121343 ) Patch adds 'merge-functions' plugin option for this purpose.	2025-01-12 17:31:45 +01:00
Simon Pilgrim	be6c752e15	[X86] X86FixupVectorConstantsPass - use VPMOVSX/ZX extensions for PS/PD domain moves (#122601 ) For targets with free domain moves, or AVX512 support, allow the use of VPMOVSX/ZX extension loads to reduce the load sizes. I've limited this to extension to i32/i64 types as we're mostly interested in shuffle mask loading here, but we could include i16 types as well just as easily. Inspired by a regression on #122485	2025-01-12 15:59:05 +00:00
Ruhung	4f7dc1b55a	[InstCombine] Fold (add (add A, 1), (sext (icmp ne A, 0))) to call umax(A, 1) (#122491 ) Transform (add (add A, 1), (sext (icmp ne A, 0))) into call umax(A, 1). Fixes #121853. Alive2: https://alive2.llvm.org/ce/z/TweTan	2025-01-12 16:51:58 +01:00
Ramkumar Ramachandra	66badf224a	VT: teach a special-case optz about samesign (#122590 ) There is a narrow special-case in isImpliedCondICmps that can benefit from being taught about samesign. Since it costs us nothing to implement it, teach it about samesign, for completeness. This patch marks the completion of the effort to teach ValueTracking about samesign.	2025-01-12 15:19:29 +00:00
eleviant	26b4a0ac7e	Add 'unifiedlto' option to gold plugin (#121336 ) Option allows using full LTO when linking bitcode files compiled with unified LTO pipeline.	2025-01-12 16:18:26 +01:00
LLVM GN Syncbot	7532958355	[gn build] Port 8ebc35f8d041	2025-01-12 10:05:24 +00:00
Kareem Ergawy	42da12063f	[flang][OpenMP] Extend delayed privatization for `omp.simd` (#122156 ) Adds support for delayed privatization for `simd` directives. This PR includes PFT down to LLVM IR lowering.	2025-01-12 07:46:58 +01:00
Daniel Paoliello	d997a722c1	Fix build break in MIRPrinter (#122630 )	2025-01-11 21:56:59 -08:00
Daniel Paoliello	5ee0a71df9	[aarch64][win] Add support for import call optimization (equivalent to MSVC /d2ImportCallOptimization) (#121516 ) This change implements import call optimization for AArch64 Windows (equivalent to the undocumented MSVC `/d2ImportCallOptimization` flag). Import call optimization adds additional data to the binary which can be used by the Windows kernel loader to rewrite indirect calls to imported functions as direct calls. It uses the same [Dynamic Value Relocation Table mechanism that was leveraged on x64 to implement `/d2GuardRetpoline`](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618). The change to the obj file is to add a new `.impcall` section with the following layout: ```cpp // Per section that contains calls to imported functions: // uint32_t SectionSize: Size in bytes for information in this section. // uint32_t Section Number // Per call to imported function in section: // uint32_t Kind: the kind of imported function. // uint32_t BranchOffset: the offset of the branch instruction in its // parent section. // uint32_t TargetSymbolId: the symbol id of the called function. ``` NOTE: If the import call optimization feature is enabled, then the `.impcall` section must be emitted, even if there are no calls to imported functions. The implementation is split across a few parts of LLVM: * During AArch64 instruction selection, the `GlobalValue` for each call to a global is recorded into the Extra Information for that node. * During lowering to machine instructions, the called global value for each call is noted in its containing `MachineFunction`. * During AArch64 asm printing, if the import call optimization feature is enabled: - A (new) `.impcall` directive is emitted for each call to an imported function. - The `.impcall` section is emitted with its magic header (but is not filled in). * During COFF object writing, the `.impcall` section is filled in based on each `.impcall` directive that were encountered. The `.impcall` section can only be filled in when we are writing the COFF object as it requires the actual section numbers, which are only assigned at that point (i.e., they don't exist during asm printing). I had tried to avoid using the Extra Information during instruction selection and instead implement this either purely during asm printing or in a `MachineFunctionPass` (as suggested in [on the forums](https://discourse.llvm.org/t/design-gathering-locations-of-instructions-to-emit-into-a-section/83729/3)) but this was not possible due to how loading and calling an imported function works on AArch64. Specifically, they are emitted as `ADRP` + `LDR` (to load the symbol) then a `BR` (to do the call), so at the point when we have machine instructions, we would have to work backwards through the instructions to discover what is being called. An initial prototype did work by inspecting instructions; however, it didn't correctly handle the case where the same function was called twice in a row, which caused LLVM to elide the `ADRP` + `LDR` and reuse the previously loaded address. Worse than that, sometimes for the double-call case LLVM decided to spill the loaded address to the stack and then reload it before making the second call. So, instead of trying to implement logic to discover where the value in a register came from, I instead recorded the symbol being called at the last place where it was easy to do: instruction selection.	2025-01-11 21:30:17 -08:00
Kazu Hirata	07ff786e39	[TableGen] Avoid repeated hash lookups (NFC) (#122586 )	2025-01-11 13:15:30 -08:00
goldsteinn	17ef436e3d	[ValueTracking] Take into account whether zero is poison when computing CR for `ct{t,l}z` (#122548 )	2025-01-11 15:11:11 -06:00
goldsteinn	cc995ad064	[InstSimpify] Simplifying `(xor (sub C_Mask, X), C_Mask)` -> `X` (#122552 ) - [InstSimpify] Add tests for simplifying `(xor (sub C_Mask, X), C_Mask)`; NFC - [InstSimpify] Simplifying `(xor (sub C_Mask, X), C_Mask)` -> `X` Helps address regressions with folding `clz(Pow2)`. Proof: https://alive2.llvm.org/ce/z/zGwUBp	2025-01-11 15:10:42 -06:00
Kazu Hirata	bfe93aedcc	[AMDGPU] Fix a warning This patch fixes: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:255:18: error: private field 'DAG' is not used [-Werror,-Wunused-private-field]	2025-01-11 13:06:37 -08:00
Florian Hahn	7f59b4e998	[VPlan] Skip non-induction phi recipes in legalizeAndOptimizeInductions. The body of the loop only applies to wide induction recipes, skip any other header phi recipes up-frond	2025-01-11 20:33:02 +00:00
Austin Kerbow	657fb4433e	[AMDGPU] Add target hook to isGlobalMemoryObject (#112781 ) We want special handing for IGLP instructions in the scheduler but they should still be treated like they have side effects by other passes. Add a target hook to the ScheduleDAGInstrs DAG builder so that we have more control over this.	2025-01-11 09:57:57 -08:00
David Green	ab9a80a3ad	[DAG] Allow AssertZExt to scalarize. (#122463 ) With range and undef metadata on a call we can have vector AssertZExt generated on a target with no vector operations. The AssertZExt needs to scalarize to a normal `AssertZext tin, ValueType`. I have added AssertSext too, although I do not have a test case. Fixes #110374	2025-01-11 16:29:06 +00:00
Marius Kamp	1eed46960c	[AArch64] Eliminate Common Subexpression of CSEL by Reassociation (#121350 ) If we have a CSEL instruction that depends on the flags set by a (SUBS x c) instruction and the true and/or false expression is (add (add x y) -c), we can reassociate the latter expression to (add (SUBS x c) y) and save one instruction. Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb We can extend this transformation for slightly different constants. For example, if we have (add (add x y) -(c-1)) and a the comparison x <u c, we can transform the comparison to x <=u c-1 to eliminate the comparison instruction, too. Similarly, we can transform (x == 0) to (x <u 1). Proofs for the transformations that alter the constants: https://alive2.llvm.org/ce/z/3nVqgR Fixes #119606.	2025-01-11 16:26:11 +00:00
Simon Pilgrim	70f37321de	[X86] avx512-build-vector.ll - regenerate VPTERNLOG comments	2025-01-11 15:02:53 +00:00
Simon Pilgrim	6078815498	[X86] avx512-mask-op.ll - regenerate VPTERNLOG comments	2025-01-11 15:02:52 +00:00
Simon Pilgrim	7b184687dd	[X86] vselect-avx.ll - regenerate VPTERNLOG comments	2025-01-11 15:02:52 +00:00
Simon Pilgrim	78953433a5	[X86] vector popcnt tests - regenerate VPTERNLOG comments	2025-01-11 15:02:52 +00:00

1 2 3 4 5 ...

281220 Commits