llvm-project

Author	SHA1	Message	Date
Thorsten Schütt	71ac1eb509	Revert "[GlobalISel] Combine [s,z]ext of undef into 0" (#118746 ) Reverts llvm/llvm-project#117439	2024-12-05 07:48:20 +01:00
Craig Topper	3e0e1c13ce	[RISCV][GISel] Support fp128 arithmetic and conversion for RV64. (#118707 ) We can support these via libcalls in libgcc/compiler-rt or integer operations for fneg/fabs/fcopysign. fp128 values will be passed in two 64-bit GPRs according to the psABI. Supporting RV32 requires sret which is not supported by libcall handling in LegalizerHelper.cpp yet. It doesn't call canLowerReturn.	2024-12-04 21:43:29 -08:00
Ben Shi	dba0861cd7	[AVR] Simplify eocoding of load/store instructions (#118279 ) Fixes https://github.com/llvm/llvm-project/issues/113774	2024-12-05 13:05:25 +08:00
hev	00d8ea3a4c	[LoongArch] Supports FP_TO_SINT operation for fp16 (#118303 ) Fixes #118301	2024-12-05 10:46:23 +08:00
Philip Reames	1ef9410a96	Revert "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647 )" This reverts commit e6aec2c12095cc7debd1a8004c8535eef41f4c36. Commit breaks "ninja check-llvm" on x86 host.	2024-12-04 15:37:25 -08:00
Philip Reames	758107f70a	[RISCV] Improve spread(N) shuffle testing Rework them now that spread(2) is special cased to ensure we still have non-zero shift coverage.	2024-12-04 15:21:08 -08:00
Jun Wang	e6aec2c120	[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647 ) The AMDGPUAnnotateKernelFeatures pass infers the "amdgpu-calls" and "amdgpu-stack-objects" attributes, which are used to infer whether we need to initialize flat scratch. This is, however, not precise. Instead, we should use AMDGPUAttributor and infer amdgpu-no-flat-scratch-init on kernels. Refer to https://github.com/llvm/llvm-project/issues/63586 .	2024-12-04 14:10:15 -08:00
Luke Quinn	261d4bbb3b	[RISCV] f32 roundeven pattern missed for Zfa (#118672 ) f32 roundeven pattern was missing from RISCVInstrInfoZfa.td. Tests for roundeven.f32/f16/f64 were missing.	2024-12-04 13:52:20 -08:00
Matt Arsenault	e0f52538c9	AMDGPU: Change bitop3 intrinsic operand to i32 (#118647 )	2024-12-04 15:44:04 -05:00
Vitaly Buka	d57892a2a1	Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc" (#118693 ) Reverts llvm/llvm-project#117566 Breaks libc++ tests with HWASAN https://lab.llvm.org/buildbot/#/builders/55/builds/3959	2024-12-04 12:36:46 -08:00
Sander de Smalen	048fc2bc10	[LiveIntervals] Ignore artificial regs when adding kill flags (#116963 ) If parts of a physical register for a given liverange, as assigned by the register allocator, can be used to store other values not represented by this liverange, then `LiveIntervals::addKillFlags` normally avoids adding a kill flag on the use of this register when the value's liverange ends. However, if all the other regunits are artificial, then we can still safely add the kill flag, since those parts of the register can never be accessed independently.	2024-12-04 20:25:31 +00:00
Philip Reames	a6e7749ea9	[RISCV] Improve lowering of spread(2) shuffles (#118658 ) A spread(2) shuffle is just a interleave with an undef lane. The existing lowering was reusing the even lane for the undef value. This was entirely legal, but non-optimal.	2024-12-04 12:21:08 -08:00
Craig Topper	4cf2cf18c9	[RISCV][GISel] Stop over promoting G_SITOFP/UITOFP libcalls on RV64. (#118597 ) When we have legal instructions we want to promote to sXLen and let isel pattern matching removing the and/sext_inreg. When using a libcall we want to use a 'si' libcall for small types instead of 'di'. To match the RV64 ABI, we need to sign extend `unsigned int` arguments. We reuse the shouldSignExtendTypeInLibCall hook from SelectionDAG.	2024-12-04 10:42:49 -08:00
Simon Pilgrim	2567feaa13	[X86] Add fabs/fneg rmw style test coverage for #117557 Missed opportunity to avoid use of fpu for store(fabs(load()) style patterns	2024-12-04 18:16:30 +00:00
Simon Pilgrim	77908345d0	[X86] fsxor-alignment.ll - add nounwind to prevent cfi noise in an upcoming change	2024-12-04 18:16:30 +00:00
Brian Cain	7748492c37	[hexagon] Add support for llvm.debugtrap (#117049 ) Also: set `hasSideEffects` on `Y2_break` instruction.	2024-12-04 12:10:29 -06:00
Dmitry Sidorov	d057b53a7d	[SPIR-V] Add SPV_INTEL_joint_matrix extension (#118578 ) The spec is available here: https://github.com/intel/llvm/pull/12497 The PR doesn't add OpCooperativeMatrixApplyFunctionINTEL instruction as it's still experimental and not properly tested E2E. The PR also fixes few bugs in the related code: 1. CooperativeMatrixMulAddKHR optional operand must be literal, not a constant; 2. Fixed available capabilities table creation for a case, when a single extension adds few capabilities, that occupy not contiguous op codes. --------- Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>	2024-12-04 19:00:19 +01:00
Jon Roelofs	e51a0b2e26	[llvm][AArch64] Fix a crash in performPostLD1Combine (#118538 ) rdar://138004275	2024-12-04 09:49:09 -08:00
Krzysztof Drewniak	87c21bf064	[AMDGPU] Preserve `noundef` and `range` during kernel argument loads (#118395 ) This commit ensures than noundef (which is frequently a prerequisite for other annotations) and range() annotations on kernel arguments are copied onto their corresponding load from the kernel argument structure.	2024-12-04 11:04:03 -06:00
Oliver Stannard	99b862efba	[DAGISel][ARM] Fix vector truncate combine for big-endian (#118101 ) This DAG combine was incorrect for big-endian targets, because it assumes that when a bitcast changes the lane width, the least-significant bits of the wider lanes are in the lower-numbered lanes of the smaller type, which is only true for little-endian.	2024-12-04 14:32:15 +00:00
Thorsten Schütt	148fdc519c	[GlobalISel] Add G_ABDS and G_ABDU instructions (#118122 ) The DAG has the same instructions: the signed and unsigned absolute difference of it's input. For AArch64, they map to uabd and sabd for Neon and SVE. The Neon and SVE instructions will require custom patterns. They are pseudo opcodes and are not imported by the IRTranslator. We need combines to create them. PowerPC, ARM, and AArch64 have native instructions. /// i.e trunc(abs(sext(Op0) - sext(Op1))) becomes abds(Op0, Op1) /// or trunc(abs(zext(Op0) - zext(Op1))) becomes abdu(Op0, Op1) For GlobalISel, we are going to write the combines in MIR patterns. see: llvm/test/CodeGen/AArch64/abd-combine.ll - [ ] combine into abd - [ ] legalize and add td patterns	2024-12-04 12:53:15 +01:00
David Sherwood	4675db5f39	[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566 ) For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant.	2024-12-04 10:26:51 +00:00
Mariusz Sikora	455b4fd01a	[AMDGPU] Emit amdgcn.if.break in the same BB as amdgcn.loop (#118081 ) Before this change if.break was placed in wrong loop level which resulted in accumulating values only from last iteration of the inner loop.	2024-12-04 08:42:04 +01:00
Simon Pilgrim	b1a48af56a	[DAG] SimplifyDemandedVectorElts - add handling for INT<->FP conversions (#117884 )	2024-12-04 07:37:01 +00:00
Brandon Wu	109e4a147f	[RISCV] Handle zeroinitializer of vector tuple Type (#113995 ) It doesn't make sense to add a new generic ISD to handle riscv tuple type. Instead we use `SPLAT_VECTOR` for ISD and further lower to `VMV_V_X`. Note: If there's `visitSPLAT_VECTOR` in generic DAG combiner, it needs to skip riscv vector tuple type. Stack on https://github.com/llvm/llvm-project/pull/114329	2024-12-04 13:40:02 +08:00
Kyungwoo Lee	4f41862c5a	Reapply "[StructuralHash] Global Variable (#118412 )" This reverts commit 6a0d6fc2e92bcfb7cb01a4c6cdd751a9b4b4c159.	2024-12-03 21:33:03 -08:00
Philip Reames	f947d5afd9	[RISCV] Reduce redundancy in vnsrl tests Triggered by discussion on pr118509.	2024-12-03 19:47:54 -08:00
Philip Reames	c1afcaf33b	[RISCV] Match deinterleave(4,8) shuffles to SHL/TRUNC when legal (#118509 ) We can extend the existing SHL+TRUNC lowering used for deinterleave2 for deinterleave4, and deinterleave8 when the result types are small enough to allow the shift to be legal. On RV64, this means i8 and i16 results for deinterleave4 and i8 results for deinterleave8.	2024-12-03 19:35:52 -08:00
Kyungwoo Lee	6a0d6fc2e9	Revert "[StructuralHash] Global Variable (#118412 )" This reverts commit 1afb81dfaf902c1c42bd91fec1a7385e6e1529d3.	2024-12-03 17:19:30 -08:00
Shilei Tian	68bcba6d7a	Revert "[AMDGPU] Use COV6 by default (#118515 )" This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some buildbots are not ready yet.	2024-12-03 20:17:06 -05:00
Shilei Tian	410cbe3cf2	[AMDGPU] Use COV6 by default (#118515 )	2024-12-03 19:38:35 -05:00
Dan Gohman	35cce408ee	[WebAssembly] Support the new "Lime1" CPU (#112035 ) This adds WebAssembly support for the new [Lime1 CPU]. First, this defines some new target features. These are subsets of existing features that reflect implementation concerns: - "call-indirect-overlong" - implied by "reference-types"; just the overlong encoding for the `call_indirect` immediate, and not the actual reference types. - "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and `memory.fill`, and not the other instructions in the bulk-memory proposal. Next, this defines a new target CPU, "lime1", which enables mutable-globals, bulk-memory-opt, multivalue, sign-ext, nontrapping-fptoint, extended-const, and call-indirect-overlong. Unlike the default "generic" CPU, "lime1" is meant to be frozen, and followed up by "lime2" and so on when new features are desired. [Lime1 CPU]: https://github.com/WebAssembly/tool-conventions/blob/main/Lime.md#lime1 --------- Co-authored-by: Heejin Ahn <aheejin@gmail.com>	2024-12-03 16:35:23 -08:00
Kyungwoo Lee	1afb81dfaf	[StructuralHash] Global Variable (#118412 ) This update enhances the implementation of structural hashing for global variables, using their initial contents. Private global variables or constants are often used for metadata, where their names are not unique. This can lead to the creation of different hash results although they could be merged by the linker as they are effectively identical. - Refine the hashing of GlobalVariables for strings or certain Objective-C metadata cases that have section names. This can be further extended to other scenarios. - Expose StructuralHash for GlobalVariable so that this API can be utilized by MachineStableHashing, which is also employed in the global function outliner. This change significantly improves size reduction by an additional 1% on the LLD binary when the global function outliner and merger are enabled together. As discussed in the RFC https://discourse.llvm.org/t/loh-conflicting-with-machineoutliner/83279/8?u=kyulee-com, if we disable or relocate the LOH pass, the size impact could increase to 4%.	2024-12-03 16:01:50 -08:00
David Green	1e7171f692	[AArch64] Add tablegen patterns for concat(extract-high, extract-high) (#118286 ) A `concat(extract-high(x), extract-high(y))` is the top half of x inserted into the bottom half of y. This patch adds a tablegen pattern to make sure that we generate a single i64 lane insert.	2024-12-03 22:13:40 +00:00
Petar Avramovic	fef54d0393	AMDGPU/GlobalISel: Add skeletons for new register bank select passes (#112862 ) New register bank select for AMDGPU will be split in two passes: - AMDGPURegBankSelect: select banks based on machine uniformity analysis - AMDGPURegBankLegalize: lower instructions that can't be inst-selected with register banks assigned by AMDGPURegBankSelect. AMDGPURegBankLegalize is similar to legalizer but with context of uniformity analysis. Does not change already assigned banks. Main goal of AMDGPURegBankLegalize is to provide high level table-like overview of how to lower generic instructions based on available target features and uniformity info (uniform vs divergent). See RegBankLegalizeRules. Summary of new features: At the moment register bank select assigns register bank to output register using simple algorithm: - one of the inputs is vgpr output is vgpr - all inputs are sgpr output is sgpr. When function does not contain divergent control flow propagating register banks like this works. In general, first point is still correct but second is not when function contains divergent control flow. Examples: - Phi with uniform inputs that go through divergent branch - Instruction with temporal divergent use. To fix this AMDGPURegBankSelect will use machine uniformity analysis to assign vgpr to each divergent and sgpr to each uniform instruction. But some instructions are only available on VALU (for example floating point instructions before gfx1150) and we need to assign vgpr to them. Since we are no longer propagating register banks we need to ensure that uniform instructions get their inputs in sgpr in some way. In AMDGPURegBankLegalize uniform instructions that are only available on VALU will be reassigned to vgpr on all operands and read-any-lane vgpr output to original sgpr output.	2024-12-03 16:02:00 -05:00
Philip Reames	4dd5ac906f	[RISCV] Improve coverage for spread(N) shuffles I'd already included a few cases for spread(N) in the decompress(N) variants, but rename for clarity and add a couple more edge cases. i.e. spread(N, 0) produces a, undef, b, undef, ...	2024-12-03 10:42:17 -08:00
Zaara Syeda	935bbbbde4	[PPC] Remove missed cases of ppc-merge-string-pool (#117626 ) PPCMergeStringPool was replaced with GlobalMerge with commit aaa37d6. Some cases of option ppc-merge-string-pool were missed being removed.	2024-12-03 13:31:26 -05:00
Vyacheslav Levytskyy	1f20eee6dc	[SPIR-V] Emit OpConstant instead of OpConstantNull to conform to NonSemantic.Shader.DebugInfo.100 DebugTypeBasic's flags definition (#118333 ) This PR is to fix https://github.com/llvm/llvm-project/issues/118011 by emitting OpConstant instead of OpConstantNull to conform to NonSemantic.Shader.DebugInfo.100 DebugTypeBasic's flags definition.	2024-12-03 17:55:26 +01:00
Vyacheslav Levytskyy	874b4fb6ad	[SPIR-V] Fix emission of debug and annotation instructions and add SPV_EXT_optnone SPIR-V extension (#118402 ) This PR fixes: * emission of OpNames (added newly inserted internal intrinsics and basic blocks) * emission of function attributes (SRet is added) * implementation of SPV_INTEL_optnone so that it emits OptNoneINTEL Function Control flag, and add implementation of the SPV_EXT_optnone SPIR-V extension.	2024-12-03 16:18:06 +01:00
Vyacheslav Levytskyy	db4cbe5069	[SPIR-V] Fix generation of invalid SPIR-V in cases of of bitcasts between pointers and multiple null pointers used in the input LLVM IR (#118298 ) This PR resolved the following issues: (1) There are rare but possible cases when there are bitcasts between pointers intertwined in a sophisticated way with loads, stores, function calls and other instructions that are part of type deduction. In this case we must account for inserted bitcasts between pointers rather than just ignore them. (2) Null pointers have the same constant representation but different types. Type info from Intrinsic::spv_track_constant() refers to the opaque (untyped) pointer, so that each MF/v-reg pair would fall into the same Const record in Duplicate Tracker and would be represented by a single OpConstantNull instruction, unless we use precise pointee type info. We must be able to distinguish one constant (null) pointer from another to avoid generating invalid code with inconsistent types of operands.	2024-12-03 16:08:25 +01:00
Vyacheslav Levytskyy	c7e14689dd	[SPIR-V] Add XFAIL to the broken test (#118487 ) The test case llvm/test/CodeGen/SPIRV/debug-info/debug-type-basic.ll fails due to https://github.com/llvm/llvm-project/issues/118011	2024-12-03 15:41:21 +01:00
Viktoria Maximova	4a6ecd3821	Add support for SPIR-V extension: SPV_INTEL_media_block_io (#118024 ) This changes implements SPV_INTEL_media_block_io extension in SPIR-V backend.	2024-12-03 13:47:18 +01:00
Nathan Gauër	5f99eb9b13	[SPIR-V] Fixup storage class for global private (#118318 ) Re-land of #116636 Adds a new address spaces: hlsl_private. Variables with such address space will be emitted with a Private storage class. This is useful for variables global to a SPIR-V module, since up to now, they were still emitted with a Function storage class, which is wrong. --------- Signed-off-by: Nathan Gauër <brioche@google.com>	2024-12-03 13:42:02 +01:00
Nikita Popov	b2df007413	[FastISel] Support unreachable with NoTrapAfterNoReturn (#118296 ) Currently FastISel triggers a fallback if there is an unreachable terminator and the TrapUnreachable option is enabled (the ISD::TRAP selection does not actually work). Add handling for NoTrapAfterNoReturn, in which case we don't actually need to emit a trap. The test is just there to make sure there is no FastISel fallback (which is why I'm not testing the case without noreturn). We have other tests that check the actual unreachable codegen variations.	2024-12-03 12:54:26 +01:00
Oliver Stannard	7d72525909	[AArch64] Fix STG instruction being moved past memcpy (#117191 ) When merging STG instructions used for AArch64 stack tagging, we were stopping on reaching a load or store instruction, but not calls, so it was possible for an STG to be moved past a call to memcpy. This test case (reduced from fuzzer-generated C code) was the result of StackColoring merging allocas A and B into one stack slot, and StackSafetyAnalysis proving that B does not need tagging, so we end up with tagged and untagged objects in the same stack slot. The tagged object (A) is live first, so it is important that it's memory is restored to the background tag before it gets reused to hold B.	2024-12-03 10:32:52 +00:00
Craig Topper	9692242f51	[RISCV][GISel] Support f64->f32 fptrunc and f32->f64 fpext without D extension. Add RUN lines to float-convert.ll and double-convert.ll without F extension.	2024-12-02 23:50:32 -08:00
Thorsten Schütt	45162635bf	[GlobalISel] Combine [s,z]ext of undef into 0 (#117439 ) Alternative for https://github.com/llvm/llvm-project/pull/113764 It builds on a minimalistic approach with the legality check in match and a blind apply. The precise patterns are used for better compile-time and modularity. It also moves the pattern check into combiner. While unary_undef_to_zero and propagate_undef_any_op rely on custom C++ code for pattern matching. Is there a limit on the number of patterns? G_ANYEXT of undef -> undef G_SEXT of undef -> 0 G_ZEXT of undef -> 0 The combine is not a member of the post legalizer combiner for AArch64. Test: llvm/test/CodeGen/AArch64/GlobalISel/combine-cast.mir	2024-12-03 07:14:49 +01:00
Philip Reames	2af2634c64	[RISCV] Use vcompress in deinterleave2 intrinsic lowering (#118325 ) This is analogous to febbf91 which added shuffle lowering using vcompress; we can do the same thing in the deinterleave2 lowering path which is used for scalable vectors. Note that we can further improve this for high lmul usage by adjusting how we materialize the mask (whose result is at most m1 with a known bit pattern). I am deliberately staging the work so that the changes to reduce register pressure are more easily evaluated on their own merit.	2024-12-02 18:37:32 -08:00
fengfeng	7907292daa	[DAG] Apply Disjoint flag. (#118045 ) or disjoint (or disjoint (x, c0), c1) --> or disjont x, or (c0, c1) Alive2: https://alive2.llvm.org/ce/z/3wPth5 --------- Signed-off-by: feng.feng <feng.feng@iluvatar.com>	2024-12-03 09:21:03 +08:00
Dan Gohman	c3536b263f	[WebAssembly] Define call-indirect-overlong and bulk-memory-opt features (#117087 ) This defines some new target features. These are subsets of existing features that reflect implementation concerns: - "call-indirect-overlong" - implied by "reference-types"; just the overlong encoding for the `call_indirect` immediate, and not the actual reference types. - "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and `memory.fill`, and not the other instructions in the bulk-memory proposal. This is split out from https://github.com/llvm/llvm-project/pull/112035. --------- Co-authored-by: Heejin Ahn <aheejin@gmail.com>	2024-12-02 17:08:07 -08:00

1 2 3 4 5 ...

56426 Commits