llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	fa9881d6a9	[X86] vector-bitreverse.ll - add AVX512BW+AVX512VL test coverage	2024-05-17 15:04:22 +01:00
Momchil Velikov	a0cc1ab978	[AArch64] Add intrinsics for multi-vector to ZA array vector accumulators (#91606 ) [Recommit of e88ba6d975d887ca001cae30bfa0c53d91165148] According to the specification in https://github.com/ARM-software/acle/pull/309 this adds the intrinsics void_svadd_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn) __arm_streaming __arm_inout("za"); void_svadd_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn) __arm_streaming __arm_inout("za"); void_svsub_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn) __arm_streaming __arm_inout("za"); void_svsub_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn) __arm_streaming __arm_inout("za"); as well as the corresponding `bf16` variants.	2024-05-17 15:02:53 +01:00
Sander de Smalen	3a32590f25	[AArch64] Avoid using NEON FCVTXN in Streaming-SVE mode. (#91981 )	2024-05-17 14:11:28 +01:00
Jay Foad	ac092925c3	[SelectionDAG] Widen cttz to cttz_zero_undef (#92514 ) Instead of widening e.g. i8 cttz(x) to i16 cttz(x \| 0x100), use the more optimizable form cttz_zero_undef(x \| 0x100) since the widened operand is definitely not zero.	2024-05-17 12:39:40 +01:00
Matt Arsenault	ddb87e0f96	SystemZ: Use REG_SEQUENCE for PAIR128 (#90640 ) PAIR128 should probably just be removed entirely Depends #90638	2024-05-17 13:16:34 +02:00
David Green	4349ffb3fa	[SelectOpt] Add tests for not select conditions. NFC	2024-05-17 11:34:01 +01:00
Johannes Reifferscheid	698cf0176b	Fix i1 array global crash in NVPTXAsmPrinter. (#92506 ) See the test file. At head, this crashes with ``` assertion failed at llvm/lib/Support/APInt.cpp:492 in uint64_t llvm::APInt::extractBitsAsZExtValue(unsigned int, unsigned int) const: bitPosition < BitWidth && (numBits + bitPosition) <= BitWidth && "Illegal bit extraction" ```	2024-05-17 12:06:35 +02:00
Vyacheslav Levytskyy	e3e06135eb	[SPIR-V] Ensure that we don't have a dangling BlockAddress constants after internal intrinsic 'spv_switch' is processed (#92390 ) After internal intrinsic 'spv_switch' is processed we need to delete G_BLOCK_ADDR instructions that were generated to keep track of the corresponding basic blocks. If we just delete G_BLOCK_ADDR instructions with BlockAddress operands, this leaves their BasicBlock counterparts in a "address taken" status. This would make AsmPrinter to generate a series of unneeded labels of a `"Address of block that was removed by CodeGen"` kind. This PR is to ensure that we don't have a dangling BlockAddress constants by zapping the BlockAddress nodes, and only after that proceed with erasing G_BLOCK_ADDR instructions. See also https://github.com/llvm/llvm-project/pull/87823 for more details.	2024-05-17 11:43:02 +02:00
Vyacheslav Levytskyy	2ed8ff3bf8	[SPIR-V] Fix types of internal intrinsic functions and add a test case for __builtin_alloca() (#92265 ) This PR generation of argument types of internal intrinsic functions `spv_const_composite` and `spv_track_constant`, so that composite constants of ConstantVector type preserve their correct type in transformation passes and can be successfully used further by LLVM intrinsic functions. The added test case serves two purposes: it is to check the above mentioned fix and to demonstrate that a call to __builtin_alloca() maps to instructions from SPV_INTEL_variable_length_array when this extension is available.	2024-05-17 11:42:37 +02:00
CarolineConcatto	c4bac7f7dc	[LLVM][AArch64]Use load/store with consecutive registers in SME2 or S… (#77665 ) …VE2.1 for spill/fill When possible the spill/fill register in Frame Lowering uses the ld/st consecutive pairs available in sme or sve2.1.	2024-05-17 09:25:21 +01:00
Vyacheslav Levytskyy	37d00635c4	[SPIR-V] Ensure that internal intrinsic functions for PHI's operand are inserted at the correct positions (#92316 ) This PR is to ensure that internal intrinsic functions for PHI's operand are inserted at the correct positions and don't break rules of instruction domination and PHI nodes grouping at top of basic block.	2024-05-17 09:01:29 +02:00
Phoebe Wang	bc9823cf60	[X86][BF16] Change MVT to EVT in combineFP_EXTEND Fixes: #92471	2024-05-17 13:41:30 +08:00
Thorsten Schütt	9bffe79049	[GlobalIsel] Speedup select to integer min/max (#92378 ) https://github.com/llvm/llvm-project/issues/92309	2024-05-17 07:32:18 +02:00
wanglei	bf1d417233	[LoongArch] Suppress the unnecessary extensions for arguments in makeLibCall Reviewed By: SixWeining, heiher Pull Request: https://github.com/llvm/llvm-project/pull/92376	2024-05-17 09:13:51 +08:00
wanglei	5a204a5f0a	[LoongArch] Use sign extend for i32 arguments in makeLibCall on LA64 The 32 bits arguments and returns on LA64 are always sign extended to i64. So we should be taking this into account around libcalls. Reviewed By: heiher, SixWeining Pull Request: https://github.com/llvm/llvm-project/pull/92375	2024-05-17 09:13:05 +08:00
wanglei	96d2db4ba9	[LoongArch] Pre-commit test for lib call auguments extension Reviewed By: SixWeining Pull Request: https://github.com/llvm/llvm-project/pull/92374	2024-05-17 09:12:12 +08:00
James Y Knight	d6f9278ae9	[X86] Use plain load/store instead of cmpxchg16b for atomics with AVX (#74275 ) In late 2021, both Intel and AMD finally documented that every AVX-capable CPU has always been guaranteed to execute aligned 16-byte loads/stores atomically, and further, guaranteed that all future CPUs with AVX will do so as well. Therefore, we may use normal SSE 128-bit load/store instructions to implement atomics, if AVX is enabled. Per AMD64 Architecture Programmer's manual, 7.3.2 Access Atomicity: > Processors that report [AVX] extend the atomicity for cacheable, > naturally-aligned single loads or stores from a quadword to a double > quadword. Per Intel's SDM: > Processors that enumerate support for Intel(R) AVX guarantee that the > 16-byte memory operations performed by the following instructions will > always be carried out atomically: > - MOVAPD, MOVAPS, and MOVDQA. > - VMOVAPD, VMOVAPS, and VMOVDQA when encoded with VEX.128. > - VMOVAPD, VMOVAPS, VMOVDQA32, and VMOVDQA64 when encoded with > EVEX.128 and k0 (masking disabled). This was also confirmed to be true for Zhaoxin CPUs with AVX, in https://gcc.gnu.org/PR104688	2024-05-16 18:24:23 -04:00
Fangrui Song	997eae3673	[AsmPrinter] Increase upper bound for size in global structs This is part of the fixes to address #57353 https://reviews.llvm.org/D133845 Pull Request: https://github.com/llvm/llvm-project/pull/92334	2024-05-16 14:41:19 -07:00
Eli Friedman	b28766eb3f	[Arm64EC] Correctly handle sret in entry thunks. (#92326 ) I accidentally left out the code to transfer sret attributes to entry thunks, so values weren't being passed in the right registers, and the sret pointer wasn't returned in the correct register. Fixes #90229	2024-05-16 09:15:17 -07:00
Simon Pilgrim	117d755b1b	[DAG] SimplifyDemandedBits - use ComputeKnownBits instead of getValidShiftAmountConstant to check for constant shift amounts. (#92412 ) This allows us to handle cases where the constant has already been type legalized behind a bitcast Despite calling ComputeKnownBits I'm not seeing any notable change in compile time.	2024-05-16 17:04:30 +01:00
Jacek Caban	93c02b7dc3	[CodeGen][ARM64EC] Use MCSymbolRefExpr::VK_None for function aliases. (#92100 )	2024-05-16 15:47:39 +02:00
Jacek Caban	4a5dffc674	[CodeGen][ARM64EC][NFC] Add ARM64EC alias symbols test. (#92100 )	2024-05-16 15:15:17 +02:00
Hassnaa Hamdi	f7392f40f3	[AArch64] Add intrinsics for bfloat16 min/max/minnm/maxnm (#90105 ) According to specifications in [ARM-software/acle/pull/309](https://github.com/ARM-software/acle/pull/309) Add following intrinsics: ``` // svmax single,multi svbfloat16x2_t svmax_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm) svbfloat16x4_t svmax_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm) svbfloat16x2_t svmax_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm) svbfloat16x4_t svmax_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm) ``` ``` // svmin single,multi svbfloat16x2_t svmin_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm) svbfloat16x4_t svmin_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm) svbfloat16x2_t svmin_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm) svbfloat16x4_t svmin_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm) ``` ``` // svmaxnm single,multi svbfloat16x2_t svmaxnm_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm) svbfloat16x4_t svmaxnm_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm) svbfloat16x2_t svmaxnm_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm) svbfloat16x4_t svmaxnm_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm) ``` ``` // svminnm single,multi svbfloat16x2_t svminnm_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm) svbfloat16x4_t svminnm_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm) svbfloat16x2_t svminnm_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm) svbfloat16x4_t svminnm_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm) ``` - Variations other than bfloat16 are already supported.	2024-05-16 13:56:02 +01:00
Simon Pilgrim	80fac30a09	[X86] rot32.ll - remove old shld check prefixes This was missed in 8dbd745b09c9f65fefc2ffac14e8f7f288766861	2024-05-16 13:53:25 +01:00
Simon Pilgrim	311339e25c	[DAG] SimplifyDemandedBits - ISD::AND - only request DemandedElts when looking for a splat constant Limit the isConstOrConstSplat call to the vector elements we care about Noticed while investigating regressions in #92096	2024-05-16 13:05:35 +01:00
wanglei	70608c24fa	[LoongArch] Refactor LoongArchABI::computeTargetABI The previous logic did not consider whether the architectural features meet the requirements of the ABI, resulting in the generation of incorrect object files in some cases. For example: ``` llc -mtriple=loongarch64 -filetype=obj test/CodeGen/LoongArch/ir-instruction/fadd.ll -o t.o llvm-readelf -h t.o ``` The object file indicates the ABI as lp64d, however, the generated code is lp64s. The new logic introduces the `feature-implied` ABI. When both target-abi and triple-implied ABI are invalid, the feature-implied ABI is used. Reviewed By: SixWeining, xen0n Pull Request: https://github.com/llvm/llvm-project/pull/92223	2024-05-16 17:15:21 +08:00
wanglei	30410018d3	[LoongArch] Enable all -target-abi options This is a pre-commit for modifying `computeTargetABI` logic. This patch will provide warning prompts when using those ABIs that have not yet been standardized. Reviewed By: xen0n, SixWeining Pull Request: https://github.com/llvm/llvm-project/pull/92222	2024-05-16 16:54:18 +08:00
Craig Topper	f2d74002fd	[LegalizeVectorOps][X86] Add ISD::ABDS/ABSDU to the list of opcodes handled by LegalizeVectorOps. (#92332 ) The expand code is present, but we were missing the type query code so the nodes would be ignored until LegalizeDAG.	2024-05-15 21:46:31 -07:00
Dhruv Chawla	1dd0d3cf40	[AArch64][GISel] Fold COPY(y:gpr, DUP(x:fpr, i)) -> UMOV(y:gpr, x:fpr, i) (#89017 ) This patch adds a peephole to AArch64PostSelectOptimize for codegen that is caused by RegBankSelect limiting G_EXTRACT_VECTOR_ELT only to FPR registers in both the input and output registers. This can cause a generation of COPY from FPR to GPR when, for example, the output register of the G_EXTRACT_VECTOR_ELT is used in a branch condition. This was noticed when looking at codegen differences between SDAG and GI for the s1279 kernel in the TSVC benchmark.	2024-05-16 08:08:06 +05:30
Amara Emerson	1daa7fd3fa	[AArch64][SME] Remove Darwin compile error for ABI support routine calls. These are allowed for Darwin and use the same ABI.	2024-05-15 14:47:17 -07:00
Nicolai Hähnle	ec1f28dc97	AMDGPU/gfx12: avoid crashing on legacy waitcnt intrinsics (#92306 ) They are still accepted by the HW but have a conservative effect. Leave them untouched since handling them would complicate the logic a bit, and developers who code to such a low level really need to revisit what they're doing anyway.	2024-05-15 22:23:18 +02:00
Alex Bradbury	891d687137	[RISCV] Gate unratified profiles behind -menable-experimental-extensions (#92167 ) As discussed in the last sync-up call, because these profiles are not yet finalised they shouldn't be exposed to users unless they opt-in to them (much like experimental extensions). We may later want to add a more specific flag, but reusing `-menable-experimental-extensions` solves the immediate problem. This is implemented using the new support for marking profiles s experimental added in #91993 to move the unratified profiles to RISCVExperimentalProfile and making the necessary changes to logic in RISCVISAInfo to handle this.	2024-05-15 21:09:43 +01:00
Patrick O'Neill	4ab2ac22d0	[DAGCombiner] Mark vectors as not AllAddOne/AllSubOne on type mismatch (#92195 ) Fixes #92193.	2024-05-15 12:39:28 -07:00
Luke Lau	9ae2177843	[RISCV] Handle undef AVLs in RISCVInsertVSETVLI Before #91440 a VSETVLIInfo would have had an IMPLICIT_DEF defining instruction, but now we look up a VNInfo which doesn't exist, which triggers an assertion failure. Mark these undef AVLs as AVLIsIgnored.	2024-05-16 02:46:31 +08:00
Simon Pilgrim	e2d74a25eb	[X86] EmitCmp - always use cmpw with foldable loads (#92251 ) By default, EmitCmp avoids cmpw with i16 immediates due to 66/67h length-changing prefixes causing stalls, instead extending the value to i32 and using a cmpl with a i32 immediate, unless it has the TuningFastImm16 flag or we're building for optsize/minsize. However, if we're loading the value for comparison, the performance costs of the decode stalls are likely to be exceeded by the impact of the load latency of the folded load, the shorter encoding and not needing an extra register to store the ext-load. This matches the behaviour of gcc and msvc. Fixes #90355	2024-05-15 17:46:49 +01:00
Luke Lau	ff313ee70a	[RISCV] Remove hasSideEffects=1 for vsetvli pseudos (#91319 ) In a similar vein to #90049, we currently model all of the effects of a vsetvli pseudo: * VL and VTYPE are marked as defs * VL preserving x0,x0 vsetvlis doesn't get emitted until RISCVInsertVSETVLI, and when they are they have implicit uses on VL * Regular vector pseudos are fully modelled too: Before RISCVInsertVSETVLI they can be moved between vsetvli pseudos because we will eventually insert vsetvlis to correct VL and VTYPE. Afterwards, they will have implicit uses on VL and VTYPE. Since we model everything we can remove hasSideEffects=1. This gives us some improvements like sinking in vsetvli-insert-crossbb.ll. We need to update RISCVDeadRegisterDefinitions to keep handling vsetvli pseudos since it only operates on instructions with unmodelled side effects.	2024-05-15 23:37:31 +08:00
Phoebe Wang	b576a6b045	[X86][AMX] Fix a bug after #83628 (#91207 ) We need to check if `GR64Cand` a valid register before using it. Test is not needed since it's covered in llvm-test-suite. Fixes #90954	2024-05-15 23:15:48 +08:00
Jay Foad	466d266945	[AMDGPU] Fix GFX90x check prefixes in tests (#92254 )	2024-05-15 15:13:53 +01:00
Simon Pilgrim	f8395f8420	[X86] Cleanup check prefixes identified in #92248 Avoid using leading numbers in check prefixes - replace with actual triple config names.	2024-05-15 14:25:29 +01:00
Simon Pilgrim	3f07430c38	[X86] avoid-sfb-g-no-change.mir - cleanup check prefixes identified in #92248 Don't include "-LABEL" (or any other FileCheck modifier) in the core check prefix name	2024-05-15 14:23:46 +01:00
Simon Pilgrim	e26eacf771	[X86] prefetch.ll - cleanup check prefixes identified in #92248 Avoid using leading numbers in check prefixes - replace with actual triple config names (and makes it easier to add X64 test coverage in a future commit).	2024-05-15 14:11:24 +01:00
Simon Pilgrim	96ac2e3af7	[X86] cmpxchg-clobber-flags.ll - cleanup check prefixes identified in #92248 Avoid using numbers as check prefix - replace with actual triple config names	2024-05-15 14:11:24 +01:00
Simon Pilgrim	97418bb519	[X86] patchable functions - cleanup check prefixes identified in #92248 Avoid using numbers as check prefix - replace with actual triple config names	2024-05-15 14:11:23 +01:00
Simon Pilgrim	8987369465	[X86] sibcall - cleanup check prefixes identified in #92248 Avoid using numbers as check prefix - replace with actual triple config names	2024-05-15 13:49:39 +01:00
Jay Foad	1650f1b3d7	Fix typo "indicies" (#92232 )	2024-05-15 13:10:16 +01:00
Paul Walker	7621a0d364	[LLVM][CodeGen][SVE] Improve custom lowering for EXTRACT_SUBVECTOR. (#90963 ) We can extract any legal fixed length vector from a scalable vector by using VECTOR_SPLICE.	2024-05-15 11:27:06 +01:00
Jonas Paulsson	d6ee7e8481	[SystemZ] Handle address clobbering in splitMove(). (#92105 ) When expanding an L128 (which is used to reload i128) it is possible that the quadword destination register clobbers an address register. This patch adds an assertion against the case where both of the expanded parts clobber the address, and in the case where one of the expanded parts do so puts it last. Fixes #91437	2024-05-15 08:36:26 +02:00
Luke Lau	77047e3cd2	[RISCV] Make vsetvli in test not loop invariant. NFC (#92094 ) The middle end will remove the inner vsetvli otherwise, and it's more typical to set the AVL to the remaining VL. This also prevents the test from showing up as a regression in #91319	2024-05-15 12:32:26 +08:00
Luke Lau	1a58e88690	[RISCV] Move RISCVInsertVSETVLI to after phi elimination (#91440 ) Split off from #70549, this patch moves RISCVInsertVSETVLI to after phi elimination where we exit SSA and need to move to LiveVariables. The motivation for splitting this off is to avoid the large scheduling diffs from moving completely to after regalloc, and instead focus on converting the pass to work on LiveIntervals. The two main changes required are updating VSETVLIInfo to store VNInfos instead of MachineInstrs, which allows us to still check for PHI defs in needVSETVLIPHI, and fixing up the live intervals of any AVL operands after inserting new instructions. On O3 the pass is inserted after the register coalescer, otherwise we end up with a bunch of COPYs around eliminated PHIs that trip up needVSETVLIPHI. Co-authored-by: Piyou Chen <piyou.chen@sifive.com>	2024-05-15 11:44:32 +08:00
Craig Topper	e417e61532	[RISCV][LegalizeTypes] Add additional test coverage for type promotion of VP_FSHL/FSHR. NFC There's a special path when the promoted type has an element size more than twice the size of the original type.	2024-05-14 16:25:07 -07:00

1 2 3 4 5 ...

53324 Commits