llvm-project

Author	SHA1	Message	Date
Samuel Tebbs	72a60e770c	[AArch64][NFC] Use regexes in register class tests Some MIR and IR tests include checks for register class IDs, which are unnecessary since the register class name is also checked for and that doesn't change when new classes are added. This patch replaces the hard-coded register class ID checks with regexes so they don't have to be updated every time a new class is added.	2024-02-29 11:46:07 +00:00
Matt Arsenault	6cfd3439d4	APFloat: Fix signed zero handling in minnum/maxnum (#83376 ) Follow the 2019 rules and order -0 as less than +0 and +0 as greater than -0. As currently defined this isn't required for the intrinsics, but is a better QoI. This will avoid the workaround in libc added by #83158	2024-02-29 16:51:33 +05:30
Simon Pilgrim	7ff3f9760d	[X86] getFauxShuffleMask - handle insert_vector_elt(bitcast(extract_vector_elt(x))) shuffle patterns If the bitcast is between types of equal scalar size (i.e. fp<->int bitcasts), then we can safely peek through them Fixes #83289	2024-02-29 10:32:49 +00:00
Simon Pilgrim	30b63def50	[X86] Regenerate tests to add missing avx512 constant comments	2024-02-29 10:32:48 +00:00
Chen Zheng	3196005f6b	[NFC][PowerPC] use script to regenerate the CHECK lines	2024-02-29 04:49:37 -05:00
Tomas Matheson	03420f570e	Revert "[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116 )" This reverts commit 634b0243b8f7acc85af4f16b70e91d86ded4dc83. Failing EXPENSIVE_CHECKS builds with "undefined physical register".	2024-02-29 09:48:29 +00:00
Craig Topper	95aab69c10	[RISCV] Remove experimental from Zacas. (#83195 ) Document that we don't use the double compare and swap instructions due to ABI concerns.	2024-02-28 21:46:58 -08:00
Dávid Ferenc Szabó	71c06bbb25	[GlobalISel] Combine (X == 0) & (Y == 0) -> (X \| Y) == 0 (#71949 ) Also combine (X != 0) \| (Y != 0) -> (X \| Y) != 0	2024-02-29 10:58:17 +05:30
Yeting Kuo	14d8c4563e	[RISCV] Add more intrinsics into canSplatOperand. (#83106 ) This patch adds smin/smax/umin/umax/sadd_sat/ssub_sat/uadd_sat/usub_sat into canSplatOperand. It can help llvm fold vv instructions with one splat operand to vx instructions.	2024-02-29 12:57:34 +08:00
Shilei Tian	191fd2d9db	[NFC][AMDGPU] Move the rem tests in `div_i128.ll` into `rem_i128.ll` (#83307 )	2024-02-28 18:47:02 -05:00
David Green	b339c88120	[AArch64] Add some base aes intrinsic tests. NFC Including commutative tests.	2024-02-28 20:31:26 +00:00
Simon Pilgrim	b4bc19e2e6	[X86] Add tests showing failure to demand only the sign bit of a sitofp/uitofp node sitofp - if we only demand the signbit, then we can try to use the source integer uitofp - signbit is guaranteed to be zero Noticed while reviewing #82290	2024-02-28 18:49:24 +00:00
SivanShani-Arm	634b0243b8	[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116 ) T1 allows for an optional registers list, the register list must be {d0-d15}. T2 defines a mandatory register list, the register list must be {d0-d31}. The requirements for T1/T2 are as follows: T1 T2 Require: v8-M.Main, v8.1-M.Main, secure state secure state 16 D Regs valid valid 32 D Regs UNDEFINED valid No D Regs NOP NOP	2024-02-28 17:02:51 +00:00
Lukacma	26402777eb	[AArch64] Optimized generated assembly for bool to svbool_t conversions (#83001 ) In certain cases Legalizer was generating `AND(WHILELO, SPLAT 1)` instruction pattern, when `WHILELO` would be sufficient.	2024-02-28 16:45:39 +00:00
Petar Avramovic	3e35ba53e2	AMDGPU/GFX12: Insert waitcnts before stores with scope_sys (#82996 ) Insert waitcnts for loads and atomics before stores with system scope. Scope is field in instruction encoding and corresponds to desired coherence level in cache hierarchy. Intrinsic stores can set scope in cache policy operand. If volatile keyword is used on generic stores memory legalizer will set scope to system. Generic stores, by default, get lowest scope level. Waitcnts are not required if it is guaranteed that memory is cached. For example vulkan shaders can guarantee this. TODO: implement flag for frontends to give us a hint not to insert waits. Expecting vulkan flag to be implemented as vulkan:private MMRA.	2024-02-28 16:18:04 +01:00
chuongg3	8e51b22ce2	[AArch64][GlobalISel] Legalize G_LOAD for v4s8 Vector (#82989 ) Lowers `v4s8 = G_LOAD %ptr ptr` into `s32 = G_LOAD %ptr ptr` `v4s8 = G_BITCAST s32`	2024-02-28 13:55:27 +00:00
Valery Pykhtin	a845ea3878	[AMDGPU] Fix SDWA 'preserve' transformation for instructions in different basic blocks. (#82406 ) This fixes crash when operand sources for V_OR instruction reside in different basic blocks.	2024-02-28 14:47:33 +01:00
AtariDreams	0a54b36d5e	[X86] Resolve FIXME: Create cld only when needed (#82415 ) Only use cld when we also have rep instructions, are calling a function, or contain inline asm.	2024-02-28 12:32:58 +00:00
Simon Pilgrim	6287b7b9e9	[X86] combineEXTRACT_SUBVECTOR - extract 256-bit comparisons if only one subvector is required If only one subvector extraction will be necessary (i.e. because the other is constant etc.) then extract the source operands and perform as a 128-bit comparison Ideally DAGCombiner's narrowExtractedVectorBinOp would handle this but its tricky to confirm when a target opcode can be safely extracted and performed as a different vector type Partially improves an outstanding regression in #82290	2024-02-28 12:24:34 +00:00
Simon Pilgrim	37daff028f	[X86] setcc-lowering.ll - regenerate with AVX2 test coverage Added while triaging a regression from #82290	2024-02-28 11:19:48 +00:00
Tuan Chuong Goh	fd336c33b6	[AArch64][GlobalISel] Pre-Commit Test for Legalize G_LOAD v4i8 (#82989 )	2024-02-28 10:57:10 +00:00
Sander de Smalen	41427b0e8e	[AArch64] Disable FastISel/GlobalISel for ZT0 state (#82768 ) For __arm_new("zt0") we need to have special setup code in the prologue. For calls that don't preserve zt0, we need to emit code preserve ZT0 around the call. This is only emitted by SelectionDAG ISel at the moment.	2024-02-28 10:42:16 +00:00
chuongg3	686ec7c2e9	[AArch64][GlobalISel] Legalize G_STORE for v4s8 vector (#82498 ) Lowers `G_STORE v4s8, ptr` into `s32 = G_BITCAST v4s8` `G_STORE s32, ptr`	2024-02-28 10:26:41 +00:00
Tuan Chuong Goh	ba692301f1	[AArch64][GlobalISel] Pre-Commit Test for G_STORE v4s8 (#82498 )	2024-02-28 09:52:08 +00:00
David Green	6e41d60a71	[SelectionDAG] Change computeAliasing signature from optional<uint64> to LocationSize. (#83017 ) This is another smaller step of #70452, changing the signature of computeAliasing() from optional<uint64_t> to LocationSize, and follow-up changes in DAGCombiner::mayAlias(). There are some test change due to the previous AA->isNoAlias call incorrectly using an unknown size (~UINT64_T(0)). This should then be improved again in #70452 when the types are known to be scalable.	2024-02-28 09:43:05 +00:00
Luke Lau	9617da88ab	[RISCV] Use a ta vslideup if inserting over end of InterSubVT (#83230 ) The description in #83146 is slightly inaccurate: it relaxes a tail undisturbed vslideup to tail agnostic if we are inserting over the entire tail of the vector and we didn't shrink the LMUL of the vector being inserted into. This handles the case where we did shrink down the LMUL via InterSubVT by checking if we inserted over the entire tail of InterSubVT, the actual type that we're performing the vslideup on, not VecVT.	2024-02-28 15:58:55 +08:00
Luke Lau	28c29fbec3	[RISCV] Add exact VLEN RUNs for insert_subvector and concat_vector tests. NFC Also update the RUNs in the extract_subvector tests to be consistent. Using the term VLS/VLA here as it's more succinct than KNOWNVLEN/UNKNOWNVLEN.	2024-02-28 14:44:42 +08:00
Luke Lau	91d23370cd	[RISCV] Use a tail agnostic vslideup if possible for scalable insert_subvector (#83146 ) If we know that an insert_subvector inserting a fixed subvector will overwrite the entire tail of the vector, we use a tail agnostic vslideup. This was added in https://reviews.llvm.org/D147347, but we can do the same thing for scalable vectors too. The `Policy` variable is defined in a slightly weird place but this is to mirror the fixed length subvector code path as closely as possible. I think we may be able to deduplicate them in future.	2024-02-28 10:26:54 +08:00
Jeffrey Byrnes	cf1c97b2d2	[AMDGPU] Do not attempt to fallback to default mutations (#83208 ) IGLP itself will be in SavedMutations via mutations added during Scheduler creation, thus falling back results in reapplying IGLP. In PostRA scheduling, if we have multiple regions with IGLP instructions, then we may have infinite loop. Disable the feature for now.	2024-02-27 18:04:59 -08:00
Heejin Ahn	8506a63bf7	Revert "[WebAssembly] Disable multivalue emission temporarily (#82714 )" This reverts commit 6e6bf9f81756ba6655b4eea8dc45469a47f89b39. It turned out the multivalue feature had active outside users and it could cause some disruptions to them, so I'd like to investigate more about the workarounds before doing this.	2024-02-28 01:02:39 +00:00
Heejin Ahn	d4cdb516ee	[WebAssembly] Add RefTypeMem2Local pass (#81965 ) This adds `WebAssemblyRefTypeMem2Local` pass, which changes the address spaces of reference type `alloca`s to `addrspace(1)`. This in turn changes the address spaces of all `load` and `store` instructions that use the `alloca`s. `addrspace(1)` is `WASM_ADDRESS_SPACE_VAR`, and loads and stores to this address space become `local.get`s and `local.set`s, thanks to the Wasm local IR support added in `82f92e35c6`. In a follow-up PR, I am planning to replace the usage of mem2reg pass with this to solve the reference type `alloca` problems described in #81575.	2024-02-27 14:00:43 -08:00
David Green	f42e321b9f	[AArch64] Use FMOVDr for clearing upper bits (#83107 ) This adds some tablegen patterns for generating FMOVDr from concat(X, zeroes), as the FMOV will implicitly zero the upper bits of the register. An extra AArch64MIPeepholeOpt is needed to make sure we can remove the FMOV in the same way we would remove the insert code.	2024-02-27 19:45:43 +00:00
Sumanth Gundapaneni	f44c3facca	Revert "[Hexagon] Optimize post-increment load and stores in loops. (… (#83151 ) …#82418)" This reverts commit d62ca8def395ac165f253fdde1d93725394a4d53.	2024-02-27 12:50:22 -06:00
Billy Laws	abc693fb40	[AArch64] Skip over shadow space for ARM64EC entry thunk variadic calls (#80994 ) When in an entry thunk the x64 SP is passed in x4 but this cannot be directly passed through since x64 varargs calls have a 32 byte shadow store at SP followed by the in-stack parameters. ARM64EC varargs calls on the other hand expect x4 to point to the first in-stack parameter.	2024-02-27 10:32:15 -08:00
Michael Maitland	9106b58ce4	[CodeGen][MISched] Add misched post-regalloc bottom-up scheduling There is the possibility that the bottom-up direction will lead to performance improvements on certain targets, as this is certainly the case for the pre-regalloc GenericScheduler. This patch will give people the opportunity to experiment for their sub-targets. However, this patch keeps the top-down approach as the default for the PostGenericScheduler since that is what subtargets expect today.	2024-02-27 09:56:28 -08:00
choikwa	04db60d150	[AMDGPU] Prevent hang in SIFoldOperands by caching uses (#82099 ) foldOperands() for REG_SEQUENCE has recursion that can trigger an infinite loop as the method can modify the operand order, which messes up the range-based for loop. This patch fixes the issue by caching the uses for processing beforehand, and then iterating over the cache rather using the instruction iterator.	2024-02-27 09:13:59 -06:00
Simon Pilgrim	13c359aa9b	[X86] ReplaceNodeResults - truncate sub-128-bit vectors as shuffles directly (#83120 ) We were scalarizing these truncations, but in most cases we can widen the source vector to 128-bits and perform the truncation as a shuffle directly (which will usually lower as a PACK or PSHUFB). For the cases where the widening and shuffle isn't legal we can leave it to generic legalization to scalarize for us. Fixes #81883	2024-02-27 15:03:42 +00:00
Paul Walker	900bea9b1c	[LLVM][test] Convert remaining instances of ConstantExpr based splats to use splat(). This is mostly NFC but some output does change due to consistently inserting into poison rather than undef and using i64 as the index type for inserts.	2024-02-27 13:37:23 +00:00
Paul Walker	dbb65dd330	[LLVM][tests/CodeGen/RISCV] Convert instances of ConstantExpr based splats to use splat(). This is mostly NFC but some output does change due to consistently inserting into poison rather than undef and using i64 as the index type for inserts.	2024-02-27 13:37:23 +00:00
Paul Walker	d6ff986dd2	[LLVM][tests/CodeGen/AArch64] Convert instances of ConstantExpr based splats to use splat(). This is mostly NFC but some output does change due to consistently inserting into poison rather than undef and using i64 as the index type for inserts.	2024-02-27 13:37:23 +00:00
Matt Arsenault	ca66f7469f	AMDGPU: Merge tests for llvm.amdgcn.dispatch.id	2024-02-27 18:42:40 +05:30
Matt Arsenault	2e4643a53e	AMDGPU: Regenerate baseline test checks	2024-02-27 18:42:40 +05:30
michaelselehov	56ad6d1939	[MachineLICM] Hoist COPY instruction only when user can be hoisted (#81735 ) befa925acac8fd6a9266e introduced preliminary hoisting of COPY instructions when the user of the COPY is inside the same loop. That optimization appeared to be too aggressive and hoisted too many COPY's greatly increasing register pressure causing performance regressions for AMDGPU target. This is intended to fix the regression by hoisting COPY instruction only if either: - User of COPY can be hoisted (other args are invariant) or - Hoisting COPY doesn't bring high register pressure	2024-02-27 12:31:29 +00:00
Dhruv Chawla (work)	2c9b6c1b36	[AArch64][GlobalISel] Improve codegen for G_VECREDUCE_{SMIN,SMAX,UMIN,UMAX} for odd-sized vectors (#82740 ) i8 vectors do not have their sizes changed as I noticed regressions in some tests when that was done. This patch also adds support for most G_VECREDUCE_* operations to moreElementsVector in LegalizerHelper.cpp. The code for getting the "neutral" element is taken almost exactly as it is in SelectionDAG, with the exception that support for G_VECREDUCE_{FMAXIMUM,FMINIMUM} was not added. The code for SelectionDAG is located at SelectionDAG::getNeutralELement().	2024-02-27 15:57:46 +05:30
Vyacheslav Levytskyy	ada70f50a5	[SPIR-V]: add SPIR-V extension: SPV_INTEL_variable_length_array (#83002 ) This PR adds SPIR-V extension SPV_INTEL_variable_length_array that allows to allocate local arrays whose number of elements is unknown at compile time: * add a new SPIR-V internal intrinsic:int_spv_alloca_array * legalize G_STACKSAVE and G_STACKRESTORE * implement allocation of arrays (previously getArraySize() of AllocaInst was not used) * add tests	2024-02-27 10:58:45 +01:00
Vyacheslav Levytskyy	9796b0e9f9	Add support for the 'freeze' instruction (#82979 ) This PR is to add support for the 'freeze' instruction: https://llvm.org/docs/LangRef.html#freeze-instruction There is no way to implement `freeze` correctly without support on SPIR-V standard side, but we may at least address a simple (static) case when undef/poison value presence is obvious. The main benefit of even incomplete `freeze` support is preventing of translation from crashing due to lack of support on legalization and instruction selection steps.	2024-02-27 10:58:04 +01:00
leecheechen	d7c80bba69	[llvm][LoongArch] Improve loongarch_lasx_xvpermi_q instrinsic (#82984 ) For instruction xvpermi.q, only [1:0] and [5:4] bits of operands[3] are used. The unused bits in operands[3] need to be set to 0 to avoid causing undefined behavior.	2024-02-27 15:38:11 +08:00
YunQiang Su	c88beb4112	MIPS: Fix asm constraints "f" and "r" for softfloat (#79116 ) This include 2 fixes: 1. Disallow 'f' for softfloat. 2. Allow 'r' for softfloat. Currently, 'f' is accpeted by clang, then LLVM meets an internal error. 'r' is rejected by LLVM by: couldn't allocate input reg for constraint 'r'. Fixes: #64241, #63632 --------- Co-authored-by: Fangrui Song <i@maskray.me>	2024-02-26 22:08:36 -08:00
Matt Arsenault	e7900e695e	AMDGPU: Regenerate baseline mir tests	2024-02-27 10:44:53 +05:30
Craig Topper	62d0c01c2c	[SelectionDAG] Remove pointer from MMO for VP strided load/store. (#82667 ) MachineIR alias analysis assumes that only bytes after the pointer will be accessed. This is incorrect if the stride is negative. This is causing miscompiles in our downstream after SLP started making strided loads. Fixes #82657	2024-02-26 16:15:34 -08:00

... 10 11 12 13 14 ...

52796 Commits