llvm-project

Author	SHA1	Message	Date
Thomas Symalla	256343a0e9	Revert "Update amdgpu_gfx functions to use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack #78226 " (#86273 ) Reverts llvm/llvm-project#81394 This reverts commit 3ac243bc0d7922d083af2cf025247b5698556062. It is not handling RSrc registers s0-s3 correctly. This leads to a broken test, where it expects s0-s3 as function argument and uses it as RSrc register as well. We need to re-visit the patch, but apparently we only want to have s0-s3 as argument registers if we don't need them as RSrc registers.	2024-03-26 11:01:08 +01:00
David Green	fbc247367a	[AArch64][GlobalISel] Legalization for small anyext/sext/zext (#86438 ) Similar to #85625, some of the codegen is still far from optimal but this helps fix quite a few fallback cases.	2024-03-26 09:48:06 +00:00
David Green	4d315ff382	[GlobalISel] Add CTLZ known bits. (#86436 ) Replicated from SDAG.	2024-03-26 09:11:35 +00:00
Bevin Hansson	14c30189fb	[ExpandLargeFpConvert] Fix incorrect values in fp-to-int conversion. (#86514 ) The IR for a double-to-i129 conversion looks like this in one of the blocks in compiler-rt: %cmp5.i = icmp ult i16 %3, -129, !dbg !24 But in ExpandLargeFpConvert, it looks like: %13 = icmp ult i129 %12, 4294967167, !dbg !19 ExpandLargeFpConvert is wrong; the value should have been signed before negating, but instead we get a very large unsigned value. Another value in the same pass also has this issue.	2024-03-26 10:08:22 +01:00
Philip Reames	a6b870db09	[RISCV] Enable sub(max, min) lowering for ABDS and ABDU (#86592 ) We have the ISD nodes for representing signed and unsigned absolute difference. For RISCV, we have vector min/max in the base vector extension, so we can expand to the sub(max,min) lowering. We could almost use the default expansion, but since fixed length min/max are custom (not legal), the default expansion doesn't cover the fixed vector cases. The expansion here is just a copy of the generic code specialized to allow the custom min/max nodes to be created so they can in turn be legalized to the _vl variants. Existing DAG combines handle the recognition of absolute difference idioms and conversion into the respective ISD::ABDS and ISD::ABDU nodes. This change does have the net effect of potentially pushing a free floating zero/sign extend after the expansion, and we don't do a great job of folding that into later expressions. However, since in general narrowing can reduce required work (by reducing LMUL) this seems like the right general tradeoff.	2024-03-25 20:13:53 -07:00
YAMAMOTO Takashi	6420f37926	[WebAssembly] Implement an alternative translation for -wasm-enable-sjlj (#84137 ) Instead of maintaining per-function-invocation malloc()'ed tables to track which functions each label belongs to, store the equivalent info in jump buffers (jmp_buf) themselves. Also, use a less emscripten-looking ABI symbols: ``` saveSetjmp -> __wasm_setjmp testSetjmp -> __wasm_setjmp_test getTempRet0 -> (removed) __wasm_longjmp -> (no change) ``` While I want to use this for WASI, it should work for emscripten as well. An example runtime and a few tests: https://github.com/yamt/garbage/tree/wasm-sjlj-alt2/wasm/longjmp wasi-libc version of the runtime: https://github.com/WebAssembly/wasi-libc/pull/483 emscripten version of the runtime: https://github.com/emscripten-core/emscripten/pull/21502 Discussion: https://docs.google.com/document/d/1ZvTPT36K5jjiedF8MCXbEmYjULJjI723aOAks1IdLLg/edit	2024-03-25 18:11:56 -07:00
Changpeng Fang	350bda4419	AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313 ) Rename the intrinsics to close to the instruction mnemonic names: Use global_load_tr_b64 and global_load_tr_b128 instead of global_load_tr. This patch also removes f16/bf16 versions of builtins/intrinsics. To simplify the design, we should avoid enumerating all possible types in implementing builtins. We can always use bitcast.	2024-03-25 16:55:22 -07:00
Farzon Lotfi	4cea2d049f	[HLSL][DXIL] implement `sqrt` intrinsic (#86560 ) completes #86187 - fix hlsl_intrinsic to cover the correct cases - move to using `__builtin_elementwise_sqrt` - add lowering of `Intrinsic::sqrt` to dxilop 24.	2024-03-25 18:02:30 -04:00
Farzon Lotfi	060df78cdb	[DXIL] Add Float `Dot` Intrinsic Lowering (#86071 ) Completes #83626 - `CGBuiltin.cpp` - modify `getDotProductIntrinsic` to be able to emit `dot2`, `dot3`, and `dot4` intrinsics based on element count - `IntrinsicsDirectX.td` - for floating point add `dot2`, `dot3`, and `dot4` inntrinsics -`DXIL.td` add dxilop intrinsic lowering for `dot2`, `dot3`, & `dot4`. - `DXILOpLowering.cpp` - add vector arg flattening for dot product. - `DXILOpBuilder.h` - modify `createDXILOpCall` to take a smallVector instead of an iterator - `DXILOpBuilder.cpp` - modify `createDXILOpCall` by moving the small vector up to the calling function in `DXILOpLowering.cpp`. - Moving one function up gives us access to the `CallInst` and `Function` which were needed to distinguish the dot product intrinsics and get the operands without using the iterator.	2024-03-25 18:01:46 -04:00
Philip Reames	07ee9bd215	[RISCV] Add fixed vector coverage for sum-absolute-difference (sad) pattern This builds on the previously added absolute difference cases, and adds the reduction at the end. This is mostly interesting for examining impact of extend placement when changing the abdu lowering.	2024-03-25 13:27:09 -07:00
Philip Reames	4b941ff4b4	[RISCV] Add coverage for abdu and abds (absolute difference) Test copied from aarch64 with minimal adaption. We likely need addition coverage, but this is a reasonable starting point.	2024-03-25 13:27:09 -07:00
Jeffrey Byrnes	b761137049	[AMDGPU] Use correct VGPR threshold for flagging ExcessRP regions in unified register file case (#85860 ) `ST.getMaxNumVGPRs(MF)` lowers to `AMDGPUBaseInfo.cpp:getTotalNumVGPRs` which returns 512 for gfx90a. This is subsequently limited by `AMDGPUBaseInfo:getAddressableNumVGPRs()`, which also returns 512 for gfx90a. The ISA states we can have a total of 512 registers, but a maximum of only 256 of each of AGPR and VGPR (gfx90a 3.6.4). Therefore, in unified register file case, `ST.getMaxNumVGPRs(MF)` calculates the maximum number of combined VGPR + AGPR. But, it is currently used as the limit for accvgpr and as the limit for archvgpr. This patch uses it as the combined limit, and accounts for the maximum addressable arch/acc VGPRs when calculating the per RegClass limits. It is not unreasonable to think other clients of getTotalNumVGPRs are using it in the wrong way.	2024-03-25 13:11:58 -07:00
Michael Maitland	8b9c3b57b1	Revert "[RISCV][GISEL] Add instruction select tests for G_VSCALE" This reverts commit c00a5ab8c4be14f63735ec61c5c9245c233cbcfc. It is not consistent with SelectionDAG.	2024-03-25 11:50:57 -07:00
Michael Maitland	668687f8a8	Revert "[RISCV][GISEL] Add regbankselect tests for G_VSCALE" This reverts commit a2476c99b745381380eab245fc9499a4ecf0b39e. It is not consistent with SelectionDAG	2024-03-25 11:50:21 -07:00
Michael Maitland	9056ce8804	Revert "[RISCV][GISEL] Legalize G_VSCALE" This reverts commit 47681506ded30fada68f180b5e80f740bc76abcd. It is not consistent with SelectionDAG.	2024-03-25 11:46:02 -07:00
Craig Topper	ce37a7131f	[RISCV] Add integer RISCVISD::SELECT_CC to canCreateUndefOrPoison and isGuaranteedNotToBeUndefOrPoison. (#84693 ) Integer RISCVISD::SELECT_CC doesn't create poison. If none of the, operands are poison, the result is not poison. This allows ISD::FREEZE to be hoisted above RISCVISD::SELECT_CC.	2024-03-25 11:10:58 -07:00
Michael Maitland	c00a5ab8c4	[RISCV][GISEL] Add instruction select tests for G_VSCALE	2024-03-25 10:44:59 -07:00
Michael Maitland	a2476c99b7	[RISCV][GISEL] Add regbankselect tests for G_VSCALE	2024-03-25 10:44:59 -07:00
Michael Maitland	47681506de	[RISCV][GISEL] Legalize G_VSCALE G_VSCALE should be lowered using VLENB.	2024-03-25 10:44:58 -07:00
Michael Maitland	05840c8714	[RISCV][GISEL] Instruction select for scalable G_SELECT SelectionDAG has SELECT and VSELECT SELECT restricts the condition operand to an i1 and the true and false operands can be vectors. The result of a SELECT has the same type as the true and false operands. VSELECT has a vector condition operand and the true and false operands must be vectors. The result of a VSELECT has a vector result. GlobalISel has G_SELECT which has condition operand that is an i1 if the true and false operands are scalar and a vector type with i1 elements if the true and false operands are vector. A G_SELECT acts like a ISD::SELECT when the operands are all scalar, and an ISD::VSELECT when the operands are are scalar. A G_SELECT cannot act like a ISD::SELECT with an i1 condition and vector operands because the type system. In this patch, we would like to take advantage of the patterns written for SELECT and VSELECT, so we mark G_SELECT equivalent to both SELECT and VSELECT to reuse the patterns. Since we cannot write a `G_SELECT (s1), (vector-ty), (vector-ty)`, we don't have to worry about accidently matching the SDAG patterns of that nature. We will probably need a way to represent an i1 condition with vector true and false operands in the future. That can be the topic of another patch.	2024-03-25 10:35:22 -07:00
Michael Maitland	973e9dbd57	[RISCV][GISEL] Regbank select for scalable G_SELECT	2024-03-25 10:35:21 -07:00
Michael Maitland	8c51ac9ddb	[RISCV][GISEL] Legalize G_SELECT for scalable vectors	2024-03-25 10:35:21 -07:00
Michael Maitland	ea798a7900	[RISCV][GISEL] Legalize and regbankselect vector typed G_IMPLICIT_DEF	2024-03-25 10:19:14 -07:00
David Green	96819daa3d	[AArch64] Handle v2i16 and v2i8 in concat load combine. (#86264 ) This extends the concat load patch from https://reviews.llvm.org/D121400, which was later moved to a combine, to handle v2i8 and v2i16 concat loads too.	2024-03-25 17:10:23 +00:00
Evgenii Kudriashov	fb394562a3	[X86][GlobalISel] Fix referencing nonexistent operand in G_ICMP (#86221 ) Fixes #86203	2024-03-25 16:46:12 +01:00
David Stuttard	06cfbe3cfd	[AMDPU] Add support for idxen and bothen buffer load/store merging in SILoadStoreOptimizer (#86285 ) Added more buffer instruction merging support	2024-03-25 14:44:22 +00:00
Michael Maitland	865294b2e6	[CodeGen][MISched] Add misched post-regalloc bidirectional scheduling (#77138 ) This PR is stacked on #76186. This PR keeps the default strategy as top-down since that is what existing targets expect. It can be enabled using `-misched-postra-direction=bidirectional`. It is up to targets to decide whether they would like to enable this option for themselves.	2024-03-25 10:10:35 -04:00
AtariDreams	f5a067bb90	[SelectionDAG]: Deduce KnownNeverZero from SMIN and SMAX (#85722 )	2024-03-25 10:35:28 +00:00
Nathan Gauër	f0eb908340	[SPIR-V] Add WaveGetLaneIndex() intrinsic support (#85979 ) Add support to generate valid SPIR-V for the WaveGetLaneIndex() HLSL builtin. To implement this, I had to fix a few small issues in the backend, like the i8* pointer type being emitted, even if we have the type information elsewhere. Signed-off-by: Nathan Gauër <brioche@google.com>	2024-03-25 11:30:47 +01:00
Vyacheslav Levytskyy	b0d03ccc08	[SPIR-V] Fix illegal OpConstantComposite instruction with non-const constituents in SPIR-V Backend (#86352 ) This PR fixes illegal use of OpConstantComposite with non-constant constituents. The test attached to the PR is able now to satisfy `spirv-val` check. Before the fix SPIR-V Backend produced for the attached test case a pattern like ``` %a = OpVariable %_ptr_CrossWorkgroup_uint CrossWorkgroup %uint_123 %11 = OpConstantComposite %_struct_6 %a %a ``` so that `spirv-val` complained with ``` error: line 25: OpConstantComposite Constituent <id> '10[%a]' is not a constant or undef. %11 = OpConstantComposite %_struct_6 %a %a ```	2024-03-25 10:14:46 +01:00
Vyacheslav Levytskyy	1d250d9099	[SPIR-V] Improve type inference in SPIR-V Backend for opaque pointers (#86283 ) This PR improves type inference in SPIR-V Backend for opaque pointers, accounting or a case when there is a chain of function calls that allows to deduce formal parameter types from actual arguments. The attached test demonstrates the case.	2024-03-25 10:14:08 +01:00
Vyacheslav Levytskyy	99c40f6ba6	[SPIR-V] Introduce a command line option to support compatibility with Khronos SPIRV Translator (#86101 ) SPIRV-LLVM-Translator project (https://github.com/KhronosGroup/SPIRV-LLVM-Translator) from Khronos Group is a tool and a library for bi-directional translation between SPIR-V and LLVM IR. In its backward translation from SPIR-V to LLVM IR SPIRV-LLVM-Translator isn't necessarily able to cover the same SPIR-V patterns/instructions set that SPIRV Backend produces, even if we target the same SPIR-V version in both SPIRV-LLVM-Translator and SPIRV Backend projects. To improve interoperability and ability to apply SPIRV Backend output in different products this PR introduces a notion of a mode of SPIR-V output that is compatible with a subset of SPIR-V supported by SPIRV-LLVM-Translator. This includes a new command line option that doesn't influence default behavior of SPIRV Backend and one test case that demonstrates how this command line option may be used to get a practical benefit of producing that one of two possible and similar output options that can be understood by SPIRV-LLVM-Translator.	2024-03-25 10:13:42 +01:00
David Stuttard	75e528fdd9	[AMDGPU] Extend zero initialization of return values for TFE (#85759 ) buffer_load instructions that use TFE also need to zero initialize return values similar to how the image instructions currently work. Add support for this with standard zero init of all results + zero init of just TFE flag when enable-prt-strict-null subtarget feature is disabled.	2024-03-25 09:01:46 +00:00
Pierre van Houtryve	babbdad15b	[AMDGPU] Handle non-register operands for S_SUB/ADD_U64_PSEUDO (#86104 ) This pseudo uses SSrc_b64 so it allows both an immediate or a register, but the lowering crashed on immediate operands.	2024-03-25 09:23:40 +01:00
Luke Lau	373e77b4c0	[RISCV] Generalize (sub zext, zext) -> (sext (sub zext, zext)) to add (#86248 ) This generalizes the combine added in #82455 to other binary ops, beginning with adds in this patch. Because the two zext operands are always +ve when treated as signed, and we don't get any overflow since the add is carried out in at least N * 2 bits of the narrow type, the result of the add will always be +ve. So we can use a zext for the outer extend, unlike sub which may produce a -ve result from two +ve operands. Although we could still use sext for add, I plan to add support for other binary ops like mul in a later patch, but mul requires zext to be correct (because the maximum value will take up the full N * 2 bits). So I've opted to use zext here too for consistency. Alive2 proof: https://alive2.llvm.org/ce/z/PRNsUM	2024-03-25 13:08:56 +08:00
Wang Pengcheng	6af6416e89	[RISCV] Add a tune feature to disable stripping W suffix (#86255 ) We have a hidden option to disable it, but I'd like to make it a tune feature. For some implementations, instructions with W suffix would be less costly as they only perform on 32 bits data. Though we may lose some chances to compress.	2024-03-25 11:44:16 +08:00
Phoebe Wang	2e4e04c590	[X86][BF16] Do not lower to VCVTNEPS2BF16 without AVX512VL (#86395 ) Fixes: #86305	2024-03-25 10:06:12 +08:00
houndlord	9632e1515c	Match fixed width ISD::AVGFLOORS + ISD::AVGCEILS patterns (#86222 )	2024-03-24 15:33:16 +00:00
David Green	e8d5223ce4	[AArch64] Additional GISel test coverage. NFC	2024-03-24 12:32:47 +00:00
Simon Pilgrim	6c6fe4b2ae	[X86] known-never-zero.ll - add 32-bit test coverage Enabled vector coverage as well: i686+SSE2 and x64_64+AVX Should improve test quality for #85722	2024-03-24 11:33:51 +00:00
yingopq	5d7fd6a04a	[Mips] Restore wrong deletion of instruction 'and' in unsigned min/max processing. (#85902 ) Fix #61881	2024-03-24 02:35:42 -04:00
Owen Anderson	7c9b5228da	Only check assertions that were meant to apply to the normal case of non-splat vector SREM expansion when we aren't hitting the special case. (#86238 ) Fixes https://github.com/llvm/llvm-project/issues/84830 Introduced in https://github.com/llvm/llvm-project/pull/82706	2024-03-23 21:49:29 -05:00
Felix (Ting Wang)	90a7fc366a	[PowerPC][NFC] Add base test case for small-local-dynamic-tls on AIX (#84711 )	2024-03-24 08:46:45 +08:00
Harvin Iriawan	57146daeaa	[CodeGen] Update for scalable MemoryType in MMO (#70452 ) Remove getSizeOrUnknown call when MachineMemOperand is created. For Scalable TypeSize, the MemoryType created becomes a scalable_vector. 2 MMOs that have scalable memory access can then use the updated BasicAA that understands scalable LocationSize. Original Patch by Harvin Iriawan Co-authored-by: David Green <david.green@arm.com>	2024-03-23 12:56:25 +00:00
Evgenii Kudriashov	d365a45cb3	[GlobalISel] Introduce G_TRAP, G_DEBUGTRAP, G_UBSANTRAP (#84941 ) Here we introduce three new GMIR instructions to cover a set of trap intrinsics. The idea behind it is that generic intrinsics shouldn't be used with G_INTRINSIC opcode. These new instructions can match perfectly with existing trap ISD nodes. It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for selection and avoid manual selection. However AMDGPU is an exception. It selects traps during legalization regardless SelectionDAG or GlobalISel. Since there are not many places where traps are used, this change attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So, there is no stage when both G_TRAP and G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.	2024-03-23 13:12:44 +01:00
XChy	d7c672834e	[CodeGen][NFC] Update tests in AArch64/and-sink.ll	2024-03-23 19:01:33 +08:00
paperchalice	ef57977f2a	[NewPM][Hexagon] Add HexagonPassRegistry.def (#86244 ) Prepare for dag-isel, also migrate some test case	2024-03-23 15:02:27 +08:00
Florian Mayer	215f105ca5	[MTE] Fix test (#85875 ) llc runs the stack tagging instrumentation, so if we run opt before, we double instrument	2024-03-22 14:14:43 -07:00
Ulrich Weigand	4b907414d2	[SystemZ] Add support for llvm.readcyclecounter The llvm.readcyclecounter intrinsic can be implemented via the STORE CLOCK FAST (STCKF) instruction.	2024-03-22 20:01:02 +01:00
David Green	f82d0187a7	[AArch64] Add a test to show incorrect latencies into Bundle instructions. NFC	2024-03-22 14:00:21 +00:00

... 3 4 5 6 7 ...

52796 Commits