llvm-project

Author	SHA1	Message	Date
Luke Lau	208edf7672	[RISCV] Fix assertion in lowerEXTRACT_SUBVECTOR This fixes a crash when lowering an extract_subvector like: t0:v1i64 = extract_subvector t1:v2i64, 1 Whilst we never need a vslidedown with M1 on scalable vector types, we might need to do it for v1i64/v1f64, since the smallest container type for it is nxv1i64/nxv1f64. The lowering code is still correct for this case, but the assertion was too strict. The actual invariant we're relying on is that ContainerSubVecVT's LMUL <= M1, not < M1. Hence why we handled v2i32 fine, because its container type was nxv1i32 and MF2.	2024-02-13 20:40:31 +08:00
Orlando Cazalet-Hyams	d860ea96b1	[HWASAN] Update dbg.assign intrinsics in HWAsan pass (#79864 ) llvm.dbg.assign intrinsics have 2 {value, expression} pairs; fix hwasan to update the second expression. Fixes #76545. This is #78606 rebased and with the addition of DPValue handling. Note the addition of --try-experimental-debuginfo-iterators in the tests and some shuffling of code in MemoryTaggingSupport.cpp.	2024-02-13 09:11:09 +00:00
Nikita Popov	070848c17c	[AArch64][GISel] Don't pointlessly lower G_TRUNC (#81479 ) If we have something like G_TRUNC from v2s32 to v2s16, then lowering this to a concat of two G_TRUNC s32 to s16 followed by G_TRUNC from v2s16 to v2s8 does not bring us any closer to legality. In fact, the first part of that is a G_BUILD_VECTOR whose legalization will produce a new G_TRUNC from v2s32 to v2s16, and both G_TRUNCs will then get combined to the original, causing a legalization cycle. Make the lowering condition more precise, by requiring that the original vector is >128 bits, which is I believe the only case where this specific splitting approach is useful. Note that this doesn't actually produce a legal result (the alwaysLegal is a lie, as before), but it will cause a proper globalisel abort instead of an infinite legalization loop. Fixes https://github.com/llvm/llvm-project/issues/81244.	2024-02-13 09:29:56 +01:00
Pierre van Houtryve	87d7711934	[AMDGPU][SIMemoryLegalizer] Fix order of GL0/1_INV on GFX10/11 (#81450 ) Fixes SWDEV-443292	2024-02-13 09:07:51 +01:00
sstipanovic	785eddd7a7	[AMDGPU][GlobalIsel] Introduce isRegisterClassType to check for legal types, instead of checking bit width. (#68189 ) In D151116 it was suggested to have a set of classes to cover every possible case. This does it for bitcast first. closes #79578	2024-02-13 08:26:10 +01:00
Austin Kerbow	4bcbeaed63	[AMDGPU] Enable kernel arg preloading with gfx90a (#81180 ) Add a trap instruction to the beginning of the kernel prologue to handle cases where preloading is attempted on HW loaded with incompatible firmware.	2024-02-12 22:33:29 -08:00
Luke Lau	bb77047a3b	[RISCV] Handle fixed length vectors with exact VLEN in loweringEXTRACT_SUBVECTOR (#79949 ) This is a revival of #65392. When we lower an extract_subvector, we extract the subregister that the subvector is contained in first and then do a vslidedown with LMUL=1. We can currently only do this for scalable vectors though because the index is scaled by vscale and thus we will know what subregister the subvector lies in. For fixed length vectors, the index isn't scaled by vscale and so the subvector could lie in any arbitrary subregister, so we have to do a vslidedown with the full LMUL. The exception to this is when we know the exact VLEN: in which case, we can still work out the exact subregister and do the LMUL=1 vslidedown on it. This patch handles this case by scaling the index by 1/vscale before computing the subregister, and extending the LMUL=1 path to handle fixed length vectors.	2024-02-13 14:29:08 +08:00
Fangrui Song	78f2eb8d0f	[test] Replace aarch64-*-{eabi,gnueabi}{,hf} with aarch64	2024-02-12 18:36:31 -08:00
Fangrui Song	3d18c8cd26	[test] Replace aarch64-*-{eabi,gnueabi}{,hf} with aarch64 Similar to d39b4ce3ce8a3c256e01bdec2b140777a332a633 Using "eabi" or "gnueabi" for aarch64 targets is a common mistake and warned by Clang Driver. We want to avoid them elsewhere as well. Just use the common "aarch64" without other triple components.	2024-02-12 18:29:55 -08:00
Artem Belevich	61a0fc7947	[NVPTX] pass correct GPU arch to ptxas test (#81535 )	2024-02-12 13:18:08 -08:00
Artem Belevich	8799d7143f	[NVPTX] Fix the error in a pattern match in v4i8 comparisons. (#81308 ) The replacement should've had BFE() as the arguments for the comparison, not the source register. While at that, tighten the patterns a bit, and expand them to cover variants with immediate arguments. Also change the default lowering of bfe() to use unsigned variant, so the value of the upper bits is predictable.	2024-02-12 12:59:03 -08:00
Craig Topper	1114ac4399	[RISCV] Remove stale comment from test. NFC (#81098 ) The bug mentioned in the comment has been committed and did change the cfi_offset.	2024-02-12 09:19:28 -08:00
Andrei Safronov	b5046a7fa9	[Xtensa] Initial codegen support from IR (#78548 ) This PR provides implementation of the basic codegen infra such as TargetFrameLowering, MCInstLower, AsmPrinter, RegisterInfo, InstructionInfo, TargetLowering, SelectionDAGISel. Migrated from https://reviews.llvm.org/D145658	2024-02-12 17:41:59 +01:00
Nikita Popov	69ddf1eb4d	[X86] Add test for #80911 (NFC)	2024-02-12 16:40:43 +01:00
Antonio Frighetto	8373ceef8f	[CGP] Extend `dupRetToEnableTailCallOpts` to known intrinsics Hint further tail call optimization opportunities when the examined returned value is the return value of a known intrinsic or library function, and it appears as first function argument. Fixes: https://github.com/llvm/llvm-project/issues/75455.	2024-02-12 14:17:02 +01:00
Antonio Frighetto	d1c481d27d	[CGP] Precommit tests for PR76613 (NFC)	2024-02-12 14:17:02 +01:00
Joseph Huber	2ac8e6b7f5	[NVPTX] Implement `__builtin_readcyclecounter` on NVPTX (#81344 ) Summary: This patch simply states that `__builtin_readcyclecounter` is legal on NVPTX and makes it return the value from the `clock64` sreg. The timer intrinsics are marked as having side effects, which is desireable for timing primitives and required to pattern match the instrinic DAG.	2024-02-12 07:07:48 -06:00
Serge Pavlov	213b0ae497	[GlobalISel][ARM] legalize G_FPENV_RESET for soft-float mode (#81456 )	2024-02-12 17:46:59 +07:00
Vyacheslav Levytskyy	d153ef6a34	Add support for SPIR-V extension: SPV_INTEL_function_pointers (#80759 ) This PR adds initial support for "SPV_INTEL_function_pointers" SPIR-V extension: https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc The goal of the extension is to support indirect function calls and translation of function pointers into SPIR-V.	2024-02-12 11:22:48 +01:00
Pierre van Houtryve	f93aa5157a	[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955 ) These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. This contains the documentation changes for both this change and #76954 as well.	2024-02-12 10:18:20 +01:00
Vyacheslav Levytskyy	b221b97336	Add support for SPIR-V extension: SPV_INTEL_subgroups (#81023 ) The goal of this PR is to implement SPV_INTEL_subgroups extension in SPIR-V Backend.	2024-02-12 10:05:21 +01:00
Pierre van Houtryve	1e36d92b70	[LowerMemIntrinsics] Avoid udiv/urem when type size is a power of 2 (#81238 ) See #64620 - does not fix the issue but improves the generated code a bit.	2024-02-12 10:01:22 +01:00
Nikita Popov	92d7992205	[AArch64] Only apply bool vector bitcast opt if result is scalar (#81256 ) This optimization tries to optimize bitcasts from `<N x i1>` to iN, but currently also triggers for `<N x i1>` to `<M x iK>` bitcasts, if custom lowering has been requested for these for an unrelated reason. Fix this by explicitly checking that the result type is scalar. Fixes https://github.com/llvm/llvm-project/issues/81216.	2024-02-12 10:00:34 +01:00
Adrian Kuegel	da9559d69a	Do not use PerformEXTRACTCombine for v8i8 types (#81242 ) Same as with v4i8 types, we should not be using PerformEXTRACTCombine for v8i8 types.	2024-02-12 07:31:31 +01:00
David Green	b1771475da	[AArch64][GlobalISel] Additional insert and extract GISel tests. NFC	2024-02-11 22:25:16 +00:00
Jon Roelofs	ffab5a089b	Add a test for the A16/A17 parts of eb1b428750181ea742c547db0bc7136cd5b8f732 There are a couple of open questions on what we should do for A14, so I'll leave that off for now. https://github.com/llvm/llvm-project/pull/81325#issuecomment-1937489565	2024-02-11 10:51:51 -08:00
Simon Pilgrim	b45de48be2	[MVE] Expand64BitShift - handle all constant shift amounts less than 32 (#81261 ) Expand64BitShift was always dropping to generic shift legalization if the shift amount type was larger than i64, even if the constant shift amount was actually very small. I've adjusted the constant bounds checks to work with APInt types so we can always perform the comparison. This results in the MVE long shift instructions being used more often, and it looks like this is preventing some additional combines from happening. This could be addressed in the future. This came about while I was trying to extend the DAGTypeLegalizer::ExpandShift* helpers and need to move to consistently using the legal shift amount types instead of reusing the shift amount type from the original wider shift.	2024-02-11 15:02:27 +00:00
David Green	c3dfbb6f49	[AArch64][GlobalISel] Add commute_constant_to_rhs to post legalizer combiners (#81103 ) This helps the fp reductions, moving the constant operands to the RHS which in turn helps simplify away fadd -0.0 and fmul 1.0.	2024-02-11 11:20:11 +00:00
Koakuma	c2f9885a8a	[SPARC] Support reserving arbitrary general purpose registers (#74927 ) This adds support for marking arbitrary general purpose registers - except for those with special purpose (G0, I6-I7, O6-O7) - as reserved, as needed by some software like the Linux kernel.	2024-02-11 02:04:18 -05:00
darkbuck	d0f4663f48	[GlobalISel][Mips] Global ISel for `brcond` - Enable equivalent between `brcond` and `G_BRCOND`. - Remove the manual selection of `G_BRCOND` in Mips. Revise test cases. Reviewers: petar-avramovic, bcardosolopes, arsenm Reviewed By: arsenm Pull Request: https://github.com/llvm/llvm-project/pull/81306	2024-02-10 21:44:05 -05:00
Ikhlas Ajbar	76e3759d8d	[Hexagon] Order objects on the stack by their alignments (#81280 ) This patch sorts stack objects by their alignment value from the largest to the smallest. If two objects have the same alignment, then they are sorted by their size from the largest to the smallest. This minimizes padding and reduces run time stack size.	2024-02-10 14:42:50 -06:00
Yeting Kuo	59037c0975	[RISCV] Add Zicfiss support to the shadow call stack implementation. (#68075 ) This patch enable hardware shadow stack with `Zicifss` and `mno-forced-sw-shadow-stack`. New feature forced-sw-shadow-stack disables hardware shadow stack even when `Zicfiss` enabled.	2024-02-10 22:18:46 +08:00
Craig Topper	c08b90c50b	[RISCV] Lower the TransientStackAlignment to the ABI alignment for rv32e/rv64e. I don't think the transient alignment needs to be larger than the ABI alignment.	2024-02-09 21:48:11 -08:00
Mikhail Gudim	7192c22ee4	[GlobalISel][RISCV] Use constant pool for large integer constants. (#81101 ) We apply custom lowering to 64 bit constants where we use the same logic as in non-global isel: if materializing in registers is too expensive, we emit a load from constant pool. Later, during instruction selection, constant pool address is generated using `selectAddr`.	2024-02-10 00:42:33 -05:00
Philipp Tomsich	fbba818a78	[AArch64] Add the Ampere1B core (#81297 ) The Ampere1B is Ampere's third-generation core implementing a superscalar, out-of-order microarchitecture with nested virtualization, speculative side-channel mitigation and architectural support for defense against ROP/JOP style software attacks. Ampere1B is an ARMv8.7+ implementation, adding support for the FEAT WFxT, FEAT CSSC, FEAT PAN3 and FEAT AFP extensions. It also includes all features of the second-generation Ampere1A, such as the Memory Tagging Extension and SM3/SM4 cryptography instructions.	2024-02-09 15:22:09 -08:00
choikwa	0b77b19292	[AMDGPU] Add test to show s_cselect generation from uniform select (#79384 )	2024-02-09 14:10:04 -08:00
Joseph Huber	3c707310a3	[NVPTX] Add clang builtin for `__nvvm_reflect` intrinsic (#81277 ) Summary: Some recent support made usage of `__nvvm_reflect` more consistent. We should expose it as a builtin rather than forcing users to externally define the function.	2024-02-09 14:11:01 -06:00
Joseph Huber	bb180856ec	[NVPTX][Fix] Update minimum CPU for NVPTX intrinsics test Summary: This test requires at least sm_30 to run, but that is still below the minimum supported version of sm_52 currently. Just set this to sm_60 so the tests pass in the future.	2024-02-09 14:05:40 -06:00
Craig Topper	7ad7db0d99	[RISCV] Fix typo in ABI name in test. NFC ilp64->lp64.	2024-02-09 11:46:23 -08:00
Joseph Huber	07dc85ba0c	[NVVMReflect] Improve folding inside of the NVVMReflect pass (#81253 ) Summary: The previous patch did very simple folding that only worked for driectly used branches. This patch improves this by traversing the use-def chain to sipmlify every constant subexpression until it reaches a terminator we can delete. The support should work for all expected cases now.	2024-02-09 13:39:03 -06:00
Philip Reames	5948d4de1d	[RISCV] Add test coverage for buildvectors with long vslidedown sequences In advance of an upcoming change.	2024-02-09 11:10:35 -08:00
Pranav Kant	2e4d2762b5	[X86][CodeGen] Emit float128 libcalls for math functions (#79611 ) Make LLVM emit libcalls to proper float128 variants for float128 types.	2024-02-09 10:55:56 -08:00
Simon Pilgrim	9ba265636f	[X86] ReplaceNodeResults - shrink i64 CTPOP to (shifted) CTPOP i32 if 32 or less active bits to avoid SSE2 codegen 32-bit targets perform i64 CTPOP as a v2i64 CTPOP - if we can perform this as a i32 CTPOP by shifting the source bits, then do so to avoid the gpr<->xmm This also triggers on non-SSE2 capable targets, as can be seen with the minor codegen diffs in ctpop_shifted_mask16	2024-02-09 12:24:09 +00:00
Simon Pilgrim	047f8321f1	[X86] ctpop-mask.ll - add 32-bit with SSE2 test coverage 32-bit targets will try to use SSE2 <2 x i64> CTPOP expansion for i64 CTPOP	2024-02-09 12:24:09 +00:00
Jan Patrick Lehr	f661057865	Revert "[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init" (#81234 ) Reverts llvm/llvm-project#79586 This broke the AMDGPU OpenMP Offload buildbot. The typical error message was that the GPU attempted to read beyong the largest legal address. Error message: AMDGPU fatal error 1: Received error in queue 0x7f8363f22000: HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address.	2024-02-09 09:57:38 +01:00
Diana Picus	bc6955f18c	[AMDGPU] Don't fix the scavenge slot at offset 0 (#79136 ) At the moment, the emergency spill slot is a fixed object for entry functions and chain functions, and a regular stack object otherwise. This patch adopts the latter behaviour for entry/chain functions too. It seems this was always the intention [1] and it will also save us a bit of stack space in cases where the first stack object has a large alignment. [1] `34c8b835b1`	2024-02-09 09:20:25 +01:00
DianQK	ccb46e8365	Reapply "[RegisterCoalescer] Clear instructions not recorded in `ErasedInstrs` but erased (#79820 )" This reverts commit 8316bf34ac21117f35bc8e6fafa2b3e7da75e1d5.	2024-02-09 15:58:48 +08:00
DianQK	8316bf34ac	Revert "[RegisterCoalescer] Clear instructions not recorded in `ErasedInstrs` but erased (#79820 )" This reverts commit 95b14da678f4670283240ef4cf60f3a39bed97b4.	2024-02-09 15:54:54 +08:00
Quentin Dian	95b14da678	[RegisterCoalescer] Clear instructions not recorded in `ErasedInstrs` but erased (#79820 ) Fixes #79718. Fixes #71178. The same instructions may exist in an iteration. We cannot immediately delete instructions in `ErasedInstrs`.	2024-02-09 15:29:05 +08:00
Craig Topper	db88f30158	[RISCV] Add test for saving s10 with cm.push. NFC If cm.push saves s10, it must also save s11 due to an encoding limitation. We handle this in the code, but had no test for it.	2024-02-08 21:03:41 -08:00

... 14 15 16 17 18 ...

52796 Commits