llvm-project

Author	SHA1	Message	Date
Hubert Tong	0812cde3bf	NFC: Make isPPC64 const and use member initializer	2024-11-01 20:41:25 -04:00
Alex MacLean	8ff60c4d47	[NVPTX] Add support for nvvm.flo.[us] intrinsics (#114489 ) Add support for '`llvm.nvvm.flo.[su].*`' intrinsics which correspond to a PTX `bfind` instruction. See [PTX ISA 9.7.1.16. Integer Arithmetic Instructions: bfind] (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-bfind) The '`llvm.nvvm.flo.u`' family of intrinsics identifies the bit position of the leading one, returning either it's offset from the most or least significant bit. The '`llvm.nvvm.flo.s`' family of intrinsics identifies the bit position of the leading non-sign bit, returning either it's offset from the most or least significant bit.	2024-11-01 16:35:43 -07:00
Alex MacLean	57183b6fe1	[NVPTX] Add support for stacksave, stackrestore intrinsics (#114484 ) Add support for the '`@llvm.stacksave`' and '`@llvm.stackrestore`' intrinsics to NVPTX. These are implemented with the `stacksave` and `stackrestore` PTX instructions respectively. See [PTX ISA 9.7.17. Stack Manipulation Instructions] (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#stack-manipulation-instructions).	2024-11-01 13:26:28 -07:00
Simon Pilgrim	8634e358eb	[AArch64][ARM] Avoid some APFloat copies in tablegen patterns. NFC. (#114416 ) Either the N->getValueAPF() was being unused or we were failing to make use of it returning a const APFloat&	2024-11-01 18:14:59 +00:00
Matt Arsenault	8e61aaa021	AMDGPU: Fix illegal commute with frame index (#114497 ) In ca409892c5396fa3fbb8ea4dbf53d0e952f36d09, frame indexes started being treated more like registers, rather than immediates. Update the commute logic to avoid failing the verifier by moving illegal SGPR operands in place of a frame index.	2024-11-01 10:02:29 -07:00
c8ef	b57b3f6425	[NFC] Simple typo correction. (#114548 )	2024-11-02 00:40:57 +08:00
Shilei Tian	10a1ea9b53	[NFC][AMDGPU] Remove the empty FPM as well as the adaptor to MPM (#114558 )	2024-11-01 12:21:26 -04:00
Krzysztof Drewniak	ea33af63de	Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714 )" v3 (#114443 ) This reverts commit 8a849a2a567d4e519b246a16936b6e7519936d4b. It seems I missed a spot when trying to ensure the code in the instruction selection tests were actually legalized MIR.	2024-11-01 11:13:29 -05:00
David Green	95fb7f8cb8	[AArch64] Move FeatureUseFixedOverScalableIfEqualCost with other tuning features. NFC This was in the with the armv9 architecture extensions.	2024-11-01 15:47:47 +00:00
peterbell10	b74e588e1f	[NVPTX] Don't use stack memory when bitcasting to/from v2i8 (#113928 ) `v2i8` is an unsupported type, so we hit the default legalization rules which perform the bitcast in stack memory and is very inefficient on GPU. This adds a custom lowering where we pack `v2i8` into `i16` and from there use another bitcast node to reach the final desired type. And also the inverse unpacking `i16` into `v2i8`.	2024-11-01 08:02:43 -07:00
Wang Qiang	b77e40265c	[llvm][NFC] Fix typos: replace “avaliable” with “available” across various files (#114524 ) This pull request corrects multiple occurrences of the typo "avaliable" to "available" across the LLVM and Clang codebase. These changes improve the clarity and accuracy of comments and documentation. Specific modifications are in the following files: 1. clang-tools-extra/clang-tidy/readability/FunctionCognitiveComplexityCheck.cpp: Updated comments in readability checks for cognitive complexity. 2. llvm/include/llvm/ExecutionEngine/Orc/ExecutionUtils.h: Corrected documentation for JITDylib responsibilities. 3. llvm/include/llvm/Target/TargetMacroFusion.td: Fixed descriptions for FusionPredicate variables. 4. llvm/lib/CodeGen/SafeStack.cpp: Improved comments on DominatorTree availability. 5. llvm/lib/Target/RISCV/RISCVSchedSiFive7.td: Enhanced resource usage descriptions for vector units. 6. llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp: Updated invariant description in shift-detect idiom logic. 7. llvm/test/MC/ARM/mve-fp-registers.s: Amended ARM MVE register availability notes. 8. mlir/lib/Bytecode/Reader/BytecodeReader.cpp: Adjusted forward reference descriptions for bytecode reader operations. These changes have no impact on code functionality, focusing solely on documentation clarity. Co-authored-by: wangqiang <wangqiang1@kylinos.cn>	2024-11-01 13:25:04 +00:00
Jay Foad	550501f21c	[AMDGPU] Simplify GFX12 VBUFFER definitions. NFC. (#114403 ) For GFX12 hasTFE is always true because it does not have the buffer load to LDS instructions.	2024-11-01 10:06:45 +00:00
Oliver Stannard	33411d5207	[ARM] Fix CMSE S->NS calls when CONTROL_S.SFPA==0 (CVE-2024-7883) (#114433 ) When doing a call from CMSE secure state to non-secure state for v8-M.main, we use the VLLDM and VLSTM instructions to save, clear and restore the FP registers around the call. These instructions both check the CONTROL_S.SFPA bit, and if it is clear (meaning the current contents of the FP registers are not secret) they execute as no-ops. This causes a problem when CONTROL_S.SFPA==0 before the call, which happens if there are no floating-point instructions executed between entry to secure state and the call. If this is the case, then the VLSTM instruction will do nothing, leaving the save area in the stack uninitialised. If the called function returns a value in floating-point registers, the call sequence includes an instruction to copy the return value from a floating-point register to a GPR, which must be before the VLLDM instruction. This copy sets CONTROL_S.SFPA, meaning that the VLLDM will fully execute, and load the uninitialised stack memory into the FP registers. This causes two problems: * The FP register file is clobbered, including all of the callee-saved registers, which might contain live values. * The stack region might contain secret values, which will be leaked to non-secure state through the floating-point registers if/when we return to non-secure state. The fix is to insert a `vmov s0, s0` instruction before the VLSTM instruction, to ensure that CONTROL_S.SFPA is set for both the VLLDM and VLSTM instruction. CVE: https://www.cve.org/cverecord?id=CVE-2024-7883 Security bulletin: https://developer.arm.com/Arm%20Security%20Center/Cortex-M%20Security%20Extensions%20Vulnerability	2024-11-01 09:36:13 +00:00
Daniil Kovalev	da083e358e	[PAC][CodeGen][ELF][AArch64] Support signed GOT (#113811 ) This re-applies #96164 after revert in #102434. Support the following relocations and assembly operators: - `R_AARCH64_AUTH_ADR_GOT_PAGE` (`:got_auth:` for `adrp`) - `R_AARCH64_AUTH_LD64_GOT_LO12_NC` (`:got_auth_lo12:` for `ldr`) - `R_AARCH64_AUTH_GOT_ADD_LO12_NC` (`:got_auth_lo12:` for `add`) `LOADgotAUTH` pseudo-instruction is introduced which is later expanded to actual instruction sequence like the following. ``` adrp x16, :got_auth:sym add x16, x16, :got_auth_lo12:sym ldr x0, [x16] autia x0, x16 ``` If a resign is requested, like below, `LOADgotPAC` pseudo is used, and GOT load is lowered similarly to `LOADgotAUTH`. ``` @var = global i32 0 define ptr @resign_globalvar() { ret ptr ptrauth (ptr @var, i32 3, i64 43) } ``` If FPAC bit is not set and auth instruction is emitted, a check+trap sequence similar to one used for `AUT` pseudo is emitted to ensure auth success. Both SelectionDAG and GlobalISel are suppported. For FastISel, we fall back to SelectionDAG. Tests starting with 'ptrauth-' have corresponding variants w/o this prefix. See also specification https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#appendix-signed-got	2024-11-01 12:21:10 +03:00
Phoebe Wang	c72a751dab	[X86][AMX] Support AMX-TRANSPOSE (#113532 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-01 16:45:03 +08:00
Thorsten Schütt	8e3772744d	[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE (#114470 ) There are patterns for: * {nxv2s32, s32, s64}, * {nxv4s16, s16, s64}, * {nxv2s16, s16, s64}	2024-11-01 06:10:26 +01:00
Hervé Poussineau	6fa1647a47	[MC][Mips] Rename MipsMCAsmInfo to MipsELFMCAsmInfo (#112592 ) Also change MipsAsmPrinter::emitStartOfAsmFile to emit ELF-related sections only when using ELF output file format.	2024-11-01 08:42:34 +08:00
Justin Fargnoli	a1987beac5	Reland "[NVPTX] Prefer prmt.b32 over bfi.b32" (#114326 ) Fix [failure](https://github.com/llvm/llvm-project/pull/110766#discussion_r1796832635) identified by @akuegel. --- In [[NVPTX] Improve lowering of v4i8](`cbafb6f2f5`) @Artem-B add the ability to lower ISD::BUILD_VECTOR with bfi PTX instructions. @Artem-B did this because: (https://github.com/llvm/llvm-project/pull/67866#discussion_r1343066911) Under the hood byte extraction/insertion ends up as BFI/BFE instructions, so we may as well do that in PTX, too. https://godbolt.org/z/Tb3zWbj9b However, the example that @Artem-B linked was targeting sm_52. On modern architectures, ptxas uses prmt.b32. [Example](https://godbolt.org/z/Ye4W1n84o). Thus, remove uses of NVPTXISD::BFI in favor of NVPTXISD::PRMT.	2024-10-31 16:09:20 -07:00
Thorsten Schütt	aa70d846b0	[GlobalISel][AArch64] Legalize G_SPLAT_VECTOR (#114006 ) {nxv8s16, s16} fails to select. {nxv16s8, s8} no patterns available.	2024-10-31 22:20:08 +01:00
zhijian lin	674574d25c	Promote 32bit pseudo instr that infer extsw removal to 64bit in PPCMIPeephole (#85451 ) Fixes: https://github.com/llvm/llvm-project/issues/71030 Bug only happens in 64bit involving spills. Since we don't know when the spill will happen, all instructions in the chain used to deduce sign extension for eliminating 'extsw' will need to be promoted to 64-bit pseudo instructions. The following instruction will promoted in PPCMIPeepholes: EXTSH, LHA, ISEL to EXTSH8, LHA8, ISEL8	2024-10-31 15:49:36 -04:00
Min-Yih Hsu	b9d7117ebd	[RISCV] Assign separate PseudoVSHA2MS_VV opcodes for each SEW (#114317 ) The vsha2ms.vv from Zvknh[ab] currently supports both SEW=32 and SEW=64. It might have different performance characteristics depending on the SEW on some processors. This patch splits these two different SEWs into their own VPsuedo opcodes and scheduling classes. This is effectively a NFC change.	2024-10-31 10:21:14 -07:00
WÁNG Xuěruì	f246b5f547	[LoongArch] Support bswap for LSX/LASX VTs (#114171 ) On top of #114170	2024-11-01 00:38:13 +08:00
Artem Belevich	8129b6b53b	[NVPTX, InstCombine] instcombine known pointer AS checks. (#114325 ) The change improves the code in general and, as a side effect, avoids crashing on an impossible address space casts guarded by `__isGlobal/__isShared`, which partially fixes https://github.com/llvm/llvm-project/issues/112760 It's still possible to trigger the issue by using explicit AS casts w/o AS checks, but LLVM should no longer crash on valid code. This is #112964 + a small fix for the crash on unintended argument access which was the root cause to revers the earlier version of the patch.	2024-10-31 09:24:51 -07:00
hev	f7a96dc664	[LoongArch] Ensure pcaddu18i and jirl adjacency in tail calls for correct relocation (#113932 ) Prior to this patch, both `pcaddu18i` and `jirl` were marked as scheduling boundaries to prevent instruction reordering that would disrupt their adjacency. However, in certain cases, epilogues were still being inserted between these two instructions, breaking the required proximity. This patch ensures that `pcaddu18i` and `jirl` remain adjacent even in the presence of epilogues, maintaining correct relocation behavior for tail calls on LoongArch.	2024-11-01 00:08:15 +08:00
Shilei Tian	9234ae1bbe	[NFC] clang-format -i llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp	2024-10-31 11:44:15 -04:00
Luke Lau	9c7188871c	[RISCV] Cost ordered bf16/f16 w/ zvfhmin reductions as invalid (#114250 ) In #111000 we removed promotion of fadd/fmul reductions for bf16 and f16 without zvfh, and marked the cost as invalid to prevent the vectorizers from emitting them. However it inadvertently didn't change the cost for ordered reductions, so this moves the check earlier to fix this. This also uses BasicTTIImpl instead which now assigns a valid but expensive cost for fixed-length vectors, which reflects how codegen will actually scalarize them.	2024-10-31 23:36:09 +08:00
Zaara Syeda	ccddd13602	Enable aggressive constant merge in GlobalMerge for AIX (#113956 ) Enable merging all constants without looking at use in GlobalMerge by default to replace PPCMergeStringPool pass on AIX.	2024-10-31 11:22:48 -04:00
Matt Arsenault	1d0370872f	AMDGPU: Expand flat atomics that may access private memory (#109407 ) If the runtime flat address resolves to a scratch address, 64-bit atomics do not work correctly. Insert a runtime address space check (which is quite likely to be uniform) and select between the non-atomic and real atomic cases. Consider noalias.addrspace metadata and avoid this expansion when possible (we also need to consider it to avoid infinitely expanding after adding the predication code).	2024-10-31 08:08:48 -07:00
Matt Arsenault	12409024d3	AMDGPU/GlobalISel: Handle atomic sextload and zextload (#111721 ) Atomic loads are handled differently from the DAG, and have separate opcodes and explicit control over the extensions, like ordinary loads. Add new patterns for these. There's room for cleanup and improvement. d16 cases aren't handled. Fixes #111645	2024-10-31 07:44:52 -07:00
SpencerAbson	c485ee1968	[AArch64] Add assembly/disassembly for zeroing SVE REV{B,H,W,D} and RBIT (#114110 ) This patch adds assembly/disassembly for the following SVE2.2 instructions - RBIT (zeroing) - REVB (zeroing) - REVH (zeroing) - REVW (zeroing) - REVD (zeroing) - In accordance with: https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions Co-authored-by: Marian Lukac marian.lukac@arm.com	2024-10-31 14:30:11 +00:00
Momchil Velikov	b185e925ad	[AArch64] Add assembly/disassembly for {S,U,SU,US}TMOPA instructions (#113946 ) The new instructions are described in https://developer.arm.com/documentation/ddi0602/2024-09/SME-Instructions Co-Authored-By: Marian Lukac <Marian.Lukac@arm.com>	2024-10-31 12:16:17 +00:00
Pengcheng Wang	18f0f70934	[RISCV] Support llvm.masked.expandload intrinsic (#101954 ) We can use `viota`+`vrgather` to synthesize `vdecompress` and lower expanding load to `vcpop`+`load`+`vdecompress`. And if `%mask` is all ones, we can lower expanding load to a normal unmasked load. Fixes #101914.	2024-10-31 20:03:58 +08:00
Momchil Velikov	95c5042db8	[AArch64] Add assembly/disassembly for {S,SU,US,U}MOP4{A,S} instructions (#113349 ) The new instructions are described in https://developer.arm.com/documentation/ddi0602/2024-09/SME-Instructions Co-Authored-By: Marian Lukac <Marian.Lukac@arm.com>	2024-10-31 11:12:14 +00:00
Sven van Haastregt	22081dc40b	[SPIR-V] Add missing ScalarOpts library (#114384 ) Fixes an "undefined reference to `llvm::createRegToMemWrapperPass()'" linker error introduced by cba70550ccf5 ("[SPIR-V] Fix BB ordering & register lifetime (#111026)", 2024-10-30).	2024-10-31 11:50:01 +01:00
SpencerAbson	0800351da4	[AArch64][SVE] Use INS when moving elements from bottom 128b of SVE type (#114034 ) Moving elements from a scalable vector to a fixed-lengh vector should use[ INS (vector, element) ](https://developer.arm.com/documentation/100069/0606/SIMD-Vector-Instructions/INS--vector--element-) when we know that the extracted element is in the bottom 128-bits of the scalable vector. This avoids inserting unecessary UMOV/FMOV instructions.	2024-10-31 10:36:00 +00:00
dnsampaio	28d0718033	[DAGCombiner] Add combine avg from shifts (#113909 ) This teaches dagcombiner to fold: `(asr (add nsw x, y), 1) -> (avgfloors x, y)` `(lsr (add nuw x, y), 1) -> (avgflooru x, y)` as well the combine them to a ceil variant: `(avgfloors (add nsw x, y), 1) -> (avgceils x, y)` `(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)` iff valid for the target. Removes some of the ARM MVE patterns that are now dead code. It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the immediate splatting as before.	2024-10-31 10:57:27 +01:00
Stanislav Mekhanoshin	7cd29741fa	[AMDGPU] Extend mov_dpp8 intrinsic lowering for generic types (#114296 ) The int_amdgcn_mov_dpp8 is overloaded, but we can only select i32. To allow a corresponding builtin to be overloaded the same way as int_amdgcn_mov_dpp we need it to be able to split unsupported values.	2024-10-31 01:15:25 -07:00
Ami-zhang	1897bf61f0	[LoongArch] Enable FeatureExtLSX for generic-la64 processor (#113421 ) This commit makes the `generic` target to support FP and LSX, as discussed in #110211. Thereby, it allows 128-bit vector to be enabled by default in the loongarch64 backend.	2024-10-31 15:58:15 +08:00
Elvis Wang	a8575c1459	[RISCV] Sink ordered reduction check into FAdd. NFC (#114180 )	2024-10-31 13:35:37 +08:00
Luke Lau	6da5968f5e	[RISCV] Lower scalar_to_vector for supported FP types (#114340 ) In https://reviews.llvm.org/D147608 we added custom lowering for integers, but inadvertently also marked it as custom for scalable FP vectors despite not handling it. This adds handling for floats and marks it as custom lowered for fixed-length FP vectors too. Note that this doesn't handle bf16 or f16 vectors that would need promotion, but these scalar_to_vector nodes seem to be emitted when expanding them.	2024-10-31 13:15:17 +08:00
Craig Topper	50896e7ef5	[ARM] Use getSignedConstant. NFC	2024-10-30 21:43:16 -07:00
Thorsten Schütt	6effab990c	Revert "[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE" (#114353 ) Reverts llvm/llvm-project#114310	2024-10-31 05:41:16 +01:00
Thorsten Schütt	6bf214b7c6	[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE (#114310 ) There are patterns for: * {nxv2s32, s32, s64}, * {nxv4s16, s16, s64}, * {nxv2s16, s16, s64}	2024-10-31 04:56:41 +01:00
Adam Yang	948249d804	Revert "[DXIL] Add GroupMemoryBarrierWithGroupSync intrinsic" (#114322 ) Reverts llvm/llvm-project#111884	2024-10-30 20:44:54 -07:00
Craig Topper	55dbacbf07	[RISCV] Remove RISCVISD::VFCVT_X(U)_F_VL by using VFCVT_RM_X(U)_F_VL with DYN rounding mode. NFC (#114306 )	2024-10-30 19:16:23 -07:00
Feng Zou	8127162427	[X86][AMX] Support AMX-FP8 (#113850 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-10-31 10:14:25 +08:00
Yingwei Zheng	cf9d1c1486	[SDAG] Simplify `SDNodeFlags` with bitwise logic (#114061 ) This patch allows using enumeration values directly and simplifies the implementation with bitwise logic. It addresses the comment in https://github.com/llvm/llvm-project/pull/113808#discussion_r1819923625.	2024-10-31 08:10:07 +08:00
Craig Topper	51628faa01	[RISCV] Sink hasPostISelHook = 1 for vector pseudos into the subclasses that set HasRoundModeOp. NFC (#114294 )	2024-10-30 16:33:18 -07:00
Craig Topper	847f4ef21b	[X86] Use getAllOnesConstant instead of getConstant(-1). NFC (#114299 )	2024-10-30 16:22:23 -07:00
Artem Belevich	04e876e6c6	Revert "[NVPTX] instcombine known pointer AS checks." (#114319 ) Reverts llvm/llvm-project#112964 Crashes MLIR: https://lab.llvm.org/buildbot/#/builders/138/builds/5665	2024-10-30 15:34:08 -07:00

1 2 3 4 5 ...

80797 Commits