llvm-project

Author	SHA1	Message	Date
Michael Maitland	c2c4db8d8f	[RISCV][VLOPT] Add support for 11.11 div instructions (#112201 ) This adds support for these instructions and also tests getOperandInfo for these instructions as well.	2024-10-14 14:44:33 -04:00
Michael Maitland	82e89c0271	[RISCV][VLOPT] Add support for 11.9 min/max instructions (#112198 ) This adds support for these instructions and also tests getOperandInfo for these instructions as well.	2024-10-14 14:43:56 -04:00
Yuta Saito	d4efc3e097	[Coverage][WebAssembly] Add initial support for WebAssembly/WASI (#111332 ) Currently, WebAssembly/WASI target does not provide direct support for code coverage. This patch set fixes several issues to unlock the feature. The main changes are: 1. Port `compiler-rt/lib/profile` to WebAssembly/WASI. 2. Adjust profile metadata sections for Wasm object file format. - [CodeGen] Emit `__llvm_covmap` and `__llvm_covfun` as custom sections instead of data segments. - [lld] Align the interval space of custom sections at link time. - [llvm-cov] Copy misaligned custom section data if the start address is not aligned. - [llvm-cov] Read `__llvm_prf_names` from data segments 3. [clang] Link with profile runtime libraries if requested See each commit message for more details and rationale. This is part of the effort to add code coverage support in Wasm target of Swift toolchain.	2024-10-15 02:41:43 +09:00
Michael Maitland	2f077ece2f	[RISCV][VLOPT] Enable VLOptimizer for vl-opt.ll test file	2024-10-14 10:36:06 -07:00
Michael Marjieh	b5600c6f85	[TargetLowering][SelectionDAG] Exploit nneg Flag in UINT_TO_FP (#108931 ) 1. Propagate the nneg flag in WidenVecRes 2. Use SINT_TO_FP in expandUINT_TO_FP when possible.	2024-10-14 20:55:48 +04:00
Shilei Tian	a74659445d	[AMDGPU] Skip terminators when forcing emit zero flag (#112116 ) When forcing emit zero, we need to skip terminators of a MBB; otherwise the terminator list of the MBB would be broken.	2024-10-14 11:46:18 -04:00
Michael Maitland	a31e834ba8	[RISCV][VLOPT] Update test cases to use riscv-enable-vl-optimizer and better formatting	2024-10-14 08:44:16 -07:00
Albert Huang	aa2c0f35a1	[ARM] [AArch32] Add support for Arm China STAR-MC1 CPU (#110085 ) STAR-MC1 is an Armv8m CPU. Technical specifications available at: https://www.armchina.com/download/Documents/Application-Notes/Technical-Reference-Manual?infoId=160	2024-10-14 15:48:12 +01:00
Simon Pilgrim	d81c2f16a3	[X86] canCreateUndefOrPoisonForTargetNode - X86ISD::VPERMV3 shuffles don't create undef/poison The operands might contain an undef/poison element, but the shuffle node itself will not create one by itself. Improves test case from #109272	2024-10-14 14:54:03 +01:00
Simon Pilgrim	fd8a4b0073	[X86] combineAndnp - fold ANDN(SEXT(SETCC()),X) -> SELECT(NOT(SETCC()),X,0) on AVX512 targets Reverse the generic foldVSelectToSignBitSplatMask fold on AVX512 targets where we can use the SETCC result directly in predicated moves/instructions. Fixes #109272	2024-10-14 14:54:03 +01:00
Akshat Oke	cd6c2b80be	[NewPM][CodeGen] Port StackColoring to NPM (#111812 )	2024-10-14 19:23:34 +05:30
c8ef	a3b0c31ebc	Revert "[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes." (#112200 ) Reverts llvm/llvm-project#111774 This appears to be causing some tests to fail.	2024-10-14 21:43:49 +08:00
c8ef	11f625cb87	[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes. (#111774 ) Closes #108218. This patch adds icmp+select patterns for integer min/max matchers in SDPatternMatch, similar to those in IR PatternMatch.	2024-10-14 21:19:34 +08:00
Akshat Oke	8b20f1b924	[MIR] Fix tests for flags in register info (#112179 ) [MIR] Serialize virtual register flags #110228 introduces register flags which appear empty in .mir dumps. Future tests should use `-simplify-mir`.	2024-10-14 18:28:54 +05:30
Simon Pilgrim	f7788618dd	[X86] vselect-packss.ll - regenerate test checks with vpternlog comments	2024-10-14 12:11:30 +01:00
Simon Pilgrim	ccb9835edb	[X86] LowerShift - lower vXi8 shifts of an uniform constant using PSHUFB (#112175 ) If each 128-bit vXi8 lane is shifting the same constant value, we can pre-compute the 8 valid shift results and use PSHUFB to act as a LUT with the shift amount. Fixes #110317	2024-10-14 12:10:41 +01:00
Serge Pavlov	52e5683ddd	[GlobalISel][ARM] Legalization of G_CONSTANT using constant pool (#98308 ) ARM uses complex encoding of immediate values using small number of bits. As a result, some values cannot be represented as immediate operands, they need to be synthesized in a register. This change implements legalization of such constants with loading values from constant pool. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2024-10-14 16:40:21 +07:00
Akshat Oke	bec839d8ee	[AMDGPU] Serialize WWM_REG vreg flag (#110229 )	2024-10-14 14:37:21 +05:30
Akshat Oke	dbfca24b99	[MIR] Serialize virtual register flags (#110228 ) [MIR] Serialize virtual register flags This introduces target-specific vreg flag serialization. Flags are represented as `uint8_t` and the `TargetRegisterInfo` override provides methods `getVRegFlagValue` to deserialize and `getVRegFlagsOfReg` to serialize.	2024-10-14 14:19:53 +05:30
David Green	a07639f4bb	[AArch64] Increase inline memmove limit to 16 stored registers (#111848 ) The memcpy inline limit has been 16 for a long time, this patch makes the memmove inline limit the same, allowing small-constant sized memmoves to be emitted inline. The 16 is the number of registers stored, which equates to a limit of 256 bytes.	2024-10-14 08:57:32 +01:00
YunQiang Su	c01ddbe916	RISC-V: Select FCANONICALIZE (#112083 ) We can use `FMIN.x OP,OP` to canonlize a float.	2024-10-14 14:12:36 +08:00
Jim Lin	dba54fb074	[RISCV] Add support for inline asm constraint vd (#111653 ) It constrains vector registers excluding v0. Refer to https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html RISC-V part. This patch also adds a testcase for constraints vr, vd and vm.	2024-10-14 10:47:59 +08:00
duk	464a7ee79e	[CodeGen] Generalize trap emission after SP check fail (#109744 ) Generalize and improve some target-specific code that emits traps after stack protector failure in SelectionDAG & GlobalIsel.	2024-10-12 20:01:22 -04:00
Miguel Saldivar	6fd229a655	[X86] Invert (and X, ~(and ~Y, Z)) back into (and X, (or Y, ~Z)) (#109215 ) When `andn` is available, we should avoid switching `s &= ~(z & ~y);` into `s &= ~z \| y;` This patch turns this assembly from: ``` foo: not rcx and rsi, rdx andn rax, rsi, rdi or rcx, rdx and rax, rcx ret ``` into: ``` foo: and rsi, rdx andn rcx, rdx, rcx andn rax, rsi, rdi andn rax, rcx, rax ret ``` Fixes #108731	2024-10-12 11:28:39 +01:00
Matt Arsenault	cb2f161957	AArch64: Remove incorrect REQUIRES arm-registered-target from test (#111983 )	2024-10-12 13:26:17 +04:00
Craig Topper	902520256b	[RISCV] Make (sext_inreg X, i1) legal for XTHeadBb to cover the existing isel pattern. I just happened to notice the untested isel pattern.	2024-10-11 16:16:07 -07:00
Tex Riddell	82b40fd4fd	Fix scalar overload name constructed by ReplaceWithVeclib.cpp (#111095 ) ReplaceWithVeclib.cpp would construct overload name using all the arguments in the intrinsic, but overloads should only be constructed from arguments for which isVectorIntrinsicWithOverloadTypeAtArg returns true, including the return type first (index -1). Additionally, - skip when `Intrinsic::not_intrinsic`, otherwise `isVectorIntrinsicWithOverloadTypeAtArg` asserts for some IntrinsicCalls. Unblocks translation for pow and atan2 intrinsics. Fixes #111093	2024-10-11 14:38:35 -07:00
Craig Topper	8b46d40221	[RISCV] Re-generate orc-b-patterns.ll for store clustering. NFC The patch added orc-b-patterns.ll landed while store clustering was still in review.	2024-10-11 14:28:51 -07:00
Alex Bradbury	2967e5f800	[RISCV] Enable store clustering by default (#73796 ) Builds on #73789, enabling store clustering by default using the same heuristic.	2024-10-11 20:25:53 +01:00
Simon Pilgrim	03447ab98d	[X86] Add test coverage for #110317 Add tests showing potential to use PSHUFB for shifts of constant uniform values by using a pre-computed LUT of all legal shift amounts	2024-10-11 17:22:56 +01:00
Janek van Oirschot	50866e84d1	Revert "[AMDGPU] Avoid resource propagation for recursion through multiple functions" (#112013 ) Reverts llvm/llvm-project#111004	2024-10-11 17:10:28 +01:00
Juan Manuel Martinez Caamaño	2d5f3b0a61	[AMDGPU][SIPreEmitPeephole] mustRetainExeczBranch: use BranchProbability and TargetSchedmodel (#109818 ) Remove s_cbranch_execnz branches if the transformation is profitable according to `BranchProbability` and `TargetSchedmodel`.	2024-10-11 17:45:59 +02:00
Janek van Oirschot	67160c5ab5	[AMDGPU] Avoid resource propagation for recursion through multiple functions (#111004 ) Avoid constructing recursive MCExpr definitions when multiple functions cause a recursion. Fixes #110863	2024-10-11 16:42:50 +01:00
Michael Maitland	1c94388f38	[RISCV] Introduce VLOptimizer pass (#108640 ) The purpose of this optimization is to make the VL argument, for instructions that have a VL argument, as small as possible. This is implemented by visiting each instruction in reverse order and checking that if it has a VL argument, whether the VL can be reduced. By putting this pass before VSETVLI insertion, we see three kinds of changes to generated code: 1. Eliminate VSETVLI instructions 2. Reduce the VL toggle on VSETVLI instructions that also change vtype 3. Reduce the VL set by a VSETVLI instruction The list of supported instructions is currently whitelisted for safety. In the future, we could add more instructions to `isSupportedInstr` to support even more VL optimization. We originally wrote this pass because vector GEP instructions do not take a VL, which leads us to emit code that uses VL=VLMAX to implement GEP in the RISC-V backend. As a result, some of the vector instructions will write to lanes, specifically between the intended VL and VLMAX, that will never be read. As an alternative to this pass, we considered adding a vector predicated GEP instruction, but this would not fit well into the intrinsic type system since GEP has a variable number of arguments, each with arbitrary types. The second approach we considered was to put this pass after VSETVLI insertion, but we found that it was more difficult to recognize optimization opportunities, especially across basic block boundaries -- the data flow analysis was also a bit more expensive and complex. While this pass solves the GEP problem, we have expanded it to handle more cases of VL optimization, and there is opportunity for the analysis to be improved to enable even more optimization. We have a few follow up patches to post, but figured this would be a good start. --------- Co-authored-by: Craig Topper <craig.topper@sifive.com> Co-authored-by: Kito Cheng <kito.cheng@sifive.com>	2024-10-11 09:45:35 -04:00
Benjamin Maxwell	c3a10dc849	[AArch64] Disable consecutive store merging when Neon is unavailable (#111519 ) Lowering fixed-size BUILD_VECTORS without Neon may introduce stack spills, leading to more stores/reloads than if the stores were not merged. In some cases, it can also prevent using paired store instructions. In the future, we may want to relax when SVE is available, but currently, the SVE lowerings for BUILD_VECTOR are limited to a few specific cases.	2024-10-11 14:15:01 +01:00
Emilio Cota	9a696b68b7	Revert "[NVPTX] Prefer prmt.b32 over bfi.b32 (#110766 )" This reverts commit 3f9998af4f79e95fe8be615df9d6b898008044b9. It breaks downstream tests with egregious numerical differences. Unfortunately no upstream tests are broken, but the fact that a prior iteration of the commit (pre-optimization) does work with our downstream tests (coming from the Triton repo) supports the claim that the final version of the commit is incorrect. Reverting now so that the original author can evaluate.	2024-10-11 08:42:33 -04:00
Daniel Mokeev	26b832a9ec	[RISCV] Add DAG combine to turn (sub (shl X, 8-Y), (shr X, Y)) into orc.b (#111828 ) This patch generalizes the DAG combine for `(sub (shl X, 8), X) => (orc.b X)` into the more general form of `(sub (shl X, 8 - Y), (srl X, Y)) => (orc.b X)`. Alive2 generalized proof: https://alive2.llvm.org/ce/z/dFcf_n Related issue: https://github.com/llvm/llvm-project/issues/96595 Related PR: https://github.com/llvm/llvm-project/pull/96680	2024-10-11 20:41:47 +08:00
Matt Arsenault	14705a912f	CodeGen: Remove redundant REQUIRES registered-target from tests (#111982 ) These are already in target specific test directories.	2024-10-11 16:16:12 +04:00
Petar Avramovic	7b0d56be1d	AMDGPU/GlobalISel: Fix inst-selection of ballot (#109986 ) Both input and output of ballot are lane-masks: result is lane-mask with 'S32/S64 LLT and SGPR bank' input is lane-mask with 'S1 LLT and VCC reg bank'. Ballot copies bits from input lane-mask for all active lanes and puts 0 for inactive lanes. GlobalISel did not set 0 in result for inactive lanes for non-constant input.	2024-10-11 11:40:27 +02:00
Fabian Ritter	173c68239d	[AMDGPU] Enable unaligned scratch accesses (#110219 ) This allows us to emit wide generic and scratch memory accesses when we do not have alignment information. In cases where accesses happen to be properly aligned or where generic accesses do not go to scratch memory, this improves performance of the generated code by a factor of up to 16x and reduces code size, especially when lowering memcpy and memmove intrinsics. Also: Make the use of the FeatureUnalignedScratchAccess feature more consistent: FeatureUnalignedScratchAccess and EnableFlatScratch are now orthogonal, whereas, before, code assumed that the latter implies the former at some places. Part of SWDEV-455845.	2024-10-11 08:50:49 +02:00
Yingwei Zheng	ec3e0a5900	Revert "[CodeGenPrepare] Convert `ctpop(X) ==/!= 1` into `ctpop(X) u</u> 2/1`" (#111932 ) Reverts llvm/llvm-project#111284 to fix clang stage2 builds. Investigating... Failed buildbots: https://lab.llvm.org/buildbot/#/builders/76/builds/3576 https://lab.llvm.org/buildbot/#/builders/168/builds/4308 https://lab.llvm.org/buildbot/#/builders/127/builds/1087	2024-10-11 11:08:07 +08:00
Phoebe Wang	9882b35a3a	[X86][StrictFP] Combine fcmp + select to fmin/fmax for some predicates (#109512 ) X86 maxss/minss etc. instructions won't turn SNaN to QNaN, so we can combine fcmp + select to them for some predicates.	2024-10-11 10:18:40 +08:00
Yingwei Zheng	e3894f58e1	[CodeGenPrepare] Convert `ctpop(X) ==/!= 1` into `ctpop(X) u</u> 2/1` (#111284 ) Some targets have better codegen for `ctpop(X) u< 2` than `ctpop(X) == 1`. After https://github.com/llvm/llvm-project/pull/100899, we set the range of ctpop's return value to indicate the argument/result is non-zero. This patch converts `ctpop(X) ==/!= 1` into `ctpop(X) u</u> 2/1` in CGP to fix https://github.com/llvm/llvm-project/issues/95255.	2024-10-11 09:08:38 +08:00
YunQiang Su	72fb379225	AArch64: Select FCANONICALIZE (#104429 ) FMINNM/FMAXNM instructions of AArch64 follow IEEE754-2008. We can use them to canonicalize a floating point number. And FMINNUM_IEEE/FMAXNUM_IEEE is used by something like expanding FMINIMUMNUM/FMAXIMUMNUM, so let's define them. --------- Co-authored-by: Your Name <you@example.com>	2024-10-11 08:45:14 +08:00
Finn Plummer	2647505027	[HLSL] Implement the `degrees` intrinsic (#111209 ) - add degrees builtin - link degrees api in hlsl_intrinsics.h - add degrees intrinsic to IntrinsicsDirectX.td - add degrees intrinsic to IntrinsicsSPIRV.td - add lowering from clang builtin to dx/spv intrinsics in CGBuiltin.cpp - add semantic checks to SemaHLSL.cpp - add expansion of directx intrinsic to llvm fmul for DirectX in DXILIntrinsicExpansion.cpp - add mapping to spir-v intrinsic in SPIRVInstructionSelector.cpp - add test coverage: - degrees.hlsl -> check hlsl lowering to dx/spv degrees intrinsics - degrees-errors.hlsl/half-float-only-errors -> check semantic warnings - hlsl-intrinsics/degrees.ll -> check lowering of spir-v degrees intrinsic to SPIR-V backend - DirectX/degrees.ll -> check expansion and scalarization of directx degrees intrinsic to fmul Resolves #99104	2024-10-10 16:34:26 -07:00
Justin Fargnoli	d832a1c744	[NVPTX] Only run LowerUnreachable when necessary (#109868 ) Before CUDA 12.3 `ptxas` did not recognize that the trap instruction terminates a basic block. Instead, it would assume that control flow continued to the next instruction. The next instruction could be in the block that's lexically below it. This would lead to phantom CFG edges being created within ptxas. [NVPTX: Lower unreachable to exit to allow ptxas to accurately reconstruct the CFG.](`1ee4d880e8`) added the LowerUnreachable pass to NVPTX to work around this. Several other WAR patches followed. This bug in `ptxas` was fixed in CUDA 12.3 and is thus impossible to encounter when targeting PTX ISA v8.3+ This commit reverts the WARs for the `ptxas` bug when targeting PTX ISA v8.3+ CC @maleadt	2024-10-10 12:57:43 -07:00
Finn Plummer	d36cef0b17	[HLSL][DXIL] Implement WaveGetLaneIndex Intrinsic (#111576 ) - add additional lowering for directx backend in CGBuiltin.cpp - add directx intrinsic to IntrinsicsDirectX.td - add semantic check of arguments in SemaHLSL.cpp - add mapping to DXIL op in DXIL.td - add testing of semantics in WaveGetLaneIndex-errors.hlsl - add testing of dxil lowering in WaveGetLaneIndex.ll Resolves #70105	2024-10-10 11:44:44 -07:00
Jay Foad	62b3a4bc70	[AMDGPU] Improve codegen for s_barrier_init (#111866 )	2024-10-10 19:40:02 +01:00
Justin Fargnoli	3f9998af4f	[NVPTX] Prefer prmt.b32 over bfi.b32 (#110766 ) In [[NVPTX] Improve lowering of v4i8](`cbafb6f2f5`) @Artem-B add the ability to lower ISD::BUILD_VECTOR with bfi PTX instructions. @Artem-B did this because: ([source](https://github.com/llvm/llvm-project/pull/67866#discussion_r1343066911)) > Under the hood byte extraction/insertion ends up as BFI/BFE instructions, so we may as well do that in PTX, too. https://godbolt.org/z/Tb3zWbj9b However, the example that @Artem-B linked was targeting sm_52. On modern architectures, ptxas uses prmt.b32. [Example](https://godbolt.org/z/Ye4W1n84o). Thus, remove uses of NVPTXISD::BFI in favor of NVPTXISD::PRMT.	2024-10-10 10:24:02 -07:00
Ellis Hoag	cb5fbd2f60	[CodeLayout] Do not verify after assigning blocks (#111754 ) Rather than invariantly running `F->verify()` when asserts are enabled, run machine IR verification in LIT tests only. Swap `CHECK-PERF` and `CHECK-SIZE` in `code_placement_ext_tsp_large.ll`. Remove `={0,1,true,false}` from flags in tests.	2024-10-10 09:01:50 -07:00

1 2 3 4 5 ...

55501 Commits