llvm-project

Author	SHA1	Message	Date
Matt Arsenault	88e23eb2cf	DAG: Fix legalization of vector addrspacecasts (#113964 )	2024-10-29 08:08:50 -05:00
Benjamin Maxwell	c3260c65e8	[IR] Add `llvm.sincos` intrinsic (#109825 ) This adds the `llvm.sincos` intrinsic, legalization, and lowering. The `llvm.sincos` intrinsic takes a floating-point value and returns both the sine and cosine (as a struct). ``` declare { float, float } @llvm.sincos.f32(float %Val) declare { double, double } @llvm.sincos.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val) ``` The lowering is built on top of the existing FSINCOS ISD node, with additional type legalization to allow for f16, f128, and vector values.	2024-10-29 10:52:20 +00:00
Ellis Hoag	6ab26eab4f	Check hasOptSize() in shouldOptimizeForSize() (#112626 )	2024-10-28 09:45:03 -07:00
Simon Pilgrim	056cf936a7	[DAG] Fold (and X, (bswap/bitreverse (not Y))) -> (and X, (not (bswap/bitreverse Y))) (#112547 ) On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT Fixes #112425	2024-10-28 11:52:44 +00:00
Dimitry Andric	4bce21480f	Ensure !NDEBUG with LLVM_ENABLE_ABI_BREAKING_CHECKS does not segfault (#113588 ) In SelectionDAG, `TargetTransformInfo::hasBranchDivergence()` can be called when both `NDEBUG` and `LLVM_ENABLE_ABI_BREAKING_CHECKS` are enabled. In that case, the class member `TTI` is still initialized to `nullptr`, causing a segfault. Fix this by ensuring that all the calls to `hasBranchDivergence` and `VerifyDAGDivergence` only occur when `NDEBUG` is disabled, and `LLVM_ENABLE_ABI_BREAKING_CHECKS` is enabled.	2024-10-24 19:30:38 +02:00
James Chesterman	11c818816d	[AArch64] Improve index selection for histograms (#111150 ) Removes unnecessary extends on the indices passed into histogram instructions. It also removes the instruction when the mask is zero.	2024-10-22 11:14:00 +01:00
Simon Pilgrim	f0b3b6d15b	[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710 ) (REAPPLIED) Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this. Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions. X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type Minor cleanup that helps with #107423 Reapplied after regression fix ba1255def64a9c3c68d97ace051eec76f546eeb0	2024-10-20 14:23:21 +01:00
Simon Pilgrim	ba1255def6	[DAG] Use FoldConstantArithmetic to constant fold (and (ext (and V, c1)), c2) -> (and (ext V), (and c1, (ext c2))) Noticed while triaging the regression from #112710 noticed by @mstorsjo - don't rely on isConstantIntBuildVectorOrConstantInt+getNode to guarantee constant folding (if it fails to constant fold it will infinite loop), use FoldConstantArithmetic instead.	2024-10-20 13:05:23 +01:00
Martin Storsjö	b26df3e463	Revert "[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710 )" This reverts commit a630771b28f4b252e2754776b8f3ab416133951a. This caused compilation to hang for Windows/ARM, see https://github.com/llvm/llvm-project/pull/112710 for details.	2024-10-20 00:49:16 +03:00
Simon Pilgrim	93ec08d629	[DAG] Move SIGN_EXTEND_INREG constant folding inside FoldConstantArithmetic Update visitSIGN_EXTEND_INREG to call FoldConstantArithmetic instead of getNode.	2024-10-19 20:57:07 +01:00
Simon Pilgrim	e1330d96a0	[DAG] visitFMA/FDIV - avoid SDLoc duplication. NFC.	2024-10-18 11:57:41 +01:00
Simon Pilgrim	5c37316b54	[DAG] visitFMA/FMAD - use FoldConstantArithmetic to add missing vector constant folding support	2024-10-18 11:12:06 +01:00
Simon Pilgrim	a630771b28	[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710 ) Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this. Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions. X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type Minor cleanup that helps with #107423	2024-10-18 10:52:55 +01:00
Simon Pilgrim	3ec1b1a4dd	[DAG] visitFP_EXTEND - use FoldConstantArithmetic to attempt to constant fold Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.	2024-10-18 10:10:44 +01:00
Simon Pilgrim	3a1df05ca9	[DAG] visitFP_ROUND - use FoldConstantArithmetic to attempt to constant fold Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.	2024-10-18 10:10:43 +01:00
Simon Pilgrim	7a43be1690	[DAG] visitXROUND - use FoldConstantArithmetic to attempt to constant fold Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.	2024-10-18 10:10:43 +01:00
Simon Pilgrim	c72992bf89	[DAG] visitABS - use FoldConstantArithmetic to attempt to constant fold Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us. Cleanup for #112682	2024-10-18 10:10:43 +01:00
Keith Packard	44b020a381	[PowerPC][ISelLowering] Support -mstack-protector-guard=tls (#110928 ) Add support for using a thread-local variable with a specified offset for holding the stack guard canary value. This supports both 32- and 64- bit PowerPC targets. This mirrors changes from #108942 but targeting PowerPC instead of RISCV. Because both of these PRs modify the same driver functions, this series is stack on top of the RISC-V one. --------- Signed-off-by: Keith Packard <keithp@keithp.com>	2024-10-17 19:06:47 -07:00
Simon Pilgrim	256bbdb3f6	[DAG] visitFCEIL/FTRUNC/FFLOOR/FNEG - use FoldConstantArithmetic to attempt to constant fold Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us. Cleanup for #112682	2024-10-17 16:53:44 +01:00
Simon Pilgrim	cf046c8717	[DAG] visitSIGN_EXTEND_INREG - avoid SDLoc duplication. NFC.	2024-10-17 12:51:11 +01:00
Simon Pilgrim	5692a0c6f8	[DAG] visitFP_TO_SINT/FP_TO_UINT - use FoldConstantArithmetic to attempt to constant fold Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us. Cleanup for #112682	2024-10-17 12:50:09 +01:00
Simon Pilgrim	784c15a282	[DAG] visitSINT_TO_FP/UINT_TO_FP - use FoldConstantArithmetic to attempt to constant fold Don't rely on isConstantIntBuildVectorOrConstantInt followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us. Cleanup for #112682	2024-10-17 12:50:09 +01:00
Simon Pilgrim	8268bc48eb	[DAG] Avoid SDLoc duplication in FP<->INT combines. NFC.	2024-10-17 12:50:09 +01:00
Matt Arsenault	067e8b8dc5	DAG: Lower fcNormal is.fpclass to compare with inf (#100389 )	2024-10-17 15:49:13 +04:00
Nikita Popov	255a99c29f	[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309 ) This fixes all the places that hit the new assertion added in https://github.com/llvm/llvm-project/pull/106524 in tests. That is, cases where the value passed to the APInt constructor is not an N-bit signed/unsigned integer, where N is the bit width and signedness is determined by the isSigned flag. The fixes either set the correct value for isSigned, set the implicitTrunc flag, or perform more calculations inside APInt. Note that the assertion is currently still disabled by default, so this patch is mostly NFC.	2024-10-17 08:48:08 +02:00
Tex Riddell	875afa939d	[X86][CodeGen] Add base atan2 intrinsic lowering (p4) (#110760 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 Based on example PR #96222 and fix PR #101268, with some differences due to 2-arg intrinsic and intermediate refactor (RuntimeLibCalls.cpp). - Add llvm.experimental.constrained.atan2 - Intrinsics.td, ConstrainedOps.def, LangRef.rst - Add to ISDOpcodes.h and TargetSelectionDAG.td, connect to intrinsic in BasicTTIImpl.h, and LibFunc_ in SelectionDAGBuilder.cpp - Update LegalizeDAG.cpp, LegalizeFloatTypes.cpp, LegalizeVectorOps.cpp, and LegalizeVectorTypes.cpp - Update isKnownNeverNaN in SelectionDAG.cpp - Update SelectionDAGDumper.cpp - Update libcalls - RuntimeLibcalls.def, RuntimeLibcalls.cpp - TargetLoweringBase.cpp - Expand for vectors, promote f16 - X86ISelLowering.cpp - Expand f80, promote f32 to f64 for MSVC Part 4 for Implement the atan2 HLSL Function #70096.	2024-10-16 11:43:17 -07:00
Lewis Crawford	f5f00764ab	[DAGCombiner] Fix check for extending loads (#112182 ) Fix a check for extending loads in DAGCombiner, where if the result type has more bits than the loaded type it should count as an extending load. All backends apart from AArch64 ignore this ExtTy argument to shouldReduceLoadWidth, so this change currently only impacts AArch64.	2024-10-16 13:23:46 +01:00
Simon Pilgrim	25b702f263	[DAG] visitXOR - add missing comment for or/and constant setcc demorgan fold. NFC. Noticed while triaging #112347 which is using this fold - we described the or->and fold, but not the equivalent and->or which is also handled.	2024-10-16 11:15:36 +01:00
Simon Pilgrim	30deb76d46	[DAG] visitXOR - add missing comment for or/and constant demorgan fold. NFC. Noticed while triaging #112347 which is using this fold - we described the or->and fold, but not the equivalent and->or which is also handled.	2024-10-15 16:32:27 +01:00
c8ef	854ded9b24	Reapply "[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes." (#112203 ) This patch adds icmp+select patterns for integer min/max matchers in SDPatternMatch, similar to those in IR PatternMatch. Reapply #111774. Closes #108218.	2024-10-15 21:07:06 +08:00
Paul Walker	d27394abf0	[LLVM][SelectionDAG] Ensure Constant[FP]SDnode only store references to scalar Constant{Int,FP}. (#111005 ) This fixes a failure path when the use-constant-##-for-###-splat IR options are enabled.	2024-10-15 10:56:41 +01:00
Michael Marjieh	b5600c6f85	[TargetLowering][SelectionDAG] Exploit nneg Flag in UINT_TO_FP (#108931 ) 1. Propagate the nneg flag in WidenVecRes 2. Use SINT_TO_FP in expandUINT_TO_FP when possible.	2024-10-14 20:55:48 +04:00
c8ef	a3b0c31ebc	Revert "[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes." (#112200 ) Reverts llvm/llvm-project#111774 This appears to be causing some tests to fail.	2024-10-14 21:43:49 +08:00
c8ef	11f625cb87	[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes. (#111774 ) Closes #108218. This patch adds icmp+select patterns for integer min/max matchers in SDPatternMatch, similar to those in IR PatternMatch.	2024-10-14 21:19:34 +08:00
duk	464a7ee79e	[CodeGen] Generalize trap emission after SP check fail (#109744 ) Generalize and improve some target-specific code that emits traps after stack protector failure in SelectionDAG & GlobalIsel.	2024-10-12 20:01:22 -04:00
Kazu Hirata	a62768c427	[CodeGen] Simplify code with *Map::operator[] (NFC) (#112075 )	2024-10-11 23:01:21 -07:00
Oliver Stannard	1e49670b31	[DAGISel] Keep flags when converting FP load/store to integer (#111679 ) This DAG combine replaces a floating-point load/store pair which has no other uses with an integer one, but did not copy the memory operand flags to the new instructions, resulting in it dropping the volatile flag. This optimisation is still valid if one or both of the instructions is volatile, so we can copy over the whole MachineMemOperand to generate volatile integer loads and stores where needed.	2024-10-10 09:17:50 +01:00
YunQiang Su	8d35ab80fc	AArch64: Add FMINNUM_IEEE and FMAXNUM_IEEE support (#107855 ) FMINNM/FMAXNM instructions of AArch64 follow IEEE754-2008. We can use them to canonicalize a floating point number. And FMINNUM_IEEE/FMAXNUM_IEEE is used by something like expanding FMINIMUMNUM/FMAXIMUMNUM, so let's define them. Update combine_andor_with_cmps.ll. Add fp-maximumnum-minimumnum.ll, with nnan testcases only. V1F64 is not supported yet. If we set v1f64 as legal, FMINNUM/FMAXNUM will have some problem: both of them use `if (isOperationLegalOrCustom(FMAXNUM_IEEE, VT))`. AArch64 depends on `expandFMINNUM_FMAXNUM` returning `SDValue()` for FMAXNUM and FMINNUM. We should fix this problem, while it will be in future patch.	2024-10-10 15:09:47 +08:00
YunQiang Su	d52c8408ff	SelectionDAG/expandFMINNUM_FMAXNUM: skips vector if SETCC/VSELECT is not legal (#109570 ) If SETCC or VSELECT is not legal for vector, we should not expand it, instead we can split the vectors. So that, some simple scale instructions can be emitted instead of some pairs of comparation+selection.	2024-10-10 08:39:25 +08:00
Matt Arsenault	ced15cd418	DAG: Preserve more flags when expanding gep (#110815 ) This allows selecting the addressing mode for stack instructions in cases where we need to prove the sign bit is zero.	2024-10-09 13:51:52 +04:00
Simon Pilgrim	1dcb6dc757	[DAG] foldVSelectToSignBitSplatMask - pull out repeated code and use getShiftAmountConstant helper. We're assuming shift amount type matches the result type - which is true for vectors, but I'm hoping to generalize this fold in the future.	2024-10-08 17:36:34 +01:00
Ralf Jung	29ec0716a8	Fix comment typo in ExpandFCOPYSIGN (#111489 ) I noticed this while following https://github.com/llvm/llvm-project/pull/111269. It makes little sense that FCOPYSIGN would look at the sign of `x`, right? Surely this must be `y`. Also fix the inconsistency where it's sometimes `x` and sometimes `X`.	2024-10-08 12:47:56 +04:00
Paul Walker	02dd6b1014	[LLVM][CodeGen] Add lowering for scalable vector bfloat operations. (#109803 ) Specifically: fabs, fadd, fceil, fdiv, ffloor, fma, fmax, fmaxnm, fmin, fminnm, fmul, fnearbyint, fneg, frint, fround, froundeven, fsub, fsqrt & ftrunc	2024-10-07 13:01:59 +01:00
Luke Lau	c98e41f858	[LegalizeVectorTypes] Always widen fabs (#111298 ) fabs and fneg are similar nodes in that they can always be expanded to integer ops, but currently they diverge when widened. If the widened vector fabs is marked as expand (and the corresponding scalar type is too), LegalizeVectorTypes thinks that it may be turned into a libcall and so will unroll it to avoid the overhead on the undef elements. However unlike the other ops in that list like fsin, fround, flog etc., an fabs marked as expand will never be legalized into a libcall. Like fneg, it can always be expanded into an integer op. This moves it below unrollExpandedOp to bring it in line with fneg, which fixes an issue on RISC-V with f16 fabs being unexpectedly scalarized when there's no zfhmin.	2024-10-07 17:40:32 +08:00
Luke Lau	18d3a5d558	[LegalizeVectorTypes] When widening don't check for libcalls if promoted (#111297 ) When widening some FP ops, LegalizeVectorTypes will check to see if the widened op may be scalarized and then turned into a bunch of libcalls, and if so unroll early to avoid unnecessary libcalls of the padded undef elements. It checks if the widened op is legal or custom to see if it will be scalarized, but promoted ops will also avoid scalarization. This relaxes the check to account for this which fixes some illegal vector types on RISC-V from being scalarized when they could be widened.	2024-10-07 16:42:36 +08:00
Stephen Tozer	d826b0c90f	[LLVM] Add HasFakeUses to MachineFunction (#110097 ) Following the addition of the llvm.fake.use intrinsic and corresponding MIR instruction, two further changes are planned: to add an -fextend-lifetimes flag to Clang that emits these intrinsics, and to have -Og enable this flag by default. Currently, some logic for handling fake uses is gated by the optdebug attribute, which is intended to be switched on by -fextend-lifetimes (and by extension -Og later on). However, the decision was made that a general optdebug attribute should be incompatible with other opt_ attributes (e.g. optsize, optnone), since they all express different intents for how to optimize the program. We would still like to allow -fextend-lifetimes with optsize however (i.e. -Os -fextend-lifetimes should be legal), since it may be a useful configuration and there is no technical reason to not allow it. This patch resolves this by tracking MachineFunctions that have fake uses, allowing us to run passes that interact with them and skip passes that clash with them.	2024-10-04 13:13:30 +01:00
Luke Lau	487686b82e	[SDAG][RISCV] Don't promote VP_REDUCE_{FADD,FMUL} (#111000 ) In https://reviews.llvm.org/D153848, promotion was added for a variety of f16 ops with zvfhmin, including VP reductions. However I don't believe it's correct to promote f16 fadd or fmul reductions to f32 since we need to round the intermediate results. Today if we lower @llvm.vp.reduce.fadd.nxv1f16 on RISC-V, we'll get two different results depending on whether we compiled with +zvfh or +zvfhmin, for example with a 3 element reduction: ; v9 = [0.1563, 5.97e-8, 0.00006104] ; zvfh vsetivli x0, 3, e16, m1, ta, ma vmv.v.i v8, 0 vfredosum.vs v8, v9, v8 vfmv.f.s fa0, v8 ; fa0 = 0.1563 ; zvfhmin vsetivli x0, 3, e16, m1, ta, ma vfwcvt.f.f.v v10, v9 vsetivli x0, 3, e32, m1, ta, ma vmv.v.i v8, 0 vfredosum.vs v8, v10, v8 vfmv.f.s fa0, v8 fcvt.h.s fa0, fa0 ; fa0 = 0.1564 This same thing happens with reassociative reductions e.g. vfredusum.vs, and this also applies for bf16. I couldn't find anything in the LangRef for reductions that suggest the excess precision is allowed. There may be something we can do in Clang with -fexcess-precision=fast, but I haven't looked into this yet. I presume the same precision issue occurs with fmul, but not with fmin/fmax/fminimum/fmaximum. I can't think of another way of lowering these other than scalarizing, and we can't scalarize scalable vectors, so this just removes the promotion and adjusts the cost model to return an invalid cost. (It looks like we also don't currently cost fmul reductions, so presumably they also have an invalid cost?) I think this should be enough to stop the loop vectorizer or SLP from emitting these intrinsics.	2024-10-04 00:17:45 +08:00
Mehdi Amini	6c7a3f80e7	Fix LLVM_ENABLE_ABI_BREAKING_CHECKS macro check: use #if instead of #ifdef (#110938 ) This macros is always defined: either 0 or 1. The correct pattern is to use #if. Re-apply #110185 with more fixes for debug build with the ABI breaking checks disabled.	2024-10-03 01:24:14 +02:00
Christopher Di Bella	45ad1ac4a3	Revert "Fix LLVM_ENABLE_ABI_BREAKING_CHECKS macro check: use #if inst… (#110923 ) …ead of #ifdef (#110883)" This reverts commit 1905cdbf4ef15565504036c52725cb0622ee64ef, which causes lots of failures where LLVM doesn't have the right header guards. The errors can be seen on [BuildKite](https://buildkite.com/llvm-project/upstream-bazel/builds/112362#01924eae-231c-4d06-ba87-2c538cf40e04), where the source uses `#ifndef NDEBUG`, but the content in question is defined when `LLVM_ENABLE_ABI_BREAKING_CHECKS == 1`. For example, `llvm/include/llvm/Support/GenericDomTreeConstruction.h` has the following: ```cpp // Helper struct used during edge insertions. struct InsertionInfo { // ... #ifdef LLVM_ENABLE_ABI_BREAKING_CHECKS SmallVector<TreeNodePtr, 8> VisitedUnaffected; #endif }; // ... InsertionInfo II; // ... #ifndef NDEBUG II.VisitedUnaffected.push_back(SuccTN); #endif ```	2024-10-02 13:54:09 -07:00
Mehdi Amini	1905cdbf4e	Fix LLVM_ENABLE_ABI_BREAKING_CHECKS macro check: use #if instead of #ifdef (#110883 ) This macros is always defined: either 0 or 1. The correct pattern is to use #if. Reapply https://github.com/llvm/llvm-project/pull/110185 with fixes.	2024-10-02 18:43:16 +02:00

1 2 3 4 5 ...

13864 Commits