llvm-project

Author	SHA1	Message	Date
Mirko	4d3c427f33	[CodeGen] Use first EHLabel as a stop gate for live range shrinking (#114195 ) This fixes issue #114194 The issue happens during the `LiveRangeShrink` pass, which runs early, before phi elimination. LandingPads, which are lowered to EHLabels, need to be the first non phi instruction in an EHPad. In case of a phi node being in front of the EHLabel and a use being after the EHLabel, we hoist the use in front of the label. This results in a portion of the landingpad missing due to being hoisted in front of the label.	2024-11-01 19:13:18 -07:00
Phoebe Wang	c72a751dab	[X86][AMX] Support AMX-TRANSPOSE (#113532 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-01 16:45:03 +08:00
Simon Pilgrim	9fb4bc5bf4	[DAG] SimplifyMultipleUseDemandedBits - ignore SRL node if we're just demanding known sign bits (#114389 ) Check to see if we are only demanding (shifted) signbits from a SRL node that are also signbits in the source node. We can't demand any upper zero bits that the SRL will shift in (up to max shift amount), and the lower demanded bits bound must already be all signbits.	2024-10-31 16:40:29 +00:00
Feng Zou	8127162427	[X86][AMX] Support AMX-FP8 (#113850 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-10-31 10:14:25 +08:00
Simon Pilgrim	f7b5f0c805	[DAG] Fold (and X, (rot (not Y), Z)) -> (and X, (not (rot Y, Z))) On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT Followup to #112547	2024-10-30 10:46:12 +00:00
Mahesh-Attarde	e61a7dc256	[X86][AVX512] Use comx for compare (#113567 ) We added AVX10.2 COMEF ISA in LLVM, This does not optimize correctly in scenario mentioned below. Summary Input ``` define i1 @oeq(float %x, float %y) { %1 = fcmp oeq float %x, %y ret i1 %1 }define i1 @une(float %x, float %y) { %1 = fcmp une float %x, %y ret i1 %1 }define i1 @ogt(float %x, float %y) { %1 = fcmp ogt float %x, %y ret i1 %1 } // Prior AVX10.2, default code generation oeq: # @oeq cmpeqss xmm0, xmm1 movd eax, xmm0 and eax, 1 ret une: # @une cmpneqss xmm0, xmm1 movd eax, xmm0 and eax, 1 ret ogt: # @ogt ucomiss xmm0, xmm1 seta al ret ``` This patch will remove `cmpeqss` and `cmpneqss`. For complete transform check unit test. Continuing on what PR https://github.com/llvm/llvm-project/pull/113098 added Earlier Legalization and combine expanded `setcc oeq:ch` node into `and` and `setcc eq` , `setcc o`. From suggestions in community new internal transform ``` Optimized type-legalized selection DAG: %bb.0 'hoeq:' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: f16,ch = CopyFromReg t0, Register:f16 %0 t4: f16,ch = CopyFromReg t0, Register:f16 %1 t14: i8 = setcc t2, t4, setoeq:ch t10: ch,glue = CopyToReg t0, Register:i8 $al, t14 t11: ch = X86ISD::RET_GLUE t10, TargetConstant:i32<0>, Register:i8 $al, t10:1 Optimized legalized selection DAG: %bb.0 'hoeq:' SelectionDAG has 12 nodes: t0: ch,glue = EntryToken t2: f16,ch = CopyFromReg t0, Register:f16 %0 t4: f16,ch = CopyFromReg t0, Register:f16 %1 t15: i32 = X86ISD::UCOMX t2, t4 t17: i8 = X86ISD::SETCC TargetConstant:i8<4>, t15 t10: ch,glue = CopyToReg t0, Register:i8 $al, t17 t11: ch = X86ISD::RET_GLUE t10, TargetConstant:i32<0>, Register:i8 $al, t10:1 ``` Earlier transform is mentioned here https://github.com/llvm/llvm-project/pull/113098#discussion_r1810307663 --------- Co-authored-by: mattarde <mattarde@intel.com>	2024-10-30 16:17:25 +08:00
David Majnemer	5c12434906	[X86] Emit comments explaining the immediate in vfpclass This makes the assembly a lot more readable at a glance. As an example: ``` vfpclasspd $4, %zmm0, %k0 # k0 = isNegativeZero(zmm0) ```	2024-10-29 19:54:34 +00:00
Craig Topper	635c344dfb	[X86] Add vector_compress patterns with a zero vector passthru. (#113970 ) We can use the kz form to automatically zero the extra elements. Fixes #113263.	2024-10-28 19:59:00 -07:00
Simon Pilgrim	c5edecbb4b	[X86] Regenerate scmp/ucmp test checks with vpternlog comments	2024-10-28 21:36:59 +00:00
Simon Pilgrim	056cf936a7	[DAG] Fold (and X, (bswap/bitreverse (not Y))) -> (and X, (not (bswap/bitreverse Y))) (#112547 ) On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT Fixes #112425	2024-10-28 11:52:44 +00:00
Phoebe Wang	fd85761208	[X86][BF16] Customize VSELECT for BF16 under AVX-NECONVERT (#113322 ) Fixes: https://godbolt.org/z/9abGnE8zs	2024-10-28 15:15:49 +08:00
Serge Pavlov	819abe412d	[Test] Fix usage of constrained intrinsics (#113523 ) Some tests contain errors in constrained intrinsic usage, such as missed or extra type parameters, wrong type parameters order and some other. --------- Co-authored-by: Andy Kaylor <andy_kaylor@yahoo.com>	2024-10-28 14:07:32 +07:00
Freddy Ye	5aa1275d03	[X86] Support SM4 EVEX version intrinsics/instructions. (#113402 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-10-28 10:46:16 +08:00
Phoebe Wang	40fffba9b2	[X86][AVX10.2] Fix wrong predicates for BF16 feature (#113800 ) Since AVX10.2, we need to enable 128/256-bit vector by default and check for 512 feature for 512-bit vector.	2024-10-28 09:54:29 +08:00
Aiden Grossman	7c9cf0c6f0	[SHT_LLVM_BB_ADDR_MAP][AsmPrinter] Emit error on bad option combinatons This patch makes it so that specifying all or none for -pgo-analysis-map along with an explicit option causes an error as this set of options does not really make sense.	2024-10-26 08:15:34 +00:00
Aiden Grossman	38caf282ab	[SHT_LLVM_BB_ADDR_MAP][AsmPrinter] Add none and all options to PGO Map (#111221 ) This patch adds none and all options to the -pgo-analysis-map flag, which do basically what they say on the tin. The none option is added to enable forcing the pgo-analysis-map by overriding an earlier invocation of the flag. The all option is just added for convenience.	2024-10-25 15:39:52 -07:00
Gaëtan Bossu	a0c318938a	[CodeGen][NFC] Properly split MachineLICM and EarlyMachineLICM (#113573 ) Both are based on MachineLICMBase, and the functionality there is "switched" based on a PreRegAlloc flag. This commit is simply about trusting the original value of that flag, defined by the `MachineLICM` and `EarlyMachineLICM` classes. The `PreRegAlloc` flag used to be overwritten it based on MRI.isSSA(), which is un-reliable due to how it is inferred by the MIRParser. I see that we can now define isSSA in MIR (thanks @gargaroff ), meaning the fix isn’t really needed anymore, but redefining that flag still feels wrong. Note that I'm looking into upstreaming more changes to MachineLICM, see [the discourse thread](https://discourse.llvm.org/t/extending-post-regalloc-machinelicm/82725).	2024-10-25 11:19:22 -07:00
Freddy Ye	c4248fa3ed	[X86] Support MOVRS and AVX10.2 instructions. (#113274 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-10-25 09:00:19 +08:00
Simon Pilgrim	b34d64921b	[X86] ReplaceNodeResults - adjust assert to allow XOP or GFNI subtargets to split i64 BITREVERSE nodes on 32-bit targets Fixes #113353 Fixes #113034	2024-10-24 06:39:07 -07:00
Vladimir Radosavljevic	401d123a1f	[MCP] Optimize copies when src is used during backward propagation (#111130 ) Before this patch, redundant COPY couldn't be removed for the following case: ``` $R0 = OP ... ... // Read of %R0 $R1 = COPY killed $R0 ``` This patch adds support for tracking the users of the source register during backward propagation, so that we can remove the redundant COPY in the above case and optimize it to: ``` $R1 = OP ... ... // Replace all uses of %R0 with $R1 ```	2024-10-23 13:37:02 +02:00
Akshat Oke	c4c60c0db9	[CodeGen][NewPM] Port OptimizePHIs to NPM (#113433 )	2024-10-23 16:55:21 +05:30
Miguel Saldivar	49ebe32905	[X86] combineAndNotOrIntoAndNotAnd - don't fold other constant operands (#113264 ) Looks like having a constant in `Z` also caused infinite loops. This fixes #113240.	2024-10-23 10:00:02 +08:00
Daniel Paoliello	6e1a7ac531	[llvm][x64] Mark win x64 SEH pseudo instruction as meta instructions (again) (#112962 ) When adding new SEH pseudo instructions in #110024 I noticed that some of the tests were changing their output since these new instructions were counting towards thresholds for branching versus folding decisions. These instructions do not result in real machine instructions being emitted, so they should be marked as meta instructions. This is a re-do of #110889 as we hit an issue where some of the SEH pseudo instructions in the prolog were being duplicated, which resulted errors being raised as the CodeView generator was seeing prolog directives after an end-prolog directive: <https://github.com/llvm/llvm-project/pull/110889#issuecomment-2393405613>. The fix for this is to mark the prolog related SEH pseudo instructions as being non-duplicatable.	2024-10-21 13:34:11 -07:00
Simon Pilgrim	f0b3b6d15b	[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710 ) (REAPPLIED) Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this. Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions. X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type Minor cleanup that helps with #107423 Reapplied after regression fix ba1255def64a9c3c68d97ace051eec76f546eeb0	2024-10-20 14:23:21 +01:00
Martin Storsjö	b26df3e463	Revert "[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710 )" This reverts commit a630771b28f4b252e2754776b8f3ab416133951a. This caused compilation to hang for Windows/ARM, see https://github.com/llvm/llvm-project/pull/112710 for details.	2024-10-20 00:49:16 +03:00
Alex Rønne Petersen	5785cbb405	[llvm] Ensure that soft float targets don't emit `fma()` libcalls. (#106615 ) The previous behavior could be harmful in some edge cases, such as emitting a call to `fma()` in the `fma()` implementation itself. Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`. This was already done for PowerPC; this commit just extends that to Arm, z/Arch, and x86. MIPS and SPARC already got it right, but I added tests for them too, for good measure. Note: I don't have commit access.	2024-10-19 06:13:15 -07:00
Simon Pilgrim	7da0a69852	[X86] andnot-patterns.ll - add non-BMI test coverage Extra test coverage for #112547 to test cases where we don't create a ANDNOT instruction	2024-10-18 17:43:57 +01:00
Simon Pilgrim	5c37316b54	[DAG] visitFMA/FMAD - use FoldConstantArithmetic to add missing vector constant folding support	2024-10-18 11:12:06 +01:00
Simon Pilgrim	a630771b28	[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710 ) Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this. Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions. X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type Minor cleanup that helps with #107423	2024-10-18 10:52:55 +01:00
Simon Pilgrim	4e0169005e	[X86] Add FMA constant folding test coverage Shows we constant fold scalars but not vectors	2024-10-18 10:48:43 +01:00
Alex Rønne Petersen	ad4a582fd9	[llvm] Consistently respect `naked` fn attribute in `TargetFrameLowering::hasFP()` (#106014 ) Some targets (e.g. PPC and Hexagon) already did this. I think it's best to do this consistently so that frontend authors don't run into inconsistent results when they emit `naked` functions. For example, in Zig, we had to change our emit code to also set `frame-pointer=none` to get reliable results across targets. Note: I don't have commit access.	2024-10-18 09:35:42 +04:00
Tex Riddell	dea213cb9b	Add atan2 test case for prior change in X86SelLowering.cpp (#112616 ) When updating X86SelLowering.cpp for atan2, based on #96222, it was known that a needed change was missing which was merged later in #101268. However, the corresponding test update to `fp-strict-libcalls-msvc32.ll` was missed. This change rectifies that oversight. This also adds a missing label to the tanh test, since it's produced by update_llc_test_checks.py Part of: Implement the atan2 HLSL Function #70096.	2024-10-17 10:37:32 -07:00
Simon Pilgrim	3f17da1f45	[X86] Regenerate test checks with vpternlog comments	2024-10-17 14:18:49 +01:00
Simon Pilgrim	d51af6c215	[X86] Regenerate test checks with vpternlog comments	2024-10-17 09:54:24 +01:00
Tex Riddell	875afa939d	[X86][CodeGen] Add base atan2 intrinsic lowering (p4) (#110760 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 Based on example PR #96222 and fix PR #101268, with some differences due to 2-arg intrinsic and intermediate refactor (RuntimeLibCalls.cpp). - Add llvm.experimental.constrained.atan2 - Intrinsics.td, ConstrainedOps.def, LangRef.rst - Add to ISDOpcodes.h and TargetSelectionDAG.td, connect to intrinsic in BasicTTIImpl.h, and LibFunc_ in SelectionDAGBuilder.cpp - Update LegalizeDAG.cpp, LegalizeFloatTypes.cpp, LegalizeVectorOps.cpp, and LegalizeVectorTypes.cpp - Update isKnownNeverNaN in SelectionDAG.cpp - Update SelectionDAGDumper.cpp - Update libcalls - RuntimeLibcalls.def, RuntimeLibcalls.cpp - TargetLoweringBase.cpp - Expand for vectors, promote f16 - X86ISelLowering.cpp - Expand f80, promote f32 to f64 for MSVC Part 4 for Implement the atan2 HLSL Function #70096.	2024-10-16 11:43:17 -07:00
Simon Pilgrim	b238c2b199	[X86] Regenerate test checks with vpternlog comments	2024-10-16 19:05:02 +01:00
Simon Pilgrim	e839d2a60a	[X86] andnot-patterns.ll - tweak #112425 test patterns to use separate source values for ANDNOT operands	2024-10-16 14:32:51 +01:00
Simon Pilgrim	3e31e30a84	[X86] Add some basic test coverage for #112425	2024-10-16 13:27:04 +01:00
Simon Pilgrim	9ee9e0e3b2	[X86] Extend ANDNOT fold tests to cover all legal scalars and 256-bit vectors Add tests to check what happens on i8/i16/i32 scalars (ANDN only has i32/i64 variants)	2024-10-16 11:15:36 +01:00
Antonio Frighetto	d3a8363bec	[X86] Do not elect to tail call if caller must preserve all registers A miscompilation issue has been addressed with improved checking. Fixes: https://github.com/llvm/llvm-project/issues/97758.	2024-10-16 09:54:08 +02:00
Antonio Frighetto	c137b3ee35	[X86] Introduce test for PR112098 (NFC)	2024-10-16 09:54:08 +02:00
Simon Pilgrim	ec78f0da0e	[X86] combineAndNotOrIntoAndNotAnd - don't attempt with constant operands Don't fold AND(X,OR(NOT(Z),C)) -> AND(X,NOT(AND(Z,C'))) as DAGCombiner will invert it back again. Fixes #112347	2024-10-15 16:54:08 +01:00
Simon Pilgrim	a3a9ba8033	[X86] lowerShuffleAsVTRUNC - ensure we peek through bitcasts when looking for freely-concatable subvectors Fixes #111611	2024-10-15 15:40:52 +01:00
Simon Pilgrim	64421eced2	[X86] shuffle-vs-trunc-512.ll - add missing AVX512BW FAST PERLANE/CROSSLANE check prefixes	2024-10-15 15:40:52 +01:00
Simon Pilgrim	e100e4afc9	[X86] Add test coverage for #111611	2024-10-15 15:23:03 +01:00
Simon Pilgrim	94eb97550a	[X86] shuffle-vs-trunc-256.ll - regenerate test checks with vpternlog comments	2024-10-15 15:23:02 +01:00
Simon Pilgrim	b75f9f7b3a	[X86[] fp128-libcalls-strict.ll - add missing fp80 libm declarations for completeness Noticed while reviewing #110760	2024-10-15 14:53:40 +01:00
Simon Pilgrim	6e86496d2f	[X86[] fp80-strict-libcalls.ll - add missing fp80 libm declarations for completeness Noticed while reviewing #110760	2024-10-15 14:22:01 +01:00
c8ef	854ded9b24	Reapply "[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes." (#112203 ) This patch adds icmp+select patterns for integer min/max matchers in SDPatternMatch, similar to those in IR PatternMatch. Reapply #111774. Closes #108218.	2024-10-15 21:07:06 +08:00
Phoebe Wang	08ddbab866	[X86][AMX] Fix missing stride register for tileloadd (#110226 ) Fixes: #110190	2024-10-15 13:02:00 +08:00

1 2 3 4 5 ...

20684 Commits