20684 Commits

Author SHA1 Message Date
Mirko
4d3c427f33
[CodeGen] Use first EHLabel as a stop gate for live range shrinking (#114195)
This fixes issue #114194

The issue happens during the `LiveRangeShrink` pass, which runs early,
before phi elimination. LandingPads, which are lowered to EHLabels, need
to be the first non phi instruction in an EHPad. In case of a phi node
being in front of the EHLabel and a use being after the EHLabel, we
hoist the use in front of the label.

This results in a portion of the landingpad missing due to being hoisted
in front of the label.
2024-11-01 19:13:18 -07:00
Phoebe Wang
c72a751dab
[X86][AMX] Support AMX-TRANSPOSE (#113532)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-11-01 16:45:03 +08:00
Simon Pilgrim
9fb4bc5bf4
[DAG] SimplifyMultipleUseDemandedBits - ignore SRL node if we're just demanding known sign bits (#114389)
Check to see if we are only demanding (shifted) signbits from a SRL node that are also signbits in the source node.

We can't demand any upper zero bits that the SRL will shift in (up to max shift amount), and the lower demanded bits bound must already be all signbits.
2024-10-31 16:40:29 +00:00
Feng Zou
8127162427
[X86][AMX] Support AMX-FP8 (#113850)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-10-31 10:14:25 +08:00
Simon Pilgrim
f7b5f0c805 [DAG] Fold (and X, (rot (not Y), Z)) -> (and X, (not (rot Y, Z)))
On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT

Followup to #112547
2024-10-30 10:46:12 +00:00
Mahesh-Attarde
e61a7dc256
[X86][AVX512] Use comx for compare (#113567)
We added AVX10.2 COMEF ISA in LLVM, This does not optimize correctly in
scenario mentioned below.
Summary
Input 
```
define i1 @oeq(float %x, float %y) {
    %1 = fcmp oeq float %x, %y
    ret i1 %1
}define i1 @une(float %x, float %y) {
    %1 = fcmp une float %x, %y
    ret i1 %1
}define i1 @ogt(float %x, float %y) {
    %1 = fcmp ogt float %x, %y
    ret i1 %1
}
// Prior AVX10.2, default code generation

oeq:                                    # @oeq
        cmpeqss xmm0, xmm1
        movd    eax, xmm0
        and     eax, 1
        ret
une:                                    # @une
        cmpneqss        xmm0, xmm1
        movd    eax, xmm0
        and     eax, 1
        ret
ogt:                                    # @ogt
        ucomiss xmm0, xmm1
        seta    al
        ret 
```

This patch will remove `cmpeqss` and `cmpneqss`. For complete transform
check unit test.

Continuing on what PR https://github.com/llvm/llvm-project/pull/113098
added

Earlier Legalization and combine expanded `setcc oeq:ch` node into `and`
and `setcc eq` , `setcc o`. From suggestions in community
new internal transform
```
Optimized type-legalized selection DAG: %bb.0 'hoeq:'
SelectionDAG has 11 nodes:
  t0: ch,glue = EntryToken
      t2: f16,ch = CopyFromReg t0, Register:f16 %0
      t4: f16,ch = CopyFromReg t0, Register:f16 %1
    t14: i8 = setcc t2, t4, setoeq:ch
  t10: ch,glue = CopyToReg t0, Register:i8 $al, t14
  t11: ch = X86ISD::RET_GLUE t10, TargetConstant:i32<0>, Register:i8 $al, t10:1

Optimized legalized selection DAG: %bb.0 'hoeq:'
SelectionDAG has 12 nodes:
  t0: ch,glue = EntryToken
        t2: f16,ch = CopyFromReg t0, Register:f16 %0
        t4: f16,ch = CopyFromReg t0, Register:f16 %1
      t15: i32 = X86ISD::UCOMX t2, t4
    t17: i8 = X86ISD::SETCC TargetConstant:i8<4>, t15
  t10: ch,glue = CopyToReg t0, Register:i8 $al, t17
  t11: ch = X86ISD::RET_GLUE t10, TargetConstant:i32<0>, Register:i8 $al, t10:1
```
Earlier transform is mentioned here
https://github.com/llvm/llvm-project/pull/113098#discussion_r1810307663

---------

Co-authored-by: mattarde <mattarde@intel.com>
2024-10-30 16:17:25 +08:00
David Majnemer
5c12434906 [X86] Emit comments explaining the immediate in vfpclass
This makes the assembly a lot more readable at a glance.

As an example:
```
  vfpclasspd $4, %zmm0, %k0 # k0 = isNegativeZero(zmm0)
```
2024-10-29 19:54:34 +00:00
Craig Topper
635c344dfb
[X86] Add vector_compress patterns with a zero vector passthru. (#113970)
We can use the kz form to automatically zero the extra elements.

Fixes #113263.
2024-10-28 19:59:00 -07:00
Simon Pilgrim
c5edecbb4b [X86] Regenerate scmp/ucmp test checks with vpternlog comments 2024-10-28 21:36:59 +00:00
Simon Pilgrim
056cf936a7
[DAG] Fold (and X, (bswap/bitreverse (not Y))) -> (and X, (not (bswap/bitreverse Y))) (#112547)
On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT

Fixes #112425
2024-10-28 11:52:44 +00:00
Phoebe Wang
fd85761208
[X86][BF16] Customize VSELECT for BF16 under AVX-NECONVERT (#113322)
Fixes: https://godbolt.org/z/9abGnE8zs
2024-10-28 15:15:49 +08:00
Serge Pavlov
819abe412d
[Test] Fix usage of constrained intrinsics (#113523)
Some tests contain errors in constrained intrinsic usage, such as missed
or extra type parameters, wrong type parameters order and some other.

---------

Co-authored-by: Andy Kaylor <andy_kaylor@yahoo.com>
2024-10-28 14:07:32 +07:00
Freddy Ye
5aa1275d03
[X86] Support SM4 EVEX version intrinsics/instructions. (#113402)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-10-28 10:46:16 +08:00
Phoebe Wang
40fffba9b2
[X86][AVX10.2] Fix wrong predicates for BF16 feature (#113800)
Since AVX10.2, we need to enable 128/256-bit vector by default and check
for 512 feature for 512-bit vector.
2024-10-28 09:54:29 +08:00
Aiden Grossman
7c9cf0c6f0 [SHT_LLVM_BB_ADDR_MAP][AsmPrinter] Emit error on bad option combinatons
This patch makes it so that specifying all or none for -pgo-analysis-map
along with an explicit option causes an error as this set of options
does not really make sense.
2024-10-26 08:15:34 +00:00
Aiden Grossman
38caf282ab
[SHT_LLVM_BB_ADDR_MAP][AsmPrinter] Add none and all options to PGO Map (#111221)
This patch adds none and all options to the -pgo-analysis-map flag,
which do basically what they say on the tin. The none option is added to
enable forcing the pgo-analysis-map by overriding an earlier invocation
of the flag. The all option is just added for convenience.
2024-10-25 15:39:52 -07:00
Gaëtan Bossu
a0c318938a
[CodeGen][NFC] Properly split MachineLICM and EarlyMachineLICM (#113573)
Both are based on MachineLICMBase, and the functionality there is
"switched" based on a PreRegAlloc flag. This commit is simply about
trusting the original value of that flag, defined by the `MachineLICM`
and `EarlyMachineLICM` classes.

The `PreRegAlloc` flag used to be overwritten it based on MRI.isSSA(),
which is un-reliable due to how it is inferred by the MIRParser. I see
that we can now define isSSA in MIR (thanks @gargaroff ), meaning the
fix isn’t really needed anymore, but redefining that flag still feels
wrong.

Note that I'm looking into upstreaming more changes to MachineLICM, see
[the discourse
thread](https://discourse.llvm.org/t/extending-post-regalloc-machinelicm/82725).
2024-10-25 11:19:22 -07:00
Freddy Ye
c4248fa3ed
[X86] Support MOVRS and AVX10.2 instructions. (#113274)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-10-25 09:00:19 +08:00
Simon Pilgrim
b34d64921b [X86] ReplaceNodeResults - adjust assert to allow XOP or GFNI subtargets to split i64 BITREVERSE nodes on 32-bit targets
Fixes #113353
Fixes #113034
2024-10-24 06:39:07 -07:00
Vladimir Radosavljevic
401d123a1f
[MCP] Optimize copies when src is used during backward propagation (#111130)
Before this patch, redundant COPY couldn't be removed for the following
case:
```
  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0
```
This patch adds support for tracking the users of the source register
during backward propagation, so that we can remove the redundant COPY in
the above case and optimize it to:
```
  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1
```
2024-10-23 13:37:02 +02:00
Akshat Oke
c4c60c0db9
[CodeGen][NewPM] Port OptimizePHIs to NPM (#113433) 2024-10-23 16:55:21 +05:30
Miguel Saldivar
49ebe32905
[X86] combineAndNotOrIntoAndNotAnd - don't fold other constant operands (#113264)
Looks like having a constant in `Z` also caused infinite loops. This
fixes #113240.
2024-10-23 10:00:02 +08:00
Daniel Paoliello
6e1a7ac531
[llvm][x64] Mark win x64 SEH pseudo instruction as meta instructions (again) (#112962)
When adding new SEH pseudo instructions in #110024 I noticed that some
of the tests were changing their output since these new instructions
were counting towards thresholds for branching versus folding decisions.

These instructions do not result in real machine instructions being
emitted, so they should be marked as meta instructions.

This is a re-do of #110889 as we hit an issue where some of the SEH
pseudo instructions in the prolog were being duplicated, which resulted
errors being raised as the CodeView generator was seeing prolog
directives after an end-prolog directive:
<https://github.com/llvm/llvm-project/pull/110889#issuecomment-2393405613>.
The fix for this is to mark the prolog related SEH pseudo instructions
as being non-duplicatable.
2024-10-21 13:34:11 -07:00
Simon Pilgrim
f0b3b6d15b [DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710) (REAPPLIED)
Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this.

Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions.

X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type

Minor cleanup that helps with #107423

Reapplied after regression fix ba1255def64a9c3c68d97ace051eec76f546eeb0
2024-10-20 14:23:21 +01:00
Martin Storsjö
b26df3e463 Revert "[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710)"
This reverts commit a630771b28f4b252e2754776b8f3ab416133951a.

This caused compilation to hang for Windows/ARM, see
https://github.com/llvm/llvm-project/pull/112710 for details.
2024-10-20 00:49:16 +03:00
Alex Rønne Petersen
5785cbb405
[llvm] Ensure that soft float targets don't emit fma() libcalls. (#106615)
The previous behavior could be harmful in some edge cases, such as
emitting a call to `fma()` in the `fma()` implementation itself.

Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`.
This was already done for PowerPC; this commit just extends that to Arm,
z/Arch, and x86. MIPS and SPARC already got it right, but I added tests
for them too, for good measure.

Note: I don't have commit access.
2024-10-19 06:13:15 -07:00
Simon Pilgrim
7da0a69852 [X86] andnot-patterns.ll - add non-BMI test coverage
Extra test coverage for #112547 to test cases where we don't create a ANDNOT instruction
2024-10-18 17:43:57 +01:00
Simon Pilgrim
5c37316b54 [DAG] visitFMA/FMAD - use FoldConstantArithmetic to add missing vector constant folding support 2024-10-18 11:12:06 +01:00
Simon Pilgrim
a630771b28
[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710)
Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this.

Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions.

X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type

Minor cleanup that helps with #107423
2024-10-18 10:52:55 +01:00
Simon Pilgrim
4e0169005e [X86] Add FMA constant folding test coverage
Shows we constant fold scalars but not vectors
2024-10-18 10:48:43 +01:00
Alex Rønne Petersen
ad4a582fd9
[llvm] Consistently respect naked fn attribute in TargetFrameLowering::hasFP() (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.

Note: I don't have commit access.
2024-10-18 09:35:42 +04:00
Tex Riddell
dea213cb9b
Add atan2 test case for prior change in X86SelLowering.cpp (#112616)
When updating X86SelLowering.cpp for atan2, based on #96222, it was
known that a needed change was missing which was merged later in
#101268. However, the corresponding test update to
`fp-strict-libcalls-msvc32.ll` was missed.

This change rectifies that oversight.

This also adds a missing label to the tanh test, since it's produced by
update_llc_test_checks.py

Part of: Implement the atan2 HLSL Function #70096.
2024-10-17 10:37:32 -07:00
Simon Pilgrim
3f17da1f45 [X86] Regenerate test checks with vpternlog comments 2024-10-17 14:18:49 +01:00
Simon Pilgrim
d51af6c215 [X86] Regenerate test checks with vpternlog comments 2024-10-17 09:54:24 +01:00
Tex Riddell
875afa939d
[X86][CodeGen] Add base atan2 intrinsic lowering (p4) (#110760)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

Based on example PR #96222 and fix PR #101268, with some differences due
to 2-arg intrinsic and intermediate refactor (RuntimeLibCalls.cpp).

- Add llvm.experimental.constrained.atan2 - Intrinsics.td,
ConstrainedOps.def, LangRef.rst
- Add to ISDOpcodes.h and TargetSelectionDAG.td, connect to intrinsic in
BasicTTIImpl.h, and LibFunc_ in SelectionDAGBuilder.cpp
- Update LegalizeDAG.cpp, LegalizeFloatTypes.cpp, LegalizeVectorOps.cpp,
and LegalizeVectorTypes.cpp
- Update isKnownNeverNaN in SelectionDAG.cpp
- Update SelectionDAGDumper.cpp
- Update libcalls - RuntimeLibcalls.def, RuntimeLibcalls.cpp
- TargetLoweringBase.cpp - Expand for vectors, promote f16
- X86ISelLowering.cpp - Expand f80, promote f32 to f64 for MSVC

Part 4 for Implement the atan2 HLSL Function #70096.
2024-10-16 11:43:17 -07:00
Simon Pilgrim
b238c2b199 [X86] Regenerate test checks with vpternlog comments 2024-10-16 19:05:02 +01:00
Simon Pilgrim
e839d2a60a [X86] andnot-patterns.ll - tweak #112425 test patterns to use separate source values for ANDNOT operands 2024-10-16 14:32:51 +01:00
Simon Pilgrim
3e31e30a84 [X86] Add some basic test coverage for #112425 2024-10-16 13:27:04 +01:00
Simon Pilgrim
9ee9e0e3b2 [X86] Extend ANDNOT fold tests to cover all legal scalars and 256-bit vectors
Add tests to check what happens on i8/i16/i32 scalars (ANDN only has i32/i64 variants)
2024-10-16 11:15:36 +01:00
Antonio Frighetto
d3a8363bec [X86] Do not elect to tail call if caller must preserve all registers
A miscompilation issue has been addressed with improved checking.

Fixes: https://github.com/llvm/llvm-project/issues/97758.
2024-10-16 09:54:08 +02:00
Antonio Frighetto
c137b3ee35 [X86] Introduce test for PR112098 (NFC) 2024-10-16 09:54:08 +02:00
Simon Pilgrim
ec78f0da0e [X86] combineAndNotOrIntoAndNotAnd - don't attempt with constant operands
Don't fold AND(X,OR(NOT(Z),C)) -> AND(X,NOT(AND(Z,C'))) as DAGCombiner will invert it back again.

Fixes #112347
2024-10-15 16:54:08 +01:00
Simon Pilgrim
a3a9ba8033 [X86] lowerShuffleAsVTRUNC - ensure we peek through bitcasts when looking for freely-concatable subvectors
Fixes #111611
2024-10-15 15:40:52 +01:00
Simon Pilgrim
64421eced2 [X86] shuffle-vs-trunc-512.ll - add missing AVX512BW FAST PERLANE/CROSSLANE check prefixes 2024-10-15 15:40:52 +01:00
Simon Pilgrim
e100e4afc9 [X86] Add test coverage for #111611 2024-10-15 15:23:03 +01:00
Simon Pilgrim
94eb97550a [X86] shuffle-vs-trunc-256.ll - regenerate test checks with vpternlog comments 2024-10-15 15:23:02 +01:00
Simon Pilgrim
b75f9f7b3a [X86[] fp128-libcalls-strict.ll - add missing fp80 libm declarations for completeness
Noticed while reviewing #110760
2024-10-15 14:53:40 +01:00
Simon Pilgrim
6e86496d2f [X86[] fp80-strict-libcalls.ll - add missing fp80 libm declarations for completeness
Noticed while reviewing #110760
2024-10-15 14:22:01 +01:00
c8ef
854ded9b24
Reapply "[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes." (#112203)
This patch adds icmp+select patterns for integer min/max matchers in
SDPatternMatch, similar to those in IR PatternMatch.

Reapply #111774.

Closes #108218.
2024-10-15 21:07:06 +08:00
Phoebe Wang
08ddbab866
[X86][AMX] Fix missing stride register for tileloadd (#110226)
Fixes: #110190
2024-10-15 13:02:00 +08:00