36701 Commits

Author SHA1 Message Date
Yingwei Zheng
f74aed7938
[DAGCombiner] Add basic support for trunc nsw/nuw (#113808)
This patch adds basic support for `trunc nsw/nuw` in SDAG. It will allow
DAGCombiner to further eliminate in-reg `zext/sext` instructions.
2024-11-07 00:23:53 +08:00
Benjamin Maxwell
ea6b8fa4b9
[SDAG] Merge multiple-result libcall expansion into DAG.expandMultipleResultFPLibCall() (#114792)
This merges the logic for expanding both FFREXP and FSINCOS into one
method `DAG.expandMultipleResultFPLibCall()`. This reduces duplication
and also allows FFREXP to benefit from the stack slot elimination
implemented for FSINCOS. This method will also be used in future to
implement more multiple-result intrinsics (such as modf and sincospi).
2024-11-06 11:06:06 +00:00
Heejin Ahn
492812f613
[WebAssembly] Fix rethrow's index calculation (#114693)
So far we have assumed that we only rethrow the exception caught in the
innermost EH pad. This is true in code we directly generate, but after
inlining this may not be the case. For example, consider this code:
```ll
ehcleanup:
  %0 = cleanuppad ...
  call @destructor
  cleanupret from %0 unwind label %catch.dispatch
```

If `destructor` gets inlined into this function, the code can be like
```ll
ehcleanup:
  %0 = cleanuppad ...
  invoke @throwing_func
    to label %unreachale unwind label %catch.dispatch.i

catch.dispatch.i:
  catchswitch ... [ label %catch.start.i ]

catch.start.i:
  %1 = catchpad ...
  invoke @some_function
    to label %invoke.cont.i unwind label %terminate.i

invoke.cont.i:
  catchret from %1 to label %destructor.exit

destructor.exit:
  cleanupret from %0 unwind label %catch.dispatch
```

We lower a `cleanupret` into `rethrow`, which assumes it rethrows the
exception caught by the nearest dominating EH pad. But after the
inlining, the nearest dominating EH pad is not `ehcleanup` but
`catch.start.i`.

The problem exists in the same manner in the new (exnref) EH, because it
assumes the exception comes from the nearest EH pad and saves an exnref
from that EH pad and rethrows it (using `throw_ref`).

This problem can be fixed easily if `cleanupret` has the basic block
where its matching `cleanuppad` is. The bitcode instruction `cleanupret`
kind of has that info (it has a token from the `cleanuppad`), but that
info is lost when when we enter ISel, because `TargetSelectionDAG.td`'s
`cleanupret` node does not have any arguments:

5091a359d9/llvm/include/llvm/Target/TargetSelectionDAG.td (L700)
Note that `catchret` already has two basic block arguments, even though
neither of them means `catchpad`'s BB.

This PR adds the `cleanuppad`'s BB as an argument to `cleanupret` node
in ISel and uses it in the Wasm backend. Because this node is also used
in X86 backend we need to note its argument there too but nothing more
needs to change there as long as X86 doesn't need it.

---

- Details about changes in the Wasm backend:

After this PR, our pseudo `RETHROW` instruction takes a BB, which means
the EH pad whose exception it needs to rethrow. There are currently two
ways to generate a `RETHROW`: one is from `llvm.wasm.rethrow` intrinsic
and the other is from `CLEANUPRET` we discussed above. In case of
`llvm.wasm.rethrow`, we add a '0' as a placeholder argument when it is
lowered to a `RETHROW`, and change it to a BB in LateEHPrepare. As
written in the comments, this PR doesn't change how this BB is computed.
The BB argument will be converted to an immediate argument as with other
control flow instructions in CFGStackify.

In case of `CLEANUPRET`, it already has a BB argument pointing to an EH
pad, so it is just converted to a `RETHROW` with the same BB argument in
LateEHPrepare. This will also be lowered to an immediate in CFGStackify
with other control flow instructions.

---

Fixes #114600.
2024-11-05 21:45:13 -08:00
Jon Roelofs
4c3e1e3c4a
[llvm][AsmPrinter] Add an option to print instruction latencies (#113243)
... matching what we have in the disassembler. This isn't turned on by
default since several of the scheduling models are not completely
accurate, and we don't want to be misleading.
2024-11-05 17:28:52 -08:00
Craig Topper
13b5899c29
[SelectionDAGBuilder][X86] Don't form FMAXNUM for f16 vectors if FMAXNUM needs to be promoted. (#114943)
In #70357, I changed a isLegalOrCustom to isLegalOrCustomOrPromote in
visitSelect to enable integer min/max to be formed when the operation
was promoted. Unfortunately, this also affected floating point. For
floating point, fmaxnum may require a libcall so we also need to check
if the operation on the promoted type is legal or custom.

Other changes to RISC-V have seen made the original change untested so
this patch restores the original isLegalOrCustom.

Fixes #114520.
2024-11-05 15:06:37 -08:00
Kai Nacke
4a37799a48
[SystemZ][XRay] Implement XRay instrumentation for SystemZ (#113253)
Expands pseudo instructions PATCHABLE_FUNCTION_ENTER and PATCHABLE_RET
into a small instruction sequence which calls into the XRay library.
2024-11-05 15:42:55 -05:00
Simon Pilgrim
9540a7ae82
[DAG] SimplifyMultipleUseDemandedBits - bypass ADD nodes if either operand is zero (#112588)
The dpbusd_const.ll test change is due to us losing the expanded add reduction pattern as one of the elements is known to be zero (removing one of the adds from the reduction pyramid). I don't think its of concern.

Noticed while working on #107423
2024-11-05 17:20:41 +00:00
Jonas Paulsson
117e952a53
[LiveRangeEdit] Remove any MemoryOperand on MI when converting it to KILL. (#114407)
When LiveRangeEdit::eliminateDeadDef() converts an MI to a KILL instruction,
it should also call dropMemRefs() in order to erase any MachineMemOperand
present.

This was discovered in testing as the MachineVerifier does not accept an MMO
without the corresponding MI mayLoad/mayStore flag, which the KILL opcode
lacks.
2024-11-05 18:08:27 +01:00
abhishek-kaushik22
6b64f36536
[NFC] Use std::move to avoid copy (#113080) 2024-11-05 14:42:53 +00:00
Simon Pilgrim
aef0e77c76
[DAG] visitAND - Fold (and (srl X, C), 1) -> (srl X, BW-1) for signbit extraction (#114992)
If we're masking the LSB of a SRL node result and that is shifting down an extended sign bit, see if we can change the SRL to shift down the MSB directly.

These patterns can occur during legalisation when we've sign extended to a wider type but the SRL is still shifting from the subreg.

Alternative to #114967

Fixes the remaining regression in #112588
2024-11-05 14:42:15 +00:00
Simon Pilgrim
5df387227e [DAG] visitAND - refactor "and (sub 0, ext(bool X)), 1 --> zext(bool X)" to use SDPatternMatch. 2024-11-05 14:30:21 +00:00
David Green
5dc9c39ac1
[GlobalISel] Check the correct register in sextload OneUse check. (#114763)
This fixes a bug that started triggering after #111730, where we could
remove a load with multiple uses. It looks like the match should be
checking the other register in a one-use check.

  %SrcReg = load..
  %DstReg = sign_extend_inreg %SrcReg
2024-11-05 10:18:07 +00:00
Matt Arsenault
ea859005b5
SafeStack: Respect alloca addrspace (#112536)
Just insert addrspacecast in cases where the alloca uses a
different address space, since I don't know what else you
could possibly do.
2024-11-04 17:51:30 -08:00
Craig Topper
999dfb2067
[GISel][AArch64][AMDGPU][RISCV] Canonicalize (sub X, C) -> (add X, -C) (#114309)
This matches InstCombine and DAGCombine.

RISC-V only has an ADDI instruction so without this we need additional
patterns to do the conversion.

Some of the AMDGPU tests look like possible regressions. Maybe some
patterns from isel aren't imported.
2024-11-04 17:20:11 -08:00
Craig Topper
97b7474970
[SelectionDAG] Remove unneeded assert from SelectionDAG::getSignedConstant. NFC (#114336)
This assert is also present inside the APInt constructor after #114539.
2024-11-04 14:43:08 -08:00
Kazu Hirata
186dc9a4df
[CodeGen] Avoid repeated hash lookups (NFC) (#113414) 2024-11-04 09:41:09 -08:00
Matt Arsenault
30dd1297fa
AMDGPU: Custom expand flat cmpxchg which may access private (#109410)
64-bit flat cmpxchg instructions do not work correctly for scratch
addresses, and need to be expanded as non-atomic.

Allow custom expansion of cmpxchg in AtomicExpand, as is
already the case for atomicrmw.
2024-11-04 09:29:38 -08:00
Simon Pilgrim
0ac2e42227
[DAG] SimplifyDemandedBits - ignore SRL node if we're just demanding known sign bits (#114805)
Check to see if we are only demanding (shifted) signbits from a SRL node that are also signbits in the source node.

We can't demand any upper zero bits that the SRL will shift in (up to max shift amount), and the lower demanded bits bound must already be all signbits.

Same fold as #114389 which added this for SimplifyMultipleUseDemandedBits
2024-11-04 17:27:45 +00:00
Sander de Smalen
3098200fcc
[ISel] Propagate disjoint flag in ShrinkDemandedOp (#114560)
When trying to evaluate an expression in a narrower type, the
DAGCombine should propagate the disjoint flag, as it's equally
valid on the narrower expression.

This helps improve better use of addressing modes for some
Arm SME instructions, for example.
2024-11-03 19:42:04 +00:00
Kazu Hirata
6927a434ba
[SelectionDAG] Remove unused includes (NFC) (#114697)
Identified with misc-include-cleaner.
2024-11-03 07:03:33 -08:00
Kazu Hirata
2fccd5c576
[AsmPrinter] Remove unused includes (NFC) (#114698)
Identified with misc-include-cleaner.
2024-11-03 07:03:02 -08:00
Antonio Frighetto
8a2113c5c5 Reapply "[SelectionDAG] Add preliminary plumbing for samesign flag"
Original commit: 19c8475871faee5ebb06281034872a85a2552675

Multiple 2-stage sanitizer buildbots were reporting failures, the issue
has been addressed separately in 29246a92aee87e86cbc2bb64ee520d7385644f34.
2024-11-02 14:19:58 +01:00
Yingwei Zheng
917b3d13b5
[SDAG] Intersect poison-generating flags after CSE (#114650)
This patch intersects poison-generating flags after CSE to fix assertion
failure reported in
https://github.com/llvm/llvm-project/pull/112354#issuecomment-2452369552.

Co-authored-by: Antonio Frighetto <me@antoniofrighetto.com>
2024-11-02 19:06:27 +08:00
Vitaly Buka
c5eb591257
Revert "[SelectionDAG] Add preliminary plumbing for samesign flag" (#114647)
Crashes on ARM builds
https://lab.llvm.org/buildbot/#/builders/85/builds/2548
```
DAGCombiner.cpp:16157: SDValue (anonymous
namespace)::DAGCombiner::visitFREEZE(SDNode *):
Assertion `DAG.isGuaranteedNotToBeUndefOrPoison(R,
false) && "Can't create node that may be
undef/poison!"' failed.
```

Issue #114648

This reverts commit 19c8475871faee5ebb06281034872a85a2552675.
2024-11-02 01:35:13 -07:00
Mirko
4d3c427f33
[CodeGen] Use first EHLabel as a stop gate for live range shrinking (#114195)
This fixes issue #114194

The issue happens during the `LiveRangeShrink` pass, which runs early,
before phi elimination. LandingPads, which are lowered to EHLabels, need
to be the first non phi instruction in an EHPad. In case of a phi node
being in front of the EHLabel and a use being after the EHLabel, we
hoist the use in front of the label.

This results in a portion of the landingpad missing due to being hoisted
in front of the label.
2024-11-01 19:13:18 -07:00
Philip Reames
69edef1ab9 [DAG] Simplify control flow in SelectionDAGBuilder::visitShuffleVector [NFC]
If we've handled ==, and < above, the only case left can be >.  We don't
need to branch on this, and can instead assert and reduce indentation,
and simplify reasoning about the fallthrough path.
2024-11-01 08:59:15 -07:00
Wang Qiang
b77e40265c
[llvm][NFC] Fix typos: replace “avaliable” with “available” across various files (#114524)
This pull request corrects multiple occurrences of the typo "avaliable"
to "available" across the LLVM and Clang codebase. These changes improve
the clarity and accuracy of comments and documentation. Specific
modifications are in the following files:

1. clang-tools-extra/clang-tidy/readability/FunctionCognitiveComplexityCheck.cpp:
Updated comments in readability checks for cognitive complexity.
2. llvm/include/llvm/ExecutionEngine/Orc/ExecutionUtils.h: Corrected
documentation for JITDylib responsibilities.
3. llvm/include/llvm/Target/TargetMacroFusion.td: Fixed descriptions for
FusionPredicate variables.
4. llvm/lib/CodeGen/SafeStack.cpp: Improved comments on DominatorTree
availability.
5. llvm/lib/Target/RISCV/RISCVSchedSiFive7.td: Enhanced resource usage
descriptions for vector units.
6. llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp: Updated invariant
description in shift-detect idiom logic.
7. llvm/test/MC/ARM/mve-fp-registers.s: Amended ARM MVE register
availability notes.
8. mlir/lib/Bytecode/Reader/BytecodeReader.cpp: Adjusted forward
reference descriptions for bytecode reader operations.

These changes have no impact on code functionality, focusing solely on
documentation clarity.

Co-authored-by: wangqiang <wangqiang1@kylinos.cn>
2024-11-01 13:25:04 +00:00
Thorsten Schütt
8e3772744d
[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE (#114470)
There are patterns for:
* {nxv2s32, s32, s64},
* {nxv4s16, s16, s64},
* {nxv2s16, s16, s64}
2024-11-01 06:10:26 +01:00
Matt Arsenault
9cc298108a
AtomicExpand: Copy metadata from atomicrmw to cmpxchg (#109409)
When expanding an atomicrmw with a cmpxchg, preserve any metadata
attached to it. This will avoid unwanted double expansions
in a future commit.

The initial load should also probably receive the same metadata
(which for some reason is not emitted as an atomic).
2024-10-31 11:54:07 -07:00
Antonio Frighetto
19c8475871 [SelectionDAG] Add preliminary plumbing for samesign flag
Extend recently-added poison-generating IR flag to codegen as well.
2024-10-31 19:47:50 +01:00
Simon Pilgrim
9fb4bc5bf4
[DAG] SimplifyMultipleUseDemandedBits - ignore SRL node if we're just demanding known sign bits (#114389)
Check to see if we are only demanding (shifted) signbits from a SRL node that are also signbits in the source node.

We can't demand any upper zero bits that the SRL will shift in (up to max shift amount), and the lower demanded bits bound must already be all signbits.
2024-10-31 16:40:29 +00:00
Sriraman Tallam
c7ef002bc6
Fix performance bug in buildLocationList (#109343)
In buildLocationList, with basic block sections, we iterate over
every basic block twice to detect section start and end. This is
sub-optimal and shows up as significantly time consuming when
compiling large functions.

This patch uses the set of sections already stored in MBBSectionRanges
and iterates over sections rather than basic blocks.

When detecting if loclists can be merged, the end label of an entry is
matched with the beginning label of the next entry. For the section
corresponding to the entry basic block, this is skipped. This is
because the loc list uses the end label corresponding to the function
whereas the MBBSectionRanges map uses the function end label.

For example:

.Lfunc_begin0:
.file
.loc 0 4 0 # ex2.cc:4:0
.cfi_startproc
.Ltmp0:
.loc 0 8 5 prologue_end # ex2.cc:8:5
....
.LBB_END0_0:
.cfi_endproc
.section .text._Z4testv,"ax",@progbits,unique,1
...
.Lfunc_end0:
.size _Z4testv, .Lfunc_end0-_Z4testv

The debug loc uses ".LBB_END0_0" for the end of the section whereas
MBBSectionRanges uses ".Lfunc_end0".

It is alright to skip this as we already check the section corresponding
to the debugloc entry.

Added a new test case to check that if this works correctly when the
variable's value is mutated in the entry section.
2024-10-31 09:00:25 -07:00
Zaara Syeda
ccddd13602
Enable aggressive constant merge in GlobalMerge for AIX (#113956)
Enable merging all constants without looking at use in GlobalMerge by
default to replace PPCMergeStringPool pass on AIX.
2024-10-31 11:22:48 -04:00
Matt Arsenault
db5bcb24c2
GlobalISel: Fix combine duplicating atomic loads (#111730)
The sext_inreg (load) combine was not deleting the old load instruction,
and it would never be deleted if volatile or atomic.
2024-10-31 07:55:12 -07:00
goldsteinn
1e072ae289
[CGP] [CodeGenPrepare] Folding urem with loop invariant value plus offset (#104724)
This extends the existing fold:

```
for(i = Start; i < End; ++i)
   Rem = (i nuw+- IncrLoopInvariant) u% RemAmtLoopInvariant;
```
 ->
```
Rem = (Start nuw+- IncrLoopInvariant) % RemAmtLoopInvariant;
for(i = Start; i < End; ++i, ++rem)
   Rem = rem == RemAmtLoopInvariant ? 0 : Rem;
```

To work with a non-zero `IncrLoopInvariant`.

This is a common usage in cases such as:

```
for(i = 0; i < N; ++i)
    if ((i + 1) % X) == 0)
        do_something_occasionally_but_not_first_iter();
```

Alive2 w/ i4/unrolled 6x (needs to be ran locally due to timeout):
https://alive2.llvm.org/ce/z/6tgyN3

Exhaust proof over all uint8_t combinations in C++:
https://godbolt.org/z/WYa561388
2024-10-31 09:14:33 -05:00
Benjamin Maxwell
89a8c71db6
[SDAG] Support expanding FSINCOS to vector library calls (#114039)
This shares most of its code with the scalar sincos expansion. It allows
expanding vector FSINCOS nodes to a library call from the specified
`-vector-library`. The upside of this is it will mean the vectorizer
only needs to handle the sincos intrinsic, which has no memory effects,
and this can handle lowering the intrinsic to a call that takes output
pointers.
2024-10-31 12:41:43 +00:00
dnsampaio
28d0718033
[DAGCombiner] Add combine avg from shifts (#113909)
This teaches dagcombiner to fold:
`(asr (add nsw x, y), 1) -> (avgfloors x, y)`
`(lsr (add nuw x, y), 1) -> (avgflooru x, y)`

as well the combine them to a ceil variant:
`(avgfloors (add nsw x, y), 1) -> (avgceils x, y)` 
`(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)`

iff valid for the target.

Removes some of the ARM MVE patterns that are now dead code.
It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the
immediate splatting as before.
2024-10-31 10:57:27 +01:00
Craig Topper
00cbb68fb7 [LegalizeDAG] Use getSignedConstant. NFC 2024-10-30 21:43:16 -07:00
Thorsten Schütt
6effab990c
Revert "[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE" (#114353)
Reverts llvm/llvm-project#114310
2024-10-31 05:41:16 +01:00
Thorsten Schütt
6bf214b7c6
[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE (#114310)
There are patterns for:
* {nxv2s32, s32, s64},
* {nxv4s16, s16, s64},
* {nxv2s16, s16, s64}
2024-10-31 04:56:41 +01:00
Craig Topper
f0bae562dc
[GISel] Return const APInt & from getIConstantFromReg. NFC (#114320)
This matches what the call to ConstantInt::getValue() returns. Let the
caller make a copy if needed.
2024-10-30 19:15:51 -07:00
Kazu Hirata
f582cd3dc7 [SelectionDAG] Fix a warning
This patch fixes:

  llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:1489:17: error:
  unused variable 'Flags' [-Werror,-Wunused-variable]
2024-10-30 17:49:51 -07:00
Yingwei Zheng
cf9d1c1486
[SDAG] Simplify SDNodeFlags with bitwise logic (#114061)
This patch allows using enumeration values directly and simplifies the
implementation with bitwise logic. It addresses the comment in
https://github.com/llvm/llvm-project/pull/113808#discussion_r1819923625.
2024-10-31 08:10:07 +08:00
Thorsten Schütt
b3bb6f18bb
[GlobalISel] Import samesign flag (#114267)
Credits: https://github.com/llvm/llvm-project/pull/111419

Fixes icmp-flags.mir

First attempt: https://github.com/llvm/llvm-project/pull/113090

Revert: https://github.com/llvm/llvm-project/pull/114256
2024-10-30 19:56:25 +01:00
Thorsten Schütt
4b028773b2
Revert "[GlobalISel] Import samesign flag" (#114256)
Reverts llvm/llvm-project#113090
2024-10-30 17:03:17 +01:00
Thorsten Schütt
72b115301d
[GlobalISel] Import samesign flag (#113090)
Credits: https://github.com/llvm/llvm-project/pull/111419
2024-10-30 16:34:01 +01:00
Petar Avramovic
84b7bcfcac
GlobalISel/MachineIRBuilder: Construct DstOp with VRegAttrs (#113581)
Allow construction of DstOp with VRegAttrs.
Also allow construction with register class or bank and LLT.
Intended to be used in lowering code for reg-bank-select where
new registers need to have both register bank and LLT.
Add support for new type of DstOp in CSEMIRBuilder.
2024-10-30 14:15:42 +01:00
Jay Foad
cea9dd833c
[CodeGen] Change MachineInstr::isConstantValuePHI to return Register. NFC. (#112901) 2024-10-30 11:58:59 +00:00
Simon Pilgrim
f7b5f0c805 [DAG] Fold (and X, (rot (not Y), Z)) -> (and X, (not (rot Y, Z)))
On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT

Followup to #112547
2024-10-30 10:46:12 +00:00
Akshat Oke
44d0e9522a
[CodeGen][NewPM] Port TailDuplicate pass to NPM (#113293) 2024-10-30 11:48:40 +05:30