36459 Commits

Author SHA1 Message Date
Craig Topper
d5d1417659
[RISCV][GISel] Use libcalls for rint, nearbyint, trunc, round, and roundeven intrinsics. (#108779) 2024-09-18 12:07:44 -07:00
Craig Topper
292ee93a87
[CodeGen] Use Register in SwitchLoweringUtils. NFC (#109092)
Use an empty Register() instead of -1U.
2024-09-18 09:43:21 -07:00
Phoebe Wang
a10c9f994b
Revert "[X86][BF16] Add libcall for F80 -> BF16" (#109140)
Reverts llvm/llvm-project#109116
2024-09-18 21:35:38 +08:00
Phoebe Wang
76eda76f9f
[X86][BF16] Add libcall for F80 -> BF16 (#109116)
This fixes #108936, but the calling convention doesn't match with GCC. I
doubt we have such a lib function for now, so leave the calling
convention as is.
2024-09-18 21:23:10 +08:00
Craig Topper
9d3ab1c36e [SelectionDAGBuilder] Use Register in more places. NFC" 2024-09-17 23:49:58 -07:00
Craig Topper
fe012bd52d [SelectionDAG] Use Register around RegisterSDNode related functions. NFC
RegisterSDNode itself already stored a Register.
2024-09-17 23:26:56 -07:00
Craig Topper
ca0613e0fc [LegalizeFloatTypes] Handle replacement for strict ops inside SoftPromoteHalfOp_FP_TO_XINT. NFC
Return SDValue() so we can notify the caller we did all replacements.
Restore the getNumValues() == 1 check in the assert in the caller now
that all handles only return nodes with a single result.
2024-09-17 16:25:10 -07:00
Michael Maitland
e08c2178ef
[MachineVerifier] Fix bug in MachineVerifier for G_INSERT_SUBVECTOR (#109048) 2024-09-17 16:57:41 -04:00
Stephen Tozer
51a29b5f16 Revert2 "[DebugInfo][DWARF] Set is_stmt on first non-line-0 instruction in BB (#105524)"
Reverted due to large .debug_line size regressions for some
configurations; work currently in place to improve the output of this
behaviour in PR #108251.

This patch also modifies two tests that were created or modified after
the original commit landed and are affected by the revert:

  llvm/test/CodeGen/X86/pseudo_cmov_lower2.ll
  llvm/test/DebugInfo/X86/empty-line-info.ll

This reverts commit 5fef40c2c477e92187bd4e5c18091eca6b8465cc.
2024-09-17 18:29:20 +01:00
Craig Topper
da46244e49 Revert "[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific."
This reverts commit 884ff9e3f9741ac282b6cf8087b8d3f62b8e138a.

Regression was reported in Halide for arm32.
2024-09-17 09:04:43 -07:00
Craig Topper
f36580fcb5
[LegalizeVectorOps] Remove calls to DAG.UnrollVectorsOps from some expansion handlers. NFC (#108930)
Instead, return SDValue() to tell the caller to do the unrolling. This
is consistent with how some other handler work. Especially the handlers
that live in TLI.

ExpandBITREVERSE was rewritten to not take the Results vector an
argument.
2024-09-17 08:35:22 -07:00
David Green
2242cd2b6a
[DAG] Fold vecreduce.or(sext(x)) to sext(vecreduce.or(x)) (#108959)
The same is true for and / xor reductions, where the sext / zext can be
sank down through the bitwise operation.
https://alive2.llvm.org/ce/z/TvzCd5
2024-09-17 15:24:00 +01:00
Mikhail R. Gadelha
d2125e1db6
[RISCV] Support STRICT_UINT_TO_FP and STRICT_SINT_TO_FP (#102503)
This patch adds support for the missing STRICT_UINT_TO_FP and
STRICT_SINT_TO_FP for riscv and adds a test case for rv32 which was
previously crashing.

The code is in line with how other strict_* nodes are handled
(e.g., getting op(1) instead of op(0) when it's a strict node, as op(0)
in a strict node is the entry token).
2024-09-17 11:21:52 -03:00
Michael Maitland
ee2add0683
[GISEL] Fix bugs and clarify spec of G_EXTRACT_SUBVECTOR (#108848)
The implementation was missing the fact that `G_EXTRACT_SUBVECTOR`
destination and source vector can be different types.

Also fix a bug in the MIR builder for `G_EXTRACT_SUBVECTOR` to generate
the correct opcode.

Clarify the G_EXTRACT_SUBVECTOR specification.
2024-09-17 10:08:39 -04:00
Thorsten Schütt
acfa294b5e
[GlobalIsel] Canonicalize G_FCMP (#108891)
As a side-effect, we start constant folding fcmps.
2024-09-17 09:42:04 +02:00
Craig Topper
884ff9e3f9 [LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific.
Only scalarize single element vectors when vector FSUB is not
supported and scalar FNEG is supported.
2024-09-16 21:48:42 -07:00
David Green
960c975acd
[AArch64] Expand scmp/ucmp vector operations with sub (#108830)
Unlike scalar, where AArch64 prefers expanding scmp/ucmp with select,
under Neon we can use the arithmetic expansion to generate fewer
instructions. Notably it also prevents the scalarization of vselect
during vector-legalization.
2024-09-16 18:44:52 +01:00
nebulark
f5ba3e1fa6
[CodeView] Flatten cmd args in frontend for LF_BUILDINFO (#106369) 2024-09-16 19:29:42 +02:00
Thorsten Schütt
5c348f692a
[GlobalIsel] Canonicalize G_ICMP (#108755)
As a side-effect, we start constant folding icmps.

Split out from https://github.com/llvm/llvm-project/pull/105991.
2024-09-16 19:25:34 +02:00
David Green
feac761f37
[GlobalISel][AArch64] Add G_FPTOSI_SAT/G_FPTOUI_SAT (#96297)
This is an implementation of the saturating fp to int conversions for
GlobalISel. On AArch64 the converstion instrctions work this way,
producing saturating results. LegalizerHelper::lowerFPTOINT_SAT is
ported from SDAG.

AArch64 has a lot of existing tests for fptosi_sat, covering a wide
range of types. I have tried to make most of them work all at once, but
a few fall back due to other missing features such as f128 handling for
min/max.
2024-09-16 10:33:59 +01:00
ErikHogeman
e16ec9b45e
[SelectionDAG] Do not build illegal nodes with users (#108573)
When we build a node with illegal type which has a user, it's possible
that it can end up being processed by the DAG combiner later before it's
removed, which can trigger an assert expecting the types to be legalized
already.
2024-09-16 10:02:42 +01:00
Nikita Popov
dfa54298ff
[InitUndef] Enable the InitUndef pass on non-AMDGPU targets (#108353)
The InitUndef pass works around a register allocation issue, where undef
operands can be allocated to the same register as early-clobber result
operands. This may lead to ISA constraint violations, where certain
input and output registers are not allowed to overlap.

Originally this pass was implemented for RISCV, and then extended to ARM
in #77770. I've since removed the target-specific parts of the pass in
#106744 and #107885. This PR reduces the pass to use a single
requiresDisjointEarlyClobberAndUndef() target hook and enables it by
default. The hook is disabled for AMDGPU, because overlapping
early-clobber and undef operands are known to be safe for that target,
and we get significant codegen diffs otherwise.

The motivating case is the one in arm64-ldxr-stxr.ll, where we were
previously incorrectly allocating a stxp input and output to the same
register.
2024-09-16 09:48:25 +02:00
Craig Topper
a5b63b5cb7
[VirtRegMap] Store MCRegister in Virt2PhysMap. (#108775)
Remove NO_PHYS_REG in favor of MCRegister() and converting MCRegister to
bool.
2024-09-15 14:04:59 -07:00
Craig Topper
76b54df87a [StackSlotColoring] Use Register for isLoadFromStackSlot/isStoreToStackSlot result. NFC 2024-09-15 12:05:28 -07:00
Craig Topper
23953798f3 [VirtRegMap] Remove unnecessary calls to Register::id() accessing IndexMaps.
VirtReg2IndexFunctor already takes a Register.
2024-09-15 09:59:34 -07:00
Matt Arsenault
c49a1ae6d6 DAG: Reorder isFMAFasterThanFMulAndFAdd checks (NFC)
Basic legality checks should be first.
2024-09-15 16:33:01 +04:00
Robert Dazi
8837898b8d
[DAGCombine] Count leading ones: refine post DAG/Type Legalisation if promotion (#102877)
This PR is related to #99591. In this PR, instead of modifying how the
legalisation occurs depending on surrounding instructions, we refine
after legalisation.

This PR has two parts:

* `SDPatternMatch/MatchContext`: Modify a little bit the code to match
Operands (used by `m_Node(...)`) and Unary/Binary/Ternary Patterns to
make it compatible with `VPMatchContext`, instead of only `m_Opc`
supported. Some tests were added to ensure no regressions.
* `DAGCombiner`: Add a `foldSubCtlzNot` which detect and rewrite the
patterns using matching context.

Remaining Tasks:

- [ ] GlobalISel
- [ ] Currently the pattern matching will occur even before
legalisation. Should I restrict it to specific stages instead ?
- [ ] Style: Add a visitVP_SUB ?? Move `foldSubCtlzNot` in another
location for style consistency purpose ?

@topperc

---------

Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>
2024-09-15 15:48:36 +04:00
Simon Pilgrim
5910e8d607 [DAG] visitUDIV - call SimplifyDemandedBits to handle hidden constant foldable cases
Fixes #108728
2024-09-15 12:29:28 +01:00
Craig Topper
367c145e5f
[IRTranslator][RISCV] Support scalable vector zeroinitializer. (#108666) 2024-09-14 15:46:18 -07:00
Craig Topper
947374c393
[IRTranslator] Simplify fixed vector ConstantAggregateZero handling. NFC (#108667)
We don't need to loop through the elements, they're all the same zero.
We can get the first element and create a splat build_vector.
2024-09-13 22:02:29 -07:00
Lawrence Benson
b74e779219
[x86] Add lowering for @llvm.experimental.vector.compress (#104904)
This is a follow-up to #92289 that adds lowering of the new
`@llvm.experimental.vector.compress` intrinsic on x86 with AVX512
instructions. This intrinsic maps directly to `vpcompress`.
2024-09-13 21:48:01 +02:00
Kazu Hirata
3a274584eb
[LiveDebugValues] Avoid repeated hash lookups (NFC) (#108484) 2024-09-13 10:41:45 -07:00
Kazu Hirata
b9d85b1263
[CodeGen] Use DenseMap::operator[] (NFC) (#108489)
Once we modernize CopyInfo with default member initializations,

  Copies.insert({Unit, ...})

becomes equivalent to:

  Copies.try_emplace(Unit)

which we can simplify further down to Copies[Unit].
2024-09-13 10:04:33 -07:00
Simon Pilgrim
69a21154ca
[DAG] Fold trunc(srl(extract_elt(vec,c1),c2)) -> extract_elt(bitcast(vec),c3) (#107987)
Extends existing trunc(extract_elt(vec,c1)) -> extract_elt(bitcast(vec),c3) fold.

Noticed while working on #107404
2024-09-13 15:13:58 +01:00
Juan Manuel Martinez Caamaño
09a4c23eb4
[NFC][EarlyIfConverter] Turn SSAIfConv into a local variable (#107390) 2024-09-13 10:43:33 +02:00
Matt Arsenault
9578db9c11 DAG: Handle atomic fsub in node dumper 2024-09-13 10:22:27 +04:00
Craig Topper
a30b1d5a38 [SelectionDAG] Use Register in a few places in InstrEmitter. NFC 2024-09-12 10:29:17 -07:00
Craig Topper
8c05515032
[LegalizeIntegerTypes] Simplify ExpandIntRes_FP_TO_XINT when operand needs to be SoftPromoted. (#107634)
Create an FP_EXTEND instead of handling the soft promote directly. This
FP_EXTEND will be visited and soft promoted itself.

This removes a zero extend from the generated code when the f32 type is
itself softened. Previously we softened it as an fp16_to_fp which sees
the operand as an integer type so we extend it. When we soften the
result as an fp_extend we see the source as f16 and don't extend. It
only becomes an integer inside call lowering not by type legalization.

If this extend is really necessary, then we have an issue when an
f16->f32 fp_extend exists in the source and f32 needs to be softened.

This simplifies part of #102503.
2024-09-12 08:28:06 -07:00
Joe Faulls
bf8101e4fd
[CodeGen] Clear InitUndef pass new register cache between pass runs (#90967)
Multiple invocations of the pass could interfere with eachother,
preventing some undefs being initialised.

I found it very difficult to create a unit test for this due to it being
dependent on particular allocations of a previous function. However, the
bug can be observed here: https://godbolt.org/z/7xnMo41Gv with the
creation of the illegal instruction `vnsrl.wi v9, v8, 0`
2024-09-12 15:01:55 +02:00
Nikita Popov
e2723c2a8a
[InitUndef] Only compute DeadLaneDetector if subreg liveness enabled (NFC) (#108279)
InitUndef currently always computes DeadLaneDetector, but only actually
uses it if subreg liveness is enabled for the target. Make the
calculation optional to avoid an unnecessary compile-time impact for
targets that don't enable subreg liveness.
2024-09-12 09:00:47 +02:00
Thorsten Schütt
ba4bcce5f5
[GlobalIsel] Combine trunc of binop (#107721)
trunc (binop X, C) --> binop (trunc X, trunc C)  --> binop (trunc X, C`)

Try to narrow the width of math or bitwise logic instructions by pulling
a truncate ahead of binary operators.

Vx and Nx cores consider 32-bit and 64-bit basic arithmetic equal in
costs.
2024-09-11 15:04:55 +02:00
Nikita Popov
1e3a24d2e4
[InitUndef] Don't use largest super class (#107885)
The InitUndef pass currently uses the getLargestSuperClass() hook (which
is only used by that pass) to chose the register to initialize. This was done
to reduce the number of undef init pseudos needed, e.g. so that the vrnov0
regclass would use the same pseudo as v0. After #106744 we use a single
generic pseudo, so this is no longer necessary.
2024-09-11 09:36:20 +02:00
YunQiang Su
5773adb0bf
SelectionDAG: Remove unneeded getSelectCC in expandFMINIMUMNUM_FMAXIMUMNUM (#107416)
ISD::FCANONICALIZE is enough, which can process NaN or non-NaN
correctly, thus getSelectCC is not needed here.
2024-09-11 09:53:04 +08:00
Craig Topper
d2f25e5405 [LegalizeTypes] Avoid creating an unused node in ExpandIntRes_ADDSUB. NFC
The Hi result is sometimes calculated a different way and this
node goes unused. Defer creation until we know for sure it is neeeded.

The test changes is because the node creation order changed the names
in the debug output.
2024-09-10 16:39:19 -07:00
Kyungwoo Lee
bf68403484
Attempt to fix [CGData][MachineOutliner] Global Outlining (#90074) (#108037) 2024-09-10 08:21:25 -07:00
Kyungwoo Lee
0f52545289
[CGData][MachineOutliner] Global Outlining (#90074)
This commit introduces support for outlining functions across modules
using codegen data generated from previous codegen. The codegen data
currently manages the outlined hash tree, which records outlining
instances that occurred locally in the past.
    
The machine outliner now operates in one of three modes:

1. CGDataMode::None: This is the default outliner mode that uses the
suffix tree to identify (local) outlining candidates within a module.
This mode is also used by (full)LTO to maintain optimal behavior with
the combined module.
2. CGDataMode::Write (`-codegen-data-generate`): This mode is identical
to the default mode, but it also publishes the stable hash sequences of
instructions in the outlined functions into a local outlined hash tree.
It then encodes this into the `__llvm_outline` section, which will be
dead-stripped at link time.
3. CGDataMode::Read (`-codegen-data-use-path={.cgdata}`): This mode
reads a codegen data file (.cgdata) and initializes a global outlined
hash tree. This tree is used to generate global outlining candidates.
Note that the codegen data file has been post-processed with the raw
`__llvm_outline` sections from all native objects using the
`llvm-cgdata` tool (or a linker, `LLD`, or a new ThinLTO pipeline
later).

This depends on https://github.com/llvm/llvm-project/pull/105398. After
this PR, LLD (https://github.com/llvm/llvm-project/pull/90166) and Clang
(https://github.com/llvm/llvm-project/pull/90304) will follow for each
client side support.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-09-10 06:56:31 -07:00
Simon Pilgrim
7e07c1df67 [DAG] expandAVG - consistently use getShiftAmountConstant for constant shift amounts. NFC 2024-09-10 09:25:58 +01:00
Tobias Stadler
2d338bed00
[CodeGen] Refactor DeadMIElim isDead and GISel isTriviallyDead (#105956)
Merge GlobalISel's isTriviallyDead and DeadMachineInstructionElim's
isDead code and remove all unnecessary checks from the hot path by
looping over the operands before doing any other checks.

See #105950 for why DeadMIElim needs to remove LIFETIME markers even
though they probably shouldn't generally be considered dead.

x86 CTMark O3: -0.1%
AArch64 GlobalISel CTMark O0: -0.6%, O2: -0.2%
2024-09-09 16:30:44 +02:00
Jeremy Morse
7a930ce327
[DWARF] Emit a minimal line-table for totally empty functions (#107267)
In degenerate but legal inputs, we can have functions that have no source
locations at all -- all the DebugLocs attached to instructions are empty.
LLVM didn't produce any source location for the function; with this patch
it will at least emit the function-scope source location. Demonstrated by
empty-line-info.ll

The XCOFF test modified has similar symptoms -- with this patch, the size
of the ".dwline" section grows a bit, thus shifting some of the file
internal offsets, which I've updated.
2024-09-09 12:54:45 +01:00
Craig Topper
f2b71491d1
[MC] Make MCRegisterInfo::getLLVMRegNum return std::optional<MCRegister>. NFC (#107776) 2024-09-08 21:21:51 -07:00