36825 Commits

Author SHA1 Message Date
Petar Avramovic
87503fa51c
Revert "AMDGPU/GlobalISel: Add stub custom regbankselect pass" (#113913)
This reverts commit e9c49901a43f5b16c3df416460b7e4dbdd24ce03.
Current AMDGPURegBankSelect does nothing different then RegBankSelect.
Revert to using generic RegBankSelect in preparation for adding new
regbankselect passes. New AMDGPURegBankSelect, that will use uniformity
analysis for regbank select decisions, will not subclass RegBankSelect.
Revert regression tests to use regbankselect since amdgpu-regbankselect
will be used by new pass and behavior will be different.
2024-11-27 13:16:22 -05:00
Igor Kirillov
e874c8fc27
[SelectOpt] Refactor to prepare for support more select-like operations (#117582)
* Enables conversion of several select-like instructions within one
group
* Any number of auxiliary instructions depending on the same condition
can be in between select-like instructions
* After splitting the basic block, move select-like instructions into
the relevant basic blocks and optimise them
* Make it easier to add support shift-base select-like instructions and
also any mixture of zext/sext/not instructions
2024-11-27 11:35:59 +00:00
Pengcheng Wang
3618c9930f
[MISched] Use right boundary when trying latency heuristics (#116592)
We may do bottom-up or bidirectional scheduling but previously we
assume we are doing top-down scheduling, which may cause some issues.
2024-11-27 14:46:05 +08:00
Sergei Barannikov
61a23646c9
[SjLjEHPrepare] Configure call sites correctly (#117656)
After 9fe78db4, the pass inserts `store volatile i32 -1, ptr %call_site`
before all invoke instruction except the one in the entry block, which
has the effect of bypassing landing pads on exceptions.

When configuring the call site for a potentially throwing instruction
check that it is not `InvokeInst` -- they are handled by earlier code.
2024-11-27 08:03:47 +03:00
antangelo
dd4844722d
[SelectionDAG] Add generic implementation for @llvm.expect.with.probability when optimizations are disabled (#117459)
Handle \@llvm.expect.with.probability in SelectionDAGBuilder, FastISel,
and IntrinsicLowering in the same way \@llvm.expect is handled, where
the value is passed through as-is. This can be reached if the intrinsic
is used without optimizations, where it would otherwise be properly
transformed out.

Fixes #115411 for SelectionDAG. A similar patch is likely needed for
GlobalISel.
2024-11-26 20:22:25 -05:00
Jonas Paulsson
175e0dd422
[MachineLateInstrsCleanup] Minor fixing (NFC). (#117816)
With cb57b7a7, MachineLateInstrsCleanup switched to using a map to keep
track of kill flags to remedy compile time regressions seen with huge
functions. It seems that the comment above clearKillsForDef() became stale with
that commit, and also that one of the arguments to it became unused,
both of which this patch fixes.
2024-11-27 01:41:42 +01:00
Craig Topper
43b6b78771
[RISCV][GISel] Use libcalls for f32/f64 G_FCMP without F/D extensions. (#117660)
LegalizerHelp only supported f128 libcalls and incorrectly assumed that
the destination register for the G_FCMP was s32.
2024-11-26 15:48:49 -08:00
Jeremy Morse
624e52b1e3
[DebugInfo] Handle trailing empty blocks when seeking prologue_end spot (#117320)
The optimiser will produce empty blocks that are unconditionally
executed according to the CFG -- while it may not be meaningful code,
and won't get a prologue_end position, we need to not crash on this
input.

The fault comes from assuming that there's always a next block with some
instructions in it, that will eventually produce some meaningful control
flow to stop at -- in the given reproducer in issue #117206 this isn't
true, because the function terminates with `unreachable`. Thus, I've
refactored the "get next instruction logic" into a helper that'll step
through all blocks and terminate if there aren't any more.

Reproducer from aeubanks
2024-11-26 14:24:25 +00:00
Nikita Popov
3e1b55cafc
[SDAG] Don't allow implicit trunc in getConstant() (#117558)
Assert that the passed value is a valid unsigned integer value for the
specified type.

For signed values getSignedConstant() / getSignedTargetConstant() should
be used instead.
2024-11-26 10:36:00 +01:00
Craig Topper
bc282605df
[SelectionDAG] Require last operand of (STRICT_)FP_ROUND to be a TargetConstant. (#117639)
Fix all the places I could find that did't do this. We were already
mostly correct for FP_ROUND after
9a976f36615dbe15e76c12b22f711b2e597a8e51, but not STRICT_FP_ROUND.
2024-11-25 21:36:33 -08:00
Philip Reames
6657d4bd70
[TTI][RISCV] Unconditionally break critical edges to sink ADDI (#108889)
This looks like a rather weird change, so let me explain why this isn't
as unreasonable as it looks. Let's start with the problem it's solving.

```
define signext i32 @overlap_live_ranges(ptr %arg, i32 signext %arg1) { bb:
  %i = icmp eq i32 %arg1, 1
  br i1 %i, label %bb2, label %bb5

bb2:                                              ; preds = %bb
  %i3 = getelementptr inbounds nuw i8, ptr %arg, i64 4
  %i4 = load i32, ptr %i3, align 4
  br label %bb5

bb5:                                              ; preds = %bb2, %bb
  %i6 = phi i32 [ %i4, %bb2 ], [ 13, %bb ]
  ret i32 %i6
}
```

Right now, we codegen this as:

```
	li	a3, 1
	li	a2, 13
	bne	a1, a3, .LBB0_2
	lw	a2, 4(a0)
.LBB0_2:
	mv	a0, a2
	ret
```

In this example, we have two values which must be assigned to a0 per the
ABI (%arg, and the return value). SelectionDAG ensures that all values
used in a successor phi are defined before exit the predecessor block.
This creates an ADDI to materialize the immediate in the entry block.

Currently, this ADDI is not sunk into the tail block because we'd have
to split a critical edges to do so. Note that if our immediate was
anything large enough to require two instructions we *would* split this
critical edge.

Looking at other targets, we notice that they don't seem to have this
problem. They perform the sinking, and tail duplication that we don't.
Why? Well, it turns out for AArch64 that this is entirely an accident of
the existance of the gpr32all register class. The immediate is
materialized into the gpr32 class, and then copied into the gpr32all
register class. The existance of that copy puts us right back into the
two instruction case noted above.

This change essentially just bypasses this emergent behavior aspect of
the aarch64 behavior, and implements the same "always sink immediates"
behavior for RISCV as well.
2024-11-25 18:59:31 -08:00
Craig Topper
ebcaa57715
[GISel] #undef macros when they are no longer needed. NFC (#117652)
These macros are created inside a function. They should be undefined
before the end of the function.
2024-11-25 18:00:03 -08:00
Craig Topper
c2bb056482
[SelectionDAG][RISCV][AArch64] Allow f16 STRICT_FLDEXP to be promoted. Fix integer promotion of STRICT_FLDEXP in type legalizer. (#117633)
A special case in type legalization wasn't accounting for different
operand numbering between FLDEXP and STRICT_FLDEXP.

AArch64 already asked STRICT_FLDEXP to be promoted, but had no test for
it.
2024-11-25 16:12:45 -08:00
Kyungwoo Lee
fe69a20cc1 Reland [CGData][GMF] Skip No Params (#116548)
This update follows up on change #112671 and is mostly a NFC, with the following exceptions:
  - Introduced `-global-merging-skip-no-params` to bypass merging when no parameters are required.
  - Parameter count is now calculated based on the unique hash count.
  - Added `-global-merging-inst-overhead` to adjust the instruction overhead, reflecting the machine instruction size.
  - Costs and benefits are now computed using the double data type. Since the finalization process occurs offline, this should not significantly impact build time.
  - Moved a sorting operation outside of the loop.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
2024-11-25 13:55:02 -08:00
Kyungwoo Lee
fe3c23b439 Revert "[CGData][GMF] Skip No Params (#116548)"
This reverts commit fdf1f69c57ac3667d27c35e097040284edb1f574.
2024-11-25 11:09:29 -08:00
Kyungwoo Lee
fdf1f69c57
[CGData][GMF] Skip No Params (#116548)
This update follows up on change #112671 and is mostly a NFC, with the following exceptions:
  - Introduced `-global-merging-skip-no-params` to bypass merging when no parameters are required.
  - Parameter count is now calculated based on the unique hash count.
  - Added `-global-merging-inst-overhead` to adjust the instruction overhead, reflecting the machine instruction size.
  - Costs and benefits are now computed using the double data type. Since the finalization process occurs offline, this should not significantly impact build time.
  - Moved a sorting operation outside of the loop.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
2024-11-25 10:57:41 -08:00
Igor Kirillov
4a7a27cb1c Revert "[SelectOpt] Refactor to prepare for support more select-like operations (#115745)"
This reverts commit b5a11d378db4b39ceb085ebd59c941e9369d9596.
2024-11-25 14:40:35 +00:00
David Sherwood
9b76e7fc60
Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031)" (#117556)
This reverts commit 22ec44f509ff266b581dbb490d7b040473b7c31a.
2024-11-25 13:49:21 +00:00
Igor Kirillov
b5a11d378d
[SelectOpt] Refactor to prepare for support more select-like operations (#115745)
* Enables conversion of several select-like instructions within one
group
* Any number of auxiliary instructions depending on the same condition
can be in between select-like instructions
* After splitting the basic block, move select-like instructions into
the relevant basic blocks and optimise them
* Make it easier to add support shift-base select-like instructions and
also any mixture of zext/sext/not instructions
2024-11-25 12:59:09 +00:00
David Sherwood
22ec44f509
[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031)
For IR like this:

  %icmp = icmp ult <4 x i32> %a, splat (i32 5)
  %res = extractelement <4 x i1> %icmp, i32 1

where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into

  %ext = extractelement <4 x i32> %a, i32 1
  %res = icmp ult i32 %ext, 5

For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.

NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file

CodeGen/AArch64/extract-vector-cmp.ll

because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
2024-11-25 09:25:01 +00:00
Nikita Popov
3317c9ceac
[AMDGPU] Use getSignedConstant() where necessary (#117328)
Create signed constant using getSignedConstant(), to avoid future
assertion failures when we disable implicit truncation in getConstant().

This also touches some generic legalization code, which apparently only
AMDGPU tests.
2024-11-25 09:49:34 +01:00
Piyou Chen
7317a6e990
[RISCV][MachineVerifier] Use RegUnit for register liveness checking (#115980)
For the RISC-V target, V14_V15 are not subregisters of v14m4, even
though they share some registers. Currently, the MachineVerifier reports
an error when checking register liveness for segment load/store
operations.

This patch adds additional register liveness checking, using RegUnit
instead of subregisters, to prevent this error.
2024-11-25 12:43:39 +08:00
LiqinWeng
02408d6b28
[VP] Refactoring some functions in ExpandVectorPredication.NFC (#115840)
Building vp intrinsic functions using a unified interface for
expandPredicationToIntCall/expandPredicationToFPCall/expandPredicationToCastIntrinsic
functions.
2024-11-25 10:05:29 +08:00
David Green
d3ce069572
[AArch64][GlobalISel] Legalize ptr shuffle vector to s64 (#116013)
This converts all ptr element shuffle vectors to s64, so that the
existing vector legalization handling can lower them as needed. This
prevents a lot of fallbacks that currently try to generate things like
`<2 x ptr> G_EXT`.

I'm not sure if bitcast/inttoptr/ptrtoint is intended to be necessary
for vectors of pointers, but it uses buildCast for the casts, which now
generates a ptrtoint/inttoptr.
2024-11-23 17:00:51 +00:00
Rahman Lavaee
68f7b075c0
[BasicBlockSections] Allow mixing of -basic-block-sections with MFS. (#117076)
This PR allows mixing `-basic-block-sections` with
`-enable-machine-function-splitter`. The strategy is to let
`-basic-block-sections` take precedence over functions with profiles.
2024-11-22 22:23:29 -08:00
Félix-Antoine Constantin
7a56dc7245
[Clang] Attribute NoFPClass should not prevent tail call optimization. (#116741)
Fixes #111950
2024-11-22 17:28:45 -08:00
Lei Wang
cf83a7fdc2
[SHT_LLVM_BB_ADDR_MAP] Add an option to skip emitting bb entries (#114447)
Sometimes we want to use a `PgoAnalysisMap` feature that doesn't require
the BB entries info, e.g. only the `FuncEntryCount`, but the BB entries
is emitted by default, so I'm adding an option to skip the info for this
case to save the binary size(can save ~90% size of the section). For
implementation, it extends a new field(`OmitBBEntries`) in
`BBAddrMap::Features` for this and it's controlled by a switch
`--basic-block-address-map-skip-bb-entries`.

Note that this naturally supports backwards compatibility as the field
is zero for the old version, matches the decoding in the new version
llvm.
2024-11-22 11:51:34 -08:00
Vitaly Buka
14a58a1390
Revert "[RegisterCoalescer] Fix up subreg lanemasks after rematerializing. (#116191)" (#117367)
To pass tests with #117307 revert.

This reverts commit 3093b29b597b9a936a3e4d1c8bc4a7ccba8fc848.
2024-11-22 11:42:43 -08:00
Vitaly Buka
1683f84d28
Revert "[InitUndef] handleSubReg should skip artificial subregs. (#116248)" (#117365)
Maybe not needed but to avoid conflicts with #117307
Without revert of this one, but reverting #117307, the
regenerated init-undef.mir became empty.

This reverts commit be15fd5085680cc5ed9ec4f4f2258b504cdd55db.
2024-11-22 11:24:48 -08:00
Akshat Oke
cac13606c2
[CodeGen][NewPM] Port EdgeBundles analysis to NPM (#116616) 2024-11-22 16:51:50 +05:30
Daniel Sanders
30df659495
[GlobalISel] Correct comment about type vs register class (#116083)
Type and register class aren't mutually exclusive in gMIR but there's also
no target-independent requirement (yet?) to have both on target instructions.
2024-11-21 11:18:34 -08:00
Finn Plummer
8663b8777e
[NFC][VectorUtils][TargetTransformInfo] Add isVectorIntrinsicWithOverloadTypeAtArg api (#114849)
This changes allows target intrinsics to specify and overwrite overloaded types.

- Updates `ReplaceWithVecLib` to not provide TTI as there most probably won't be a use-case
- Updates `SLPVectorizer` to use available TTI
- Updates `VPTransformState` to pass down TTI
- Updates `VPlanRecipe` to use passed-down TTI

This change will let us add scalarization for `asdouble`:  #114847
2024-11-21 11:04:25 -08:00
Florian Hahn
ef102b4a63
[MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987)
The improvements in 63917e1 / #70796 do not check for memory
barriers/unmodelled sideeffects, which means we may incorrectly hoist
loads across memory barriers.

Fix this by checking any machine instruction in the loop is a load-fold
barrier.

PR: https://github.com/llvm/llvm-project/pull/116987
2024-11-21 10:25:04 +00:00
Jonathan Cohen
00d383ee9d
[DAGCombiner] Limit steps in shouldCombineToPostInc (#116030)
Currently the function will walk the entire DAG to find other candidates
to perform a post-inc store. This leads to very long compilation times
on large functions. Added a MaxSteps limit to avoid this, which is also
aligned to how hasPredecessorHelper is used elsewhere in the code.
2024-11-21 11:58:37 +02:00
abhishek-kaushik22
a23260087d
[SDAG] [X86] Extend SplitVecOp_VSETCC for STRICT_FSETCCS (#116768)
Closes #116767
2024-11-21 17:43:01 +08:00
abhishek-kaushik22
46f43b6d92
[DebugInfo][InstrRef][MIR][GlobalIsel][MachineLICM] NFC Use std::move to avoid copying (#116935) 2024-11-21 13:37:56 +05:30
Daniel Hoekwater
07137ce3e1
[CFIFixup] Add frame info to the first block of each section (#113626)
Now that `-fbasic-block-sections=list` is enabled for Arm, functions may
be split aross multiple sections, and CFI information must be handled
independently for each section.

On x86, this is handled in `llvm/lib/CodeGen/CFIInstrInserter.cpp`.
However, this pass does not run on Arm, so we must add logic for it
to `llvm/lib/CodeGen/CFIFixup.cpp`.
2024-11-20 17:40:30 -05:00
Simon Pilgrim
e2368afbd0 Fix GCC Wparentheses warning in assert condition / message. NFC. 2024-11-20 17:02:23 +00:00
Benjamin Maxwell
0a1795f781
[SDAG] Generalize FSINCOS type legalization (NFC) (#116848)
There's nothing that specific to FSINCOS about these; they could be used
for similar nodes in the future.
2024-11-20 10:56:39 +00:00
Ramkumar Ramachandra
2b5214b9e1
IR: de-duplicate two CmpInst routines (NFC) (#116866)
De-duplicate the functions getSignedPredicate and getUnsignedPredicate,
nearly identical versions of which were present in CmpInst and ICmpInst,
creating less confusion.
2024-11-20 09:30:35 +00:00
Ellis Hoag
e72209db35
[MachineSink] Fix stable sort comparator (#116705)
Fix the comparator in `stable_sort()` to satisfy the strict weak
ordering requirement.

In https://github.com/llvm/llvm-project/pull/115367 this comparator was
changed to use `getCycleDepth()` when `shouldOptimizeForSize()` is true.
However, I mistakenly changed to logic so that we use `LHSFreq <
RHSFreq` if **either** of them are zero. This causes us to fail the last
requirment (https://en.cppreference.com/w/cpp/named_req/Compare).

> if comp(a, b) == true and comp(b, c) == true then comp(a, c) == true
2024-11-19 16:15:35 -08:00
Yashas Andaluri
b28eebf926
[RDF] Fix cover check when linking refs to defs (#113888)
During RDF graph construction, linkRefUp method links a register ref to
its upward reaching defs until all RegUnits of the ref have been covered
by defs.
However, when a sub-register def covers some, but not all, of the
RegUnits of a previous super-register def, a super-register ref is not
linked to the super-register def.
This can result in certain super register defs being dead code
eliminated.

This patch fixes the cover check for a register ref. A def must be
skipped only when all RegUnits of that def have already been covered by
a previously seen def.
2024-11-19 12:38:36 -06:00
Zaara Syeda
8e4423eb08
[AsmPrinter] Fix handling in emitGlobalConstantImpl for AIX (#116255)
When GlobalMerge creates a MergedGlobal of statics all initialized to
zero, emitGlobalConstantImpl sees a ConstantAggregateZero. This results
in just emitting zeros followed by labels for the aliases. We need to
handle it more like how emitGlobalConstantStruct does by emitting each
global inside the aggregate.

---------

Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
2024-11-19 09:58:25 -05:00
Sander de Smalen
3093b29b59
[RegisterCoalescer] Fix up subreg lanemasks after rematerializing. (#116191)
In a situation like the following:

```
   undef %2.subreg = INST %1         ; DefMI (rematerializable),
                                     ; DefSubIdx = subreg
   %3 = COPY %2                      ; SrcIdx = DstIdx = 0
   .... = SOMEINSTR %3, %2
```
there are no subranges for `%3` because the entire register is copied,
but after rematerialization the subrange of the rematerialized value
must be fixed up with the appropriate subranges for `.subreg`.

(To me this issue seemed a bit similar to the issue fixed by #96839, but
then related to rematerialization)
2024-11-19 08:46:55 +00:00
Shubham Sandeep Rastogi
e914d97327 Revert "[NFC] Move DroppedVariableStats to its own file and redesign it to be extensible. (#115563)"
This reverts commit 2de78815604e9027efd93cac27c517bf732587d2.

Reverted due to buildbot failure:

unittests/IR/CMakeFiles/IRTests.dir/DroppedVariableStatsIRTest.cpp.o:DroppedVariableStatsIRTest.cpp:function llvm::DroppedVariableStatsIR::runAfterPass(llvm::StringRef, llvm::Any): error: undefined reference to 'llvm::DroppedVariableStatsIR::runOnModule(llvm::Module const*, bool)'
2024-11-18 16:05:09 -08:00
Shubham Sandeep Rastogi
81924ac1fb Revert "Add a pass to collect dropped var stats for MIR. (#115566)"
This reverts commit 6e2b77d4696d4a672635c0ba1ead4824e2158a7d.

Reverting due to buildbot failure:

unittests/IR/CMakeFiles/IRTests.dir/DroppedVariableStatsIRTest.cpp.o:DroppedVariableStatsIRTest.cpp:function llvm::DroppedVariableStatsIR::runAfterPass(llvm::StringRef, llvm::Any): error: undefined reference to 'llvm::DroppedVariableStatsIR::runOnModule(llvm::Module const*, bool)'
2024-11-18 16:05:09 -08:00
Shubham Sandeep Rastogi
6e2b77d469
Add a pass to collect dropped var stats for MIR. (#115566)
This patch uses the DroppedVariableStats class to add dropped variable
statistics for MIR passes.
2024-11-18 15:56:06 -08:00
Shubham Sandeep Rastogi
2de7881560
[NFC] Move DroppedVariableStats to its own file and redesign it to be extensible. (#115563)
Move DroppedVariableStats code to its own file and change the class to
have an extensible design so that we can use it to add dropped
statistics to MIR passes and the instruction selector.
2024-11-18 15:48:53 -08:00
Daniel Sanders
2310e3e3f2
[GlobalISel] Move DemandedElt's APInt size assert after isValid() check (#115979)
This prevents the assertion from wrongly triggering on invalid LLT's
2024-11-18 15:39:28 -08:00
Thorsten Schütt
f8d1905a24
[GlobalISel] Combine [S,U]SUBO (#116489)
We import the llvm.ssub.with.overflow.* Intrinsics, but the Legalizer
also builds them while legalizing other opcodes, see narrowScalarAddSub.
2024-11-18 22:39:23 +01:00