34787 Commits

Author SHA1 Message Date
Christian Kissig
730df5a437
[Support] Add KnownBits::computeForSubBorrow (#67788)
- [Support] Add KnownBits::computeForSubBorrow
- [CodeGen] Implement USUBC, USUBO_CARRY, and SSUBO_CARRY with
KnownBits::computeForSubBorrow
- [CodeGen] Compute unknown bits for Carry/Borrow for ADD/SUB
- [CodeGen] Compute known bits of Carry/Borrow for UADDO, SADDO, USUBO,
and SSUBO

Fixes #65893

---------

Co-authored-by: Shafik Yaghmour <shafik@users.noreply.github.com>
2023-10-18 13:48:47 +01:00
Paul Walker
675231eb09
[SVE ACLE] Allow default zero initialisation for svcount_t. (#69321)
This matches the behaviour of the other SVE ACLE types.
2023-10-18 10:40:07 +01:00
Noah Goldstein
112e49b381 [DAGCombiner] Transform (icmp eq/ne (and X,C0),(shift X,C1)) to use rotate or to getter constants.
If `C0` is a mask and `C1` shifts out all the masked bits (to
essentially compare two subsets of `X`), we can arbitrarily re-order
shift as `srl` or `shl`.

If `C1` (shift amount) is a power of 2, we can replace the and+shift
with a rotate.

Otherwise, based on target preference we can arbitrarily swap `shl`
and `shl` in/out to get better constants.

On x86 we can use this re-ordering to:
    1) get better `and` constants for `C0` (zero extended moves or
       avoid imm64).
    2) covert `srl` to `shl` if `shl` will be implementable with `lea`
       or `add` (both of which can be preferable).

Proofs: https://alive2.llvm.org/ce/z/qzGM_w

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D152116
2023-10-18 01:16:55 -05:00
Pierre van Houtryve
c464fea779
[DAG] Constant fold FMAD (#69324)
This has very little effect on codegen in practice, but is a nice to
have I think.

See #68315
2023-10-18 07:46:24 +02:00
Simon Pilgrim
2a40ec2d3e [DAG] SimplifyDemandedBits - fix isOperationLegal typo in D146121
We need to check that the simplified ISD::SRL node is legal, not the old one

Noticed while trying to isolate the regressions in D155472
2023-10-17 17:50:12 +01:00
Guozhi Wei
760e7d00d1 [X86, Peephole] Enable FoldImmediate for X86
Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate.

Also enhanced peephole by deleting identical instructions after FoldImmediate.

Differential Revision: https://reviews.llvm.org/D151848
2023-10-17 16:22:42 +00:00
Simon Pilgrim
2f329d88bc [DAG] foldConstantFPMath - accept ArrayRef<SDValue> Ops instead of explicit N1/N2 ops
First step towards adding unary/ternary fp ops handling, and not just binops
2023-10-17 16:31:46 +01:00
Arthur Eubanks
5fab20bc7e
[NFC] Move StableHashing.h from CodeGen to ADT (#67704) 2023-10-16 10:42:22 -07:00
Kazu Hirata
0b570ad969
[CodeGen] Remove LiveVariables::{isPHIJoin,setPHIJoin} (#69128)
The last use of isPHIJoin was removed by:

  commit fac770b865f59cbe615241dad153ad20d5138b9e
  Author: Jakob Stoklund Olesen <stoklund@2pi.dk>
  Date:   Sat Feb 9 00:04:07 2013 +0000

so there is no reason to maintain PHIJoins.
2023-10-16 09:31:09 -07:00
Björn Pettersson
4acb96c99f
[SelectionDAG] Tidy up around endianness and isConstantSplat (#68212)
The BuildVectorSDNode::isConstantSplat function could depend on
endianness, and it takes a bool argument that can be used to indicate
if big or little endian should be considered when internally casting
from a vector to a scalar. However, that argument is default set to
false (= little endian). And in many situations, even in target
generic code such as DAGCombiner, the endianness isn't specified when
using the function.

The intent with this patch is to highlight that endianness doesn't
matter, depending on the context in which the function is used.

In DAGCombiner the code is slightly refactored. Back in the days when
the code was written it wasn't possible to request a MinSplatBits
size when calling isConstantSplat. Instead the code re-expanded the
found SplatValue to match with the EltBitWidth. Now we can just
provide EltBitWidth as MinSplatBits and remove the logic for doing
the re-expand.

While being at it, tidying up around isConstantSplat, this patch also
adds an explicit check in BuildVectorSDNode::isConstantSplat to break
out from the loop if trying to split an on VecWidth into two halves.
Haven't been able to prove that there could be miscompiles involved
if not doing so. There are lit tests that trigger that scenario,
although I think they happen to later discard the returned SplatValue
for other reasons.
2023-10-16 14:53:53 +02:00
Nikita Popov
d4300154b6 Revert "[ValueTracking] Remove by-ref computeKnownBits() overloads (NFC)"
This reverts commit b5743d4798b250506965e07ebab806a3c2d767cc.

This causes some minor compile-time impact. Revert for now, better
to do the change more gradually.
2023-10-16 14:04:09 +02:00
Nikita Popov
b5743d4798 [ValueTracking] Remove by-ref computeKnownBits() overloads (NFC)
Remove the old overloads that accept KnownBits by reference, in
favor of those that return it by value.
2023-10-16 13:00:31 +02:00
Carl Ritson
e1bb0598b2
[MachineBasicBlock] Fix use after free in SplitCriticalEdge (#68786)
Remove use after free when attempting to update SlotIndexes in
MachineBasicBlock::SplitCriticalEdge.

Use MachineFunction delegate mechanism to capture target specific
manipulations of branch instructions and update SlotIndexes.
2023-10-15 17:32:27 +09:00
Markus Böck
0ad92c0cbb
[StatepointLowering] Take return attributes of gc.result into account (#68439)
The current lowering of statepoints does not take into account return
attributes present on the `gc.result` leading to different code being
generated than if one were to not use statepoints. These return
attributes can affect the ABI which is why it is important that they are
applied in the lowering.
2023-10-14 18:38:18 +02:00
Craig Topper
3750558ee1
[RISCV][GISel] Legalize G_SMULO/G_UMULO (#67635)
Update `LegalizerHelper::widenScalarMulo` to not create a mulo if we aren't going to use the overflow flag. This prevents needing to legalize the widened operation. This generates better code when we need to make a libcall for multiply.
2023-10-13 20:34:45 -07:00
Kazu Hirata
6e8013a130 [llvm] Stop including llvm/ADT/StringMap.h (NFC)
These source files do not use StringMap.
2023-10-13 20:09:33 -07:00
Yingwei Zheng
53c81a8c16
[RISCV][SDAG] Fix constant narrowing when narrowing loads (#69015)
When narrowing logic ops(OR/XOR) with constant rhs, `DAGCombiner` will fixup the constant rhs node.
It is incorrect when lhs is also a constant. For example, we will incorrectly replace `xor OpaqueConstant:i64<8191>, Constant:i64<-1>` with `xor (and OpaqueConstant:i64<8191>, Constant:i64<65535>), Constant:i64<-1>`.

Fixes #68855.
2023-10-14 06:38:17 +08:00
Maurice Heumann
187e02fa2d
[CodeGenPrepare] Check types when unmerging GEPs across indirect branches (#68587)
The optimization in CodeGenPrepare, where GEPs are unmerged across
indirect branches must respect the types of both GEPs and their sizes
when adjusting the indices.

The sample here shows the bug:

https://godbolt.org/z/8e9o5sYPP

The value `%elementValuePtr` addresses the second field of the
`%struct.Blub`. It is therefore a GEP with index 1 and type i8.
The value `%nextArrayElement` addresses the next array element. It is
therefore a GEP with index 1 and type `%struct.Blub`.

Both values point to completely different addresses, even if the indices
are the same, due to the types being different.
However, after CodeGenPrepare has run, `%nextArrayElement` is a bitcast
from `%elementValuePtr`, meaning both were treated as equal.

The cause for this is that the unmerging optimization does not take
types into consideration.
It sees both GEPs have `%currentArrayElement` as source operand and
therefore tries to rewrite `%nextArrayElement` in terms of
`%elementValuePtr`.
It changes the index to the difference of the two GEPs. As both indices
are `1`, the difference is `0`. As the indices are `0` the GEP is later
replaced with a simple bitcast in CodeGenPrepare.

Before adjusting the indices, the types of the GEPs would have to be
aligned and the indices scaled accordingly for the optimization to be
correct.
Due to the size of the struct being `16` and the `%elementValuePtr`
pointing to offset `1`, the correct index for the unmerged
`%nextArrayElement` would be 15.

I assume this bug emerged from the opaque pointer change as GEPs like
`%elementValuePtr` that access the struct field based of type i8 did not
naturally occur before.

In light of future migration to ptradd, simply not performing the
optimization if the types mismatch should be sufficient.
2023-10-13 09:47:47 +02:00
Momchil Velikov
2ceabf6bdc
[MachineSink] Reduce the number of unnecessary invalidations of StoreInstrCache (NFC) (#68676)
Don't invalidate the cache when erasing instructions which cannot ever
appear in the cache.
2023-10-12 10:06:19 +01:00
Momchil Velikov
86d9faa5a9
[MachineSink] Use LLVM ADTs (NFC) (#68677)
Replace a few uses of `std::map` with `llvm::DenseMap`.
2023-10-12 10:04:41 +01:00
Rahman Lavaee
28b9126879
[BasicBlockSections] Introduce the path cloning profile format to BasicBlockSectionsProfileReader. (#67214)
Following up on prior RFC
(https://lists.llvm.org/pipermail/llvm-dev/2020-September/145357.html)
we can now improve above our highly-optimized basic-block-sections
binary (e.g., 2% for clang) by applying path cloning. Cloning can
improve performance by reducing taken branches.

This patch prepares the profile format for applying cloning actions.

The basic block cloning profile format extends the basic block sections
profile in two ways.

1. Specifies the cloning paths with a 'p' specifier. For example, `p 1 4
5` specifies that blocks with BB ids 4 and 5 must be cloned along the
edge 1 --> 4.
2. For each cloned block, it will appear in the cluster info as
`<bb_id>.<clone_id>` where `clone_id` is the id associated with this
clone.

For example, the following profile specifies one cloned block (2) and
determines its cluster position as well.
```
f foo
p 1 2
c 0 1 2.1 3 2 5
```

This patch keeps backward-compatibility (retains the behavior for old
profile formats). This feature is only introduced for profile version >=
1.
2023-10-11 22:47:13 -07:00
weiguozhi
b6043f9867
[RA] Disable split around hint register if optimize for size (#68619)
Split a virtual register with hint may generate COPY instructions in
multiple cold basic blocks, and increase code size. So disable this
split when the function is optimized for size.
2023-10-11 14:57:15 -07:00
Jay Foad
7ddf6e915c
[SlotIndexes] Use upper/lower bound terminology for MBB searches. NFC. (#68802)
Rename advanceMBBIndex and findMBBIndex to getMBBLowerBound and add
getMBBUpperBound.

The motivations are:
- Make it clear what kind of search is being done, using names inspired
  by std::upper/lower_bound.
- Simplify getMBBFromIndex which really wants an upper bound search and
  previously had to work hard to get the result it wanted from a lower
  bound search.
2023-10-11 16:37:47 +01:00
chuongg3
d88d9834e9
[AArch64][GlobalISel] Support more types for TRUNC (#66927)
G_TRUNC will get lowered into trunc(merge(trunc(unmerge),
trunc(unmerge))) if the source is larger than 128 bits or the truncation
is more than half of the current bit size.

Now mirrors ZEXT/SEXT code more closely for vector types.
2023-10-11 16:05:25 +01:00
Jay Foad
fac4206e66 [EarlyIfConversion] Simplify condition after #65729 2023-10-11 10:53:12 +01:00
Jay Foad
05c16f40c9 [VirtRegMap] Simplify condition after #65729 2023-10-11 10:33:52 +01:00
Jay Foad
b78f3ea7df
Clean up strange uses of getAnalysisIfAvailable (#65729)
After a pass calls addRequired<X>() it is strange to call
getAnalysisIfAvailable<X>() because analysis X should always be
available. Use getAnalysis<X>() instead.
2023-10-11 09:53:00 +01:00
Fangrui Song
2d854dd3e7 Move global namespace cl::opt inside llvm:: or internalize them 2023-10-10 19:58:03 -07:00
Serge Pavlov
462d5830da [GlobalISel] Add support for *_fpmode intrinsics
The change implements support of the intrinsics `get_fpmode`,
`set_fpmode` and `reset_fpmode` in Global Instruction Selector. Now they
are lowered into library function calls.

Differential Revision: https://reviews.llvm.org/D158260
2023-10-09 21:14:07 +07:00
Hendrik Greving
2600aaab21
Revert "[MachineLICM] Relax overlay conservative PHI check (#67186)" (#68580)
This reverts commit 71a8d2e3064fcb3ff76565e6e8529613f90aa51b.
2023-10-09 05:26:58 -07:00
LiqinWeng
111c7c1d07
[VP] IR expansion for bitreverse/bswap (#68504) 2023-10-09 19:59:52 +08:00
Hendrik Greving
71a8d2e306
[MachineLICM] Relax overlay conservative PHI check (#67186)
Skip LICM if PHI belongs to the current loop, e.g. is in the
loop's header. This prevents LICM from bailing for CFGs like

L1:
  R = LoopInvariant // can be LICM'd
  BR L1
L2:
  PHI(R, ..)
  BR L2
2023-10-09 04:49:11 -07:00
Jay Foad
7b3bbd83c0 Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.
2023-10-09 12:31:32 +01:00
Jay Foad
2501ae58e3
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
2023-10-09 11:44:41 +01:00
Jie Fu
573a083c1c [DAG] Remove unused variable 'VT' in DAGCombiner.cpp (NFC)
/llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:26896:7: error: unused variable 'VT' [-Werror,-Wunused-variable]
  EVT VT = N->getValueType(0);
      ^
1 error generated.
2023-10-09 18:30:38 +08:00
Simon Pilgrim
072675f14e [DAG] foldSelectOfBinops - correctly handle select of binops where ResNo != 0
Correctly handle cases where the select(cond, binop(x, y), binop(z, y)) --> binop(select(cond, x, z), y) fold is selecting ResNo != 0 results (UADDO flags etc.)

Fixes #68539
2023-10-09 11:08:55 +01:00
Kazu Hirata
d7b18d5083 Use llvm::endianness{,::little,::native} (NFC)
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form.  This patch replaces
llvm::support::endianness with llvm::endianness.
2023-10-09 00:54:47 -07:00
LiqinWeng
32f7197765
[VP] Use the interface of 'getFunctionalIntrinsicID' to get the non-p… (#68508)
…redicated Intrinsic ID
2023-10-08 18:14:48 +08:00
Amara Emerson
7510f32f90 [MachineSink] Fix crash due to use-after-free in a MachineInstr* cache.
After the SinkAndFold optimization was enabled, we saw some crashes with
GISel due to SinkAndFold erasing an MI while a reference was being held in a
cache.
2023-10-06 15:02:39 -07:00
Kazu Hirata
e9fa18878c [SelectionDAG] Fix an unused variable warning
This patch fixes:

  llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:10832:12: error:
  variable 'Changed' set but not used
  [-Werror,-Wunused-but-set-variable]
2023-10-06 09:27:35 -07:00
Ben Mudd
6d6b395b53 [DebugInfo][SelectionDAG] Add debug info salvaging for TRUNC nodes
This patch adds support for salvaging TRUNC nodes during SelectionDAG,
fixing LLVM issue #63076:
  https://github.com/llvm/llvm-project/issues/63076

Reviewed in: https://github.com/llvm/llvm-project/pull/66922
2023-10-06 16:10:33 +01:00
Petar Avramovic
2fa7d652d0 AMDGPU: Fix temporal divergence introduced by machine-sink (#67456)
Temporal divergence that was present in input or introduced in IR
transforms, like code-sinking or LICM, is handled in SIFixSGPRCopies
by changing sgpr source instr to vgpr instr.
After 5b657f5, that moved LICM after AMDGPUCodeGenPrepare,
machine-sinking can introduce temporal divergence by sinking
instructions outside of the cycle.
Add isSafeToSink callback in TargetInstrInfo.
2023-10-06 15:00:08 +02:00
Petar Avramovic
ccf68ab432 Revert "MachineSink: Fix sinking VGPR def out of a divergent loop"
This reverts commit 3f8ef57bede94445b1a1042c987cc914a886e7ff.
2023-10-06 15:00:08 +02:00
Matthias Braun
2e26d09106
BlockFrequencyInfo: Add PrintBlockFreq helper (#67512)
- Refactor the (Machine)BlockFrequencyInfo::printBlockFreq functions
into a `PrintBlockFreq()` function returning a `Printable` object. This
simplifies usage as it can be directly piped to a `raw_ostream` like
`dbgs() << PrintBlockFreq(MBFI, Freq) << '\n';`.
- Previously there was an interesting behavior where
`BlockFrequencyInfoImpl` stores frequencies both as a `Scaled64` number
and as an `uint64_t`. Most algorithms use the `BlockFrequency`
abstraction with the integers, the print function for basic blocks
printed the `Scaled64` number potentially showing higher accuracy than
was used by the algorithm. This changes things to only print
`BlockFrequency` values.
- Replace some instances of `dbgs() << Freq.getFrequency()` with the new
function.
2023-10-05 18:26:50 -07:00
Matt Arsenault
5e15997291
MachineFunctionPass: Clear properties before running function (#67962)
This ensures !isSSA checks in the function work if the input MIR
happened to appear as SSA.
2023-10-05 15:11:47 -07:00
Nico Weber
f320065aeb Revert "[LLVM][DWARF] Add support for monolithic types in .debug_names (#68131)"
This reverts commit 9bbd2bf654634cd95dd0be7948ec8402c3c76e1e.

Accidental commit: https://github.com/llvm/llvm-project/pull/68131#issuecomment-1749430207
2023-10-05 14:47:04 -04:00
Matthias Braun
5181156b37
Use BlockFrequency type in more places (NFC) (#68266)
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to
make usage more consistent and explicit.

- Use `BlockFrequency Freq` parameter for `setBlockFreq`,
`getProfileCountFromFreq` and `setBlockFreqAndScale` functions.
- Return `BlockFrequency` in `getEntryFreq()` functions.
- While on it change some `const BlockFrequency& Freq` parameters to
plain `BlockFreqency Freq`.
- Mark `BlockFrequency(uint64_t)` constructor as explicit.
- Add missing `BlockFrequency::operator!=`.
- Remove `uint64_t BlockFreqency::getMaxFrequency()`.
- Add `BlockFrequency BlockFrequency::max()` function.
2023-10-05 11:40:17 -07:00
Alexander Yermolovich
9bbd2bf654
[LLVM][DWARF] Add support for monolithic types in .debug_names (#68131)
Added support for Type Units in monolithic DWARF in .debug_names.
2023-10-05 11:14:18 -07:00
Alexey Bataev
e22818d5c9 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-05 06:17:07 -07:00
Kirill Stoimenov
0a776996af Revert "[DAG] Attempt shl narrowing in SimplifyDemandedBits"
This reverts commit 7a8c04ef84ecdab4390b451d4c2fe17bc45a7b63.
2023-10-04 22:15:41 +00:00