32115 Commits

Author SHA1 Message Date
Nikita Popov
924696d271 [AsmPrinter] Avoid pointer element type access
Instead of checking for a bitcast from a function type, check
whether the aliasee is a function after stripping bitcasts. This
is not strictly equivalent, but serves the same purpose.
2022-02-08 15:06:02 +01:00
Simon Pilgrim
fd2bb51f1e [ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length
In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bitmask.

This patch adds variants of APInt::isShiftedMask, isShiftedMask_32 and isShiftedMask_64 that return these values as additional arguments.

I've updated a number of cases that were either performing seperate size/position calculations or had created their own local wrapper versions of these.

Differential Revision: https://reviews.llvm.org/D119019
2022-02-08 12:04:13 +00:00
Carl Ritson
42ac4e1a12 [MachineLICM] Add shouldHoist method to TargetInstrInfo
Add a shouldHoist method to TargetInstrInfo which is queried by
MachineLICM to override hoisting decisions for a given target.
This mirrors functionality provided by shouldSink.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D118773
2022-02-08 15:53:05 +09:00
Sheng
146c7820d9 [GlobalISel][Legalizer] Support reducing load/store width in big endian order 2022-02-07 20:06:17 -05:00
Sanjay Patel
d1ecfaa097 [SDAG] try to fold one-demanded-bit-of-multiply
This is a translation of the transform added to InstCombine with:
D118539
2022-02-07 17:24:35 -05:00
Sanjay Patel
fc6bee1c11 [SDAG] SimplifyDemandedBits - generalize fold for 2 LSB of X*X
This is translated from recent changes to the IR version of this function:
D119060
D119139
2022-02-07 15:38:50 -05:00
Vang Thao
570471199b [AMDGPU] Fix debug values in scheduler not placed correctly when reverting
Debug position data is cleared after ScheduleDAGMILive::schedule() due to it also calling placeDebugValues(). Make it so the data is not cleared after initial call to placeDebugValues since we will call it again after reverting a schedule.

Secondly, since we skip debug instructions when reverting the schedule on AMDGPU, all debug instructions are now moved to the end of the scheduling region. RegionEnd points to the beginning of this chunk of debug instructions since it was not incremented when a debug instruction was skipped. RegionBegin may also point to the same debug instruction if Unsched.front() is a debug instruction thus shrinking the region to 1. Fix RegionBegin and RegionEnd so that they point to the current beginning and ending before calling placeDebugValues() since both vars will be used as reference points to move debug instructions back.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D119022
2022-02-07 11:01:13 -08:00
Simon Pilgrim
74555fd367 [DAG] visitINSERT_VECTOR_ELT - break if-else chain as they both return (style). NFC. 2022-02-07 09:58:47 +00:00
Simon Pilgrim
5d3a86489f [GlobalISel] Move getOpcode() calls inside assert() to avoid (void)s. NFC.
Tidier solution to the unused variable warnings - we already do this in other places in this file.
2022-02-07 09:50:27 +00:00
Djordje Todorovic
def10a2895 [GlobalIsel] Fix another "unused variable" warning 2022-02-07 09:32:22 +01:00
Djordje Todorovic
eab395fa40 Fix the warning after D118805
A variable was used within assert() only.
2022-02-07 09:25:02 +01:00
Craig Topper
c35ccd2ac8 [DAGCombiner][RISCV] Allow rotates by non-constant to be matched for i32 on riscv64 with Zbb.
rv64izbb has a RORW/ROLW instructions that operate on the lower
32-bits of a 64-bit value and sign extend bit 31 of the result.

DAGCombiner won't match rotate idioms because the i32 type isn't Legal
on riscv64.

This patch teaches DAGCombiner to allow it if the type is going to
be promoted and the target has Custom type legalization for ISD::ROTL
or ISD::ROTR. I've restricted this to scalar types. It doesn't appear
any in tree targets other than riscv64 have custom type legalization
for rotates.

If this patch isn't acceptable, I guess I can match SRLW, SLLW, and OR
after type legalization, but I'd like to avoid that if possible.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D119062
2022-02-06 10:58:12 -08:00
Kazu Hirata
3a8c51480f [CodeGen] Use = default (NFC)
Identified with modernize-use-equals-default
2022-02-06 10:54:44 -08:00
Bjorn Pettersson
cecf11c315 [DAGCombiner] Fold SSHLSAT/USHLSAT to SHL when no saturation will occur
When the shift amount is known and a known sign bit analysis of
the shiftee indicates that no saturation will occur, then we can
replace SSHLSAT/USHLSAT by SHL.

Differential Revision: https://reviews.llvm.org/D118765
2022-02-06 18:59:06 +01:00
Rong Xu
52d981a4c1 [SampleFDO] Enable FSAFDO loading passes if --enable-fs-discriminator is enabled
FSAFDO profile loader is currently disabled even --enable-fs-discriminator is enabled.
They need to be turned on by options which makes it cumbersome for experiments.

This patch changes the FSAFDO profile loader enabled by default.  Since they are
guarded by EnableFSDiscriminator, they will only be turned on if
--enable-fs-discriminator is enabled. Note that --enable-fs-discriminator is
still disabled by default.

Differential Revision: https://reviews.llvm.org/D119033
2022-02-05 22:37:09 -08:00
Benjamin Kramer
a40dc4eaf8 Simplify mask creation with llvm::seq. NFCI. 2022-02-05 23:35:41 +01:00
Sander de Smalen
6452549f30 [DAGCombiner] Fold vecreduce_or/and if operand is insert_subvector.
Fold:
  vecreduce_or(insert_subvec(zeroinitializer, vec))
  -> vecreduce_or(vec)

  vecreduce_and(insert_subvec(allones, vec))
  -> vecreduce_and(vec)

  vecreduce_and/or(insert_subvec(undef, vec))
  -> vecreduce_and/or(vec)

This is useful for SVE which uses insert/extract subvector
to convert fixed-width to/from scalable vectors.

Reviewed By: bsmith

Differential Revision: https://reviews.llvm.org/D118919
2022-02-05 14:35:53 +00:00
Hongtao Yu
dee058c670 [CSSPGO] Turn on ext-tsp by default for CSSPGO.
I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D119048
2022-02-04 19:46:44 -08:00
Róbert Ágoston
cd4ed08b5a [GlobalISel] Don't combine instructions which are fed by memory instructions using different size
Memory instructions like extending loads from the same address are not equal if
their size is not equal.

This fixes https://github.com/llvm/llvm-project/issues/53524.

Differential Revision: https://reviews.llvm.org/D118805
2022-02-04 15:00:47 -08:00
John Brawn
0d8092dd48 [AArch64] Fix legalization of v1f64 strict_fsetcc and strict_fsetccs
These operations are scalarized but the result type v1i1 isn't which
needs special handling (the same as is done for the non-strict
versions of these operations).

Differential Revision: https://reviews.llvm.org/D118258
2022-02-04 12:55:38 +00:00
serge-sans-paille
ffe8720aa0 Reduce dependencies on llvm/BinaryFormat/Dwarf.h
This header is very large (3M Lines once expended) and was included in location
where dwarf-specific information were not needed.

More specifically, this commit suppresses the dependencies on
llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and
llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used,
this has a decent impact on number of preprocessed lines generated during
compilation of LLVM, as showcased below.

This is achieved by moving some definitions back to the .cpp file, no
performance impact implied[0].

As a consequence of that patch, downstream user may need to manually some extra
files:

llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h
llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h

In some situations, codes maybe relying on the fact that
llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden
dependency now needs to be explicit.

$ clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
after:   10978519
before:  11245451

Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
[0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions

Differential Revision: https://reviews.llvm.org/D118781
2022-02-04 11:44:03 +01:00
Bjorn Pettersson
3db39e7479 [DAGCombiner] Fix dependency analysis in checkMergeStoreCandidatesForDependencies
In the aftermath of D116895 a problem was found in the analysis of
dependencies between store merge candidates in
checkMergeStoreCandidatesForDependencies, that is needed to avoid
the cycles are introduced in the DAG.

In the past it has been enough (or assumed to be enough) to start
scanning from non-chain operands when analysing the store merge
candidates for dependencies, assuming that the analysis of chain
dependencies performed when finding the candidates would cover
up for potential dependencies that exist involving the chain operands.
It was however discovered that one could end up with scenarios such
as descibed in the aarch64-checkMergeStoreCandidatesForDependencies.ll
test case, when the dependency between two stores is given by a mix
of chain operand dependencies and non-chain operand dependencies.

The fix in this patch make sure that we also account for chain operand
dependencies when doing the more elaborate analysis in
checkMergeStoreCandidatesForDependencies, no longer relying on that
the earlier check involving chain operands is enough.

Differential Revision: https://reviews.llvm.org/D118943
2022-02-04 08:53:01 +01:00
Mircea Trofin
91a33ad32b [nfc][mlgo][regalloc] Cache live interval feature components
Lazily cache the feature components of a LiveInterval.

Differential Revision: https://reviews.llvm.org/D118674
2022-02-03 17:01:42 -08:00
Jessica Paquette
9a61e731ff [GlobalISel] Combine (G_*ADDO x, 0) -> x + no carry out
Similar to the G_*MULO change.

The code for checking if a constant is legal/pre-legalize is shared between
these, and is kind of hairy. So, factor it out into a new function:
`isConstantLegalOrBeforeLegalizer`.

To make the refactoring clean, further refactor `isLegalOrBeforeLegalizer` into
a wrapper for two functions:

- `isPreLegalize`
- `isLegal`

This is a bit easier to read in general.

https://godbolt.org/z/KW7oszP1o

Differential Revision: https://reviews.llvm.org/D118655
2022-02-03 14:25:15 -08:00
Jessica Paquette
c636899dc1 [GlobalISel] Combine: (G_*MULO x, 0) -> 0 + no carry out
Similar to the following combine in `DAGCombiner::visitMULO`:

```
  // fold (mulo x, 0) -> 0 + no carry out
  if (isNullOrNullSplat(N1))
    return CombineTo(N, DAG.getConstant(0, DL, VT),
                     DAG.getConstant(0, DL, CarryVT));
```

This fixes some generally poor codegen for `*mulo`:

https://godbolt.org/z/eTxYsvz8f

Differential Revision: https://reviews.llvm.org/D118635
2022-02-03 14:23:58 -08:00
Mircea Trofin
592f52de33 [nfc][regalloc] const LiveIntervals within the allocator
Once built, LiveIntervals are immutable. This patch captures that.

Differential Revision: https://reviews.llvm.org/D118918
2022-02-03 12:35:36 -08:00
Bjorn Pettersson
0352ee1a22 [CodeGenPrepare] Avoid out-of-bounds shift
AddressingModeMatcher::matchOperationAddr may attempt to shift a
variable by the same amount of steps as found in the IR in a SHL
instruction. This was done without considering that there could be
undefined behavior in the IR, so the shift performed when compiling
could end up having undefined behavior as well.

This patch avoid UB in the codegenprepare by making sure that we
limit the shift amount used, in a similar way as already being done
in CodeGenPrepare::optimizeLoadExt.

Differential Revision: https://reviews.llvm.org/D118602
2022-02-03 21:03:58 +01:00
Mircea Trofin
79b98f0a07 Revert "[nfc][mlgo] De-const a parameter"
This reverts commit bc3b372161716a4c4845d47a877e4892df0d08da.

The planned change that would have needed non-const MachineFunction refs
isn't needed after all.
2022-02-03 09:20:36 -08:00
John Brawn
94843ea7d7 [AArch64] Make machine combiner patterns preserve MIFlags
This is mainly done so that we don't lose the nofpexcept flag once we
start emitting it.

Differential Revision: https://reviews.llvm.org/D118621
2022-02-03 11:58:59 +00:00
Sander de Smalen
01bfe9729a [ISEL] Canonicalize STEP_VECTOR to LHS if RHS is a splat.
This helps recognise patterns where we're trying to match STEP_VECTOR
patterns to INDEX instructions that take a GPR for the Start/Step.

The reason for canonicalising this operation to the LHS is
because it will already be canonicalised to the LHS if the RHS
is a constant splat vector.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D118459
2022-02-03 09:31:46 +00:00
Jeremy Morse
4654fa89ea Follow up to 6e03a68b776dc, squelch another leak
This patch is a sticking-paster until D118774 solves the situation with
unique_ptrs. I'm certainly wishing I'd focused on that first X_X.
2022-02-02 21:02:11 +00:00
Jeremy Morse
6e03a68b77 [DebugInfo] Re-enable instruction referencing for x86_64
After discussion in D116821 this was turned off in 74db5c8c95e,
14aaaa12366f7 applied to limit the maximum memory consumption in rare
conditions, plus some performance patches.
2022-02-02 19:41:59 +00:00
Matt Arsenault
a96dbb9035 CodeGen: Use asm register names in warning message
This was using the ugly tablegenerated register enum names, which are
really hideous for register tuples on AMDGPU. Use the prettier names
which are recognized by the asm parser.
2022-02-02 14:20:12 -05:00
Jeremy Morse
206cafb680 Follow up to 9fd9d56dc6b, avoid a memory leak
Gaps in the basic block number range (from blocks being deleted or folded)
get block-value-tables allocated but never ejected, leading to a memory
leak, currently tripping up the asan buildbots. Fix this up by manually
freeing that memory.

As suggested elsewhere, if these things were owned by a unique_ptr then
cleanup would happen automagically. D118774 should eliminate the need for
this dance.
2022-02-02 16:01:11 +00:00
Masoud Ataei
256d253332 [PowerPC] Scalar IBM MASS library conversion pass
This patch introduces the conversions from math function calls
to MASS library calls. To resolves calls generated with these conversions, one
need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX.

Differential: https://reviews.llvm.org/D101759

Reviewer: bmahjour
2022-02-02 07:54:19 -08:00
Mircea Trofin
660ff655c8 Fix buildbreak introduced in ed2deab5956fea9e8f64ef6020fe0b4e19734ecc 2022-02-02 07:34:51 -08:00
Mircea Trofin
ed2deab595 [nfc][regalloc] Make the max inference cutoff configurable
Added a flag to make configurable the number of interferences after
which we 'bail out' and treat a set of intervals as un-evictable. Also
using it on the ML side, as it turns out to be a good control for
compile-time.

With this configurable, we can do a bit of trial and error and see if
bumping it has any effect on heuristic/policy quality.

Differential Revision: https://reviews.llvm.org/D118707
2022-02-02 07:29:34 -08:00
Jeremy Morse
43de305704 [DebugInfo][InstrRef] Fix a tombstone-in-DenseMap crash from D117877
This is a follow-up to D117877: variable assignments of DBG_VALUE $noreg,
or DBG_INSTR_REFs where no value can be found, are represented by a
DbgValue object with Kind "Undef", explicitly meaning "there is no value".
In D117877 I added a special-case to some assignment accounting faster,
without considering this scenario. It causes variables to be given the
value ValueIDNum::EmptyValue, which then ends up being a DenseMap key. The
DenseMap asserts, because EmptyValue is the tombstone key.

Fix this by handling the assign-undef scenario in the special case, to
match what happens in the general case: the variable has no value if it's
only ever assigned $noreg / undef.

Differential Revision: https://reviews.llvm.org/D118715
2022-02-02 15:08:49 +00:00
Jeremy Morse
9fd9d56dc6 [DebugInfo][InstrRef][NFC] Use depth-first scope search for variable locs
This patch aims to reduce max-rss from instruction referencing, by avoiding
keeping variable value information in memory for too long. Instead of
computing all the variable values then emitting them to DBG_VALUE
instructions, this patch tries to stream the information out through a
depth first search:
 * Make use of the fact LexicalScopes gives a depth-number to each lexical
   scope,
 * Produce a map that identifies the last lexical scope to make use of a
   block,
 * Enumerate each scope in LexicalScopes' DFS order, solving the variable
   value problem,
 * After each scope is processed, look for any blocks that won't be used by
   any other scope, and emit all the variable information to DBG_VALUE
   instructions.

Differential Revision: https://reviews.llvm.org/D118460
2022-02-02 14:09:54 +00:00
Jeremy Morse
a80181a81e [DebugInfo][InstrRef][NFC] Free resources at an earlier stage
This patch releases some memory from InstrRefBasedLDV earlier that it would
otherwise. The underlying problem is:
 * We store a big table of "live in values for each block",
 * We translate that into DBG_VALUE instructions in each block,

And both exist in memory at the same time, which needlessly doubles that
information. The most of what this patch does is: as we progressively
translate live-in information into DBG_VALUEs, we free the variable-value /
machine-value tracking information as we go, which significantly reduces
peak memory.

While I'm here, also add a clear method to wipe variable assignments that
have been accumulated into VLocTracker objects, and turn a DenseMap into
a SmallDenseMap to avoid an initial allocation.

Differential Revision: https://reviews.llvm.org/D118453
2022-02-02 12:58:15 +00:00
Jeremy Morse
d556eb7e27 [DebugInfo][InstrRef][NFC] Cache some PHI resolutions
Install a cache of DBG_INSTR_REF -> ValueIDNum resolutions, for scenarios
where the value has to be reconstructed from several DBG_PHIs. Whenever
this happens, it's because branch folding + tail duplication has messed
with the SSA form of the program, and we have to solve a mini SSA problem
to find the variable value. This is always called twice, so it makes sense
to cache the value.

This gives a ~0.5% geomean compile-time-performance improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D118455
2022-02-02 12:21:28 +00:00
Simon Pilgrim
5aa2acc86b [DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper
None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.
2022-02-02 12:04:49 +00:00
Jeremy Morse
14aaaa1236 Re-apply 3fab2d138e30, now with a triple added
Was reverted in 1c1b670a73a9 as it broke all non-x86 bots. Original commit
message:

[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out

In certain circumstances with things like autogenerated code and asan, you
can end up with thousands of Values live at the same time, causing a large
working set and a lot of information spilled to the stack. Unfortunately
InstrRefBasedLDV doesn't cope well with this and consumes a lot of memory
when there are many many stack slots. See the reproducer in D116821.

It seems very unlikely that a developer would be able to reason about
hundreds of live named local variables at the same time, so a huge working
set and many stack slots is an indicator that we're likely analysing
autogenerated or instrumented code. In those cases: gracefully degrade by
setting an upper bound on the amount of stack slots to track. This limits
peak memory consumption, at the cost of dropping some variable locations,
but in a rare scenario where it's unlikely someone is actually going to
use them.

In terms of the patch, this adds a cl::opt for max number of stack slots to
track, and has the stack-slot-numbering code optionally return None. That
then filters through a number of code paths, which can then chose to not
track a spill / restore if it touches an untracked spill slot. The added
test checks that we drop variable locations that are on the stack, if we
set the limit to zero.

Differential Revision: https://reviews.llvm.org/D118601
2022-02-02 11:04:00 +00:00
Sam Parker
281d29b8fe [TypePromotion] Avoid some unnecessary truncs
Check for legal zext 'sinks' before inserting a trunc.

Differential Revision: https://reviews.llvm.org/D115451
2022-02-02 10:05:15 +00:00
Simon Moll
7d926b7177 [VE] LEGALAVL and staged VVP legalization
The new LEGALAVL node annotates that the AVL refers to packs of 64bit.
We use a two-stage lowering approach with LEGALAVL:

First, standard SDNodes are translated into illegal VVP layer nodes.
Regardless of source (VP or standard), all VVP nodes have a mask and AVL
parameter. The AVL parameter refers to the element position (just as in
VP intrinsics).

Second, we legalize the AVL usage in VVP layer nodes. If the element
size is < 64bit, the EVL parameter has to be adjusted to refer to packs
of 64bits.  We wrap the legalized AVL in a LEGALAVL node to track this.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D118321
2022-02-02 09:11:41 +01:00
Kevin Athey
1c1b670a73 Revert "[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out"
This reverts commit 3fab2d138e30c65249e1eaea6cc68b2b7f50955a.

Breaking PPC sanitizer build:
https://lab.llvm.org/buildbot/#/builders/105/builds/20857
2022-02-01 18:37:02 -08:00
David Blaikie
f69f23396d Revert "DebugInfo: Don't put types in type units if they reference internal linkage types"
This reverts commit ab4756338c5b2216d52d9152b2f7e65f233c4dac.

Breaks some cases, including this:

namespace {
template <typename> struct a {};
} // namespace
class c {
  c();
};
class b {
  b();
  a<c> ax;
};
b::b() {}
c::c() {}

By producing a reference to a type unit for "c" but not producing the type unit.
2022-02-01 16:13:07 -08:00
David Green
c89cfbd4dd Revert "[DAG] Extend SearchForAndLoads with any_extend handling"
This reverts commit 100763a88fe97b22cd5e3f69d203669aac3ed48f as it was
making incorrect assumptions about implicit zero_extends.
2022-02-01 20:18:40 +00:00
Jeremy Morse
8e75536e51 [DebugInfo][InstrRef][NFC] Bypass a frequently-noop loop
Bypass this loop if it would do nothing -- if there are no register masks
to be examined, there's no point looking at each location to see if the
location has been def'd. Awkwardly, this was responsible for almost an
entire half a percent of performance improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D118613
2022-02-01 19:39:09 +00:00
Jeremy Morse
3fab2d138e [DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out
In certain circumstances with things like autogenerated code and asan, you
can end up with thousands of Values live at the same time, causing a large
working set and a lot of information spilled to the stack. Unfortunately
InstrRefBasedLDV doesn't cope well with this and consumes a lot of memory
when there are many many stack slots. See the reproducer in D116821.

It seems very unlikely that a developer would be able to reason about
hundreds of live named local variables at the same time, so a huge working
set and many stack slots is an indicator that we're likely analysing
autogenerated or instrumented code. In those cases: gracefully degrade by
setting an upper bound on the amount of stack slots to track. This limits
peak memory consumption, at the cost of dropping some variable locations,
but in a rare scenario where it's unlikely someone is actually going to
use them.

In terms of the patch, this adds a cl::opt for max number of stack slots to
track, and has the stack-slot-numbering code optionally return None. That
then filters through a number of code paths, which can then chose to not
track a spill / restore if it touches an untracked spill slot. The added
test checks that we drop variable locations that are on the stack, if we
set the limit to zero.

Differential Revision: https://reviews.llvm.org/D118601
2022-02-01 19:25:29 +00:00