Instead of checking for a bitcast from a function type, check
whether the aliasee is a function after stripping bitcasts. This
is not strictly equivalent, but serves the same purpose.
In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bitmask.
This patch adds variants of APInt::isShiftedMask, isShiftedMask_32 and isShiftedMask_64 that return these values as additional arguments.
I've updated a number of cases that were either performing seperate size/position calculations or had created their own local wrapper versions of these.
Differential Revision: https://reviews.llvm.org/D119019
Add a shouldHoist method to TargetInstrInfo which is queried by
MachineLICM to override hoisting decisions for a given target.
This mirrors functionality provided by shouldSink.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D118773
Debug position data is cleared after ScheduleDAGMILive::schedule() due to it also calling placeDebugValues(). Make it so the data is not cleared after initial call to placeDebugValues since we will call it again after reverting a schedule.
Secondly, since we skip debug instructions when reverting the schedule on AMDGPU, all debug instructions are now moved to the end of the scheduling region. RegionEnd points to the beginning of this chunk of debug instructions since it was not incremented when a debug instruction was skipped. RegionBegin may also point to the same debug instruction if Unsched.front() is a debug instruction thus shrinking the region to 1. Fix RegionBegin and RegionEnd so that they point to the current beginning and ending before calling placeDebugValues() since both vars will be used as reference points to move debug instructions back.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D119022
rv64izbb has a RORW/ROLW instructions that operate on the lower
32-bits of a 64-bit value and sign extend bit 31 of the result.
DAGCombiner won't match rotate idioms because the i32 type isn't Legal
on riscv64.
This patch teaches DAGCombiner to allow it if the type is going to
be promoted and the target has Custom type legalization for ISD::ROTL
or ISD::ROTR. I've restricted this to scalar types. It doesn't appear
any in tree targets other than riscv64 have custom type legalization
for rotates.
If this patch isn't acceptable, I guess I can match SRLW, SLLW, and OR
after type legalization, but I'd like to avoid that if possible.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D119062
When the shift amount is known and a known sign bit analysis of
the shiftee indicates that no saturation will occur, then we can
replace SSHLSAT/USHLSAT by SHL.
Differential Revision: https://reviews.llvm.org/D118765
FSAFDO profile loader is currently disabled even --enable-fs-discriminator is enabled.
They need to be turned on by options which makes it cumbersome for experiments.
This patch changes the FSAFDO profile loader enabled by default. Since they are
guarded by EnableFSDiscriminator, they will only be turned on if
--enable-fs-discriminator is enabled. Note that --enable-fs-discriminator is
still disabled by default.
Differential Revision: https://reviews.llvm.org/D119033
I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D119048
These operations are scalarized but the result type v1i1 isn't which
needs special handling (the same as is done for the non-strict
versions of these operations).
Differential Revision: https://reviews.llvm.org/D118258
This header is very large (3M Lines once expended) and was included in location
where dwarf-specific information were not needed.
More specifically, this commit suppresses the dependencies on
llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and
llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used,
this has a decent impact on number of preprocessed lines generated during
compilation of LLVM, as showcased below.
This is achieved by moving some definitions back to the .cpp file, no
performance impact implied[0].
As a consequence of that patch, downstream user may need to manually some extra
files:
llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h
llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h
In some situations, codes maybe relying on the fact that
llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden
dependency now needs to be explicit.
$ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
after: 10978519
before: 11245451
Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
[0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions
Differential Revision: https://reviews.llvm.org/D118781
In the aftermath of D116895 a problem was found in the analysis of
dependencies between store merge candidates in
checkMergeStoreCandidatesForDependencies, that is needed to avoid
the cycles are introduced in the DAG.
In the past it has been enough (or assumed to be enough) to start
scanning from non-chain operands when analysing the store merge
candidates for dependencies, assuming that the analysis of chain
dependencies performed when finding the candidates would cover
up for potential dependencies that exist involving the chain operands.
It was however discovered that one could end up with scenarios such
as descibed in the aarch64-checkMergeStoreCandidatesForDependencies.ll
test case, when the dependency between two stores is given by a mix
of chain operand dependencies and non-chain operand dependencies.
The fix in this patch make sure that we also account for chain operand
dependencies when doing the more elaborate analysis in
checkMergeStoreCandidatesForDependencies, no longer relying on that
the earlier check involving chain operands is enough.
Differential Revision: https://reviews.llvm.org/D118943
Similar to the G_*MULO change.
The code for checking if a constant is legal/pre-legalize is shared between
these, and is kind of hairy. So, factor it out into a new function:
`isConstantLegalOrBeforeLegalizer`.
To make the refactoring clean, further refactor `isLegalOrBeforeLegalizer` into
a wrapper for two functions:
- `isPreLegalize`
- `isLegal`
This is a bit easier to read in general.
https://godbolt.org/z/KW7oszP1o
Differential Revision: https://reviews.llvm.org/D118655
Similar to the following combine in `DAGCombiner::visitMULO`:
```
// fold (mulo x, 0) -> 0 + no carry out
if (isNullOrNullSplat(N1))
return CombineTo(N, DAG.getConstant(0, DL, VT),
DAG.getConstant(0, DL, CarryVT));
```
This fixes some generally poor codegen for `*mulo`:
https://godbolt.org/z/eTxYsvz8f
Differential Revision: https://reviews.llvm.org/D118635
AddressingModeMatcher::matchOperationAddr may attempt to shift a
variable by the same amount of steps as found in the IR in a SHL
instruction. This was done without considering that there could be
undefined behavior in the IR, so the shift performed when compiling
could end up having undefined behavior as well.
This patch avoid UB in the codegenprepare by making sure that we
limit the shift amount used, in a similar way as already being done
in CodeGenPrepare::optimizeLoadExt.
Differential Revision: https://reviews.llvm.org/D118602
This reverts commit bc3b372161716a4c4845d47a877e4892df0d08da.
The planned change that would have needed non-const MachineFunction refs
isn't needed after all.
This helps recognise patterns where we're trying to match STEP_VECTOR
patterns to INDEX instructions that take a GPR for the Start/Step.
The reason for canonicalising this operation to the LHS is
because it will already be canonicalised to the LHS if the RHS
is a constant splat vector.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D118459
After discussion in D116821 this was turned off in 74db5c8c95e,
14aaaa12366f7 applied to limit the maximum memory consumption in rare
conditions, plus some performance patches.
This was using the ugly tablegenerated register enum names, which are
really hideous for register tuples on AMDGPU. Use the prettier names
which are recognized by the asm parser.
Gaps in the basic block number range (from blocks being deleted or folded)
get block-value-tables allocated but never ejected, leading to a memory
leak, currently tripping up the asan buildbots. Fix this up by manually
freeing that memory.
As suggested elsewhere, if these things were owned by a unique_ptr then
cleanup would happen automagically. D118774 should eliminate the need for
this dance.
This patch introduces the conversions from math function calls
to MASS library calls. To resolves calls generated with these conversions, one
need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX.
Differential: https://reviews.llvm.org/D101759
Reviewer: bmahjour
Added a flag to make configurable the number of interferences after
which we 'bail out' and treat a set of intervals as un-evictable. Also
using it on the ML side, as it turns out to be a good control for
compile-time.
With this configurable, we can do a bit of trial and error and see if
bumping it has any effect on heuristic/policy quality.
Differential Revision: https://reviews.llvm.org/D118707
This is a follow-up to D117877: variable assignments of DBG_VALUE $noreg,
or DBG_INSTR_REFs where no value can be found, are represented by a
DbgValue object with Kind "Undef", explicitly meaning "there is no value".
In D117877 I added a special-case to some assignment accounting faster,
without considering this scenario. It causes variables to be given the
value ValueIDNum::EmptyValue, which then ends up being a DenseMap key. The
DenseMap asserts, because EmptyValue is the tombstone key.
Fix this by handling the assign-undef scenario in the special case, to
match what happens in the general case: the variable has no value if it's
only ever assigned $noreg / undef.
Differential Revision: https://reviews.llvm.org/D118715
This patch aims to reduce max-rss from instruction referencing, by avoiding
keeping variable value information in memory for too long. Instead of
computing all the variable values then emitting them to DBG_VALUE
instructions, this patch tries to stream the information out through a
depth first search:
* Make use of the fact LexicalScopes gives a depth-number to each lexical
scope,
* Produce a map that identifies the last lexical scope to make use of a
block,
* Enumerate each scope in LexicalScopes' DFS order, solving the variable
value problem,
* After each scope is processed, look for any blocks that won't be used by
any other scope, and emit all the variable information to DBG_VALUE
instructions.
Differential Revision: https://reviews.llvm.org/D118460
This patch releases some memory from InstrRefBasedLDV earlier that it would
otherwise. The underlying problem is:
* We store a big table of "live in values for each block",
* We translate that into DBG_VALUE instructions in each block,
And both exist in memory at the same time, which needlessly doubles that
information. The most of what this patch does is: as we progressively
translate live-in information into DBG_VALUEs, we free the variable-value /
machine-value tracking information as we go, which significantly reduces
peak memory.
While I'm here, also add a clear method to wipe variable assignments that
have been accumulated into VLocTracker objects, and turn a DenseMap into
a SmallDenseMap to avoid an initial allocation.
Differential Revision: https://reviews.llvm.org/D118453
Install a cache of DBG_INSTR_REF -> ValueIDNum resolutions, for scenarios
where the value has to be reconstructed from several DBG_PHIs. Whenever
this happens, it's because branch folding + tail duplication has messed
with the SSA form of the program, and we have to solve a mini SSA problem
to find the variable value. This is always called twice, so it makes sense
to cache the value.
This gives a ~0.5% geomean compile-time-performance improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D118455
None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.
Was reverted in 1c1b670a73a9 as it broke all non-x86 bots. Original commit
message:
[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out
In certain circumstances with things like autogenerated code and asan, you
can end up with thousands of Values live at the same time, causing a large
working set and a lot of information spilled to the stack. Unfortunately
InstrRefBasedLDV doesn't cope well with this and consumes a lot of memory
when there are many many stack slots. See the reproducer in D116821.
It seems very unlikely that a developer would be able to reason about
hundreds of live named local variables at the same time, so a huge working
set and many stack slots is an indicator that we're likely analysing
autogenerated or instrumented code. In those cases: gracefully degrade by
setting an upper bound on the amount of stack slots to track. This limits
peak memory consumption, at the cost of dropping some variable locations,
but in a rare scenario where it's unlikely someone is actually going to
use them.
In terms of the patch, this adds a cl::opt for max number of stack slots to
track, and has the stack-slot-numbering code optionally return None. That
then filters through a number of code paths, which can then chose to not
track a spill / restore if it touches an untracked spill slot. The added
test checks that we drop variable locations that are on the stack, if we
set the limit to zero.
Differential Revision: https://reviews.llvm.org/D118601
The new LEGALAVL node annotates that the AVL refers to packs of 64bit.
We use a two-stage lowering approach with LEGALAVL:
First, standard SDNodes are translated into illegal VVP layer nodes.
Regardless of source (VP or standard), all VVP nodes have a mask and AVL
parameter. The AVL parameter refers to the element position (just as in
VP intrinsics).
Second, we legalize the AVL usage in VVP layer nodes. If the element
size is < 64bit, the EVL parameter has to be adjusted to refer to packs
of 64bits. We wrap the legalized AVL in a LEGALAVL node to track this.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D118321
This reverts commit ab4756338c5b2216d52d9152b2f7e65f233c4dac.
Breaks some cases, including this:
namespace {
template <typename> struct a {};
} // namespace
class c {
c();
};
class b {
b();
a<c> ax;
};
b::b() {}
c::c() {}
By producing a reference to a type unit for "c" but not producing the type unit.
Bypass this loop if it would do nothing -- if there are no register masks
to be examined, there's no point looking at each location to see if the
location has been def'd. Awkwardly, this was responsible for almost an
entire half a percent of performance improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D118613
In certain circumstances with things like autogenerated code and asan, you
can end up with thousands of Values live at the same time, causing a large
working set and a lot of information spilled to the stack. Unfortunately
InstrRefBasedLDV doesn't cope well with this and consumes a lot of memory
when there are many many stack slots. See the reproducer in D116821.
It seems very unlikely that a developer would be able to reason about
hundreds of live named local variables at the same time, so a huge working
set and many stack slots is an indicator that we're likely analysing
autogenerated or instrumented code. In those cases: gracefully degrade by
setting an upper bound on the amount of stack slots to track. This limits
peak memory consumption, at the cost of dropping some variable locations,
but in a rare scenario where it's unlikely someone is actually going to
use them.
In terms of the patch, this adds a cl::opt for max number of stack slots to
track, and has the stack-slot-numbering code optionally return None. That
then filters through a number of code paths, which can then chose to not
track a spill / restore if it touches an untracked spill slot. The added
test checks that we drop variable locations that are on the stack, if we
set the limit to zero.
Differential Revision: https://reviews.llvm.org/D118601