This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D148303
`G_FNEG` used to be legalized to `G_FSUB -0, x` causing infinite loop.
This is no longer the case after D84287.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D148187
PATCHABLE_* instructions expand to up to 36-byte
sleds. Updating the size of PATCHABLE instructions
causes them to be outlined, so we need to add a
check to prevent the outliner from considering
basic blocks that contain PATCHABLE instructions.
Differential Revision: https://reviews.llvm.org/D147982
Last user of DemandedBitsWrapperPass was the BDCE pass. Since
the legacy PM version of BDCE was removed in an earlier commit, this
patch removes the now unused DemandedBitsWrapperPass.
Differential Revision: https://reviews.llvm.org/D148336
Remove C APIs for interacting with PassRegistry and pass
initialization. These are legacy PM concepts, and are no longer
relevant for the new pass manager.
Calls to these initialization functions can simply be dropped.
Differential Revision: https://reviews.llvm.org/D145043
Currenty, setting the -mbb-profile-dump dumps a CSV file with blocks
inside an individual function identified by their MBB numbers. This
patch changes the MBBs to be identified by their ID which is set at MBB
creation and not changed afterwards, making it inherently stable
throughout the backend. This alleviates concerns with the MBB IDs
changing between the profile dump and what ends up in the final object
file. The MBBs inside the SHT_LLVM_BB_ADDR_MAP sections are also
identified using their MBB ID rather than number, so if we want to match
them up we need to identify the MBBs here by number.
Reviewed By: mtrofin, rahmanl
Differential Revision: https://reviews.llvm.org/D147366
Toggle true/false values of the JumpToFallThrough
parameter to simplify code and make it consistent
with the documentation for the `getFallThrough(..)`
method.
Reviewed By: bcahoon
Differential Revision: https://reviews.llvm.org/D148139
If we have set of mergeable stores of shifts, but the original source value being shifted
is wider than the merged size, we should still be able to merge if we truncate first. To do this
however we need to search for stores speculatively up the block, without knowing exactly how
many stores we should see before we stop. The old algorithm has to match an exact number of
stores to fit the wide type, or it dies. The new one will try to set the wide type to however
many stores we found in the upwards block traversal and use later checks to verify if they're
a valid mergeable set.
The reason I need to move this to LoadStoreOpt is because the combiner works going top down
inside a block, which means that we end up doing partial merges because we haven't seen all
the possible stores before we mutate the MIR. In LoadStoreOpt we can go bottom up.
As a side effect of this change, we also end up doing better on an existing test case (missing_store)
since we manage to do a partial merge there.
This DAGCombine is not valid for some combinations of the known bits
of x and non-power-of-two widths of x. As shown in the bug:
- The bitwidth of x is 35 (n=5)
- The unknown bits of x is only the least significant bit
- This gives the result of the ctlz two possible values: 34 or 35, both
of which will give 1 when left-shifted 5 bits.
- So the `eor x, 1` that this optimisation would give is not correct.
A similar instcombine optimisation is only applied when the width of x is
a power-of-two. GlobalISel does not have this bug, as shown by the testcase.
Fixes#61549
Differential Revision: https://reviews.llvm.org/D147518
If we go to line 302, with one of MCE or MAB is not nullptr, then we could
leak mem here.
Use unique_ptr to maintain these 2 pointer can avoid it.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D148003
As described in [0], this extends IRPGO to support //Temporal Profiling//.
When `-pgo-temporal-instrumentation` is used we add the `llvm.instrprof.timestamp()` intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function `INSTR_PROF_PROFILE_SET_TIMESTAMP()`. A new field in the `llvm_prf_cnts` section stores each function's timestamp. Then in `llvm-profdata merge` we convert these function timestamps into a //trace// and add it to the indexed profile.
Since these traces could significantly increase the profile size, we've added `-max-temporal-profile-trace-length` and `-temporal-profile-trace-reservoir-size` to limit the length of a trace and the number of traces in a profile, respectively.
In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup.
Special thanks to Julian Mestre for helping with reservoir sampling.
[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
Reviewed By: snehasish
Differential Revision: https://reviews.llvm.org/D147287
This shows up in the wild, notably as a regression in D127115 .
Depends on D147821
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D147827
I've found that a frequent source of debug information loss in optimized
code is due to DEBUG_VALUE intrinsics in a position of the instruction
stream that is outside the scope of the variable it describes.
Tracking these is pretty difficult with the existing debug messages of
the history calculator; this patch addresses the issue by making it
obvious when this event happens.
Differential Revision: https://reviews.llvm.org/D147718
The type of a function is nowadays just an opaque pointer, which is not
helpful when analyzing FastISel misses. Instead print the actual
function type of the function.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D147716
This patch splits a restore point to allow it to only post-dominate blocks reachable by use
or def of CSRs(Callee Saved Registers)/FI(Frame Index).
Benchmarking this on SPEC2017, this gives around 4% improvement on povray and no significant change
for others.
Co-authored-by: junbuml
Differential Revision: https://reviews.llvm.org/D42600
We add a field `IsOutlined` to indicate whether a MachineFunction
is outlined and set it true for outlined functions in MachineOutliner.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D146191
Outlining isn't always a win when the saved instruction count is >= 1.
The overhead of representing a new function in the binary depends on
exception metadata and alignment. So parameterize this for local tuning.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D136774
AMDGPU code with enabled address sanitizer generates tons of stack objects (> 200000 in my testcase) and
takes forever to compile due to the time spent on stack slot sharing.
While LiveRange::overlaps method has logarithmic complexity on the number of segments in the involved
liveranges the problem is that when a new interval is assigned to a used color it's tested against
overlapping every other assigned interval for that color.
Instead I decided to join all assigned intervals for a color into a single interval and this allows to
have logarithmic complexity on the number of segments for the joined interval.
This patch reduced time spent on stack slot coloring pass from 628 to 3 seconds on my testcase.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D146057
This was assuming all register operands were assigned to physical registers.
This should ignore the operands which weren't assigned in this run.
Fixes#61134
For IR like:
```
%alloca = alloca ...
dbg.value(%alloca, !myvar, OP_deref(<other_ops>))
```
GlobalISel lowers it to MIR:
```
%some_reg = G_FRAME_INDEX <stack_slot>
DBG_VALUE %some_reg, !myvar, OP_deref(<other_ops>)
```
In other words, if the value of `!myvar` can be obtained by
dereferencing an alloca, in MIR we say that the _location_ of a variable
is obtained by dereferencing register %some_reg (plus some
`<other_ops>`).
We can instead remove the use of `%some_reg`: the location of `!myvar`
_is_ `<stack_slot>` (plus some `<other_ops>`). This patch implements
this transformation, which improves debug information handling in O0, as
these registers hardly ever survive register allocation.
A note about testing: similar to what was done in D76934
(f24e2e9eebde4b7a1d), this patch exposed a bug in the Builder class when
using `-debug`, where we tried to print an incomplete instruction. The
changes in `MachineIRBuilder.cpp` address that.
Differential Revision: https://reviews.llvm.org/D147536
The intrinsics have been out of experimental since 322d0afd875d
("[llvm][mlir] Promote the experimental reduction intrinsics to be
first class intrinsics.", 2020-10-07); update some places that still
referred to them as experimental.
Compiler hits infinite loop in DAGCombine. For force-streaming-compatible-sve
mode we have custom lowering for 128-bit vector splats and later in
DAGCombiner::SimplifyVCastOp() we scalarized SPLAT because we have custom
lowering for SME. Later, we restored SPLAT opertion via performMulCombine().
Such dbg.assigns will occur if you write zero-sized memcpys (see
https://reviews.llvm.org/D146987#4240016).
Handle this in AssignmentTrackingAnalysis (back end) rather than
AssignmentTrackingPass (declare-to-assign) in case it is possible to reproduce
this as a result of optimisations.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D147435
This is the first change for FS-AFDO integration with CSSPGO. There are more patches coming.
With pseudo probes, we do not assign FS discriminators to any other instructions since we will be using only probes for profile correlation.
Also call instructions are excluded since their dwarf discriminators are used for other purposes, i.e, storing probe ids. Since they are not getting a FS discriminator, they will also be excluded from MIR profile loading. The corresponding changes will be in the subsequent patches.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D147286
This broke Objective-C autorelease / retainAutoreleasedReturnValue, see
comments on the code review.
> This is a mitigation patch for
> https://bugs.chromium.org/p/llvm/issues/detail?id=30, where existing stack
> protection is skipped if a function is returned through by an unwinder rather
> than the normal call/return path. The recent patch D139254 added the ability to
> instrument a visible unwind path, at least in the IR case (I'm working on the
> SelectionDAG instrumentation too) but there are still invisible unwinds it
> can't reach.
>
> So this patch adds logic to DwarfEHPrepare that goes through a function,
> converting any call that might throw into an invoke to a simple resume cleanup,
> and adding cleanup clauses to existing landingpads that lack them. Obviously we
> don't really want to do this if it's wasted effort, so I also exposed
> requiresStackProtector from the actual StackProtector code to skip the extra
> paths if they won't be used.
>
> Changes:
> * Move test to AArch64 directory as it relies on target presence.
> * Re-add Dominator-tree maintenance. Accidentally cherry-picked wrong patch.
> * Skip adding paths on Windows EH functions.
>
> https://reviews.llvm.org/D143637
This reverts commit 2d690684f66fabc9ac6a2c70fcff3b31c9520794.
Verify the SlotIndexes analysis after a pass that claims to preserve it,
even if there are no further passes (apart from the verifier itself)
that would use the analysis.
Differential Revision: https://reviews.llvm.org/D129201
Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source.
Differential Revision: https://reviews.llvm.org/D147264
It is expected that shuffles that we hoist through binops only have a single
vector operand, the other being undef/poison. The checks for
isDeInterleaveMaskOfFactor check that all the elements come from inside the
first vector, but with non-canonical shuffles the second operand could still
have a value. Add a quick check to make sure it is UndefValue as expected, to
make sure we don't run into problems with BinOpShuffles not using BinOps.
Fixes#61749
Differential Revision: https://reviews.llvm.org/D147306
This patch renames the `mroptr` option to `mxcoff-roptr` to indicate in the option itself that it is xcoff specific.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D147161