It may be profitable to revert SCCP propagation of C++ static values,
if such constants are pointers, in order to avoid redundant pointer
computation, since the method returning the constant is non-removable.
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
Currently targets without LZCNT/TZCNT won't speculate with BSR/BSF instructions in case they have a zero value input, meaning we always insert a test+branch for the zero-input case.
This patch proposes we allow speculation if the target has CMOV, and perform a branchless select instead to handle the zero input case. This will predominately help x86-64 targets where we haven't set any particular cpu target. We already always perform BSR/BSF instructions if we were lowering a CTLZ/CTTZ_ZERO_UNDEF instruction.
The added test triggers the following assert in `areExtractShuffleVectors`
that is called from `shouldSinkOperands`:
Assertion `(!isScalable() || isZero()) && "Request for a fixed element count on a scalable object"' failed.
I don't think scalable types can be extract shuffles, so bail early if
this is the case.
Was missing remainder on `Start` value.
Also changed logic as as nikic suggested (getting loop from `PN`
instead of `Rem`). The prior impl increased the complexity of the code
and made debugging it more difficult.
Closes#104877
```
for(i = Start; i < End; ++i)
Rem = (i nuw+ IncrLoopInvariant) u% RemAmtLoopInvariant;
```
->
```
Rem = (Start nuw+ IncrLoopInvariant) % RemAmtLoopInvariant;
for(i = Start; i < End; ++i, ++rem)
Rem = rem == RemAmtLoopInvariant ? 0 : Rem;
```
In its current state, only if `IncrLoopInvariant` and `Start` both
being zero.
Alive2 seemed unable to prove this (see:
https://alive2.llvm.org/ce/z/ATGDp3 which is clearly wrong but still
checks out...) so wrote an exhaustive test here:
https://godbolt.org/z/WYa561388Closes#96625
DenseMap iteration order is not guaranteed to be deterministic.
Without the change,
llvm/test/Transforms/CodeGenPrepare/X86/statepoint-relocate.ll would
fail when `combineHashValue` changes (#95970).
Fixes: dba7329ebb0dbe1fabb3faaedfd31da3b8bd611d
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
The data-layout independent constant folding currently has some rather
gnarly code for canonicalizing GEP indices to reduce "notional
overindexing", and then infers inbounds based on that canonicalization.
Now that we canonicalize to i8 GEPs, this canonicalization is
essentially useless, as we'll discard it as soon as the GEP hits the
data-layout aware constant folder anyway. As such, I'd like to remove
this code entirely.
This shouldn't have any impact on optimization capabilities.
Because most of tests assume target-abi=`lp64d`, adding the
corresponding feature is reasonable.
rg -l loongarch -g '!*.s' | xargs sed -i '/mtriple=loongarch/ {/-mattr=/!{/target-abi/! s/mtriple=loongarch.. /&-mattr=+d /}}'
As demonstrated in the test change, when deciding to sink a trunc we
were losing its flags. This patch moves to cloning the original
instruction instead.
SinkCast creates a new cast with the same type and inputs, which drops
the nsw/nuw flags.
Reviewed as part of <https://github.com/llvm/llvm-project/pull/89904>
but split out so I can land the test separately.
Depending on the TLSMode many thread-local accesses on x86 can be
expressed by adding a %fs: segment register to an addressing mode. Even
if there are mutliple users of a `llvm.threadlocal.address` intrinsic it
is generally not worth sharing the value in a register but instead fold
the %fs access into multiple addressing modes.
Hence this changes CodeGenPrepare to duplicate the
`llvm.threadlocal.address` intrinsic as necessary.
Introduces a new `TargetLowering::addressingModeSupportsTLS` callback
that allows targets to indicate whether TLS accesses can be part of an
addressing mode.
This is fixing a performance problem, as this folding of TLS-accesses
into multiple addressing modes happened naturally before the
introduction of the `llvm.threadlocal.address` intrinsic, but regressed
due to `SelectionDAG` keeping things in registers when accessed across
basic blocks, so CodeGenPrepare needs to duplicate to mitigate this. We
see a ~0.5% recovery in a codebase with heavy TLS usage (HHVM).
This fixes most of #87437
If not performing gep splits can prevent important optimizations, such
as preventing the element indices / member offsets from being
(partially) folded into load/store instruction immediates.
Reverted due to failures on buildbots, where a new cl flag was placed
in the wrong file, resulting in link errors.
https://lab.llvm.org/buildbot/#/builders/198/builds/8548
This reverts commit 0b398256b3f72204ad1f7c625efe4990204e898a.
This patch adds support for printing the proposed non-instruction debug
info ("RemoveDIs") out to textual IR. This patch does not add any
bitcode support, parsing support, or documentation.
Printing of the new format is controlled by a flag added in this patch,
`--write-experimental-debuginfo`, which defaults to false. The new
format will be printed *iff* this flag is true, so whether we use the IR
format is completely independent of whether we use non-instruction debug
info during LLVM passes (which is controlled by the
`--try-experimental-debuginfo-iterators` flag).
Even with the flag disabled, some existing tests need to be updated, as this
patch causes debug intrinsic declarations to be changed in a round trip,
such that they always appear at the end of a module and have no attributes
(this has no functional change on the module).
The design of this new IR format was proposed previously on
Discourse, and any further discussion about the design can still be
contributed there:
https://discourse.llvm.org/t/rfc-debuginfo-proposed-changes-to-the-textual-ir-representation-for-debug-values/73491
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #75380
Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
Revert e0c554ad87d18dcbfcb9b6485d0da800ae1338d1 "Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)"
Revert #75380 and #77054 as they were breaking EXPENSIVE_CHECKS buildbots: https://lab.llvm.org/buildbot/#/builders/104
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #64560
Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
It turns out that CodeGenPrepare will skip over consecutive select
instructions as it knows it can optimise them all at the same time. This
is unfortunate for the RemoveDIs project to remove intrinsic-based
debug-info, because that means debug-info attached to those skipped
instructions doesn't get seen by optimizeInst and so updated. Add code
to handle debug-info on those skipped instructions manually.
This code will also have been slower when it had dbg.values stuffed in
between instructions, but with RemoveDIs it'll go faster because the
dbg.values won't break up the select sequence.
When all the large const offsets masked with the same value from bit-12 to bit-23.
Fold
add x8, x0, #2031, lsl #12
add x8, x8, #960
ldr x9, [x8, x8]
ldr x8, [x8, #2056]
into
add x8, x0, #2031, lsl #12
ldr x9, [x8, #960]
ldr x8, [x8, #3016]
CodeGenPrepare needs to support the maintenence of DPValues, the
non-instruction replacement for dbg.value intrinsics. This means there are
a few functions we need to duplicate or replicate the functionality of:
* fixupDbgValue for setting users of sunk addr GEPs,
* The remains of placeDbgValues needs a DPValue implementation for sinking
* Rollback of RAUWs needs to update DPValues
* Rollback of instruction removal needs supporting (see github #73350)
* A few places where we have to use iterators rather than instructions.
There are three places where we have to use the setHeadBit call on
iterators to indicate which portion of debug-info records we're about to
splice around. This is because CodeGenPrepare, unlike other optimisation
passes, is very much concerned with which block an operation occurs in and
where in the block instructions are because it's preparing things to be in
a format that's good for SelectionDAG.
There isn't a large amount of test coverage for debuginfo behaviours in
this pass, hence I've added some more.
Fix the issue by not reusing the zext at all. The code already handles
creation of new zexts if more than one is needed. Always use that
code-path instead of trying to reuse the old zext in some case.
(Alternatively we could also drop poison-generating flags on the old
zext, but it seems cleaner to not reuse it at all, especially if it's
not always possible anyway.)
Fixes https://github.com/llvm/llvm-project/issues/72046.
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.
One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.
Differential Revision: https://reviews.llvm.org/D141060
SVE supports scalar+vector and scalar+extw(vector) addressing modes.
However, the masked gather/scatter intrinsics take a vector of
addresses, which means address computations can be hoisted out of
loops. The is especially true for things like offsets where the
true size of offsets is lost by the time you get to code generation.
This is problematic because it forces the code generator to legalise
towards `<vscale x 2 x ty>` vectors that will not maximise bandwidth
if the main block datatypes is in fact i32 or smaller.
This patch sinks GEPs and extends for cases where one of the above
addressing modes can be used.
NOTE: There are cases where it would be better to split the extend
in two with one half hoisted out of a loop and the other within the
loop. Whilst true I think this switch of default is still better
than before because the extra extends are an improvement over being
forced to split a gather/scatter.
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.
This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
Before elimination of mostly empty block it makes sense to remove dead PHI nodes.
It open more opportunity for elimination plus eliminates dead code itself.
It appeared that change results in failing many unit tests and some of
them I've updated and for another one I disable this optimization.
The pattern I observed in the tests is that there is a infinite loop
without side effects. As a result after elimination of dead phi node all other
related instruction are also removed and tests stops to check what it is expected.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D158503
Refresh of the generic scheduling model to use A510 instead of A55.
Main benefits are to the little core, and introducing SVE scheduling information.
Changes tested on various OoO cores, no performance degradation is seen.
Differential Revision: https://reviews.llvm.org/D156799
No longer conservatively assume a load/store accesses the stack when we
can prove that we did not compute any stack-relative address up to this
point in the program.
We do this in a cheap not-quite-a-dataflow-analysis: Assume
`NoStackAddressUsed` when all predecessors of a block already guarantee
it. Process blocks in reverse post order to guarantee that except for
loop headers we have processed all predecessors of a block before
processing the block itself. For loops we accept the conservative answer
as they are unlikely to be shrink-wrappable anyway.
Differential Revision: https://reviews.llvm.org/D152213