When there is no target-specific lowering of @llvm.cond.loop, it is
lowered into a simple loop by PreISelIntrinsicLowering. Mark the branch
weights into the no-return loop as unknown given we do not have value
metadata to fix the profcheck test for this feature.
Reviewers: mtrofin, alanzhao1, snehasish, pcc
Pull Request: https://github.com/llvm/llvm-project/pull/180390
The llvm.cond.loop intrinsic is semantically equivalent to a conditional
branch conditioned on ``pred`` to a basic block consisting only of an
unconditional branch to itself. Unlike such a branch, it is guaranteed
to use specific instructions. This allows an interrupt handler or
other introspection mechanism to straightforwardly detect whether
the program is currently spinning in the infinite loop and possibly
terminate the program if so. The intent is that this intrinsic may
be used as a more efficient alternative to a conditional branch to
a call to ``llvm.trap`` in circumstances where the loop detection
is guaranteed to be present. This construct has been experimentally
determined to be executed more efficiently (when the branch is not taken)
than a conditional branch to a trap instruction on AMD and older Intel
microarchitectures, and is also more code size efficient by avoiding the
need to emit a trap instruction and possibly a long branch instruction.
On i386 and x86_64, the infinite loop is guaranteed to consist of a short
conditional branch instruction that branches to itself. Specifically,
the first byte of the instruction will be between 0x70 and 0x7F, and
the second byte will be 0xFE.
Part of this RFC:
https://discourse.llvm.org/t/rfc-optimizing-conditional-traps/89456
Reviewers: arsenm, RKSimon, fmayer, vitalybuka
Pull Request: https://github.com/llvm/llvm-project/pull/177686
This patch changes the memset lowering to match the optimized memcpy lowering.
The memset lowering now queries TTI.getMemcpyLoopLoweringType for a preferred
memory access type. If that type is larger than a byte, the memset is lowered
into two loops: a main loop that stores a sufficiently wide vector splat of the
SetValue with the preferred memory access type and a residual loop that covers
the remaining bytes individually. If the memset size is statically known, the
residual loop is replaced by a sequence of stores.
This improves memset performance on gfx1030 (AMDGPU) in microbenchmarks by
around 7-20x.
I'm planning similar treatment for memset.pattern as a follow-up PR.
For SWDEV-543208.
This is mostly boilerplate to move various freestanding utility
functions into LegalizerHelper. LibcallLoweringInfo is currently
optional, mostly because threading it through assorted other
uses of LegalizerHelper is more difficult.
I had a lot of trouble getting this to work in the legacy pass
manager with setRequiresCodeGenSCCOrder, and am not happy with the
result. A sub-pass manager is introduced and this is invalidated,
so we're re-computing this unnecessarily.
The libcall lowering decisions should be program dependent,
depending on the current module's RuntimeLibcallInfo. We need
another related analysis derived from that plus the current
function's subtarget to provide concrete lowering decisions.
This takes on a somewhat unusual form. It's a Module analysis,
with a lookup keyed on the subtarget. This is a separate module
analysis from RuntimeLibraryAnalysis to avoid that depending on
codegen. It's not a function pass to avoid depending on any
particular function, to avoid repeated subtarget map lookups in
most of the use passes, and to avoid any recomputation in the
common case of one subtarget (and keeps it reusable across
repeated compilations).
This also switches ExpandFp and PreISelIntrinsicLowering as
a sample function and module pass. Note this is not yet wired
up to SelectionDAG, which is still using the LibcallLoweringInfo
constructed inside of TargetLowering.
As noted in #153256, TableGen is generating reserved names for
RuntimeLibcalls, which resulted in a build failure for Arm64EC since
`vcruntime.h` defines `__security_check_cookie` as a macro.
To avoid using reserved names, all impl names will now be prefixed with
`Impl_`.
`NumLibcallImpls` was lifted out as a `constexpr size_t` instead of
being an enum field.
While I was churning the dependent code, I also removed the TODO to move
the impl enum into its own namespace and use an `enum class`: I
experimented with using an `enum class` and adding a namespace, but we
decided it was too verbose so it was dropped.
…210)"
This reverts commit 9a14b1d254a43dc0d4445c3ffa3d393bca007ba3.
Revert "RuntimeLibcalls: Return StringRef for libcall names (#153209)"
This reverts commit cb1228fbd535b8f9fe78505a15292b0ba23b17de.
Revert "TableGen: Emit statically generated hash table for runtime
libcalls (#150192)"
This reverts commit 769a9058c8d04fc920994f6a5bbb03c8a4fbcd05.
Reverted three changes because of a CMake error while building llvm-nm
as reported in the following PR:
https://github.com/llvm/llvm-project/pull/150192#issuecomment-3192223073
Stop emitting these calls by name in PreISelIntrinsicLowering. This
is still kind of a hack. We should be going through the abstract
RTLIB:Libcall, and then checking if the call is really supported in
this module. Do this as a placeholder until RuntimeLibcalls is a
module analysis.
As Constants are already uniquified, we can use a map to keep track of
whether a GlobalVariable was produced for a given Constant or not.
Repeated globals with the same value was one of the codegen differences
noted in #126736. This patch removes that diff, producing cleaner
output.
Instead of reporting ___memmove as an implementation of memcpy,
make it unavailable and let the lowering logic consider memmove as
a fallback path.
This avoids a special case 1:N mapping for libcall implementations.
This adds basic support for objc_claimAutoreleasedReturnValue, which is
mostly equivalent to objc_retainAutoreleasedReturnValue, with the
difference that it doesn't require the marker nop to be emitted between
it and the call it was attached to.
To achieve that, this also teaches the AArch64 attachedcall bundle
lowering to pick whether the marker should be emitted or not based on
whether the attachedcall target is claimARV or retainARV.
Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
This patch cleans up the handling of the count parameter in general,
though was initially motivated by a compiler crash upon a memset.pattern
with a narrow count causing a compiler crash due to different types for
CreateMul when converting the count to the number of bytes.
The logic used to name globals means there is some minor renaming churn
in the output to
test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll
irrelevant to the newly added tests (that would crash before).
This is to enable a transition of LoopIdiomRecognize to selecting the
llvm.experimental.memset.pattern intrinsic as requested in #118632 (as
opposed to supporting selection of the libcall or the intrinsic). As
such, although it _is_ a TODO to add costing considerations on whether
to lower to the libcall (when available) or expand directly, lacking
such logic is helpful at this stage in order to minimise any unexpected
code gen changes in this transition.
Relands 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3 after regenerating the
test case.
Supersedes the draft PR #94992, taking a different approach following
feedback:
* Lower in PreISelIntrinsicLowering
* Don't require that the number of bytes to set is a compile-time
constant
* Define llvm.memset_pattern rather than llvm.memset_pattern.inline
As discussed in the [RFC
thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496),
the intent is that the intrinsic will be lowered to loops, a sequence of
stores, or libcalls depending on the expected cost and availability of
libcalls on the target. Right now, there's just a single lowering path
that aims to handle all cases. My intent would be to follow up with
additional PRs that add additional optimisations when possible (e.g.
when libcalls are available, when arguments are known to be constant
etc).
This reverts commit 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3.
Recent scheduling changes means tests need to be re-generated. Reverting
to green while I do that.
Supersedes the draft PR #94992, taking a different approach following
feedback:
* Lower in PreISelIntrinsicLowering
* Don't require that the number of bytes to set is a compile-time
constant
* Define llvm.memset_pattern rather than llvm.memset_pattern.inline
As discussed in the [RFC
thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496),
the intent is that the intrinsic will be lowered to loops, a sequence of
stores, or libcalls depending on the expected cost and availability of
libcalls on the target. Right now, there's just a single lowering path
that aims to handle all cases. My intent would be to follow up with
additional PRs that add additional optimisations when possible (e.g.
when libcalls are available, when arguments are known to be constant
etc).
This is used by PreISelIntrinsicLowering. With this change,
PreISelIntrinsicLowering does not have to assume that there were changes
just because we encountered a VP intrinsic.
lowerConstantIntrinsics does an RPO traveral, which doesn't reach dead
blocks. Remove the assertion that all intrinsics are lowered, because
some intrinsics might remain.
expandVectorPredication may change code, even if the intrinsic itself
remains in the code. Report changes whenever such an intrinsic is
encountered, because code could have been changed.
Another follow-up fix for #101652 to fix expensive-checks-only failure.
Currently, the LowerConstantIntrinsics pass does an RPO traversal of
every function... only to find that many functions don't have constant
intrinsics (is.constant, objectsize). In the CodeGen pipeline, there is
already a pre-isel intrinsic lowering pass, which iterates over
intrinsic declarations and lowers all users. Call
lowerConstantIntrinsics from this pass to avoid the extra iteration over
the entire IR and the RPO traversal.
This reverts commit ac4b6b662630cd4d3bf6929f2b39ea203c0054a1.
A test change was missing for
mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir in the initial commit.
Following on from the discussion in
https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496
and the equivalent change for llvm.memset.inline (#95397), this removes
the requirement that the length of llvm.memcpy.inline is constant.
PreISelInstrinsicLowering will expand llvm.memcpy.inline with
non-constant lengths, while the codegen path for constant lengths is
left unaltered.
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.
This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
We should query the subtarget of the calling function, not of the
intrinsic.
This probably makes no functional difference (as libcalls are
unlikely to vary across subtargets), but fixes minor compile-time
regressions from unnecessary subtarget instantiations.
Followup to D157567.
Differential Revision: https://reviews.llvm.org/D157848
This is a quick fix for an assert when the source and dest have
different address spaces. The pointer compare needs to have matching
types, but we can't generically introduce addrspacecast and we don't
know if the address spaces alias.
Expand large or unknown size memory intrinsics into loops in the
default lowering pipeline if the target doesn't have the corresponding
libfunc. Previously AMDGPU had a custom pass which existed to call the
expansion utilities.
With a default no-libcall option, we can remove the libfunc checks in
LoopIdiomRecognize for these, which never made any sense. This also
provides a path to lifting the immarg restriction on
llvm.memcpy.inline.
There seems to be a bug where TLI reports functions as available if
you use -march and not -mtriple.