Rename the `RF_MoveDistinctMDs` flag passed into `MapValue` and
`MapMetadata` to `RF_ReuseAndMutateDistinctMDs` in order to more
precisely describe its effect and clarify the header documentation.
Found this while helping to investigate PR48841, which pointed out an
unsound use of the flag in `CloneModule()`. For now I've just added a
FIXME there, but I'm hopeful that the new (more precise) name will
prevent other similar errors.
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:
1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D95982
It seems nicer to list passes given a flag rather than displaying all
passes in opt --help.
This is awkwardly structured because a PassBuilder is required, but
reusing the PassBuilder in runPassPipeline() doesn't work because we
read the input IR before getting to runPassPipeline(). So printing the
list of passes needs to happen before reading the input IR. If we remove
the legacy PM code in main() and move everything from NewPMDriver.cpp
into opt.cpp, we can create the PassBuilder before reading IR and check
if we should print the list of passes and exit. But until then this hack
seems fine.
Compared to the legacy PM, the new PM passes are lacking descriptions.
We'll need to figure out a way to add descriptions if we think this is
important.
Also, this only works for passes specified in PassRegistry.def. If we
want to print other custom registered passes, we'll need a different
mechanism.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D96101
Originally landed in ddc2f1e3fb4 and reverted in d32deaab4d because of
a Generic test objecting. That was fixed up in 013613964fd9. Original
landing commit message follows:
[DWARF] Location-less inlined variables should not have DW_TAG_variable
Discussed in this thread:
https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html
DwarfDebug::collectEntityInfo accidentally distinguishes between variable
locations that never have a location specified, and variable locations that
have an empty location specified. The latter leads to the creation of an
empty variable referring to the abstract origin.
Fix this by seeking a non-empty location before producing a concrete
entity, to guarantee a DW_AT_location will be produced. Other loops in
collectEntityInfo and endFunctionImpl take care of examining the
retainedNodes collection and ensuring optimised-out variables are created.
Differential Revision: https://reviews.llvm.org/D95617
glibc deprecates `mallinfo` in the latest version of 2.33. This patch replaces the usage of `mallinfo` with the new `mallinfo2` when it's available.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D96359
This was taking the calling convention from the parent function,
instead of the callee. Avoids regressions in a future patch when the
caller and callee have different type breakdowns.
For some reason AArch64's lowerFormalArguments seems to intentionally
ignore the parent isVarArg.
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.
Differential Revision: https://reviews.llvm.org/D82703
The current support only printed coredump notes, but most binaries also
contain notes. This change adds names for four FreeBSD-specific notes and
pretty-prints three of them:
NT_FREEBSD_ABI_TAG:
This note holds a 32-bit (decimal) integer containing the value of the
__FreeBSD_version macro, which is defined in crt1.o and will hold a value
such as 1300076 for a binary build on a FreeBSD 13 system.
NT_FREEBSD_ARCH_TAG:
A string containing the value of the build-time MACHINE_ARCH
NT_FREEBSD_FEATURE_CTL: A 32-bit flag that indicates to the kernel that
the binary wants certain bevahiour. Examples include setting
NT_FREEBSD_FCTL_ASLR_DISABLE which tells the kernel to disable ASLR.
After this change llvm-readobj also no longer decodes coredump-only
FreeBSD notes in non-coredump files. I've also converted the
note-freebsd.s test to use yaml2obj instead of llvm-mc.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D74393
This reverts commit 4a64d8fe392449b205e59031aad5424968cf7446.
Makes clang crash when buildling trivial iOS programs, see comment
after https://reviews.llvm.org/D92808#2551401
As of commit 284f2bffc9bc5, the DAG Combiner gets rid of the masking of the
input to this node if the mask only keeps the bottom 16 bits. This is because
the underlying library function does not use the high order bits. However, on
PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits
from the register. Therefore, the library implementation of __gnu_h2f_ieee will
return an incorrect result if the bits aren't cleared.
This combine is desired for ARM (and possibly other targets) so this patch adds
a query to Target Lowering to check if this zeroing needs to be kept.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092
Differential revision: https://reviews.llvm.org/D96283
References to functions are in program memory and need a `pm()` fixup. This should fix trait objects for Rust on AVR.
Differential Revision: https://reviews.llvm.org/D87631
Patch by Alex Mikhalev.
This patch implements generation of remaining codegen options and tests it by performing parse-generate-parse round trip.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D96056
This reverts commit 502a67dd7f23901834e05071ab253889f671b5d9.
This expose a failure in test-suite build on PowerPC,
revert to unblock buildbot first,
Dave will re-commit in https://reviews.llvm.org/D96287.
Thanks Dave.
On AArch64 (which seems to be the only target that supports it), this
attribute allows codegen to avoid saving/restoring the value in x0
across a call.
Gives a 0.1% geomean -Os code size improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D96099
Summary:
Introduce base classes that hold a textual represent of the IR
based on basic blocks and a base class for comparing this
representation. A new change printer is introduced that uses these
classes to save and compare representations of the IR before and after
each pass. It only reports when changes are made by a pass (similar to
-print-changed) except that the changes are shown in a patch-like format
with those lines that are removed shown in red prefixed with '-' and those
added shown in green with '+'. This functionality was introduced in my
tutorial at the 2020 virtual developer's meeting.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D91890
Different targets might handle branch performance differently, so this patch allows for
targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch
can be and still be duplicated to generate straight-line code instead.
This patch also specifies said override values for the AArch64 subtarget.
Differential Revision: https://reviews.llvm.org/D95631
When running the tests on PowerPC and x86, the lit test GlobalISel/trunc.ll fails at the memory sanitize step. This seems to be due to wrong invalid logic (which matches even if it shouldn't) and likely missing variable initialisation."
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D95878
These headers can be in a Clang module like the rest. This also fixes the
modules build that is currently struggling with these headers being textually
included in several other modules.
PR49043 exposed a problem when it comes to RAUW llvm.assumes. While
D96106 would fix it for GVNSink, it seems a more general concern. To
avoid future problems this patch moves away from the vector of weak
reference model used in the assumption cache. Instead, we track the
llvm.assume calls with a callback handle which will remove itself from
the cache if the call is deleted.
Fixes PR49043.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D96168
This fixes PR49043 by invalidating the handle on RAUW. This will work
fine assuming all existing RAUW users add the new assumption to the
cache. That means, if a new llvm.assume call replaces an old one, you
need to add the new one now as a RAUW is not enough anymore.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D96208
GNU ld>=2.36 supports mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER sections in an
output section, so we can set SHF_LINK_ORDER if -fbinutils-version=2.36 or above.
If -fno-function-sections or older binutils, drop unique ID for -fno-unique-section-names.
The users can just specify -fbinutils-version=2.36 or above to allow GC with both GNU ld and LLD.
(LLD does not support garbage collection of non-group non-SHF_LINK_ORDER .gcc_except_table sections.)
Context-sensitive profile effectively split a function profile into many copies each representing the CFG profile of a particular calling context. That makes the count distribution looks more flat as we now have more function profiles each with lower counts, which in turn leads to lower hot thresholds. Now we tells threshold computation to merge context profile first before calculating percentile based cutoffs to compensate for seemingly flat context profile. This can be controlled by swtich `sample-profile-contextless-threshold`.
Earlier measurement showed ~0.4% perf boost with this tuning on spec2k6 for CSSPGO (with pseudo-probe and new inliner).
Differential Revision: https://reviews.llvm.org/D95980
Currently, the SmallPtrSet type allows inserting elements but it does
not support inserting elements with a positional hint. The lack of this
signature means that you cannot use SmallPtrSet with
std::insert_iterator or std::inserter(), which makes some code
constructs more awkward. This adds an overload of insert() that can be
used in these scenarios.
The positional hint is unused by SmallPtrSet and the call is equivalent
to calling insert() without a hint.
This patch adds a pass to replace calls to vector intrinsics
(i.e., LLVM intrinsics operating on vector operands) with
calls to a vector library.
Currently, calls to LLVM intrinsics are only replaced with
calls to vector libraries when scalar calls to intrinsics are
vectorized by the Loop- or SLP-Vectorizer.
With this pass, it is now possible to replace calls to LLVM
intrinsics already operating on vector operands, e.g., if
such code was generated by MLIR. For the replacement,
information from the TargetLibraryInfo, e.g., as specified
via -vector-library is used.
Differential Revision: https://reviews.llvm.org/D95373
__builtin_isnan currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.
Reviewed By: kpn
Differential Revision: https://reviews.llvm.org/D95948
emitting retainRV or claimRV calls in the IR
This reapplies 3fe3946d9a958b7af6130241996d9cfcecf559d4 without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.
Original commit message:
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
This reverts commit 3fe3946d9a958b7af6130241996d9cfcecf559d4.
The commit violates layering by including a header from Analysis in
lib/IR/AutoUpgrade.cpp.
emitting retainRV or claimRV calls in the IR
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.
Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.
Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf36af5c88c working. ARM removed the fix in
dfac521da1b90db683, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.
Differential Revision: https://reviews.llvm.org/D95291
The vrgather.vv instruction uses a vector of indices with the same
SEW as operand 0. The vrgather.vx instructions use a scalar index
operand of XLen bits.
By splitting this into 2 intrinsics we are able to use LLVMatchType
in the definition to avoid specifying the type for the index operand
when creating the IR for the intrinsic. For .vv it will match the
operand 0 type. And for .vx it will match the type of the vl operand
we already needed to specify a type for.
I'm considering splitting more intrinsics. This was a somewhat
odd one because the .vx doesn't use the element type, it always
use XLen.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D95979