This fixes support for merging profiles which broke as a consequence
of e50a38840dc3db5813f74b1cd2e10e6d984d0e67. The issue was missing
adjustment in merge logic to account for the binary IDs which are
now included in the raw profile just after header.
In addition, this change also:
* Includes the version in module signature that's used for merging
to avoid accidental attempts to merge incompatible profiles.
* Moves the binary IDs size field after version field in the header
as was suggested in the review.
Differential Revision: https://reviews.llvm.org/D107143
This fixes support for merging profiles which broke as a consequence
of e50a38840dc3db5813f74b1cd2e10e6d984d0e67. The issue was missing
adjustment in merge logic to account for the binary IDs which are
now included in the raw profile just after header.
In addition, this change also:
* Includes the version in module signature that's used for merging
to avoid accidental attempts to merge incompatible profiles.
* Moves the binary IDs size field after version field in the header
as was suggested in the review.
Differential Revision: https://reviews.llvm.org/D107143
Change `CountersPtr` in `__profd_` to a label difference, which is a link-time
constant. On ELF, when linking a shared object, this requires that `__profc_` is
either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that
`.quad a-b` (64-bit label difference) can lower to a 32-bit PC-relative relocation.
```
# ELF: R_X86_64_PC64 (PC-relative)
.quad .L__profc_foo-.L__profd_foo
# Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR
.quad l___profc_foo-l___profd_foo
# COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so
# the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo
# As compensation, we truncate CountersDelta in the header so that
# __llvm_profile_merge_from_buffer and llvm-profdata reader keep working.
.quad .L__profc_foo-.L__profd_foo
```
(Note: link.exe sorts `.lprfc` before `.lprfd` even if the object writer
has `.lprfd` before `.lprfc`, so we cannot work around by reordering
`.lprfc` and `.lprfd`.)
With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
`ld -pie` linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations.
```
% readelf -r pie | awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}'
R_X86_64_JUMP_SLO 331
R_X86_64_TPOFF64 2
R_X86_64_RELATIVE 476059 # was: 607712
R_X86_64_64 2616
R_X86_64_GLOB_DAT 31
```
The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.
Differential Revision: https://reviews.llvm.org/D104556
This distributes reductions based on the relative offset of loads, if
one is found from their operands. Given chains of reductions this will
then sort them in ascending load order, which in turn can help simple
prefetches latch on to increasing strides more easily.
Differential Revision: https://reviews.llvm.org/D106569
This introduces a builder function for emitting IR performing reductions in
OpenMP. Reduction variable privatization and initialization to the
reduction-neutral value is expected to be handled separately. The caller
provides the reduction functions. Further commits can provide implementation of
reduction functions for the reduction operators defined in the OpenMP
specification.
This implementation was tested on an MLIR fork targeting OpenMP from C and
produced correct executable code.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D104928
Pulled out the OptimizationLevel class from PassBuilder in order to be able to access it from within the PassManager and avoid include conflicts.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D107025
I moved the code that tries to combine away each unmerge def into a method in
ArtifactValueFinder class itself. This removes a logically messy lambda and
makes it easier to use the value-finder in more places in future.
Load/Store unit is used to enforce order of loads and stores if they
alias (controlled by --noalias=false option).
Fixes PR50483 - [MCA] In-order pipeline doesn't track memory
load/store dependencies.
Differential Revision: https://reviews.llvm.org/D103955
This takes two ranges and invokes a predicate on the element-wise pair in the
ranges. It returns true if all the pairs are matching the predicate and the ranges
have the same size.
It is useful with containers that aren't random iterator where we can't check the
sizes in O(1).
Differential Revision: https://reviews.llvm.org/D106605
The current implementation of function internalization creats a copy of each
function and replaces every use. This has the downside that the external
versions of the functions will call into the internalized versions of the
functions. This prevents them from being fully independent of eachother. This
patch replaces the current internalization scheme with a method that creates
all the copies of the functions intended to be internalized first and then
replaces the uses as long as their caller is not already internalized.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106931
Reapply commit d675b594f4f1e1f6a195fb9a4fd02cf3de92292d that was
reverted due to buildbot failures. A simple fix has been applied to
remove an assertion.
Differential Revision: https://reviews.llvm.org/D105207
This is a second attempt to fix the EXPENSIVE_CHECKS issue that was mentioned In D91661#2875179 by @jroelofs.
(The first attempt was in D105983)
D91661 more or less completely reverted D49126 and by doing so also removed the cleanup logic of the created declarations and calls.
This patch is a replacement for D91661 (which must itself be reverted first). It replaces the custom declaration creation with the
generic version and shows the test impact. It also tracks the number of NamedValues to detect if a new prototype was added instead
of looking at the available users of a prototype.
Reviewed By: jroelofs
Differential Revision: https://reviews.llvm.org/D106147
This reverts commit 77080a1eb6061df2dcfae8ac84b85ad4d1e02031.
This change introduced issues detected with EXPENSIVE_CHECKS. Reverting to restore the
needed function cleanup. A next patch will then just improve on the name mangling.
[[noreturn]] can be used since Oct 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.
Note: the definition of LLVM_ATTRIBUTE_NORETURN is kept for now.
Reapply commit 796b84d26f4d461fb50e7b4e84e15a10eaca88fc that was
reverted due to reports of crashes. A minor change now guards against
getVariableLocationOperand() returning a nullptr.
Differential Revision: https://reviews.llvm.org/D106659
On AIX, the linker needs to check whether a given lto_module_t contains
any constructor/destructor functions, in order to implement the behavior
of the -bcdtors:all flag. See
https://www.ibm.com/docs/en/aix/7.2?topic=l-ld-command for the flag's
documentation.
In llvm IR, constructor (destructor) functions are added to a special
global array @llvm.global_ctors (@llvm.global_dtors).
However, because these two symbols are artificial, they are not visited
during the symbol traversal (using the
lto_module_get_[num_symbols|symbol_name|symbol_attribute] API).
This patch adds a new function to the libLTO interface that checks the
presence of one or both of these two symbols.
Reviewed By: steven_wu
Differential Revision: https://reviews.llvm.org/D106887
The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block`
and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.
In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and
`NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions.
The code checks all the kernels, and if their attributes match, the functions are folded.
In the future we will explore specializing for multiple values of NumThreads and NumTeams.
Depends on D106390
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D106033
This patch adds a peephole optimization `SETCC(FREEZE(x),const)` => `FREEZE(SETCC(x,const))`
if the SETCC is only used by BRCOND.
Combined with `BRCOND(FREEZE(X)) => BRCOND(X)`, this leads to a nice improvement in the generated assembly when x is a masked loaded value.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D105344
This reapplies commit cbb709e25124dc38ee593882051fc88c987fe591 and
includes the use of the lookup method instead of operator[] to avoid
accidentally setting (empty) simplification callbacks.
This reverts commit aa27430a625b2fd059707a87f8ba2df8f480ff11.
AAValueSimplify, AAValueConstantRange, and AAPotentialValues all look at
the IR by default. If queried for a IR position which has a
simplification callback we should either look at the callback return, or
give up. We do the latter for now.
The current JumpThreading pass does not jump thread loops since it can
result in irreducible control flow that harms other optimizations. This
prevents switch statements inside a loop from being optimized to use
unconditional branches.
This code pattern occurs in the core_state_transition function of
Coremark. The state machine can be implemented manually with goto
statements resulting in a large runtime improvement, and this transform
makes the switch implementation match the goto version in performance.
This patch specifically targets switch statements inside a loop that
have the opportunity to be threaded. Once it identifies an opportunity,
it creates new paths that branch directly to the correct code block.
For example, the left CFG could be transformed to the right CFG:
```
sw.bb sw.bb
/ | \ / | \
case1 case2 case3 case1 case2 case3
\ | / / | \
latch.bb latch.2 latch.3 latch.1
br sw.bb / | \
sw.bb.2 sw.bb.3 sw.bb.1
br case2 br case3 br case1
```
Co-author: Justin Kreiner @jkreiner
Co-author: Ehsan Amiri @amehsan
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D99205
Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions
with normal codegen patterns.
Differential Revision: https://reviews.llvm.org/D106724
- This patch consists of the bare basic code needed in order to generate some assembly for the z/OS target.
- Only the .text and the .bss sections are added for now.
- The relevant MCSectionGOFF/Symbol interfaces have been added. This enables us to print out the GOFF machine code sections.
- This patch enables us to add simple lit tests wherever possible, and contribute to the testing coverage for the z/OS target
- Further improvements and additions will be made in future patches.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D106380
When hoisting/moving calls to locations, we strip unknown metadata. Such calls are usually marked `speculatable`, i.e. they are guaranteed to not cause undefined behaviour when run anywhere. So, we should strip attributes that can cause immediate undefined behaviour if those attributes are not valid in the context where the call is moved to.
This patch introduces such an API and uses it in relevant passes. See
updated tests.
Fix for PR50744.
Reviewed By: nikic, jdoerfert, lebedev.ri
Differential Revision: https://reviews.llvm.org/D104641
Avoid several crashes when DBG_INSTR_REF and DBG_PHI instructions are fed
to the instruction scheduler. DBG_INSTR_REFs should be treated like
DBG_LABELs, and just ignored for the purpose of scheduling [0].
DBG_PHIs however behave much more like DBG_VALUEs: they refer to register
operands, and if some register defs get shuffled around during instruction
scheduling, there's a risk that the debug instr will refer to the wrong
value. There's already a facility for updating DBG_VALUEs to reflect this;
add DBG_PHI to the list of instructions that it will update.
[0] Suboptimal, but it's what instr scheduling does right now.
Differential Revision: https://reviews.llvm.org/D106663
This reapplies commit 76f3ffb2b285998f02639db8fd42fb0de8a540d0 that was
reverted due to buildbot failures.
- Update lit tests with REQUIRES condition.
- Abandon salvage attempt if SCEVUnknown::getValue() returns nullptr.
Differential Revision: https://reviews.llvm.org/D105207
This patch extends salvaging of debuginfo in the Loop Strength Reduction
(LSR) pass by translating Scalar Evaluations (SCEV) into DIExpressions.
The method is as follows:
- Cache dbg.value intrinsics that are salvageable.
- Obtain a loop Induction Variable (IV) from ScalarExpressionExpander or
the loop header.
- Translate the IV SCEV into an expression that recovers the current
loop iteration count. Combine this with the dbg.value's location
op SCEV to create a DIExpression that salvages the value.
Review by: jmorse
Differential Revision: https://reviews.llvm.org/D105207
list for attributes that don't have the loclist class.
Summary: The overflow error occurs when we try to dump
location list for those attributes that do not have the
loclist class, like DW_AT_count and DW_AT_byte_size.
After re-reviewed the entire list, I sorted those
attributes into two parts, one for dumping location list
and one for dumping the location expression.
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D105613
Wrapper function call and dispatch handler helpers are moved to
ExecutionSession, and existing EPC-based tools are re-written to take an
ExecutionSession argument instead.
Requiring an ExecutorProcessControl instance simplifies existing EPC based
utilities (which only need to take an ES now), and should encourage more
utilities to use the EPC interface. It also simplifies process termination,
since the session can automatically call ExecutorProcessControl::disconnect
(previously this had to be done manually, and carefully ordered with the
rest of JIT tear-down to work correctly).
Eliminating loads/stores in the device code is worth the extra effort,
especially for the new device runtime.
At the same time we do not compute AAExecutionDomain for non-device code
anymore, there is no point.
Differential Revision: https://reviews.llvm.org/D106845
Users, especially the Attributor, might replace multiple operands at
once. The actual implementation of simplifyWithOpReplaced is able to
handle that just fine, the interface was simply not allowing to replace
more than one operand at a time. This is exposing a more generic
interface without intended changes for existing code.
Differential Revision: https://reviews.llvm.org/D106189