This reapplies 36c64af9d7f97414d48681b74352c9684077259b in updated
form.
Emit the xdata for each function at .seh_endproc. This keeps the
exact same output header order for most code generated by the LLVM
CodeGen layer. (Sections still change order for code built from
assembly where functions lack an explicit .seh_handlerdata
directive, and functions with chained unwind info.)
The practical effect should be that assembly output lacks
superfluous ".seh_handlerdata; .text" pairs at the end of functions
that don't handle exceptions, which allows such functions to use
the AArch64 packed unwind format again.
Differential Revision: https://reviews.llvm.org/D87448
This patch moves the selection of the style used to emit the numbers
(DW_OP_implicit_value vs. DW_OP_const+DW_OP_stack_value) into
DwarfExpression::addUnsignedConstant. This logic is not FP-specific, and
it will be needed for large integers too.
The refactor also makes DW_OP_implicit_value (DW_OP_stack_value worked
already) be used for floating point constants other than float and
double, so I've added a _Float16 test for it.
Split off from D90916.
Differential Revision: https://reviews.llvm.org/D91058
This is part of the discussion on D91876 about trying to reduce custom lowering of MIN/MAX ops on older SSE targets - if we can improve generic vector expansion we should be able to relax the limitations in SelectionDAGBuilder when it will let MIN/MAX ops be generated, and avoid having to flag so many ops as 'custom'.
The function was introduced on Jun 12, 2016 in commit
071d0f180794f7819c44026815614ce8fa00a3bd. Its definition was removed
on Mar 2, 2017 in commit 1393761e0ca3fe8271245762f78daf4d5208cd77.
ExpandStrictFPOp started taking two parameters instead of one on Jan
10, 2020 in commit f678fc7660b36ce0ad6ce4f05eaa28f3e9fdedb5, but the
declaration for the single-perameter version has remained since.
All these potential null pointer dereferences are reported by my static analyzer for null smart pointer dereferences, which has a different implementation from `alpha.cplusplus.SmartPtr`.
The checked pointers in this patch are initialized by Target::createXXX functions. When the creator function pointer is not correctly set, a null pointer will be returned, or the creator function may originally return a null pointer.
Some of them may not make sense as they may be checked before entering the function, but I fixed them all in this patch. I submit this fix because 1) similar checks are found in some other places in the LLVM codebase for the same return value of the function; and, 2) some of the pointers are dereferenced before they are checked, which may definitely trigger a null pointer dereference if the return value is nullptr.
Reviewed By: tejohnson, MaskRay, jpienaar
Differential Revision: https://reviews.llvm.org/D91410
This change introduces a MIR target-independent pseudo instruction corresponding to the IR intrinsic llvm.pseudoprobe for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.
An `llvm.pseudoprobe` intrinsic call will be lowered into a target-independent operation named `PSEUDO_PROBE`. Given the following instrumented IR,
```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
%cmp = icmp eq i32 %x, 0
call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
br i1 %cmp, label %bb1, label %bb2
bb1:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
br label %bb3
bb2:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
br label %bb3
bb3:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
ret void
}
```
the corresponding MIR is shown below. Note that block `bb3` is duplicated into `bb1` and `bb2` where its probe is duplicated too. This allows for an accurate execution count to be collected for `bb3`, which is basically the sum of the counts of `bb1` and `bb2`.
```
bb.0.bb0:
frame-setup PUSH64r undef $rax, implicit-def $rsp, implicit $rsp
TEST32rr killed renamable $edi, renamable $edi, implicit-def $eflags
PSEUDO_PROBE 837061429793323041, 1, 0
$edi = MOV32ri 1, debug-location !13; test.c:0
JCC_1 %bb.1, 4, implicit $eflags
bb.2.bb2:
PSEUDO_PROBE 837061429793323041, 3, 0
PSEUDO_PROBE 837061429793323041, 4, 0
$rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
RETQ
bb.1.bb1:
PSEUDO_PROBE 837061429793323041, 2, 0
PSEUDO_PROBE 837061429793323041, 4, 0
$rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
RETQ
```
The target op PSEUDO_PROBE will be converted into a piece of binary data by the object emitter with no machine instructions generated. This is done in a different patch.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D86495
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.
A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:
1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.
We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.
Let's now look at an example. Given the following LLVM IR:
```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
%cmp = icmp eq i32 %x, 0
br i1 %cmp, label %bb1, label %bb2
bb1:
br label %bb3
bb2:
br label %bb3
bb3:
ret void
}
```
The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.
```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
%cmp = icmp eq i32 %x, 0
call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
br i1 %cmp, label %bb1, label %bb2
bb1:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
br label %bb3
bb2:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
br label %bb3
bb3:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
ret void
}
```
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D86490
The default version only works if the returned node has a single
result. The X86 and PowerPC versions support multiple results
and allow a single result to be returned from a node with
multiple outputs. And allow a single result that is not result 0
of the node.
Also replace the Mips version since the new version should work
for it. The original version handled multiple results, but only
if the new node and original node had the same number of results.
Differential Revision: https://reviews.llvm.org/D91846
This patch implements out of line atomics for LSE deployment
mechanism. Details how it works can be found in llvm/docs/Atomics.rst
Options -moutline-atomics and -mno-outline-atomics to enable and disable it
were added to clang driver. This is clang and llvm part of out-of-line atomics
interface, library part is already supported by libgcc. Compiler-rt
support is provided in separate patch.
Differential Revision: https://reviews.llvm.org/D91157
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
The `dso_local_equivalent` constant is a wrapper for functions that represents a
value which is functionally equivalent to the global passed to this. That is, if
this accepts a function, calling this constant should have the same effects as
calling the function directly. This could be a direct reference to the function,
the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the
resolved function as a call target.
When lowered, the returned address must have a constant offset at link time from
some other symbol defined within the same binary. The address of this value is
also insignificant. The name is leveraged from `dso_local` where use of a function
or variable is resolved to a symbol in the same linkage unit.
In this patch:
- Addition of `dso_local_equivalent` and handling it
- Update Constant::needsRelocation() to strip constant inbound GEPs and take
advantage of `dso_local_equivalent` for relative references
This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959)
which makes vtables readonly. This works by replacing the dynamic relocations for
function pointers in them with static relocations that represent the offset between
the vtable and virtual functions. If a function is externally defined,
`dso_local_equivalent` can be used as a generic wrapper for the function to still
allow for this static offset calculation to be done.
See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details.
Differential Revision: https://reviews.llvm.org/D77248
In some cases, the values passed to `asm sideeffect` calls cannot be
mapped directly to simple MVTs. Currently, we crash in the backend if
that happens. An example can be found in the @test_vector_too_large_r_m
test case, where we pass <9 x float> vectors. In practice, this can
happen in cases like the simple C example below.
using vec = float __attribute__((ext_vector_type(9)));
void f1 (vec m) {
asm volatile("" : "+r,m"(m) : : "memory");
}
One case that use "+r,m" constraints for arbitrary data types in
practice is google-benchmark's DoNotOptimize.
This patch updates visitInlineAsm so that it use MVT::Other for
constraints with complex VTs. It looks like the rest of the backend
correctly deals with that and properly legalizes the type.
And we still report an error if there are no registers to satisfy the
constraint.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D91710
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D87544
This patch uses the new `getMnemonic` helper from D90039
to display mnemonics instead of the internal opcodes.
The main motivation behind using the mnemonics is that they
are more user-friendly and more directly related to the assembly
the users will be presented.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D90040
This patch is added to remove the unreachable MBBs reference in the jump table.
Differential Revisien: https://reviews.llvm.org/D90498
Reviewed by: amyk, bsaleil
For example, during RAUW in IRMover, the `Function` ValueAsMetadata in "CG Profile" could become bitcast.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D88433
It's fairly common to need matchers for a specific constant value, or for
common idioms like finding a negated register.
Add
- `m_SpecificICst`, which returns true when matching a specific value..
- `m_ZeroInt`, which returns true when an integer 0 is matched.
- `m_Neg`, which returns when a register is negated.
Also update a few places which use idioms related to the new matchers.
Differential Revision: https://reviews.llvm.org/D91397
The test fails on Mac, see comment on the code review.
> This option was in a rather convoluted place, causing global parameters
> to be set in awkward and undesirable ways to try to account for it
> indirectly. Add tests for the -disable-debug-info option and ensure we
> don't print unintended markers from unintended places.
>
> Reviewed By: dstenb
>
> Differential Revision: https://reviews.llvm.org/D91083
This reverts commit 9606ef03f03904cec213db031b5ea6fd6052dc5d.
If the scatter store is able to perform the sign/zero extend of
its index, this is folded into the instruction with refineIndexType().
Additionally, refineUniformBase() will return the base pointer and index
from an add + splat_vector.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D90942
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
This option was in a rather convoluted place, causing global parameters
to be set in awkward and undesirable ways to try to account for it
indirectly. Add tests for the -disable-debug-info option and ensure we
don't print unintended markers from unintended places.
Reviewed By: dstenb
Differential Revision: https://reviews.llvm.org/D91083
When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack.
I've fixed this by ensuring that:
1. When we don't have enough registers to allocate the whole block
we mark any remaining SVE registers temporarily as allocated.
2. We temporarily remove the InConsecutiveRegs flags from the last
tuple part argument and reinvoke the autogenerated calling
convention handler. Doing this prevents the code from entering
an infinite recursion and, in combination with 1), ensures we
switch over to the Indirect mode.
3. After allocating a GPR register for the pointer to the tuple we
then deallocate any SVE registers we marked as allocated in 1).
We also set the InConsecutiveRegs flags back how they were before.
4. I've changed the AArch64ISelLowering LowerCALL and
LowerFormalArguments functions to detect the start of a tuple,
which involves allocating a single stack object and doing the
correct numbers of legal loads and stores.
Differential Revision: https://reviews.llvm.org/D90219
This broke both Firefox and Chromium (PR47905) due to what seems like dllimport
function not being handled correctly.
> This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
> Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
>
> Reviewed By: rnk
>
> Differential Revision: https://reviews.llvm.org/D87544
This reverts commit cfd8481da1adba1952e0f6ecd00440986e49a946.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only)
Changes included in this patch:
- Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use.
Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts.
- Added the getCanonicalIndexType function to convert redundant addressing
modes (e.g. scaling is redundant when accessing bytes)
- Tests with 32 & 64-bit scaled & unscaled offsets
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D90941
This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter().
Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print
the details of masked scatters (is truncating, signed or scaled).
This is the first in a series of patches which adds support for scalable masked scatters
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D90939
SafeStack instrumentation should not insert anything inbetween musttail call and return instruction.
For every ReturnInst that needs to be instrumented, we adjust the insertion point to the musttail call if exists.
Differential Revision: https://reviews.llvm.org/D90702
This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR
This will hopefully mean that low overhead loops are more tied together,
and we can more reliably generate loops without reverting or being at
the whims of the register allocator.
This is a fairly simple change in itself, but leads to a number of other
required alterations.
- The hardware loop pass, if UsePhi is set, now generates loops of the
form:
%start = llvm.start.loop.iterations(%N)
loop:
%p = phi [%start], [%dec]
%dec = llvm.loop.decrement.reg(%p, 1)
%c = icmp ne %dec, 0
br %c, loop, exit
- For this a new llvm.start.loop.iterations intrinsic was added, identical
to llvm.set.loop.iterations but produces a value as seen above, gluing
the loop together more through def-use chains.
- This new instrinsic conceptually produces the same output as input,
which is taught to SCEV so that the checks in MVETailPredication are not
affected.
- Some minor changes are needed to the ARMLowOverheadLoop pass, but it has
been left mostly as before. We should now more reliably be able to tell
that the t2DoLoopStart is correct without having to prove it, but
t2WhileLoopStart and tail-predicated loops will remain the same.
- And all the tests have been updated. There are a lot of them!
This patch on it's own might cause more trouble that it helps, with more
tail-predicated loops being reverted, but some additional patches can
hopefully improve upon that to get to something that is better overall.
Differential Revision: https://reviews.llvm.org/D89881
We can use KnownBitsAnalysis to cover cases when mask is not trivial. It can
also help with cases when mask is not constant but can still be folded into
one. Since 'and' is comutative we should treat both operands as possible
replacements.
Differential Revision: https://reviews.llvm.org/D90674
This sequence of instructions can be simplified if they are single use and
some operands are constants. Additional combines may be applied afterwards.
Differential Revision: https://reviews.llvm.org/D90223
Sequence of same shift instructions with constant operands can be combined into
a single shift instruction.
Differential Revision: https://reviews.llvm.org/D90217