Partial progress towards replacing in-tree uses of
`Type::getPointerTo()`.
If `getPointerTo()` is used solely to support an unnecessary bitcast,
remove the bitcast.
Reviewed By: barannikov88, nikic
Differential Revision: https://reviews.llvm.org/D153307
`__xray_customevent` and `__xray_typedevent` are built-in functions in Clang.
With -fxray-instrument, they are lowered to intrinsics llvm.xray.customevent and
llvm.xray.typedevent, respectively. These intrinsics are then lowered to
TargetOpcode::{PATCHABLE_EVENT_CALL,PATCHABLE_TYPED_EVENT_CALL}. The target is
responsible for generating a code sequence that calls either
`__xray_CustomEvent` (with 2 arguments) or `__xray_TypedEvent` (with 3
arguments).
Before patching, the code sequence is prefixed by a branch instruction that
skips the rest of the code sequence. After patching
(compiler-rt/lib/xray/xray_AArch64.cpp), the branch instruction becomes a NOP
and the function call will take effects.
This patch implements the lowering process for
{PATCHABLE_EVENT_CALL,PATCHABLE_TYPED_EVENT_CALL} and implements the runtime.
```
// Lowering of PATCHABLE_EVENT_CALL
.Lxray_sled_N:
b #24
stp x0, x1, [sp, #-16]!
x0 = reg of op0
x1 = reg of op1
bl __xray_CustomEvent
ldrp x0, x1, [sp], #16
```
As a result, two updated tests in compiler-rt/test/xray/TestCases/Posix/ now
pass on AArch64.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D153320
When eliding an argument copy, we need to update the chain to ensure
the argument reads are performed before later writes. However, the
code doing this only handled this for the first part of the argument.
If the argument had multiple parts, the chains of the later parts were
dropped. Make sure we preserve all chains.
Fixes https://github.com/llvm/llvm-project/issues/63430.
This patch introduces the reduction intrinsic for floating point minimum
and maximum which has the same semantics (for NaN and signed zero) as
llvm.minimum and llvm.maximum.
Reviewed-By: nikic
Differential Revision: https://reviews.llvm.org/D152370
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.
Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'.
They are used to read floating-point environment, set it or reset to
some default state. They do the same actions as C library functions
'fegetenv' and 'fesetenv'. By default these intrinsics are lowered to calls
to these functions.
The new intrinsics specify FP environment as a value of integer type, it
is convenient of most targets where the FP state is a content of some
register. Some targets however use long representations. On X86 the size
of FP environment is 256 bits, and even half of this size is not a legal
ibteger type. To facilitate legalization in such cases, two sets of DAG
nodes is used. Nodes GET_FPENV and SET_FPENV are used when FP
environment may be represented by a legal integer type. Nodes
GET_FPENV_MEM and SET_FPENV_MEM consider FP environment as a region in
memory, much like `fesetenv` and `fegetenv` do. They are used when
target has long representation for floationg-point state.
Differential Revision: https://reviews.llvm.org/D71742
Sometimes an developer would like to have more control over cmov vs branch. We have unpredictable metadata in LLVM IR, but currently it is ignored by X86 backend. Propagate this metadata and avoid cmov->branch conversion in X86CmovConversion for cmov with this metadata.
Example:
```
int MaxIndex(int n, int *a) {
int t = 0;
for (int i = 1; i < n; i++) {
// cmov is converted to branch by X86CmovConversion
if (a[i] > a[t]) t = i;
}
return t;
}
int MaxIndex2(int n, int *a) {
int t = 0;
for (int i = 1; i < n; i++) {
// cmov is preserved
if (__builtin_unpredictable(a[i] > a[t])) t = i;
}
return t;
}
```
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D118118
The generic implementation is umin(TC, VF * vscale).
Lowering to vsetvli for RISC-V will come in a future patch.
This patch is a pre-requisite to be able to CodeGen vectorized code from
D99750.
Reviewed By: reames, frasercrmck
Differential Revision: https://reviews.llvm.org/D149916
Summary:
DbgValue intrinsics whose expression is an entry_value and whose address is
described an llvm::Argument must be lowered to the corresponding livein physical
register for that Argument.
Depends on D151329
Reviewers: aprantl
Subscribers:
This getZExtOrTrunc seems to have been added when getPtrExtOrTrunc
was introduced. getPtrExtOrTrunc is currently equivalent to getZExtOrTrunc,
but could be changed for some target in the future.
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D149680
This patch-set aims to simplify the existing RVV segment load/store
intrinsics to use a type that represents a tuple of vectors instead.
To achieve this, first we need to relax the current limitation for an
aggregate type to be a target of load/store/alloca when the aggregate
type contains homogeneous scalable vector types. Then to adjust the
prolog of an LLVM function during lowering to clang. Finally we
re-define the RVV segment load/store intrinsics to use the tuple types.
The pull request under the RVV intrinsic specification is
riscv-non-isa/rvv-intrinsic-doc#198
---
This is the 1st patch of the patch-set. This patch is originated from
D98169.
This patch allows aggregate type (StructType) that contains homogeneous
scalable vector types to be a target of load/store/alloca. The RFC of
this patch was posted in LLVM Discourse.
https://discourse.llvm.org/t/rfc-ir-permit-load-store-alloca-for-struct-of-the-same-scalable-vector-type/69527
The main changes in this patch are:
Extend `StructLayout::StructSize` from `uint64_t` to `TypeSize` to
accommodate an expression of scalable size.
Allow `StructType:isSized` to also return true for homogeneous
scalable vector types.
Let `Type::isScalableTy` return true when `Type` is `StructType`
and contains scalable vectors
Extra description is added in the LLVM Language Reference Manual on the
relaxation of this patch.
Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Co-Authored-by: eop Chen <eop.chen@sifive.com>
Reviewed By: craig.topper, nikic
Differential Revision: https://reviews.llvm.org/D146872
MachineFunction keeps a table of variables whose addresses never change
throughout the function. Today, the only kinds of locations it can
handle are stack slots.
However, we could expand this for variables whose address is derived
from the value a register had upon function entry. One case where this
happens is with variables alive across coroutine funclets: these can
be placed in a coroutine frame object whose pointer is placed in a
register that is an argument to coroutine funclets.
```
define @foo(ptr %frame_ptr) {
dbg.declare(%frame_ptr, !some_var,
!DIExpression(EntryValue, <ptr_arithmetic>))
```
This is a patch in a series that aims to improve the debug information
generated by the CoroSplit pass in the context of `swiftasync`
arguments. Variables stored in the coroutine frame _must_ be described
the entry_value of the ABI-defined register containing a pointer to the
coroutine frame. Since these variables have a single location throughout
their lifetime, they are candidates for being stored in the
MachineFunction table.
Differential Revision: https://reviews.llvm.org/D149879
After function argument lowering, but prior to instruction selection,
dbg declares pointing to function arguments are lowered using special
logic.
Later, during instruction selection (both "fast" and regular ISel), this
logic is "repeated" in order to identify which intrinsics have already
been lowered. This is bad for two reasons:
1. The logic is not _really_ repeated, the code is different, which
could lead to duplicate lowering of the intrinsic.
2. Even if the logic were repeated properly, this is still code
duplication.
This patch addresses these issues by storing all preprocessed
dbg.declare intrinsics in a set inside FuncInfo; the set is queried upon
instruction selection.
Differential Revision: https://reviews.llvm.org/D149682
This reverts part of D149033 and rG8f966cedea594d9a91e585e88a80a42c04049e6c. The added test case
is kept to avoid future regression.
Reviewed By: vzakhari, vdonaldson
Differential Revision: https://reviews.llvm.org/D149639
https://llvm.org/docs/LangRef.html#llvm-powi-intrinsic
The max length of the integer power of `llvm.powi` intrinsic is 32, and
the value can be negative. If we use `int32_t` to store this value, `-Val`
will underflow when it is `INT32_MIN`
The issue was reported in D149033.
Issue was reported in D149033, `Val` can be negative value and
arithmetic right shift always keeps the sign bit.
BTW, the redundant code `Val = -Val` is removed by this patch.
First, in CodeGenPrepare.cpp, line 6891, the VectorCond will always be false
because if not function will return at 6888.
Second, in SelectionDAGBuilder.cpp, line 5443, getSExtValue() will return
value as int type, but now we use unsigned Val to maintain it, which make the
if condition at 5452 meaningless.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D149033
We shouldn't be able to reach this code path from source code but this provides
a better fail-safe than asserting. The result of the downgrade is a degraded
debugging experience, but it is better than nothing.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D148212
As described in [0], this extends IRPGO to support //Temporal Profiling//.
When `-pgo-temporal-instrumentation` is used we add the `llvm.instrprof.timestamp()` intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function `INSTR_PROF_PROFILE_SET_TIMESTAMP()`. A new field in the `llvm_prf_cnts` section stores each function's timestamp. Then in `llvm-profdata merge` we convert these function timestamps into a //trace// and add it to the indexed profile.
Since these traces could significantly increase the profile size, we've added `-max-temporal-profile-trace-length` and `-temporal-profile-trace-reservoir-size` to limit the length of a trace and the number of traces in a profile, respectively.
In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup.
Special thanks to Julian Mestre for helping with reservoir sampling.
[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
Reviewed By: snehasish
Differential Revision: https://reviews.llvm.org/D147287
This reverts commit db6a979ae82410e42430e47afa488936ba8e3025.
Reland D102817 without any change. The previous revert was a mistake.
Differential Revision: https://reviews.llvm.org/D102817
Use RawLocationWrapper rather than a Value to represent the location operand(s)
so that it's possible to represent multiple location
operands. AssignmentTrackingAnalysis still converts variadic debug intrinsics
to kill locations so this patch is NFC.
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D145911
This patch adds AArch64 CodeGen support such that the type can be passed
and returned to/from functions, and also adds support to use this type in
load/store operations and PHI nodes.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D136862
As part of this work, removing `SDDbgValue::clearIsEmitted` originally added for
`dbg.addr` in 045c67769d7fe577fc38cccb6fb40fd814437447 was attempted, but it
appears some tests for `DBG_INSTR_REF` now depend on that behaviour as well, so
it was kept and comments were updated instead.
Part of `dbg.addr` removal
Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898
Differential Revision: https://reviews.llvm.org/D144800
This is recommit of 2e416cdd52, fixed to be accepatble by GCC.
The original commit message is below.
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
The m_VScale() matcher is unusual in that it requires a DataLayout.
It is currently used to determine the size of the GEP type. However,
I believe it is sufficient to check for the canonical
<vscale x 1 x i8> form here -- I don't think there's a need to
recognize exotic variations like <vscale x 1 x i4> as a vscale
constant representation as well.
Differential Revision: https://reviews.llvm.org/D144566
The patch ensures last two operands of vp.abs/ctlz/cttz are mask and evl.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144536
This reverts commit e7613c1d9b259bdf2b0b06b4169d9a10dd553406.
GCC issues an error:
In file included from /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:9:
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:66:22: error: explicit specialization of template<class E, class Enable> struct llvm::is_bitmask_enum outside its namespace must use a nested-name-specifier [-fpermissive]
66 | template <> struct is_bitmask_enum<Enum> : std::true_type {}; \
| ^~~~~~~~~~~~~~~~~~~~~
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:30:1: note: in expansion of macro LLVM_DECLARE_ENUM_AS_BITMASK
30 | LLVM_DECLARE_ENUM_AS_BITMASK(Flags2, V4);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is recommit of 2e416cdd52, reverted in 8555ab2fcd, because GCC
complains on extra qualification. The macro LLVM_DECLARE_ENUM_AS_BITMASK
does not specify llvm:: anymore, so the macro must occur in the namespace
llvm. Documentation updated accordingly. The original commit message is below.
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241
This patch adds 2 new intrinsics:
; Interleave two vectors into a wider vector
<vscale x 4 x i64> @llvm.vector.interleave2.nxv2i64(<vscale x 2 x i64> %even, <vscale x 2 x i64> %odd)
; Deinterleave the odd and even lanes from a wider vector
{<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave2.nxv2i64(<vscale x 4 x i64> %vec)
The main motivator for adding these intrinsics is to support vectorization of
complex types using scalable vectors.
The intrinsics are kept simple by only supporting a stride of 2, which makes
them easy to lower and type-legalize. A stride of 2 is sufficient to handle
complex types which only have a real/imaginary component.
The format of the intrinsics matches how `shufflevector` is used in
LoopVectorize. For example:
using cf = std::complex<float>;
void foo(cf * dst, int N) {
for (int i=0; i<N; ++i)
dst[i] += cf(1.f, 2.f);
}
For this loop, LoopVectorize:
(1) Loads a wide vector (e.g. <8 x float>)
(2) Extracts odd lanes using shufflevector (leading to <4 x float>)
(3) Extracts even lanes using shufflevector (leading to <4 x float>)
(4) Performs the addition
(5) Interleaves the two <4 x float> vectors into a single <8 x float> using
shufflevector
(6) Stores the wide vector.
In this example, we can 1-1 replace shufflevector in (2) and (3) with the
deinterleave intrinsic, and replace the shufflevector in (5) with the
interleave intrinsic.
The SelectionDAG nodes might be extended to support higher strides (3, 4, etc)
as well in the future.
Similar to what was done for vector.splice and vector.reverse, the intrinsic
is lowered to a shufflevector when the type is fixed width, so to benefit from
existing code that was written to recognize/optimize shufflevector patterns.
Note that this approach does not prevent us from adding new intrinsics for other
strides, or adding a more generic shuffle intrinsic in the future. It just solves
the immediate problem of being able to vectorize loops with complex math.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D141924
Revert "[SelectionDAG] Add missing setValue calls in visitIntrinsicCall"
This reverts commit 0c64e1b68f36640ffe82fc90e6279c50617ad1cc.
This reverts commit 1142e6c7c795de7f80774325a07ed49bc95a48c9.
It spuriously added !pcsections where they shouldn't be. See added test
case in test/CodeGen/X86/pcsections.ll as an example. The reason is that
the SelectionDAG chains operations in a basic block as "operands"
pointing to preceding instructions. This resulted in setting the
metadata on _all_ instructions preceding the one that should have the
metadata.
Reverting for now because the semantics of !pcsections was completely
buggy now.
ValueTracking attempts to match compare+select patterns to FP min/max
operations, but it was created before the newer IEEE-754-2019
minimum/maximum ops were defined. Ie, matchSelectPattern() does not
account for the -0.0/+0.0 behavior that is specified in the newer
standard.
FMINIMUM/FMAXIMUM nodes were created to map to the newer standard:
/// FMINIMUM/FMAXIMUM - NaN-propagating minimum/maximum that also treat -0.0
/// as less than 0.0. While FMINNUM_IEEE/FMAXNUM_IEEE follow IEEE 754-2008
/// semantics, FMINIMUM/FMAXIMUM follow IEEE 754-2018 draft semantics.
We could adjust ValueTracking to deal with signed zero, but it seems like
a moot point given the divergent NaN behavior discussed in D143056, so just
delete this possibility to avoid bugs when converting IR to SDAG.
Differential Revision: https://reviews.llvm.org/D143106