This avoids the call overhead as well as the the save/restore of
fflags and the snan handling in the libm function.
The save/restore of fflags and snan handling are needed to be
correct for -ftrapping-math. I think we can ignore them in the
default environment.
The inline sequence will generate an invalid exception for nan
and an inexact exception if fractional bits are discarded.
I've used a custom inserter to explicitly create the control flow
around the float->int->float conversion.
We can probably avoid the final fsgnj after the conversion for
no signed zeros FMF, but I'll leave that for future work.
Note the comparison constant is slightly different than glibc uses.
They use 1<<53 for double, I'm using 1<<52. I believe either are valid.
Numbers >= 1<<52 can't have any fractional bits. It's ok to do the
float->int->float conversion on numbers between 1<<53 and 1<<52 since
they will all fit in 64. We only have a problem if the double can't fit
in i64
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D136508
This change modifies the implementation of the format() function
so that vendor forks committed to building with compilers that
support __attribute__((format)) on non-variadic functions can
check the format() function with it.
Reviewed By: ahatanak
Differential Revision: https://reviews.llvm.org/D132413
rdar://84571523
This is a follow-on to https://reviews.llvm.org/D134073.
The number of MIPS16 changes here is a bit surprising. Many of the
fields with mismatched names were NOT previously choosing the correct
argument positionally, but instead doing something completely wrong
(e.g. it would encode a register where an immediate was expected).
But, machine-code generation for MIPS16 has never actually functioned.
It's also fully untested, thus, the MIPS16 changes, despite changing
behavior, breaks (and fixes) zero tests. This change does not fix
MIPS16 output, but it ought to be at least incrementally less broken.
Outside MIPS16, I believe the only functional change is to the 'ginvi'
instruction: it was previously encoding garbage into a field which was
specified to be '00'. Fortunately, it was covered by tests -- and the
tests were testing the incorrect behavior. So, fixed.
Differential Revision: https://reviews.llvm.org/D134220
This is a follow-on to https://reviews.llvm.org/D134073.
It renames a few fields to have consistent names, as well as renaming
operands to match the field names.
Behavior is unchanged by this cleanup. (The only generated code change
is for the disassembler for LDSTUB/LDSTUBA, but in both old and new
versions, it fails to add enough operands, and thus triggers a runtime
abort. I will address that bug in a future commit.)
Differential Revision: https://reviews.llvm.org/D134201
scalar-to-vector (scalar binop (extractelt V, Idx), C) --> shuffle (vector binop V, C'), {Idx, -1, -1...}
We generally try to avoid ad-hoc vectorization in SDAG,
but the motivating case from issue #39482 escapes our
normal vectorization folds in IR. It seems like it should
always be a win to transform this pattern in cases where
we have the same vector type for input and output and the
target supports the vector operation. That avoids
transfers from vector to scalar and back.
In the x86 shift examples, we create the scalar-to-vector
node during legalization. I'm not sure if there's a more
general way to create the pattern for testing. (If so, I
could add tests for other targets.)
Differential Revision: https://reviews.llvm.org/D136713
If we run LTO optimization we migth end up introducing a custom state machine
and later transforming the region into SPMD. This is a problem. While a follow
up will introduce a check for the SPMD conversion, this already prevents the
eager custom state machine generation. Only if the kernel init function is
defined, rather then declared, we will emit a custom state machine. SPMD-zation
can happen eagerly though. Tests are adjusted via a weak definition. The LTO
test was added to verify this works as expected.
Differential Revision: https://reviews.llvm.org/D136740
This was reverted because it was breaking when targeting Darwin which
tried to export these symbols which are now hidden. It should be safe
to just stop attempting to export these symbols in the clang driver,
though Apple folks will need to change their TAPI allow list described
in the commit where these symbols were originally exported
f538018562
Then reverted again because it broke tests on MacOS, they should be
fixed now.
Bug: https://github.com/llvm/llvm-project/issues/58265
Differential Revision: https://reviews.llvm.org/D135340
Instead of using vslide1up, use vslide1down and build the other
direction. This avoids the overlap constraint early clobber of
vslide1up.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D136735
This patch adds base 2 logarithm that returns integer result. I initially wanted to name it `!log2`,
but numbers are not permitted in the name. The documentation makes sure to clarify that it is
base 2 since it is not explicit in the operator name.
Differential Revision: https://reviews.llvm.org/D134068
This is not quite NFC because one of the users should
now avoid the DIVREM opcodes too, but I'm not sure
how to test that.
I used the same name as an analysis function in IR
in case we want to expand this to include other
operations.
Another potential use is proposed in D136713.
This patch reorders the traversal of function call sites and function
formal parameters to:
* do various argument feasibility checks (`isArgumentInteresting` ) only once per argument, i.e. doing N-args checks instead of N-calls x N-args checks.
* do hash table lookups only once per call site, i.e. N-calls lookups/inserts instead of N-call x N-args lookups/inserts.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D135968
When rewriting the call sites to call the new specialised functions, a
single call site can be matched by two different specialisations - a
"less specialised" version of the function and a "more specialised"
version of the function, e.g. for a function
void f(int x, int y)
the call like `f(1, 2)` could be matched by either
void f.1(int x /* int y == 2 */);
or
void f.2(/* int x == 1, int y == 2 */);
The `FunctionSpecialisation` pass tries to match specialisation in the
order of decreasing gain, so "more specialised" functions are
preferred to "less specialised" functions. This breaks, however, when
using the flag `-force-function-specialization`, in which case the
cost/benefit analysis is not performed and all the specialisations are
equally preferable.
This patch makes the pass calculate specialisation gain and order the
specialisations accordingly even when `-force-function-specialization`
is used, under the assumption that this flag has purely debugging
purpose and it is reasonable to ignore the extra computing effort it
incurs.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D136180
The `FunctionSpecialization` pass has support for specialising
functions, which are called with literal arguments. This functionality
is disabled by default and is enabled with the option
`-function-specialization-for-literal-constant` . There are a few
issues with the implementation, though:
* even with the default, the pass will still specialise based on
floating-point literals
* even when it's enabled, the pass will specialise only for the `i1`
type (or `i2` if all of the possible 4 values occur, or `i3` if all
of the possible 8 values occur, etc)
The reason for this is incorrect check of the lattice value of the
function formal parameter. The lattice value is `overdefined` when the
constant range of the possible arguments is the full set, and this is
the reason for the specialisation to trigger. However, if the set of
the possible arguments is not the full set, that must not prevent the
specialisation.
This patch changes the pass to NOT consider a formal parameter when
specialising a function if the lattice value for that parameter is:
* unknown or undef
* a constant
* a constant range with a single element
on the basis that specialisation is pointless for those cases.
Is also changes the criteria for picking up an actual argument to
specialise if the argument is:
* a LLVM IR constant
* has `constant` lattice value
has `constantrange` lattice value with a single element.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135893
memcpy has clamped dst stack alignment to NaturalStackAlignment if
hasStackRealignment is false. We should also clamp stack alignment
for memset and memmove. If we don't clamp, SelectionDAG may first
do tail call optimization which requires no stack realignment. Then
memmove, memset in same function may be lowered to load/store with
larger alignment leading to PEI emit stack realignment code which
is absolutely not correct.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D136456
Small QoL change to allow Predicates to be used in GICombineRule.
Currently only one combine in the AMDGPU backend makes use of it.
The implementation is pretty simple to get started but of course we can expand this later on and optimize predicate checking better if needed.
Reviewed By: dsanders
Differential Revision: https://reviews.llvm.org/D136681
The miscompile case's G_ZEXT has a G_FREEZE source. Similar to D127154, this patch removed isDef32, relying on the AArch64MIPeephole optimizer to remove redundant SUBREG_TO_REG nodes also in GISel.
Fix#58431
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D136433
We don't need to provide a load-address for non-alloc sections. Skipping them
allows us to avoid some complications, like handling duplicate .group sections.
When collecting the possible constant arguments to
specialise a function the compiler will abandon the search
on the first argument that is for some reason unsuitable as
a specialisation constant. Thus, depending on the traversal
order of the functions and call sites, the compiler can end
up with a different set of possible constants, hence with
different set of specialisations.
With this patch, the compiler will skip unsuitable
constants, but nevertheless will continue searching for
more.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135867
Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder.
In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV.
In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV.
As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination.
As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV.
Differential Revision: https://reviews.llvm.org/D136695
Small functions with size under a given threshold are not
considered for specialisaion on the presumption that they
are easy to inline. This does not apply to `noinline`
functions, though.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135862
Adds a generic utility for creating anonymous aarch64 pointer blocks
(automatically adding an edge to initialize the pointer if given an
initial target).
Updates the aarch64 GOTTableManager to use the utility when building
GOT entries.
Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and
that makes those types visible furing GPU-side compilation, where it currently
fails with Sema complaining that __bf16 is not supported.
Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's
enabled on the host should pose no issues, correctness-wise.
Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better
support for __bf16 on NVPTX going forward.
Differential Revision: https://reviews.llvm.org/D136311
Using the legacy PM for the optimization pipeline was deprecated in 13.0.0.
Following recent changes to remove non-core features of the legacy
PM/optimization pipeline, remove DataFlowSanitizerLegacyPass.
Differential Revision: https://reviews.llvm.org/D124594
This patch adds the assembly/disassembly for the following instructions:
SDOT: (4-way, multiple and single vector): Multi-vector signed integer dot-product by vector.
SDOT (4-way, multiple vectors): Multi-vector signed integer dot-product.
UDOT: (4-way, multiple and single vector): Multi-vector unsigned integer dot-product by vector.
(4-way, multiple vectors): Multi-vector unsigned integer dot-product.
for groups of 2 and 4 ZA registers
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
Depends on: D135563
Differential Revision: https://reviews.llvm.org/D135760
This patch adds the assembly/disassembly for the following instruction:
INT:
SDOT (2-way, multiple and single vector): Multi-vector signed integer dot-product by vector.
(2-way, multiple vectors): Multi-vector signed integer dot-product.
UDOT (2-way, multiple and single vector): Multi-vector unsigned integer dot-product by vector.
(2-way, multiple vectors): Multi-vector unsigned integer dot-product.
SUDOT (multiple and indexed vector): Multi-vector signed by unsigned integer dot-product by indexed elements.
(multiple and single vector): Multi-vector signed by unsigned integer dot-product by vector.
USDOT (multiple and single vector): Multi-vector unsigned by signed integer dot-product by vector.
(multiple vectors): Multi-vector unsigned by signed integer dot-product.
FP:
BFDOT(multiple and single vector): Multi-vector BFloat16 floating-point dot-product by vector.
(multiple vectors): Multi-vector BFloat16 floating-point dot-product.
FDOT (multiple and single vector): Multi-vector half-precision floating-point dot-product by vector.
(multiple vectors): Multi-vector half-precision floating-point dot-product.
For set of 2 and 4 ZA registers
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
Depends on:D135455
Differential Revision: https://reviews.llvm.org/D135683
I'm unsure what the code does without the semicolon. On the surface it
seems like the assert below it would be considered part of the if
and thus the assert would only execute if DestReg is 0. But 0 isn't
considered a virtual register so the assert should fail.
Found by PVS Studio.
Reported https://pvs-studio.com/en/blog/posts/cpp/1003/ (N7)
V_FMAC_F32 and V_DOT2C_F32_F16 have a dummy src2 operand tied to vdst to
inform passes that the instructions read the dst operand. The VOPD
versions of these instructions lacked the dummy operand, which was a
problem for inserting s_delay_alu.
Introduce the dummy src2 operand on the VOPD versions, and fix the VOPD operand
tracking logic to account for it.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D136629
This reverts commit bd7949bcd86633bd4203b2ba6f891aea00fce4d1.
Revert this patch since reviwers have different opinions regarding
the approach in post-commit review.
Will open RFC for further discussion.
Differential Revision: https://reviews.llvm.org/D132408
Similar to the EPCEHFrameRegistrar change in c977251ef6f, this allows clients
who have sourced a dylib handle via a side-channel to search that dylib to
find the registration functions.
This patch defaults to the existing behavior in the case where the client does
not specify a handle to use.
SelfExecutorProcessControl no longer requires that handles passed to
lookupSymbols be ones that were previously returned from loadDylib. This brings
SelfExecutorPRocessControl into alignment with SimpleRemoteEPC, which was
updated in 6613f4aff85.
This patch puts the individual target region information attributes into a
struct so that the nested mappings are not needed and passing the information
around is simplified.
Reviewed By: jdoerfert, mikerice
Differential Revision: https://reviews.llvm.org/D136601
getMergedLocation returns a 'line 0' DILocaiton if the two locations
being merged don't perfecly match, even if they are in the same line but
a different column.
This commit adds support to keep the line number if it matches (but only
the column differs). The merged column number is the leftmost between the
two.
Reviewed By: dblaikie, orlando
Differential Revision: https://reviews.llvm.org/D135166