Because DW_TAG_LLVM_ptrauth_type is not marked as a type, dwarfdump
currently emits a spurious error: DIE has DW_AT_type with incompatible
tag DW_TAG_LLVM_ptrauth_type. This patch fixes that and adds a test.
According to the documentation, lto_module_create_in_codegen_context
should return NULL on error but currently it is accidentally return
error_code. Since this is a bug fix and it seems to be a one-off bug
that only affects this API, there is no need to bump API version.
rdar://101505192
Reviewed By: pete
Differential Revision: https://reviews.llvm.org/D136769
This avoids the call overhead as well as the the save/restore of
fflags and the snan handling in the libm function.
The save/restore of fflags and snan handling are needed to be
correct for -ftrapping-math. I think we can ignore them in the
default environment.
The inline sequence will generate an invalid exception for nan
and an inexact exception if fractional bits are discarded.
I've used a custom inserter to explicitly create the control flow
around the float->int->float conversion.
We can probably avoid the final fsgnj after the conversion for
no signed zeros FMF, but I'll leave that for future work.
Note the comparison constant is slightly different than glibc uses.
They use 1<<53 for double, I'm using 1<<52. I believe either are valid.
Numbers >= 1<<52 can't have any fractional bits. It's ok to do the
float->int->float conversion on numbers between 1<<53 and 1<<52 since
they will all fit in 64. We only have a problem if the double can't fit
in i64
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D136508
This change modifies the implementation of the format() function
so that vendor forks committed to building with compilers that
support __attribute__((format)) on non-variadic functions can
check the format() function with it.
Reviewed By: ahatanak
Differential Revision: https://reviews.llvm.org/D132413
rdar://84571523
This is a follow-on to https://reviews.llvm.org/D134073.
The number of MIPS16 changes here is a bit surprising. Many of the
fields with mismatched names were NOT previously choosing the correct
argument positionally, but instead doing something completely wrong
(e.g. it would encode a register where an immediate was expected).
But, machine-code generation for MIPS16 has never actually functioned.
It's also fully untested, thus, the MIPS16 changes, despite changing
behavior, breaks (and fixes) zero tests. This change does not fix
MIPS16 output, but it ought to be at least incrementally less broken.
Outside MIPS16, I believe the only functional change is to the 'ginvi'
instruction: it was previously encoding garbage into a field which was
specified to be '00'. Fortunately, it was covered by tests -- and the
tests were testing the incorrect behavior. So, fixed.
Differential Revision: https://reviews.llvm.org/D134220
This is a follow-on to https://reviews.llvm.org/D134073.
It renames a few fields to have consistent names, as well as renaming
operands to match the field names.
Behavior is unchanged by this cleanup. (The only generated code change
is for the disassembler for LDSTUB/LDSTUBA, but in both old and new
versions, it fails to add enough operands, and thus triggers a runtime
abort. I will address that bug in a future commit.)
Differential Revision: https://reviews.llvm.org/D134201
This brings it in line with recommended practice after the
introduction of sub-operand naming in a538d1f13a13, and the
deprecation of positional argument matching in 5351878ba196.
scalar-to-vector (scalar binop (extractelt V, Idx), C) --> shuffle (vector binop V, C'), {Idx, -1, -1...}
We generally try to avoid ad-hoc vectorization in SDAG,
but the motivating case from issue #39482 escapes our
normal vectorization folds in IR. It seems like it should
always be a win to transform this pattern in cases where
we have the same vector type for input and output and the
target supports the vector operation. That avoids
transfers from vector to scalar and back.
In the x86 shift examples, we create the scalar-to-vector
node during legalization. I'm not sure if there's a more
general way to create the pattern for testing. (If so, I
could add tests for other targets.)
Differential Revision: https://reviews.llvm.org/D136713
Summary:
Several loop queries look for a singleton by finding all instances and then
returning whether there is 1 instance or not. This can be improved by
stopping the search after 2 have been found. Introduce generic range
based singleton searches that stop after finding a second value
and use them for these loop queries.
There is no intended functional change other than improved compile-time
efficiency.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: Meinersbur (Michael Kruse)
Differential Revision: https://reviews.llvm.org/D136261
If we run LTO optimization we migth end up introducing a custom state machine
and later transforming the region into SPMD. This is a problem. While a follow
up will introduce a check for the SPMD conversion, this already prevents the
eager custom state machine generation. Only if the kernel init function is
defined, rather then declared, we will emit a custom state machine. SPMD-zation
can happen eagerly though. Tests are adjusted via a weak definition. The LTO
test was added to verify this works as expected.
Differential Revision: https://reviews.llvm.org/D136740
This was reverted because it was breaking when targeting Darwin which
tried to export these symbols which are now hidden. It should be safe
to just stop attempting to export these symbols in the clang driver,
though Apple folks will need to change their TAPI allow list described
in the commit where these symbols were originally exported
f538018562
Then reverted again because it broke tests on MacOS, they should be
fixed now.
Bug: https://github.com/llvm/llvm-project/issues/58265
Differential Revision: https://reviews.llvm.org/D135340
Instead of using vslide1up, use vslide1down and build the other
direction. This avoids the overlap constraint early clobber of
vslide1up.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D136735
Increase test coverage - check that functions are not specialised on
constant or unused arguments.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D136184
This patch adds base 2 logarithm that returns integer result. I initially wanted to name it `!log2`,
but numbers are not permitted in the name. The documentation makes sure to clarify that it is
base 2 since it is not explicit in the operator name.
Differential Revision: https://reviews.llvm.org/D134068
This is not quite NFC because one of the users should
now avoid the DIVREM opcodes too, but I'm not sure
how to test that.
I used the same name as an analysis function in IR
in case we want to expand this to include other
operations.
Another potential use is proposed in D136713.
Some instructions are not matched by update_mir_test_checks.py because MIFlags and
regex in the script are not synchronized.
Differential Revision: https://reviews.llvm.org/D136170
This patch reorders the traversal of function call sites and function
formal parameters to:
* do various argument feasibility checks (`isArgumentInteresting` ) only once per argument, i.e. doing N-args checks instead of N-calls x N-args checks.
* do hash table lookups only once per call site, i.e. N-calls lookups/inserts instead of N-call x N-args lookups/inserts.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D135968
When rewriting the call sites to call the new specialised functions, a
single call site can be matched by two different specialisations - a
"less specialised" version of the function and a "more specialised"
version of the function, e.g. for a function
void f(int x, int y)
the call like `f(1, 2)` could be matched by either
void f.1(int x /* int y == 2 */);
or
void f.2(/* int x == 1, int y == 2 */);
The `FunctionSpecialisation` pass tries to match specialisation in the
order of decreasing gain, so "more specialised" functions are
preferred to "less specialised" functions. This breaks, however, when
using the flag `-force-function-specialization`, in which case the
cost/benefit analysis is not performed and all the specialisations are
equally preferable.
This patch makes the pass calculate specialisation gain and order the
specialisations accordingly even when `-force-function-specialization`
is used, under the assumption that this flag has purely debugging
purpose and it is reasonable to ignore the extra computing effort it
incurs.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D136180
The `FunctionSpecialization` pass has support for specialising
functions, which are called with literal arguments. This functionality
is disabled by default and is enabled with the option
`-function-specialization-for-literal-constant` . There are a few
issues with the implementation, though:
* even with the default, the pass will still specialise based on
floating-point literals
* even when it's enabled, the pass will specialise only for the `i1`
type (or `i2` if all of the possible 4 values occur, or `i3` if all
of the possible 8 values occur, etc)
The reason for this is incorrect check of the lattice value of the
function formal parameter. The lattice value is `overdefined` when the
constant range of the possible arguments is the full set, and this is
the reason for the specialisation to trigger. However, if the set of
the possible arguments is not the full set, that must not prevent the
specialisation.
This patch changes the pass to NOT consider a formal parameter when
specialising a function if the lattice value for that parameter is:
* unknown or undef
* a constant
* a constant range with a single element
on the basis that specialisation is pointless for those cases.
Is also changes the criteria for picking up an actual argument to
specialise if the argument is:
* a LLVM IR constant
* has `constant` lattice value
has `constantrange` lattice value with a single element.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135893
memcpy has clamped dst stack alignment to NaturalStackAlignment if
hasStackRealignment is false. We should also clamp stack alignment
for memset and memmove. If we don't clamp, SelectionDAG may first
do tail call optimization which requires no stack realignment. Then
memmove, memset in same function may be lowered to load/store with
larger alignment leading to PEI emit stack realignment code which
is absolutely not correct.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D136456
Small QoL change to allow Predicates to be used in GICombineRule.
Currently only one combine in the AMDGPU backend makes use of it.
The implementation is pretty simple to get started but of course we can expand this later on and optimize predicate checking better if needed.
Reviewed By: dsanders
Differential Revision: https://reviews.llvm.org/D136681
In `computeInlinedContextSizeForRange`, the offset of range is only used one time, there is no need to cache the frame location stack.
Measured on one internal service binary, this can save 2GB memory usage and reduce a small run time (avoid one hash search).
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D128859
The miscompile case's G_ZEXT has a G_FREEZE source. Similar to D127154, this patch removed isDef32, relying on the AArch64MIPeephole optimizer to remove redundant SUBREG_TO_REG nodes also in GISel.
Fix#58431
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D136433
We don't need to provide a load-address for non-alloc sections. Skipping them
allows us to avoid some complications, like handling duplicate .group sections.
When collecting the possible constant arguments to
specialise a function the compiler will abandon the search
on the first argument that is for some reason unsuitable as
a specialisation constant. Thus, depending on the traversal
order of the functions and call sites, the compiler can end
up with a different set of possible constants, hence with
different set of specialisations.
With this patch, the compiler will skip unsuitable
constants, but nevertheless will continue searching for
more.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135867
Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder.
In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV.
In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV.
As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination.
As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV.
Differential Revision: https://reviews.llvm.org/D136695