If we run LTO optimization we migth end up introducing a custom state machine
and later transforming the region into SPMD. This is a problem. While a follow
up will introduce a check for the SPMD conversion, this already prevents the
eager custom state machine generation. Only if the kernel init function is
defined, rather then declared, we will emit a custom state machine. SPMD-zation
can happen eagerly though. Tests are adjusted via a weak definition. The LTO
test was added to verify this works as expected.
Differential Revision: https://reviews.llvm.org/D136740
This was reverted because it was breaking when targeting Darwin which
tried to export these symbols which are now hidden. It should be safe
to just stop attempting to export these symbols in the clang driver,
though Apple folks will need to change their TAPI allow list described
in the commit where these symbols were originally exported
f538018562
Then reverted again because it broke tests on MacOS, they should be
fixed now.
Bug: https://github.com/llvm/llvm-project/issues/58265
Differential Revision: https://reviews.llvm.org/D135340
This patch reorders the traversal of function call sites and function
formal parameters to:
* do various argument feasibility checks (`isArgumentInteresting` ) only once per argument, i.e. doing N-args checks instead of N-calls x N-args checks.
* do hash table lookups only once per call site, i.e. N-calls lookups/inserts instead of N-call x N-args lookups/inserts.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D135968
When rewriting the call sites to call the new specialised functions, a
single call site can be matched by two different specialisations - a
"less specialised" version of the function and a "more specialised"
version of the function, e.g. for a function
void f(int x, int y)
the call like `f(1, 2)` could be matched by either
void f.1(int x /* int y == 2 */);
or
void f.2(/* int x == 1, int y == 2 */);
The `FunctionSpecialisation` pass tries to match specialisation in the
order of decreasing gain, so "more specialised" functions are
preferred to "less specialised" functions. This breaks, however, when
using the flag `-force-function-specialization`, in which case the
cost/benefit analysis is not performed and all the specialisations are
equally preferable.
This patch makes the pass calculate specialisation gain and order the
specialisations accordingly even when `-force-function-specialization`
is used, under the assumption that this flag has purely debugging
purpose and it is reasonable to ignore the extra computing effort it
incurs.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D136180
The `FunctionSpecialization` pass has support for specialising
functions, which are called with literal arguments. This functionality
is disabled by default and is enabled with the option
`-function-specialization-for-literal-constant` . There are a few
issues with the implementation, though:
* even with the default, the pass will still specialise based on
floating-point literals
* even when it's enabled, the pass will specialise only for the `i1`
type (or `i2` if all of the possible 4 values occur, or `i3` if all
of the possible 8 values occur, etc)
The reason for this is incorrect check of the lattice value of the
function formal parameter. The lattice value is `overdefined` when the
constant range of the possible arguments is the full set, and this is
the reason for the specialisation to trigger. However, if the set of
the possible arguments is not the full set, that must not prevent the
specialisation.
This patch changes the pass to NOT consider a formal parameter when
specialising a function if the lattice value for that parameter is:
* unknown or undef
* a constant
* a constant range with a single element
on the basis that specialisation is pointless for those cases.
Is also changes the criteria for picking up an actual argument to
specialise if the argument is:
* a LLVM IR constant
* has `constant` lattice value
has `constantrange` lattice value with a single element.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135893
When collecting the possible constant arguments to
specialise a function the compiler will abandon the search
on the first argument that is for some reason unsuitable as
a specialisation constant. Thus, depending on the traversal
order of the functions and call sites, the compiler can end
up with a different set of possible constants, hence with
different set of specialisations.
With this patch, the compiler will skip unsuitable
constants, but nevertheless will continue searching for
more.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135867
Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder.
In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV.
In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV.
As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination.
As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV.
Differential Revision: https://reviews.llvm.org/D136695
Small functions with size under a given threshold are not
considered for specialisaion on the presumption that they
are easy to inline. This does not apply to `noinline`
functions, though.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135862
Using the legacy PM for the optimization pipeline was deprecated in 13.0.0.
Following recent changes to remove non-core features of the legacy
PM/optimization pipeline, remove DataFlowSanitizerLegacyPass.
Differential Revision: https://reviews.llvm.org/D124594
This reverts commit bd7949bcd86633bd4203b2ba6f891aea00fce4d1.
Revert this patch since reviwers have different opinions regarding
the approach in post-commit review.
Will open RFC for further discussion.
Differential Revision: https://reviews.llvm.org/D132408
This reverts commit 04877284b4592e9286cab43467662c1b4ff81861.
Looks like this is still breaking the test
Profile-x86_64 :: instrprof-darwin-dead-strip.c
(see comment on https://reviews.llvm.org/D135340).
This addresses a bug where vector versions of ctlz are creating false positive reports.
Depends on D136369
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D136523
When -asan-max-inline-poisoning-size=0, all shadow memory access should be
outlined (through asan calls). This was not occuring when partial poisoning
was required on the right side of a variable's redzone. This diff contains
the changes necessary to implement and utilize __asan_set_shadow_01() through
__asan_set_shadow_07(). The change is necessary for the full abstraction of
the asan implementation and will enable experimentation with alternate strategies.
Differential Revision: https://reviews.llvm.org/D136197
This is a sibling transform to the fold just above it. That was changed
to allow the corresponding commuted patterns with:
307307456277
e1bd759ea567
8628e6df7000
This was reverted because it was breaking when targeting Darwin which
tried to export these symbols which are now hidden. It should be safe
to just stop attempting to export these symbols in the clang driver,
though Apple folks will need to change their TAPI allow list described
in the commit where these symbols were originally exported
f538018562
Bug: https://github.com/llvm/llvm-project/issues/58265
Differential Revision: https://reviews.llvm.org/D135340
Improve O(N^2) to O(N) in some cases, reduce number of allocations by
reserving memory.
Also, improve analysis of loads reduction values to avoid analysis
of not vectorizable cases.
This teaches the SCCP Solver how to constant fold more intrinsics. Constant
folding appears to be just as good as D115737 but much, much lower in code
change impact as suggested by nikic.
The constrained floating-point intrinsics all take at least one metadata
argument and were the motivation for the change.
Differential Revision: https://reviews.llvm.org/D136466
(sign|resign) + (auth|resign) can be folded by omitting the middle
sign+auth component if the key and discriminator match.
Differential Revision: https://reviews.llvm.org/D132383
Without a freeze, this transform can leak poison to the output:
https://alive2.llvm.org/ce/z/GJuF9i
This makes the transform as uniform as possible, and it can help
reduce patterns like issue #58313 (although that particular
example probably still needs another transform).
Differential Revision: https://reviews.llvm.org/D136527
This doesn't touch objc-arc-contract because that's in the codegen pipeline.
However, this does move its corresponding initialize function into initializeCodegen().
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D135041
When the common value is part of either select condition,
this is safe to reduce. Otherwise, it is not poison-safe
(with the select form of the pattern):
https://alive2.llvm.org/ce/z/FxQTzB
This is another patch motivated by issue #58313.
1) Use a static array of pointer to retain the dummy vars.
2) Associate liveness of the array with that of the runtime hook variable
__llvm_profile_runtime.
3) Perform the runtime initialization through the runtime hook variable.
4) Preserve the runtime hook variable using the -u linker flag.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D136192
This is obviously correct for real logic instructions,
and it also works for the poison-safe variants that use
selects:
https://alive2.llvm.org/ce/z/wyHiwX
This is motivated by the lack of 'xor' folding seen in issue #58313.
This more general fold should help reduce some of those patterns,
but I'm not sure if this specific case does anything for that
particular example.
This allows the regular bitwise logic opcodes in addition to the
poison-safe select variants:
https://alive2.llvm.org/ce/z/8xB9gy
Handling commuted variants safely is likely trickier, so that's
left to another patch.
Additional SCEV verification highlighted a case where the cached loop
dispositions where incorrect after simplifying a condition in IndVars
and moving the user in LoopDeletion. Fix it by invalidating ICmp and all
its users.
Fixes#58515.
This implements IR and bitcode support for the memory attribute,
as specified in https://reviews.llvm.org/D135597.
The new attribute is not used for anything yet (and as such, the
old memory attributes are unaffected).
Differential Revision: https://reviews.llvm.org/D135592
Currently, compiling a program with the `-pg` flag will result in an
undefined symbol error for `.mcount`. This revision fixes the call to
use `__mcount`, which requires a pointer argument to a pointer-sized
object (unique per inserted call) on AIX.
This is only a partial fix. This patch should fix the `-pg` flag's
behaviour on AIX to work with code you are compiling, but it will not
link against standard libraries with `mcount` instrumentation calls. The
next step is to add profiled libraries to the linker search paths in the
Clang driver for the AIX toolchain when linking with `-pg`.
Differential Review: https://reviews.llvm.org/D135384
Canonicalize GEP of GEP by swapping GEP with some suffix constant indices to the back (and GEP with all constant indices to the back of that), this allows more constant index GEP merging to happen. Exceptions are: If swapping violates use-def relations, or anti-optimizes LICM
For constant indexed GEP of GEP, if they cannot be merged directly, they will be casted to i8* and merged.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D125845
The code in buildScalarSteps already properly handles creating the
scalar induction values with VF = 1. Use it directly instead of using
extra code to handle that case.
Suggested by @Ayal in D133760.
Reapplying after the fix for volatile modelling in D135863.
-----
Don't add argmem if the pointer is clearly not an argument (e.g.
a global). I don't think this makes a difference right now, but
gives more obvious results with D135780.