Previously reverted in 8b446ea2ba39e406bcf940ea35d6efb4bb9afe95
Reapplying because this commit is NOT DEPENDENT on the reverted commit
fc21f2d7bae2e0be630470cc7ca9323ed5859892, which broke the ASAN buildbot.
See https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for
more information.
The arguments to a PHI may represent a recurrence by eventually using the output
of the PHI itself. This is now handled by checking for cycles in the control
flow. If a PHI is not in a recurrence, it is now able to report multiple offsets
instead of conservatively reporting unknown.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D138991
Before we might have missed calling the destructor on an abstract
attribute if it was created outside the seeding or update phase.
All AAs are now in the AAMap and we can use it to delete them all.
Much of foldLoadsRecursive relies on knowing the size of loaded
data, which is not possible for scalable vector types. However,
the logic of combining two small loads into one bigger load does
not apply for vector types so rather than converting the algorithm
to use TypeSize I've simply added an early exit for vectors.
Fixes#59510
Differential Revision: https://reviews.llvm.org/D140106
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
This fixes check-clang-tools.
This reverts commit 37b8f09a4b61bf9bf9d0b9017d790c8b82be2e17,
and returns commit 1bd0b82e508d049efdb07f4f8a342f35818df341.
The miscompile was in InstCombine, and it has been addressed.
This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37
Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb
Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts
Differential Revision: https://reviews.llvm.org/D139275
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
This fixes clang.
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
The class in InstCombineInternal.h inherits from InstCombiner.h.
I think this split was created when target specific InstCombines
were moved to go through TTI.
I had to update some of the code in InstCombiner.h to match changes
that had been made to InstCombineInternal.h.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D140230
Add a VP_CLASSOF_IMPL macro to define common classof implementations for
recipes. This reduces duplication and also adds missing implementations
to existing recipes.
* This is a recommit of 3c4d2a03968ccf5889bacffe02d6fa2443b0260f,
* which was reverted in 25f01d593ce296078f57e872778b77d074ae5888,
because it exposed a miscompile in PPC backend, which was resolved
in https://reviews.llvm.org/D140089 / cb3f415cd2019df7d14683842198bc4b7a492bc5.
* which was a recommit of cf624b23bc5d5a6161706d1663def49380ff816a,
* which was reverted in 5cfc22cafe3f2465e0bb324f8daba82ffcabd0df,
because the cut-off on the number of vector elements was not low enough,
and it triggered both SDAG SDNode operand number assertions,
5and caused compile time explosions in some cases.
Let's try with something really *REALLY* conservative first,
just to get somewhere, and try to bump it later.
FIXME: should this respect TTI reg width * num vec regs?
Original commit message:
Now, there's a big caveat here - these bytes
are abstract bytes, not the i8 we have in LLVM,
so strictly speaking this is not exactly legal,
see e.g. https://github.com/AliveToolkit/alive2/issues/860
^ the "bytes" "could" have been a pointer,
and loading it as an integer inserts an implicit ptrtoint.
But at the same time,
InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()`
would expand a memtransfer of 1/2/4/8 bytes
into integer-typed load+store,
so this isn't exactly a new problem.
Note that in memory, poison is byte-wise,
so we really can't widen elements,
but SROA seems to be inconsistent here.
Fixes#59116.
Currently InstCombine folds using the
`return replaceInstUsesWith(V, Builder.CreateFoo())`
pattern do not preserve the original name of the instruction.
To preserve the name, you either have to use something like
`return FooInst::Create(...)` which is usually less nice, or go
out of the way to preserve the name with takeName(). We often
don't do that.
This patch instead preserves the name in replaceInstUsesWith()
when replacing a named instruction with an unnamed instruction.
To be conservative, I also added a zero-use check, which is a
proxy for the case where the instruction was just created, rather
than an existing one reused. Possibly we could drop that part.
As InstCombine tests are robust against renames this does not
cause any test diffs, so I regenerated a random test to show the
effects.
Differential Revision: https://reviews.llvm.org/D140192
Use a consistent type for the operands() methods of different SCEV
types. Also make the API consistent by only providing operands(),
rather than also providin op_begin() and op_end() for some of them.
Fix two issues for profile staleness report.
1) It should be more accurate to use the sum of all entry count(`getHeadSamplesEstimate`) for the callsite samples than the total samples, since even the top-level callsite is mismatched, it does affect the inlining but it can still be merged into base profile and used later.
2) I accidentally missed to persist the num of mismatched callsite into binary.
Also added the asm testing to test the decoding of the section.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D140063
It was annoying to write the check for this in the one case I added,
and I'm planning on adding another, so add a convenient PatternMatch
like for other special case values.
I have no idea what is going on in the DoubleAPFloat case, I reversed
this from the makeSmallestNormalized test. Also could implement this
as *this == getSmallestNormalized() for less code, but this avoids the
construction of a temporary APFloat copy and follows the style of the
other functions.
This transform was added with 4446f71ce392. However, as noted in
the post-commit feedback, the transform is not safe with an
arbitrary base vector because we may leak poison from a narrow
element into an adjacent element when bitcasting.
I made the least invasive code change in case we do figure out
a way to make this safe.
Previously reverted in 12696d302d146ffe616eecab3feceba9d29be2db
The arguments to a PHI may represent a recurrence by eventually using the output
of the PHI itself. This is now handled by checking for cycles in the control
flow. If a PHI is not in a recurrence, it is now able to report multiple offsets
instead of conservatively reporting unknown.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D138991
The arguments to a PHI may represent a recurrence by eventually using the output
of the PHI itself. This is now handled by checking for cycles in the control
flow. If a PHI is not in a recurrence, it is now able to report multiple offsets
instead of conservatively reporting unknown.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D138991
In scalar plans, replicate recipes will only generate a single value per
UF, independent of whether they are uniform or not. So don't consider
uniformity for plans with scalar VFs only.
This allows us to handle a few additional cases in VPlan sinking instead
of non-VPlan sinkScalarOperands.
Depends on D133762.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D134218
The `FunctionSpecialization` pass chooses specializations among the
opportunities presented by a single function and its calls,
progressively penalizing subsequent specialization attempts by
artificially increasing the cost of a specialization, depending on how
many specialization were applied before. Thus the chosen
specializations are sensitive to the order the functions appear in the
module and may be worse than others, had those others been considered
earlier.
This patch makes the `FunctionSpecialization` pass rank the
specializations globally, i.e. choose the "best" specializations
among the all possible specializations in the module, for all
functions.
Since this involved quite a bit of redesign of the pass data
structures, this patch also carries:
* removal of duplicate specializations
* optimization of call sites update, by collecting per
specialization the list of call sites that can be directly
rewritten, without prior expensive check if the call constants and
their positions match those of the specialized function.
A bit of a write-up up about the FuncSpec data structures and
operation:
Each potential function specialisation is kept in a single vector
(`AllSpecs` in `FunctionSpecializer::run`). This vector is populated
by `FunctionSpecializer::findSpecializations`.
The `findSpecializations` member function has a local `DenseMap` to
eliminate duplicates - with each call to the current function,
`findSpecializations` builds a specialisation signature (`SpecSig`)
and looks it in the duplicates map. If the signature is present, the
function records the call to rewrite into the existing specialisation
instance. If the signature is absent, it means we have a new
specialisation instance - the function calculates the gain and creates
a new entry in `AllSpecs`. Negative gain specialisation are ignored at
this point, unless forced.
The potential specialisations for a function form a contiguous range
in the `AllSpecs` [1]. This range is recorded in `SpecMap SM`, so we
can quickly find all specialisations for a function.
Once we have all the potential specialisations with their gains we
need to choose the best ones, which fit in the module specialisation
budget. This is done by using a max-heap (`std::make_heap`,
`std::push_heap`, etc) to find the best `NSpec` specialisations with a
single traversal of the `AllSpecs` vector. The heap itself is
contained with a small vector (`BestSpecs`) of indices into
`AllSpecs`, since elements of `AllSpecs` are a bit too heavy to
shuffle around.
Next the chosen specialisation are performed, that is, functions
cloned, `SCCPSolver` primed, and known call sites updated.
Then we run the `SCCPSolver` to propagate constants in the cloned
functions, after which we walk the calls of the original functions to
update them to call the specialised functions.
---
[1] This range may contain specialisation that were discarded and is
not ordered in any way. One alternative design is to keep a vector
indices of all specialisations for this function (which would
initially be, `i`, `i+1`, `i+2`, etc) and later sort them by gain,
pushing non-applied ones to the back. This has the potential to speed
`updateCallSites` up.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D139346
Change-Id: I708851eb38f07c42066637085b833ca91b195998
Without this change this function could return nullptr if no further
transformation took place making InstCombine pass incorrectly report no
IR changes preventing analyses invalidation.
Differential Revision: https://reviews.llvm.org/D140004
now with the fixed warning and updated lit tests
---
[RISC-V][HWASAN] Add support for HWASAN code instrumentation for RISC-V
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D131575
This reverts commit 492c471839a66e354ebe696bd3e15f7477c63613.
As pointed out by nloped, the transform in f2 is not correct: If
%shr is poison, then freeze may result in a negative value. The
transform is correct in the case where the freeze is pushed through
the operation in a way that guarantees the result is non-negative,
which is the case I had tested.
Even if all loads and stores are in `nosync` functions we cannot
guarantee there is no synchronization going on between them. As such, we
cannot use CFG reasoning. We could check the entire module, or, what
happens now to minimize test churn, is to check if all accesses are in
the same function that is `nosync`. A follow up will undo some of the
regressions where possible.
Similarly, reachability cannot be used to exclude an access if the
access is not known to be executed by the same thread as the given
instruction.
The OpenMP-opt test was added for the latter problem.
We had two AAs for reachability but it was very cumbersome to extend
them. We also had some fallback to use LLVM-core mechanisms and cache
the result. The new design shares the query code and interface nicely
between AAIntraFnReachability and AAInterFnReachability.
As part of the rewrite we also added the ExclusionSet to the queries.
Even if a value is for sure written we need to visit the call sites as
they might end up inside the function that reads and writes the value.
In a follow up we can introduce correct reasoning to avoid the backwards
traversal in this case and instead check if any call site between the
write and the read might reach a potential write we want to exclude.