Previously compounding was all-or-nothing. Now, the
compounding attempts will iterate and yield the most
compounds that still result in a valid packet.
There was a limitation in legality that in the original inner loop latch,
no instruction was allowed between the induction variable increment
and the branch instruction. This is because we used to split the
inner latch at the induction variable increment instruction. Since
now we have split at the inner latch branch instruction and have
properly duplicated instructions over to the split block, we remove
this limitation.
Please refer to the test case updates to see how we now interchange
loops where instructions exist between the induction variable
increment and the branch instruction.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D115238
No need to include the order of the scalars beeing used as part of the
alternate vectorization into account when trying to reorder the whole
graph. Such elements better to reorder in the following phase because
the subtree still ends up in shuffle.
Part of D116688, fixes the regression in D116690.
Differential Revision: https://reviews.llvm.org/D116740
There is a bug in the reordering analysis stage. If the element with the
given hash is not added to the map but has the same number of APOs and
instructions with same parent, but different instruction opcode, it will
be initalized with default values and then the counter is increased by
1. But the lane is not updated and default to 0 instead of the actual
`Lane` value. It leads to the fact that the analysis is useless in
many cases and default to lane 0 instead of actual lane with the
minimum amount of APO operands.
Differential Revision: https://reviews.llvm.org/D116690
This adds some AArch64 specific smul_with_overflow and umul_with_overflow
costs, overriding the default costs. The code generation for these mul
with overflow intrinsics is usually better than the default expansion on
AArch64. The costs come from https://godbolt.org/z/zEzYhMWqo with various
types, or llvm/test/CodeGen/AArch64/arm64-xaluo.ll.
Differential Revision: https://reviews.llvm.org/D116732
dyn_cast<> can return null - use cast<> instead to assert the cast is valid before dereferencing the casted pointer.
Fixes static-analyzer null dereference warning.
I am suspecting a bug around updates of loop info for unreachable exits, but don't have a test case. Running this locally on make check didn't reveal anything, we'll see if the expensive checks bots find it.
The 0 immediate can't be selected to vmsgtu.vi/vmsleu.vi by decrementing
the immediate. To prevent his we had special patterns that provided
alternate lowering for the 0 cases. This relied on tablegen prioritizing
the 0 pattern over the sim5_plus1 range.
This patch introduces simm5_plus1_nonzero that excludes 0. It also
excludes the special case for vmsltu.vi since we can just use
vmsltu.vx and let the 0 be selected to X0.
This is an alternative to some of the changes in D116584.
Reviewed By: Chenbing.Zheng, asb
Differential Revision: https://reviews.llvm.org/D116723
Include the value of `ZLIB_ROOT` in `LLVMConfig.cmake` so `FindZLIB` can pick it up. This fixes an issue where ZLIB is not found on AIX runtimes despite specifying `-DZLIB_ROOT`.
Reviewed By: daltenty
Differential Revision: https://reviews.llvm.org/D116235
Function calls and compare instructions tend to cause sext.w
instructions to be inserted. If we make good use of W instructions,
these operations can often end up being redundant. We don't always
detect these during SelectionDAG due to things like phis. There also
some cases caused by failure to turn extload into sextload in
SelectionDAG. extload selects to LW allowing later sext.ws to become
redundant.
This patch adds a pass that examines the input of sext.w instructions trying
to determine if it is already sign extended. Either by finding a
W instruction, other instructions that produce a sign extended result,
or looking through instructions that propagate sign bits. It uses
a worklist and visited set to search as far back as necessary.
Reviewed By: asb, kito-cheng
Differential Revision: https://reviews.llvm.org/D116397
The zextload hook is only used to determine whether to insert a
zero_extend or any_extend for narrow types leaving a basic block.
Returning true from this hook tends to cause any load whose output
leaves the basic block to become an LWU instead of an LW.
Since we tend to prefer sexts for i32 compares on RV64, this can
cause extra sext.w instructions to be created in other basic blocks.
If we use LW instead of LWU this gives the MIR pass from D116397
a better chance of removing them.
Another option might be to teach getPreferredExtendForValue in
FunctionLoweringInfo.cpp about our preference for sign_extend of
i32 compares. That would cause SIGN_EXTEND to be chosen for any
value used by a compare instead of using the isZExtFree heuristic.
That will require code to convert from the llvm::Type* to EVT/MVT
as well as querying the type legalization actions to get the
promoted type in order to call TargetLowering::isSExtCheaperThanZExt.
That seemed like many extra steps when no other target wants it.
Though it would avoid us needing to lean on the MIR pass in some cases.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D116567
Fix static analysis warning by using cast<> instead of dyn_cast<> as both isa<> and isGuaranteedToExecuteForEveryIteration expect a non-null Instruction pointer.
This is the autoupgrade part of D116531. If old bitcode is missing
the elementtype attribute for indirect inline asm constraints,
automatically add it. As usual, this only works when upgrading
in typed mode, we haven't figured out upgrade in opaque mode yet.
When determining whether the memory is local to the function (and
we can thus introduce spurious writes without thread-safety issues),
check for a noalias call rather than the hardcoded list of memory
allocation functions. Noalias calls are the more general way to
determine allocation functions, as long as we're only interested
in the property that the returned value is distinct from any other
accessible memory.
Differential Revision: https://reviews.llvm.org/D116728
This patch updates SCEVExpander::expandUnionPredicate to not create
redundant 'or false, x' instructions. While those are trivially
foldable, they can be easily avoided and hinder code that checks the
size/cost of the generated checks before further folds.
I am planning on look into a few other similar improvements to code
generated by SCEVExpander.
I remember a while ago @lebedev.ri working on doing some trivial folds
like that in IRBuilder itself, but there where concerns that such
changes may subtly break existing code.
Reviewed By: reames, lebedev.ri
Differential Revision: https://reviews.llvm.org/D116696
Add CMake variable LLVM_EXTERNAL_PROJECT_BUILD_TOOL_ARGS to allow
arguments to be passed to the native tool used in CMake --build
invocations for external projects.
Can be used to pass extra arguments for enhanced versions of build
tools, e.g. distributed build options.
Differential Revision: https://reviews.llvm.org/D115815
Although we moved to Github Issues. The bug report message refers to
Bugzilla still. This patch tries to update these URLs.
Reviewed By: MaskRay, Quuxplusone, jhenderson, libunwind, libc++
Differential Revision: https://reviews.llvm.org/D116351