If there is an entry P that has C bits set, it could become one of C
different possibilities. If P occurs more than C times, then there are
no valid completions.
* Concatenate partial shuffles into longer ones whenever possible:
In selection DAG, shuffle's operands and return type must all agree. This
is not the case in LLVM IR, and non-conforming IR-level shuffles will be
rewritten to match DAG's requirements. This can also make a shuffle that
can be matched to a single HVX instruction become shuffles that require
more complex handling. Example: anything that takes two single vectors
and returns a pair (e.g. V6_vshuffvdd).
This is avoided by concatenating such shuffles into ones that take a vector
pair, and an undef pair, and produce a vector pair.
* Recognize perfect shuffles when masks contain `undef` values.
* Use funnel shifts for contracting shuffles.
* Recognize rotations as a separate step.
These changes go into a single commit, because each one on their own
introduced some regressions.
The issue that caused the revert has been fixed in:
44bd80751274a81c870882968ecd478b03af292a
-----
This switches Hexagon intrinsics to use the default attributes
(nosync, nofree, nocallback and willreturn). Especially willreturn
is needed to prevent optimization regressions in the future.
The only intrinsics I've excluded here are the load/store locked
intrinsics, which presumably aren't nosync.
Differential Revision: https://reviews.llvm.org/D137623
Make the stack alignment register (AP) reserved in the given function. This
will make it available everywhere in the function, and allow aligned access
to vector register spill slots.
This reverts commit 8a8983b279dd5e4dceabe1fadbb8980b6adb88f9.
Uncovers existing regalloc issue in Hexagon backend - blocking for Halide
Hexagon users. Reverting to unblock, to be recommitted when underlying issue is resolved.
Reproducer available shortly.
Vector alignment code was grouping all aligned loads together. In some
cases the groups could become quite large causing a lot of spill to be
generated. This will place the loads closer to where they are used,
reducing the register pressure.
This switches Hexagon intrinsics to use the default attributes
(nosync, nofree, nocallback and willreturn). Especially willreturn
is needed to prevent optimization regressions in the future.
The only intrinsics I've excluded here are the load/store locked
intrinsics, which presumably aren't nosync.
Differential Revision: https://reviews.llvm.org/D137623
The loop-carried dependency detection logic in isLoopCarriedDep relies
on the load and store using the same definition for the base register.
This misses the case of post-increment loads and stores whose base
register are different PHI initialized from the same initial value.
This commit extends the logic to accept the load and store having
different PHI base address provided that they had the same initial value
when entering the loop and are incremented by the same amount in each
loop.
Reviewed By: bcahoon
Differential Revision: https://reviews.llvm.org/D136463
These weren't running anywhere because of bad specifications.
One test has bit-rotted and had to be XFAILed, the rest are okay.
Differential Revision: https://reviews.llvm.org/D136612
In llvm/lib/Target/Hexagon/HexagonDepMapAsm2Intrin.td we use "HasV66" for
HVX v66 intrinsics. We should be using "UseHVXV66" instead, since HVX has
its own versioning.
This prepares for an upcoming change to make --print-imm-hex the default
behavior of llvm-objdump. These tests were updated in a semi-automatic
fashion.
See D136972 for details.
This will allow recognizing Q.31 multiplications on vectors that are
multiplies of HVX vectors. At the moment this comes at the expense of
Q.15 multiplications, which now are handled as 32-bit multiplications
with shifts.
In the longer term this will likely be replaced by a different scheme
of "legalizing" vectors, which is necessary for idiom recognition, at
least where using direct HVX instrinsics is desired.
Handle MULH[US] by normalizing them into newly invented nodes
HexagonISD::(S|U|US)MUL_LOHI. On HVX v60, if only the high part of
SMUL_LOHI is used, use the original MULHS expansion. In all other
cases, expand the full product.
On HVX v62, always expand the full product.
Introduce Hexagon-specific LLVM IR intrinsics for 32x32 multiplication
returning low/high parts.
HVC::calculatePointerDifference inserts temporary instructions for
simplification, and calulation of known bits. These instructions were
inserted at the end of a basic block (after the terminator), which
caused BB->getTerminator() to return nullptr. This, in turn, caused
a crash when a PHI instruction was examined in computeKnownBits.
The carry bit from an intermediate addition was not properly propagated.
For example mulhs(7fffffff, 7fffffff) was evaluated as 3ffeffff, while
the correct result is 3fffffff.
HVX v62+ has bidirectional shifts, which do not mask the shift amount to
the bit width. Instead, the shift amount is sign-extended from the log(BW)
bit value, and a negative value causes a shift in the other direction.
For the shift amount being -log(BW), this reversed shift will shift all
bits out, inserting 0s or sign bits depending on the type and direction.
HVX v60 only has splats that take a 32-bit word as input, while v62+
has splats that take 8- or 16-bit value. This makes writing output
patterns that need to use a splat annoying, because the entire output
pattern needs to be replicated for various versions of HVX.
To avoid this, the patterns will always use the pseudos, and then the
pseudos will be handled using a post-ISel hook.
V6_vzb and V6_vshuffeb can use any 2 resources in a packet, while
V6_vunpackub/V6_vpackeb both need a shift resource.
Also, add patterns for shifting vectors of i8.
Resizing operations (e.g. sign extension) in DAG can go from any width
to any other width, e.g. i8 -> i32. If the input and the result differ
by a factor larger than 2, the operation cannot be legal in HVX, since
the only two legal vector sizes in HVX are a single vector and a pair
of vectors.
To simplify the legalization, such operations are expanded into steps
that only double/halve the type size, so that each such step can be fully
legalized on its own. The complication is that DAG will automatically
fold these steps back into one, e.g. sext(sext) -> sext. To prevent that
new HexagonISD nodes are introduced: TL_EXTEND and TL_TRUNCATE. Once
legalized, these nodes are replaced with the original opcodes.
The type legalization is now common to aext/sext/zext/trunc and Hexagon-
specific ssat/usat nodes.