Similar to 806761a7629df268c8aed49657aeccffa6bca449
-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple, leaving a target triple which
may not make sense.
Therefore, -march= is error-prone and not recommended for tests without a target
triple. The issue has been benign as we recognize $ve-apple-darwin as ELF instead
of rejecting it outrightly.
Change lowering store iff the data operand is leagalized. In this way,
llvm can lower only operands first, then lower store instruction later.
Reviewed By: efocht
Differential Revision: https://reviews.llvm.org/D158253
Correct the legality of i32 mul_lohi on AArch64.
Previously, AArch64 incorrectly reported i32 mul_lohi as Legal.
This allowed BuildUDIV/SDIV to use them. A later DAGCombiner would
replace them with MULHS/MULHU because only the high half was used.
This conversion does not check the legality of MULHS/MULHU under
the assumption that LegalizeDAG can turn it back into MUL_LOHI later.
After they are converted to MULHS/MULHU, DAGCombine ran and saw that
these operations aren't supported but an i64 MUL is. So they get
converted to that plus a shift. Without this, LegalizeDAG would
convert back MUL_LOHI and isel would fail to find a pattern.
This patch teaches BuildUDIV/SDIV to create the wide mul and shift
so that we can report the correct operation legality on AArch64. It
also enables div by constant folding for more cases on VE.
I don't know if VE wants this div by constant optimization or not. If they
don't want it, they can use the isIntDivCheap hook to disable it.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D150333
In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`.
This is because IRTranslator uses SExt for indices.
In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt.
This change includes both documentation, SelectionDAG and IRTranslator.
We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86
This patch fixes issue #57452.
Differential Revision: https://reviews.llvm.org/D132978
I want to default all VP operations to Expand. These 2 were blocking
because VE doesn't support them and the tests were expecting them
to fail a specific way. Using Expand caused them to fail differently.
Seemed better to emulate them using operations that are supported.
@simoll mentioned on Discord that VE has some expansion downstream. Not
sure if its done like this or in the VE target.
Reviewed By: frasercrmck, efocht
Differential Revision: https://reviews.llvm.org/D133514
Support load/store vm registers to memory location as a first step.
As a next step, support load/store vm registers to stack location.
This patch also adds several regression tests for not only load/store
vm registers but also missing load/store for vr registers.
Reviewed By: efocht
Differential Revision: https://reviews.llvm.org/D128610
This reverts commit 83a798d4b0e17ac41d5430f1290d3661343eee1e.
As discussed in D120714 with @thakis, the patch added unneeded complexity
without noticeable benefits.
Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to
reduce memory usage when it is not needed.
Differential Revision: https://reviews.llvm.org/D120714
This adds LLVMAnyPointerToElt to use instead of LLVMPointerToElt.
This allows us to preserve the address space as part of the type
overload for the intrinsic, but still require the vector element
type to match the pointer type.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D122042
Sending output to /dev/stdout on AIX gets an llc permission denied error, so this patch removes this from the tests.
Reviewed By: simoll, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D121799
fneg instruction isel and tests. We do this also in preparation of fused
negatate-multiple-add fp operations.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D121620
ISel for experimental.vp.strided.load|store for v256.32 types via
lowering to vvp_load|store SDNodes.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D121616
This adds support for v256.32|64 scatter|gather isel. vp.gather|scatter
and regular gather|scatter intrinsics are both lowered to the internal
VVP layer. Splitting these ops on v512.32 is the subject of future
patches.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D121288
Add `vvp_load|store` nodes. Lower to `vld`, `vst` where possible. Use
`vgt` for masked loads for now.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D120413
The broadcast patterns for all-true|false masks are available now.
Enable the true|fast fcmp predicate tests that use them.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D119936
Extend the VE binaryop vector isel patterns to use passthru when the
result of a SDNode is used in a vector select or merge.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D117495
Use the `VMRG` for all three operations for now. `vp_select` will be
used in passthru patterns.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D117206
Depends on D115940 for the `Binary_rv_vr_vv` pattern class op isel
fragment used for divisions.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D116035
This implements vp_add, vp_and for the VE target by lowering them to the
VVP_* layer. We also add helper functions for VP SDNodes (isVPSDNode,
getVPMaskIdx, getVPExplicitVectorLengthIdx).
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D93766
We do this mostly to be able to test the insert_vector_elt isel
patterns. As long as we don't, most single element insertions show up as
`BUILD_VECTOR` in the backend.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D93759
VE used to allocate VM1, VM2, VMP2 (VM4+VM5), and VM3. This patch
corrects to allocate VM1, VM2, VMP2 (VM4+VM5), and VM6. Also add
a regression test.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D93570
Support VM and VMP registers in copyPhysReg() function. Also add
regression tests.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D93547
Optimize eliminate FP mechanism. This time optimize a function which has
no call but fixed stack objects. LLVM eliminates FP on such functions now.
Also, optimize GOT/PLT registers save/restore instructions if a given
function doesn't uses them. In addition, remove generating mechanism of
`.cfi` instructions since those are taken from other architectures and not
inspected yet. Update regression tests, also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D92251
Change the way to truncate i64 to i32 in I64 registers. VE assumed
sext values previously. Change it to zext values this time to make
it match to the LLVM behaviour.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D92226
VE Vector Predicated (VVP) SDNodes form an intermediate layer between VE
vector instructions and the initial SDNodes.
We introduce 'vvp_add' with isel and tests as the first of these VVP
nodes. VVP nodes have a mask and explicit vector length operand, which
we will make proper use of later.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D91802
This defines the vec_broadcast SDNode along with lowering and isel code.
We also remove unused type mappings for the vector register classes (all vector MVTs that are not used in the ISA go).
We will implement support for short vectors later by intercepting nodes with illegal vector EVTs before LLVM has had a chance to widen them.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D91646
This defines a 'fastcc' for the VE target and implements vreg-to-vreg
copy for parameter passing. The 'fastcc' extends the standard CC for
SX-Aurora with register passing of vector-typed parameters and return
values.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D90842
`+vpu` controls whether VEISelLowering adds any vregs. This defaults to
`-vpu` to have scalar code generation out of the box. We bring up
vector isel under the `+vpu` flag. Once vector isel is stable we switch
to `+vpu` and advertise vregs and vops in TTI.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D90465