Some MIR and IR tests include checks for register class IDs, which are
unnecessary since the register class name is also checked for and that
doesn't change when new classes are added. This patch replaces the
hard-coded register class ID checks with regexes so they don't have to
be updated every time a new class is added.
Follow the 2019 rules and order -0 as less than +0 and +0 as greater
than -0. As currently defined this isn't required for the intrinsics,
but is a better QoI.
This will avoid the workaround in libc added by #83158
This patch adds smin/smax/umin/umax/sadd_sat/ssub_sat/uadd_sat/usub_sat
into canSplatOperand. It can help llvm fold vv instructions with one
splat operand to vx instructions.
sitofp - if we only demand the signbit, then we can try to use the source integer
uitofp - signbit is guaranteed to be zero
Noticed while reviewing #82290
T1 allows for an optional registers list, the register list must be {d0-d15}.
T2 defines a mandatory register list, the register list must be {d0-d31}.
The requirements for T1/T2 are as follows:
T1 T2
Require: v8-M.Main, v8.1-M.Main,
secure state secure state
16 D Regs valid valid
32 D Regs UNDEFINED valid
No D Regs NOP NOP
Insert waitcnts for loads and atomics before stores with system scope.
Scope is field in instruction encoding and corresponds to desired
coherence level in cache hierarchy.
Intrinsic stores can set scope in cache policy operand.
If volatile keyword is used on generic stores memory legalizer will set
scope to system. Generic stores, by default, get lowest scope level.
Waitcnts are not required if it is guaranteed that memory is cached.
For example vulkan shaders can guarantee this.
TODO: implement flag for frontends to give us a hint not to insert
waits.
Expecting vulkan flag to be implemented as vulkan:private MMRA.
If only one subvector extraction will be necessary (i.e. because the other is constant etc.) then extract the source operands and perform as a 128-bit comparison
Ideally DAGCombiner's narrowExtractedVectorBinOp would handle this but its tricky to confirm when a target opcode can be safely extracted and performed as a different vector type
Partially improves an outstanding regression in #82290
For __arm_new("zt0") we need to have special setup code in the prologue.
For calls that don't preserve zt0, we need to emit code preserve ZT0
around the call.
This is only emitted by SelectionDAG ISel at the moment.
This is another smaller step of #70452, changing the signature of
computeAliasing() from optional<uint64_t> to LocationSize, and follow-up
changes in DAGCombiner::mayAlias(). There are some test change due to
the previous AA->isNoAlias call incorrectly using an unknown size
(~UINT64_T(0)). This should then be improved again in #70452 when the
types are known to be scalable.
The description in #83146 is slightly inaccurate: it relaxes a tail
undisturbed vslideup to tail agnostic if we are inserting over the
entire tail of the vector **and** we didn't shrink the LMUL of the
vector being inserted into.
This handles the case where we did shrink down the LMUL via InterSubVT
by checking if we inserted over the entire tail of InterSubVT, the
actual type that we're performing the vslideup on, not VecVT.
If we know that an insert_subvector inserting a fixed subvector will
overwrite the entire tail of the vector, we use a tail agnostic
vslideup. This was added in https://reviews.llvm.org/D147347, but we can
do the same thing for scalable vectors too.
The `Policy` variable is defined in a slightly weird place but this is
to mirror the fixed length subvector code path as closely as possible. I
think we may be able to deduplicate them in future.
IGLP itself will be in SavedMutations via mutations added during
Scheduler creation, thus falling back results in reapplying IGLP.
In PostRA scheduling, if we have multiple regions with IGLP
instructions, then we may have infinite loop.
Disable the feature for now.
This reverts commit 6e6bf9f81756ba6655b4eea8dc45469a47f89b39.
It turned out the multivalue feature had active outside users and it
could cause some disruptions to them, so I'd like to investigate more
about the workarounds before doing this.
This adds `WebAssemblyRefTypeMem2Local` pass, which changes the address
spaces of reference type `alloca`s to `addrspace(1)`. This in turn
changes the address spaces of all `load` and `store` instructions that
use the `alloca`s.
`addrspace(1)` is `WASM_ADDRESS_SPACE_VAR`, and loads and stores to this
address space become `local.get`s and `local.set`s, thanks to the Wasm
local IR support added in
82f92e35c6.
In a follow-up PR, I am planning to replace the usage of mem2reg pass
with this to solve the reference type `alloca` problems described in
#81575.
This adds some tablegen patterns for generating FMOVDr from concat(X,
zeroes), as the FMOV will implicitly zero the upper bits of the
register. An extra AArch64MIPeepholeOpt is needed to make sure we can
remove the FMOV in the same way we would remove the insert code.
When in an entry thunk the x64 SP is passed in x4 but this cannot be
directly passed through since x64 varargs calls have a 32 byte shadow
store at SP followed by the in-stack parameters. ARM64EC varargs calls
on the other hand expect x4 to point to the first in-stack parameter.
There is the possibility that the bottom-up direction will lead to
performance improvements on certain targets, as this is certainly the case for
the pre-regalloc GenericScheduler. This patch will give people the
opportunity to experiment for their sub-targets. However, this patch
keeps the top-down approach as the default for the PostGenericScheduler
since that is what subtargets expect today.
foldOperands() for REG_SEQUENCE has recursion that can trigger an infinite loop
as the method can modify the operand order, which messes up the range-based
for loop. This patch fixes the issue by caching the uses for processing beforehand,
and then iterating over the cache rather using the instruction iterator.
We were scalarizing these truncations, but in most cases we can widen the source vector to 128-bits and perform the truncation as a shuffle directly (which will usually lower as a PACK or PSHUFB).
For the cases where the widening and shuffle isn't legal we can leave it to generic legalization to scalarize for us.
Fixes#81883
This is mostly NFC but some output does change due to consistently
inserting into poison rather than undef and using i64 as the index
type for inserts.
This is mostly NFC but some output does change due to consistently
inserting into poison rather than undef and using i64 as the index
type for inserts.
This is mostly NFC but some output does change due to consistently
inserting into poison rather than undef and using i64 as the index
type for inserts.
befa925acac8fd6a9266e introduced preliminary hoisting of COPY
instructions when the user of the COPY is inside the same loop. That
optimization appeared to be too aggressive and hoisted too many COPY's
greatly increasing register pressure causing performance regressions for
AMDGPU target.
This is intended to fix the regression by hoisting COPY instruction only
if either:
- User of COPY can be hoisted (other args are invariant)
or
- Hoisting COPY doesn't bring high register pressure
i8 vectors do not have their sizes changed as I noticed regressions in
some tests when that was done.
This patch also adds support for most G_VECREDUCE_* operations to
moreElementsVector in LegalizerHelper.cpp.
The code for getting the "neutral" element is taken almost exactly as it
is in SelectionDAG, with the exception that support for
G_VECREDUCE_{FMAXIMUM,FMINIMUM} was not added.
The code for SelectionDAG is located at
SelectionDAG::getNeutralELement().
This PR adds SPIR-V extension SPV_INTEL_variable_length_array that
allows to allocate local arrays whose number of elements is unknown at
compile time:
* add a new SPIR-V internal intrinsic:int_spv_alloca_array
* legalize G_STACKSAVE and G_STACKRESTORE
* implement allocation of arrays (previously getArraySize() of
AllocaInst was not used)
* add tests
This PR is to add support for the 'freeze' instruction:
https://llvm.org/docs/LangRef.html#freeze-instruction
There is no way to implement `freeze` correctly without support on
SPIR-V standard side, but we may at least address a simple (static) case
when undef/poison value presence is obvious. The main benefit of even
incomplete `freeze` support is preventing of translation from crashing
due to lack of support on legalization and instruction selection steps.
For instruction xvpermi.q, only [1:0] and [5:4] bits of operands[3] are
used. The unused bits in operands[3] need to be set to 0 to avoid
causing undefined behavior.
This include 2 fixes:
1. Disallow 'f' for softfloat.
2. Allow 'r' for softfloat.
Currently, 'f' is accpeted by clang, then LLVM meets an internal error.
'r' is rejected by LLVM by: couldn't allocate input reg for constraint
'r'.
Fixes: #64241, #63632
---------
Co-authored-by: Fangrui Song <i@maskray.me>
MachineIR alias analysis assumes that only bytes after the pointer will
be accessed. This is incorrect if the stride is negative.
This is causing miscompiles in our downstream after SLP started making
strided loads.
Fixes#82657