If we're logical shifting an all-signbits value, then we can just mask out the shifted bits.
This helps removes some unnecessary bitcasted vXi16 shifts used for vXi8 shifts (which SimplifyDemandedBits will struggle to remove through the bitcast), and allows some AVX1 shifts of 256-bit values to stay as a YMM instruction.
Noticed in codegen from #82290
When MVT is not a vector type, TCK_CodeSize should return an invalid
cost. This patch adds a check in the beginning to make sure all cost
kinds return invalid costs consistently.
Before this patch, TCK_CodeSize returns a valid cost on scalar MVT but
other cost kinds doesn't.
This fixes the issue #83294 where a loop contains vector instructions
and MVT is scalar after type legalization when the vector extension is
not enabled,
hlsl_intrinsics.h - add the round api
DXIL.td add the llvm intrinsic to DXIL lowering mapping
This change reuses llvm's existing intrinsic
`__builtin_elementwise_round`\ `int_round`
This change implements: #70077
Without PCMPGTQ, if the i64 elements are sign-extended enough to be representable as i32 then we can compare the lower i32 bits with PCMPGTD and splat the results into the upper elements.
Value tracking has meant we already get pretty close with this, but this allows us to remove a lot of unnecessary bit flipping.
This changes the type of `PredicationCode` and `VPTPredicationCode` from
`unsigned` to `ARMCC::CondCodes` and `ARMVCC::VPTCodes` resp' for
clarity and correctness.
Use IR analysis to infer when an addrspacecast operand is nonnull, then
lower it to an intrinsic that the DAG can use to skip the null check.
I did this using an intrinsic as it's non-intrusive. An alternative
would have been to allow something like `!nonnull` on `addrspacecast`
then lower that to a custom opcode (or add an operand to the
addrspacecast MIR/DAG opcodes), but it's a lot of boilerplate for just
one target's use case IMO.
I'm hoping that when we switch to GISel that we can move all this logic
to the MIR level without losing info, but currently the DAG doesn't see
enough so we need to act in CGP.
Fixes: SWDEV-316445
The base register of OPmi_ND may be allocated to the same physic
register as the ND operand.
OPmi_ND is not compressible b/c it has different semnatic from OPmi.
In this case, `isRedundantNewDataDest` should return false, otherwise
we would get error
Assertion `!IsNDLike && "Missing entry for ND-like instruction"' failed.
According to
https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html:
> The v0 register defined by the RISC-V vector extension is special in
> that it can be used both as a general purpose vector register and also
> as a mask register. As a preference, use registers other than v0 for
> non-mask values. Otherwise data will have to be moved out of v0 when a
> mask is required in an operation. v0 may be used when all other
> registers are in use, and using v0 would avoid spilling register state
> to memory.
And using V0 register may stall masking pipeline and stop chaining
for some microarchitectures.
So we should try to not use V0 and register groups contained it as
much as possible. We achieve this via moving V0 to the end of RA
order.
Inspect a basic block and if its single basic block loop with a small
number of instructions, set the Loop Alignment to 32 bytes. This will
avoid the cache line break in the first packet of loop which will cause
a stall per each execution of loop.
Remove LSH transform and restore previous lowering.
Fixes conformance issue in
[77615](https://github.com/llvm/llvm-project/pull/77615) where OpenCL
integer_ops tests fail for integer_clz.
Co-authored-by: Leon Clark <leoclark@amd.com>
Instead of caching STI in the RISCVELFTargetStreamer, store the two
flags we need from it.
My goal is to allow RISCVAsmPrinter to override these flags using IR
module metadata for LTO. So they need to be separated from the STI used
to construct the TargetStreamer.
This patch should be NFC as long as no one is changing the contents of
the STI that was used to construct the TargetStreamer between the
constructor and the use of the flags.
Correct an issue with Arm Neoverse N2 after it was
changed to a v9a core in change
f576cbe44eabb8a5ac0af817424a0d1e7c8fbf85:
* FEAT_FHM should be enabled for this core.
We can add implicit defs/uses of the 'VG' register to the instructions
to prevent the register allocator from rematerializing values in between
streaming-mode changes, as the def/use of VG will further nail down the
ordering that comes out of ISel. This avoids the heavy-handed approach
to prevent any kind of rematerialization.
While we could add 'VG' as a Use to all SVE instructions, we only really
need to do this for instructions that are rematerializable, as the
smstart/smstop instructions and pseudos act as scheduling barriers which
is sufficient to prevent other instructions from being scheduled in
between the streaming-mode-changing call sequence. However, we may
revisit this in the future.
AMDGPUInstructionSelector should no longer attempt to select S1 G_PHIs.
Remove MIR test that attempts to inst-select divergent vcc(S1) G_PHI.
Lane mask merging algorithm for GlobalISel is now responsible for
selecting divergent S1 G_PHIs in AMDGPUGlobalISelDivergenceLowering.
Uniform S1 G_PHIs should be lowered to S32 G_PHIs in reg bank select
pass. In summary S1 G_PHIs should not reach AMDGPUInstructionSelector.
* Leverage TableGen record descriptions of LLVM or DirectX intrinsics
that can be directly mapped in DXIL Ops TableGen description. As a
result, such DXIL Ops can be succinctly described without duplication.
DXILEmitter backend can derive the properties of DXIL Ops accordingly.
* Ensured that corresponding lit tests pass.
Basic implementation of lane mask merging for GlobalISel.
Lane masks on GlobalISel are registers with sgpr register class
and S1 LLT - required by machine uniformity analysis.
Implements equivalent of lowerPhis from SILowerI1Copies.cpp in:
patch 1: https://github.com/llvm/llvm-project/pull/75340
patch 2: https://github.com/llvm/llvm-project/pull/75349
patch 3: https://github.com/llvm/llvm-project/pull/80003
patch 4: https://github.com/llvm/llvm-project/pull/78431
patch 5: is in this commit:
AMDGPU/GlobalISelDivergenceLowering: constrain incoming registers
Previously, in PHIs that represent lane masks, incoming registers
taken as-is were not selected as lane masks. Such registers are not
being merged with another lane mask and most often only have S1 LLT.
Implement constrainAsLaneMask by constraining incoming registers
taken as-is with lane mask attributes, essentially transforming them
to lane masks. This is final step in having PHI instructions created
in this pass to be fully instruction-selected.
Following on from #83118, this adds aliases for the "rtn" forms of these
instructions. The fact that they were missing from SP3 was an oversight
which has been fixed now.
This patch adds smin/smax/umin/umax/sadd_sat/ssub_sat/uadd_sat/usub_sat
into canSplatOperand. It can help llvm fold vv instructions with one
splat operand to vx instructions.