This change completes #86155
- `DXIL.td` - lowering `fabs` intrinsic to the float dxil op.
- `DXILIntrinsicExpansion.cpp` - Add intrinsic expansion for the abs
case.
Fixes#84831
When matching carry pattern with `getAsCarry`, it may produce different
type of carryout. This patch checks such case and does early exit.
I'm new to DAG, any suggestion is appreciated.
Presently the atomic optimizer supports only 32-bit operations. Plan is
to extend the atomic optimizer for 64-bit operations for compute and
graphics. This patch extends support for double type for `uniform
values` only. Going forward, will extend the support for divergent
values. Adding support for divergent values requires
extending/legalizing readfirstlane, readlane, writelane, etc ops for
64-bit operations to avoid `bitcast` noise that we have currently.
---------
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
This is needed to support non-intrinsic functions returning tuple types
which are represented as structs with scalable vector types in IR.
I suspect this may have been broken since
https://reviews.llvm.org/D158115
We can remove the restriction that the narrow type needs to be exactly
EEW / 2 for scalable ISD::{ADD,SUB,MUL} nodes. This allows us to perform
the combine even if we can't fully fold the extend into the widening op.
VP intrinsics already do this, since they are lowered to _VL nodes which
don't have this restriction.
The "exactly EEW / 2" narrow type restriction prevented us from emitting
V{S,Z}EXT_VL nodes with i1 element types which crash when we try to
select them, since no other legal type is double the size of i1, see the
test case added in this PR `i1_zext`. So to preserve this, this adds a
check for i1 narrow types instead.
This is a reimplementation of the combine added in #83035 but as a
lowering instead of a combine, so we don't regress the test case added
in e59f120e3a14ccdc55fcb7be996efaa768daabe0 by interfering with the
strided load combine
Previously the combine had to concatenate the split vectors with
insert_subvector instead of concat_vectors to prevent an infinite
combine loop. And the reasoning behind keeping it as a combine was
because if we emitted the insert_subvector during lowering then we
didn't fold away inserts of undef subvectors.
However it turns out we can avoid this if we just do this in lowering
and select a concat_vector directly, since we get the undef folding for
free with `DAG.getNode(ISD::CONCAT_VECTOR, ...)` via foldCONCAT_VECTORS.
Previously we used BuildPairF64 and SplitF64 only if Zfa was supported
since they will select register file moves that are only available with
Zfa.
We recently changed the handling of BuildPairF64/SplitF64 for Zdinx to
not go through memory so we should use that for bitcast.
That leaves the D without Zfa case that does need to go through memory.
Previously we let type legalization expand to loads and stores using a
new stack temporary created for each bitcast. After this patch we will
create the loads ands stores in the custom inserter and share the same
stack slot for all. This also allows DAGCombiner to optimize when
bitcast is mixed with BuildPairF64/SplitF64.
I believe we can use XLen alignment as long as eliminateFrameIndex
limits the maximum folded offset to 2043. This way when we split
the load/store into two 2 instructions we'll be able to add 4
without overflowing simm12.
Add ones for every high bit that will cleared.
This will allow us to evaluate variables that have their bits known to
see if they have no risk of overflow despite the shift amount being
greater than the difference between the two types.
Refactor the logic that checks if a module contains mixed
absolute/non-lowered LDS GVs.
The check now happens latter when the "worklists" are formed. This is
because in some cases (OpenMP) we can have non-lowered GVs in a lowered
module, and this is normal because those GVs are just unused and removed
from the list at some point before the end of `getUsesOfLDSByFunction`.
Doing the check later ensures that if a mixed module is spotted, then
it's a _real_ mixed module that needs rejection, not a module containing
an intentionally ignored GV.
SIMachineFunctionInfo has a scan of the function body for inline asm
which may use AGPRs, or callees in SIMachineFunctionInfo. Move this
into the attributor, so it actually works interprocedurally.
Could probably avoid most of the test churn if this bothered to avoid
adding this on subtargets without AGPRs. We should also probably
try to delete the MIR scan in usesAGPRs but it seems to be trickier
to eliminate.
This patch improves codegen for scalar (<128bits) version
of llvm.abs intrinsic by using the existing non-XOR based lowering.
This takes the generated code closer to SDAG.
codegen with GISel for > 128 bit types is not very good
with these method so not doing so.
Currently, we mistakenly mark the local labels used in RISC-V TLSDESC as
TLS symbols, when they should not be. This patch adds tests with the
current incorrect behavior, and subsequent patches will address the
issue.
Reviewers: MaskRay, topperc
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/85816
Return type of DXIL Ops may be different from valid overload type of the
parameters, if any. Such DXIL Ops are correctly represented in DXIL.td.
However, DXILEmitter assumes the return type to be the same as parameter
overload type, if one exists. This results in generation in incorrect
overload index value in DXILOperation.inc for the DXIL Op and incorrect
DXIL operation function call in DXILOpLowering pass.
This change distinguishes return types correctly from parameter overload
types in DXILEmitter backend to handle such DXIL ops.
Add specification for DXIL Op `isinf` and corresponding tests to verify
the above change.
Fixes issue #85125
Walk all the ISA strings and set the subtarget bits for any extension we
find in any string.
This allows LTO output to have a ELF attributes from the union of all of
the files used to compile it.
This PR:
* adds __spirv_ builtins for existing instructions;
* fixes parsing of "syncscope" values in atomic instructions;
* fix a special case of binary header emision.
Previously we used memory like we do to move between GPRs and FPR64 with
the D extension on RV32.
We can instead use REG_SEQUENCE/EXTRACT_SUBREG to inform register
allocation how to do the copy without memory.
Previously we expected lane constants to be in the range of signed
values for each lane size, but the included test case produced large
unsigned values that fall outside that range. Allow instruction
selection to proceed in this case rather than failing.
Fixes#63817.
The current values for PrivateGlobalPrefix and PrivateLabelPrefix (@@
and @ respectively) are, in hindsight, poor choices for multiple
reasons:
First, there exist externally visible routines from the language
environment that begin with @@. These functions are certainly not
local/private by any means and they should not share a prefix with
private globals.
Secondly, both private globals and private labels should be handled the
same way by GOFF, so it doesn't make much sense for them to have
separate prefixes. GOFF remains the only file format where these are
different and there is no reason for that to be the case
This reverts commit 353fbeb0a294d2c7cef6d88607fa0fd50ee81462. It crashes
when it encounters an UINT_TO_FP.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1618 in SDValue llvm::SelectionDAG::getConstant(const ConstantInt &, const SDLoc &, EVT, bool, bool): VT.isInteger() && "Cannot create FP integer constant!"
MIPSr6 ISA requires normal load/store instructions support
misunaligned memory access, while it is not always do so
by hardware. On some microarchitectures or some corner cases
it may need support by OS.
Don't confuse with pre-R6's lwl/lwr famlily: MIPSr6 doesn't
support them, instead, r6 requires lw instruction support
misunaligned memory access. So, if -mstrict-align is used for
pre-R6, lwl/lwr won't be disabled.
If -mstrict-align is used for r6 and the access is not well
aligned, some lb/lh instructions will be used to replace lw.
This is useful for OS kernels.
To be back-compatible with GCC, -m(no-)unaligned-access are also
added as Neg-Alias of -m(no-)strict-align.
When i1 true is used as an index, SExt extends it to i32 -1. This would
cause BitVector to overflow.
The language manual have specified that the index shall be treated as an
unsigned number, this patch fixes that.
(https://llvm.org/docs/LangRef.html#insertelement-instruction)
This patch fixes#85717
---------
Signed-off-by: Peter Rong <PeterRong96@gmail.com>
Add IRTranslator for scalable vector load instruction and include
corresponding tests with alignment argument included, which can be
smaller/equal/larger than element size or smaller/equal/larger than the
minimum total vector size.
Defines a subset of attributes and emits them to a section called
.hexagon.attributes.
The current attributes recorded are the attributes needed by
llvm-objdump to automatically determine target features and eliminate
the need to manually pass features.