Currently, we mistakenly mark the local labels used in RISC-V TLSDESC as
TLS symbols, when they should not be. This patch adds tests with the
current incorrect behavior, and subsequent patches will address the
issue.
Reviewers: MaskRay, topperc
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/85816
Return type of DXIL Ops may be different from valid overload type of the
parameters, if any. Such DXIL Ops are correctly represented in DXIL.td.
However, DXILEmitter assumes the return type to be the same as parameter
overload type, if one exists. This results in generation in incorrect
overload index value in DXILOperation.inc for the DXIL Op and incorrect
DXIL operation function call in DXILOpLowering pass.
This change distinguishes return types correctly from parameter overload
types in DXILEmitter backend to handle such DXIL ops.
Add specification for DXIL Op `isinf` and corresponding tests to verify
the above change.
Fixes issue #85125
Walk all the ISA strings and set the subtarget bits for any extension we
find in any string.
This allows LTO output to have a ELF attributes from the union of all of
the files used to compile it.
This PR:
* adds __spirv_ builtins for existing instructions;
* fixes parsing of "syncscope" values in atomic instructions;
* fix a special case of binary header emision.
Previously we used memory like we do to move between GPRs and FPR64 with
the D extension on RV32.
We can instead use REG_SEQUENCE/EXTRACT_SUBREG to inform register
allocation how to do the copy without memory.
Previously we expected lane constants to be in the range of signed
values for each lane size, but the included test case produced large
unsigned values that fall outside that range. Allow instruction
selection to proceed in this case rather than failing.
Fixes#63817.
The current values for PrivateGlobalPrefix and PrivateLabelPrefix (@@
and @ respectively) are, in hindsight, poor choices for multiple
reasons:
First, there exist externally visible routines from the language
environment that begin with @@. These functions are certainly not
local/private by any means and they should not share a prefix with
private globals.
Secondly, both private globals and private labels should be handled the
same way by GOFF, so it doesn't make much sense for them to have
separate prefixes. GOFF remains the only file format where these are
different and there is no reason for that to be the case
This reverts commit 353fbeb0a294d2c7cef6d88607fa0fd50ee81462. It crashes
when it encounters an UINT_TO_FP.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1618 in SDValue llvm::SelectionDAG::getConstant(const ConstantInt &, const SDLoc &, EVT, bool, bool): VT.isInteger() && "Cannot create FP integer constant!"
MIPSr6 ISA requires normal load/store instructions support
misunaligned memory access, while it is not always do so
by hardware. On some microarchitectures or some corner cases
it may need support by OS.
Don't confuse with pre-R6's lwl/lwr famlily: MIPSr6 doesn't
support them, instead, r6 requires lw instruction support
misunaligned memory access. So, if -mstrict-align is used for
pre-R6, lwl/lwr won't be disabled.
If -mstrict-align is used for r6 and the access is not well
aligned, some lb/lh instructions will be used to replace lw.
This is useful for OS kernels.
To be back-compatible with GCC, -m(no-)unaligned-access are also
added as Neg-Alias of -m(no-)strict-align.
When i1 true is used as an index, SExt extends it to i32 -1. This would
cause BitVector to overflow.
The language manual have specified that the index shall be treated as an
unsigned number, this patch fixes that.
(https://llvm.org/docs/LangRef.html#insertelement-instruction)
This patch fixes#85717
---------
Signed-off-by: Peter Rong <PeterRong96@gmail.com>
Add IRTranslator for scalable vector load instruction and include
corresponding tests with alignment argument included, which can be
smaller/equal/larger than element size or smaller/equal/larger than the
minimum total vector size.
Defines a subset of attributes and emits them to a section called
.hexagon.attributes.
The current attributes recorded are the attributes needed by
llvm-objdump to automatically determine target features and eliminate
the need to manually pass features.
This adds a reduced test case for the regression seen in x264 with #83035.
If the intermediate concatenating shuffles are large enough then the
splitting combine will prevent the strided load combine which is
preferable.
this implements part 1 of 2 for #83626
- `CGBuiltin.cpp` - modified to have seperate cases for signed and
unsigned integers.
- `SemaChecking.cpp` - modified to prevent the generation of a double
dot product intrinsic if the builtin were to be called directly.
- `IntrinsicsDirectX.td` creation of the signed and unsigned dot
intrinsics needed for instruction expansion.
- `DXILIntrinsicExpansion.cpp` - handle instruction expansion cases for
integer dot product.
This is re-working of #74460, which adds a soft-float ABI for AArch64.
That was reverted because it causes errors when building the linux and
fuchsia kernels.
The problem is that GCC's implementation of the ABI compatibility checks
when using the hard-float ABI on a target without FP registers does it's
checks after optimisation. The previous version of this patch reported
errors for all uses of floating-point types, which is stricter than what
GCC does in practice.
This changes two things compared to the first version:
* Only check the types of function arguments and returns, not the types
of other values. This is more relaxed than GCC, while still guaranteeing
ABI compatibility.
* Move the check from Sema to CodeGen, so that inline functions are only
checked if they are actually used. There are some cases in the linux
kernel which depend on this behaviour of GCC.
We use the VSCBIQ/VSBIQ/VSBCBIQ family of instructions to implement
USUBO/USUBO_CARRY for the i128 data type. However, these instructions
use an inverted sense of the borrow indication flag (a value of 1
indicates *no* borrow, while a value of 0 indicated borrow). This
does not match the semantics of the boolean "overflow" flag of the
USUBO/USUBO_CARRY ISD nodes.
Fix this by generating code to explicitly invert the flag. These
cancel out of the result of USUBO feeds into an USUBO_CARRY.
To avoid unnecessary zero-extend operations, also improve the
DAGCombine handling of ZERO_EXTEND to optimize (zext (xor (trunc)))
sequences where appropriate.
Fixes: https://github.com/llvm/llvm-project/issues/83268
This reverts commit c59129a7c79448837d665de8f2743ad4b14666f6.
This causes regressions in some x264 workloads like pixel_var_8x8 due to it
interfering with the strided load combine. Reverting so I can try to rework
it as a lowering instead.
Added common check for DPP and Iterative strategies for uniform value
case since optimization applied is same.
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
Update PromoteAllocaToVector so it considers the whole function before promoting allocas.
Allocas are scored & sorted so the highest value ones are seen first. The budget is now per function instead of per alloca.
Passed internal performance testing.
We use the VSCBIQ/VSBIQ/VSBCBIQ family of instructions to implement
USUBO/USUBO_CARRY for the i128 data type. However, these instructions
use an inverted sense of the borrow indication flag (a value of 1
indicates *no* borrow, while a value of 0 indicated borrow). This
does not match the semantics of the boolean "overflow" flag of the
USUBO/USUBO_CARRY ISD nodes.
Fix this by generating code to explicitly invert the flag. These
cancel out of the result of USUBO feeds into an USUBO_CARRY.
To avoid unnecessary zero-extend operations, also improve the
DAGCombine handling of ZERO_EXTEND to optimize (zext (xor (trunc)))
sequences where appropriate.
Fixes: https://github.com/llvm/llvm-project/issues/83268
- Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE
nodes to regular LOAD/STORE nodes, make them legal and select those nodes
properly instead. This avoids exposing them to the DAGCombiner.
- AtomicExpand pass no longer casts float/double atomic load/stores to integer
(FP128 is still casted).
This is an alternative to #85610, that moreElement's small G_TRUNC
vectors to widen the vectors. It needs to disable one of the existing
Unmerge(Trunc(..)) combines, and some of the code is not as optimal as
it could be. I believe with some extra optimizations it could look
better (I was thinking combining trunc(buildvector) -> buildvector and
possibly improving buildvector lowering by generating
insert_vector_element earlier).
- Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code.
- Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().
This PR:
* adds Lifetime intrinsics/instructions
* fixes how the binary header is emitted (correct version and better
approximation of Bound)
* add validation into more test cases
For each call that changes the streaming-mode ISel inserts a
COALESCER_BARRIER node for the FP and (non-scalable) vector arguments to
the callee.
When calling a non-streaming function from a streaming-compatible
function, it's not required to have +sme (in case the SME code-path is
not actually executed at runtime). The patterns to match the
COALESCER_BARRIER however were still predicated with `HasSME`, which is
incorrect. This patch tries to fix that.
This change allows us to use `--lto-partitions` in some cases (not at
all guaranteed it works perfectly), as LDS is lowered before the module
is split for parallel codegen.
We must run LowerLDS before splitting modules as it needs to see all
callers of functions with LDS to properly lower them.
rldimi is 64-bit instruction, so the corresponding builtin should not
be available in 32-bit mode. Rotate amount should be in range and
cases when mask is zero needs special handling.
This change also swaps the first and second operands of rldimi/rlwimi
to match previous behavior. For masks not ending at bit 63-SH,
rotation will be inserted before rldimi.
[GlobalISel] Implement convergence control tokens and intrinsics in GMIR
In the IR translator, convert the LLVM token type to LLT::token(), which is an
alias for the s0 type. These show up as implicit uses on convergent operations.
Differential Revision: https://reviews.llvm.org/D158147