The logic was supposed to be choosing between {0, 1, -1} as an
adjustment to the FP bit pattern. However, the adjustment itself was
used as the bit pattern instead which result in garbage results.
We did something pretty naive:
- round FP64 -> BF16 by first rounding to FP32
- skip FP32 -> BF16 rounding entirely
- taking the top 16 bits of a FP32 which will turn some NaNs into
infinities
Let's do this in a more principled way by rounding types with more
precision than FP32 to FP32 using round-inexact-to-odd which will negate
double rounding issues.
When an integer argument is promoted and *not* split (like i72 -> i128 on
a new machine with vector support), the SlotVT should be i128, which is
stored in VT - not ArgVT.
Fixes#81417
When SVE register size is unknown or the minimal size is not equal to
the maximum size then we could determine the actual SVE register size in
the runtime and adjust shuffle mask in the runtime.
Certain stack probing sequences might clobber flags, then we can't use a
block as a prologue if the flags register is a live-in on entry to that
block.
Use the 3 or 4 active bits as a shift amount into a i32/i64 constant representing the number of set bits.
In future, it might be worthwhile to move this into a generic location in case other targets want to make use of them.
Another expansion pulled from #79823
This intrinsic was introduced by #81331, which is a lot like
`llvm.readcyclecounter`.
For the RISCV implementation, we rename `ReadCycleWide` pseudo to
`ReadCounterWide` and make it accept two operands (the low and high
parts of the counter). As for legalization and lowering parts, we
reuse the code of `ISD::READCYCLECOUNTER` (make it able to handle
both intrinsics), and we use `time` CSR for `ISD::READSTEADYCOUNTER`.
Tests using Clang builtins are runned on real hardware and it works
as excepted.
Reviewers: asb, MaskRay, dtcxzyw, preames, topperc, jhuber6
Reviewed By: jhuber6, asb, MaskRay, dtcxzyw
Pull Request: https://github.com/llvm/llvm-project/pull/82322
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:
1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
control intrinsics.
2. Model token values as untyped virtual registers in MIR.
The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.
The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
This pull request fixes an issue with missing vector element count
immediate in OpExtInst calls and adds a case for generating bitcasts
before GEPs for kernel arguments of non-matching pointer type. The new
LITs are based on basic/vload_local and basic/vload_global OpenCL CTS
tests. The tests after this change pass SPIR-V validation.
In WebAssembly, we have `WASM_SYMBOL_NO_STRIP` symbol flag to mark the
referenced content as retained. However, the flag is not enough to
express retained data that is not referenced by any symbol. This patch
adds a new segment flag`WASM_SEG_FLAG_RETAIN` to support "private"
linkage data that is retained by llvm.used.
This kind of data that is not referenced but must be retained is usually
used with encapsulation symbols (__start/__stop). Swift runtime uses
this technique and depends on the fact "all metadata sections in live
objects are retained", which was not guaranteed with `--gc-sections`
before this patch.
This is a revised version of https://reviews.llvm.org/D126950 (has been
reverted) based on @MaskRay's comments
When PSHUFB is used as a LUT (for CTPOP, BITREVERSE etc.), its the source operand that is constant and the index operand the variable. As long as the indices don't set the MSB (which zeros the output element), then the common known bits from the source operand can be used directly, even though the shuffle mask isn't constant.
Further helps to improve CTPOP reduction codegen
If the vXi8 add(X,Y) is guaranteed not to overflow then we can push the addition though the psadbw nodes (being used for reduction) and only need a single psadbw node.
Noticed while working on CTPOP reduction codegen
The section headers for XCOFF files have a subtype flag for Dwarf
sections. This PR updates obj2yaml, yaml2obj, and llvm-readobj so that
they recognize the subtype.
This combine transforms an unmerge where only the first element is used
into a truncate. That works OK for scalar but for vector needs to insert
a bitcast to integers, perform the truncate then bitcast back to
vectors. This generates more awkward code than using an Unmerge.
It is purely based on symmetry. Registers can be scalars, vectors, and
non-constants.
X < 5.0 || X > 5.0
->
X != 5.0
X < Y && X > Y
->
FCMP_FALSE
X < Y && X < Y
->
FCMP_TRUE
see InstCombinerImpl::foldLogicOfFCmps
This PR is to add support for the SPIR-V extension
SPV_KHR_uniform_group_instructions that adds new instructions to SPIR-V
to support additional group operations within uniform control flow.
SPIRV-V Backend generates unnecessary OpExecutionMode records, putting
into the id's which are not the Entry Point operands of an OpEntryPoint
(ref: https://github.com/llvm/llvm-project/issues/81753). This PR is to
fix the issue.
This adds additional tests for #82199.
These tests need us to propagate the nneg flag when we zero/sign
extend an existing zext nneg node. For these tests on RV64, call
lowering will need to sign extend or zero extend the existing zext
nneg to i64. getNode will fold this into a single zext. We should
propagate the nneg flag from the original zext nneg. This will allow
us to remove the zext nneg based on known sign bits during DAG combine.
This treats the zext nneg as sext if X is known to have sufficient sign
bits to allow the zext or truncate or both to removed. This code is
taken from the same optimization for sext.
These tests have a dominating icmp that require an i16 value to be
sign extended to do the compare. Because of this, the i16 will be
exported from the first basic block sign extended to XLen. We can
use this fact to remove the zext nneg in the scond block.
This is needed by PR#77665[1] that uses a P-register while restoring
Z-registers.
The reverse for SVE register restore in the epilogue was added to
guarantee performance, but further work was done to improve sve frame
restore and besides that the schedule also may change the order of the
restore, undoing the reverse restore.
[1]https://github.com/llvm/llvm-project/pull/77665
This PR adds support for atomic instruction on floating-point numbers:
* SPV_EXT_shader_atomic_float_add
* SPV_EXT_shader_atomic_float_min_max
* SPV_EXT_shader_atomic_float16_add
and fixes asm printer output for half floating-type.
Inline stack probing code may need a scratch register, hence basic
blocks where such register is not available cannot be used as prologues.
Checking for an available scratch regidster was incorrectly skipped when
the function uses stack probing.