Reverts llvm/llvm-project#81394
This reverts commit 3ac243bc0d7922d083af2cf025247b5698556062.
It is not handling RSrc registers s0-s3 correctly. This leads to a
broken test, where it expects s0-s3 as function argument and uses it as
RSrc register as well.
We need to re-visit the patch, but apparently we only want to have s0-s3
as
argument registers if we don't need them as RSrc registers.
The IR for a double-to-i129 conversion looks like this in one of the
blocks in compiler-rt:
%cmp5.i = icmp ult i16 %3, -129, !dbg !24
But in ExpandLargeFpConvert, it looks like:
%13 = icmp ult i129 %12, 4294967167, !dbg !19
ExpandLargeFpConvert is wrong; the value should have been
signed before negating, but instead we get a very large
unsigned value. Another value in the same pass also has this
issue.
We have the ISD nodes for representing signed and unsigned absolute
difference. For RISCV, we have vector min/max in the base vector
extension, so we can expand to the sub(max,min) lowering.
We could almost use the default expansion, but since fixed length
min/max are custom (not legal), the default expansion doesn't cover the
fixed vector cases. The expansion here is just a copy of the generic
code specialized to allow the custom min/max nodes to be created so they
can in turn be legalized to the _vl variants.
Existing DAG combines handle the recognition of absolute difference
idioms and conversion into the respective ISD::ABDS and ISD::ABDU nodes.
This change does have the net effect of potentially pushing a free
floating zero/sign extend after the expansion, and we don't do a great
job of folding that into later expressions. However, since in general
narrowing can reduce required work (by reducing LMUL) this seems like
the right general tradeoff.
Rename the intrinsics to close to the instruction mnemonic names:
Use global_load_tr_b64 and global_load_tr_b128 instead of
global_load_tr.
This patch also removes f16/bf16 versions of builtins/intrinsics. To
simplify the design, we should avoid enumerating all possible types in
implementing builtins. We can always use bitcast.
completes #86187
- fix hlsl_intrinsic to cover the correct cases
- move to using `__builtin_elementwise_sqrt`
- add lowering of `Intrinsic::sqrt` to dxilop 24.
Completes #83626
- `CGBuiltin.cpp` - modify `getDotProductIntrinsic` to be able to emit
`dot2`, `dot3`, and `dot4` intrinsics based on element count
- `IntrinsicsDirectX.td` - for floating point add `dot2`, `dot3`, and
`dot4` inntrinsics -`DXIL.td` add dxilop intrinsic lowering for `dot2`,
`dot3`, & `dot4`.
- `DXILOpLowering.cpp` - add vector arg flattening for dot product.
- `DXILOpBuilder.h` - modify `createDXILOpCall` to take a smallVector
instead of an iterator
- `DXILOpBuilder.cpp` - modify `createDXILOpCall` by moving the small
vector up to the calling function in `DXILOpLowering.cpp`.
- Moving one function up gives us access to the `CallInst` and
`Function` which were needed to distinguish the dot product intrinsics
and get the operands without using the iterator.
This builds on the previously added absolute difference cases, and adds
the reduction at the end. This is mostly interesting for examining
impact of extend placement when changing the abdu lowering.
`ST.getMaxNumVGPRs(MF)` lowers to `AMDGPUBaseInfo.cpp:getTotalNumVGPRs`
which returns 512 for gfx90a. This is subsequently limited by
`AMDGPUBaseInfo:getAddressableNumVGPRs()`, which also returns 512 for
gfx90a. The ISA states we can have a total of 512 registers, but a
maximum of only 256 of each of AGPR and VGPR (gfx90a 3.6.4).
Therefore, in unified register file case, `ST.getMaxNumVGPRs(MF)`
calculates the maximum number of combined VGPR + AGPR. But, it is
currently used as the limit for accvgpr and as the limit for archvgpr.
This patch uses it as the combined limit, and accounts for the maximum addressable arch/acc VGPRs when calculating the per RegClass limits.
It is not unreasonable to think other clients of getTotalNumVGPRs are
using it in the wrong way.
Integer RISCVISD::SELECT_CC doesn't create poison. If none of the,
operands are poison, the result is not poison.
This allows ISD::FREEZE to be hoisted above RISCVISD::SELECT_CC.
SelectionDAG has SELECT and VSELECT
SELECT restricts the condition operand to an i1 and the true and false operands
can be vectors. The result of a SELECT has the same type as the true and
false operands.
VSELECT has a vector condition operand and the true and false operands
must be vectors. The result of a VSELECT has a vector result.
GlobalISel has G_SELECT which has condition operand that is an i1 if the
true and false operands are scalar and a vector type with i1 elements if
the true and false operands are vector.
A G_SELECT acts like a ISD::SELECT when the operands are all scalar, and
an ISD::VSELECT when the operands are are scalar. A G_SELECT cannot act
like a ISD::SELECT with an i1 condition and vector operands because the
type system.
In this patch, we would like to take advantage of the patterns written
for SELECT and VSELECT, so we mark G_SELECT equivalent to both SELECT
and VSELECT to reuse the patterns. Since we cannot write a `G_SELECT (s1),
(vector-ty), (vector-ty)`, we don't have to worry about accidently
matching the SDAG patterns of that nature.
We will probably need a way to represent an i1 condition with vector
true and false operands in the future. That can be the topic of another
patch.
This extends the concat load patch from
https://reviews.llvm.org/D121400, which was later moved to a combine, to
handle v2i8 and v2i16 concat loads too.
This PR is stacked on #76186.
This PR keeps the default strategy as top-down since that is what
existing targets expect. It can be enabled using
`-misched-postra-direction=bidirectional`.
It is up to targets to decide whether they would like to enable this
option for themselves.
Add support to generate valid SPIR-V for the WaveGetLaneIndex() HLSL
builtin.
To implement this, I had to fix a few small issues in the backend, like
the i8* pointer type being emitted, even if we have the type information
elsewhere.
Signed-off-by: Nathan Gauër <brioche@google.com>
This PR fixes illegal use of OpConstantComposite with non-constant
constituents. The test attached to the PR is able now to satisfy
`spirv-val` check. Before the fix SPIR-V Backend produced for the
attached test case a pattern like
```
%a = OpVariable %_ptr_CrossWorkgroup_uint CrossWorkgroup %uint_123
%11 = OpConstantComposite %_struct_6 %a %a
```
so that `spirv-val` complained with
```
error: line 25: OpConstantComposite Constituent <id> '10[%a]' is not a constant or undef.
%11 = OpConstantComposite %_struct_6 %a %a
```
This PR improves type inference in SPIR-V Backend for opaque pointers,
accounting or a case when there is a chain of function calls that allows
to deduce formal parameter types from actual arguments. The attached
test demonstrates the case.
SPIRV-LLVM-Translator project
(https://github.com/KhronosGroup/SPIRV-LLVM-Translator) from Khronos
Group is a tool and a library for bi-directional translation between
SPIR-V and LLVM IR. In its backward translation from SPIR-V to LLVM IR
SPIRV-LLVM-Translator isn't necessarily able to cover the same SPIR-V
patterns/instructions set that SPIRV Backend produces, even if we target
the same SPIR-V version in both SPIRV-LLVM-Translator and SPIRV Backend
projects.
To improve interoperability and ability to apply SPIRV Backend output in
different products this PR introduces a notion of a mode of SPIR-V
output that is compatible with a subset of SPIR-V supported by
SPIRV-LLVM-Translator. This includes a new command line option that
doesn't influence default behavior of SPIRV Backend and one test case
that demonstrates how this command line option may be used to get a
practical benefit of producing that one of two possible and similar
output options that can be understood by SPIRV-LLVM-Translator.
buffer_load instructions that use TFE also need to zero initialize
return values similar to how the image instructions currently work. Add
support for this with standard zero init of all results + zero init of
just TFE flag when enable-prt-strict-null subtarget feature is disabled.
This generalizes the combine added in #82455 to other binary ops,
beginning with adds in this patch.
Because the two zext operands are always +ve when treated as signed, and
we don't get any overflow since the add is carried out in at least N * 2
bits of the narrow type, the result of the add will always be +ve. So we
can use a zext for the outer extend, unlike sub which may produce a -ve
result from two +ve operands.
Although we could still use sext for add, I plan to add support for
other binary ops like mul in a later patch, but mul requires zext to be
correct (because the maximum value will take up the full N * 2 bits). So
I've opted to use zext here too for consistency.
Alive2 proof: https://alive2.llvm.org/ce/z/PRNsUM
We have a hidden option to disable it, but I'd like to make it a
tune feature.
For some implementations, instructions with W suffix would be less
costly as they only perform on 32 bits data. Though we may lose some
chances to compress.
Remove getSizeOrUnknown call when MachineMemOperand is created. For Scalable
TypeSize, the MemoryType created becomes a scalable_vector.
2 MMOs that have scalable memory access can then use the updated BasicAA that
understands scalable LocationSize.
Original Patch by Harvin Iriawan
Co-authored-by: David Green <david.green@arm.com>
Here we introduce three new GMIR instructions to cover a set of trap
intrinsics. The idea behind it is that generic intrinsics shouldn't be
used with G_INTRINSIC opcode.
These new instructions can match perfectly with existing trap ISD nodes.
It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for
selection and avoid manual selection. However AMDGPU is an exception. It
selects traps during legalization regardless SelectionDAG or GlobalISel.
Since there are not many places where traps are used, this change
attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So,
there is no stage when both G_TRAP and
G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.