With this change, targets are no longer required to put memory / strict-fp opcodes after special
`ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers.
This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709).
Pull Request: https://github.com/llvm/llvm-project/pull/119969
[Reverts d57892a2a153ab71a796f07e39d939eae6910c21]
For IR like this:
%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1
where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into
%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5
For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.
NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file
CodeGen/AArch64/extract-vector-cmp.ll
because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
---------
Co-authored-by: Paul Walker <paul.walker@arm.com>
For IR like this:
%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1
where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into
%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5
For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.
NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file
CodeGen/AArch64/extract-vector-cmp.ll
because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
This defines some new target features. These are subsets of existing
features that reflect implementation concerns:
- "call-indirect-overlong" - implied by "reference-types"; just the
overlong encoding for the `call_indirect` immediate, and not the actual
reference types.
- "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and
`memory.fill`, and not the other instructions in the bulk-memory
proposal.
This is split out from https://github.com/llvm/llvm-project/pull/112035.
---------
Co-authored-by: Heejin Ahn <aheejin@gmail.com>
For IR like this:
%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1
where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into
%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5
For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.
NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file
CodeGen/AArch64/extract-vector-cmp.ll
because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
WebAssembly's `memory.fill` and `memory.copy` instructions trap if the
pointers are out of bounds, even if the length is zero. This is
different from LLVM, which expects that it can call `memcpy` on
arbitrary invalid pointers if the length is zero. To avoid spurious
traps, branch around `memory.fill` and `memory.copy` when the length is
zero.
---------
Co-authored-by: Heejin Ahn <aheejin@gmail.com>
This commit implements the [wide-arithmetic] proposal which has recently
reached phase 2 in the WebAssembly proposals process. The goal here is
to implement support in LLVM for emitting these instructions which are
gated behind a new feature flag by default. A new `wide-arithmetic`
feature flag is introduced which gates these four new instructions from
being emitted.
Emission of each instruction itself is relatively simple given LLVM's
preexisting lowering rules and infrastructure. The main gotcha is that
due to the multi-result nature of all of these instructions it needed
the lowerings to be implemented in C++ rather than in TableGen.
[wide-arithmetic]: https://github.com/WebAssembly/wide-arithmetic
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
Getting this to work required a few additional changes:
- Add builtins for any instructions that can't be done with plain C
currently.
- Add support for the saturating version of fp_to_<s,i>_I16x8. Other
vector sizes supported this already.
- Support bitcast of f16x8 to v128. Needed to return a __f16x8 as
v128_t.
Currently, `getStackAlignment` asserts if the stack alignment wasn't
specified. This makes it inconvenient to use and complicates testing.
This change also makes `exceedsNaturalStackAlignment` method redundant.
Instead of splatting a single lane, to initialise a build_vector, lower
to scalar_to_vector which can be selected to load_zero.
Also add load_zero and load_lane patterns for f32x4 and f64x2.
Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up
miscompiling.
This patch also refactors the logic out of isInTailCallPosition() into the callers.
As a result memcpy and memmove are also modified to do the same thing
for consistency.
rdar://131419786
The current way of lowering `llvm.clear_cache` is a bit unusual. As
suggested by Matt Arsenault we are better off using an ISD node.
This change introduces a new `ISD::CLEAR_CACHE`, registers a new libcall
by default named `__clear_cache` and the default legalisation is a
libcall.
This is preparatory work for a custom lowering of `ISD::CLEAR_CACHE`
needed by RISC-V on some platforms.
This reuses most of the code that was created for f32x4 and f64x2 binary
instructions and tries to follow how they were implemented.
add/sub/mul/div - use regular LL instructions
min/max - use the minimum/maximum intrinsic, and also have builtins
pmin/pmax - use the wasm.pmax/pmin intrinsics and also have builtins
Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Adds a builtin and intrinsic for the f16x8.splat instruction.
Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Note: the current spec has f16x8.splat as opcode 0x123, but this is
incorrect and will be changed to 0x120 soon.
Adds a builtin and intrinsic for the f32.store_f16 instruction.
The instruction stores an f32 value as an f16 memory. Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Note: the current spec has f32.store_f16 as opcode 0xFD0121, but this is
incorrect and will be changed to 0xFC31 soon.
Adds a builtin and intrinsic for the f32.load_f16 instruction.
The instruction loads an f16 value from memory and puts it in an f32.
Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Note: the current spec has f32.load_f16 as opcode 0xFD0120, but this is
incorrect and will be changed to 0xFC30 soon.
Multivalue feature of WebAssembly has been standardized for several
years now. I think it makes sense to be able to enable it in the feature
section by default for our clang/llvm-produced binaries so that the
multivalue feature can be used as necessary when necessary within our
toolchain and also when running other optimizers (e.g. wasm-opt) after
the LLVM code generation.
But some WebAssembly toolchains, such as Emscripten, do not provide both
mulvalue-returning and not-multivalue-returning versions of libraries.
Also allowing the uses of multivalue in the features section does not
necessarily mean we generate them whenever we can to the fullest, which
is a different code generation / optimization option.
So this makes the lowering of multivalue returns conditional on the use
of 'experimental-mv' target ABI. This ABI is turned off by default and
turned on by passing `-Xclang -target-abi -Xclang experimental-mv` to
`clang`, or `-target-abi experimental-mv` to `clang -cc1` or `llc`.
But the purpose of this PR is not tying the multivalue lowering to this
specific 'experimental-mv'. 'experimental-mv' is just one multivalue ABI
we currently have, and it is still experimental, meaning it is not very
well optimized or tuned for performance. (e.g. it does not have the
limitation of the max number of multivalue-lowered values, which can be
detrimental to performance.) We may change the name of this ABI, or
improve it, or add a new multivalue ABI in the future. Also I heard that
WASI is planning to add their multivalue ABI soon. So the plan is,
whenever any one of multivalue ABIs is enabled, we enable the lowering
of multivalue returns in the backend. We currently have only
'experimental-mv' in the repo so we only check for that in this PR.
Related past discussions:
#82714https://github.com/WebAssembly/tool-conventions/pull/223#issuecomment-2008298652
This reverts commit 6e6bf9f81756ba6655b4eea8dc45469a47f89b39.
It turned out the multivalue feature had active outside users and it
could cause some disruptions to them, so I'd like to investigate more
about the workarounds before doing this.
We plan to enable multivalue in the features section soon (#80923) for
other reasons, such as the feature having been standardized for many
years and other features being developed (e.g. EH) depending on it. This
is separate from enabling Clang experimental multivalue ABI (`-Xclang
-target-abi -Xclang experimental-mv`), but it turned out we generate
some multivalue code in the backend as well if it is enabled in the
features section.
Given that our backend multivalue generation still has not been much
used nor tested, and enabling the feature in the features section can be
a separate decision from how much multialue (including none) we decide
to generate for now, I'd like to temporarily disable the actual
generation of multivalue in our backend. To do that, this adds an
internal flag `-wasm-emit-multivalue` that defaults to false. All our
existing multivalue tests can use this to test multivalue code. This
flag can be removed later when we are confident the multivalue
generation is well tested.
This follows on from #76708, allowing
`cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just
`N->getAsZextVal();`
Introduced via `git grep -l "cast<ConstantSDNode>\(.*\).*getZExtValue" |
xargs sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getZExtValue/\1->getAsZExtVal/'` and
then using `git clang-format` on the result.
The BuildVectorSDNode::isConstantSplat function could depend on
endianness, and it takes a bool argument that can be used to indicate
if big or little endian should be considered when internally casting
from a vector to a scalar. However, that argument is default set to
false (= little endian). And in many situations, even in target
generic code such as DAGCombiner, the endianness isn't specified when
using the function.
The intent with this patch is to highlight that endianness doesn't
matter, depending on the context in which the function is used.
In DAGCombiner the code is slightly refactored. Back in the days when
the code was written it wasn't possible to request a MinSplatBits
size when calling isConstantSplat. Instead the code re-expanded the
found SplatValue to match with the EltBitWidth. Now we can just
provide EltBitWidth as MinSplatBits and remove the logic for doing
the re-expand.
While being at it, tidying up around isConstantSplat, this patch also
adds an explicit check in BuildVectorSDNode::isConstantSplat to break
out from the loop if trying to split an on VecWidth into two halves.
Haven't been able to prove that there could be miscompiles involved
if not doing so. There are lit tests that trigger that scenario,
although I think they happen to later discard the returned SplatValue
for other reasons.
The vector shift operation in WebAssembly uses an i32 shift amount type, while
the LLVM IR requires binary operator uses the same type of operands. When the
shift amount operand is splated from a different block, the splat source will
not be exported and the vector shift will be unrolled to scalar shifts. This
patch enables the vector shift to identify the splat source value from the other
block, and generate expected WebAssembly bytecode when lowering.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D158399
Move WebAssemblyUtilities from Utils to the CodeGen library. It
primarily deals in MIR layer types, so it really lives in the CodeGen
library.
Move a variety of other things around to try create better separation.
See issue #64166 for more info on layering.
Move llvm/include/CodeGen/WasmAddressSpaces.h back to
llvm/lib/Target/WebAssembly/Utils.
Differential Revision: https://reviews.llvm.org/D156472
The codegen routine introduced in 18077e9fd688 did not account for vectors with
more than 16 lanes. Remove the incorrect assertion and bail out of the
optimization when encountering this case. Add test cases that previously
triggered the assertion. Unfortunately, these test cases now have terrible
codegen, but that is at least better than crashing.
Fixes#63500.
Differential Revision: https://reviews.llvm.org/D154124
WebAssembly doesn't expose inexact exceptions, so frint can be mapped to
fnearbyint. Likewise, WebAssembly always rounds ties-to-even, so
froundeven can be mapped to fnearbyint.
Differential Revision: https://reviews.llvm.org/D153451
This commit implements support for WebAssembly table types and
respective builtins. Table tables are WebAssembly objects to store
reference types. They have a large amount of semantic restrictions
including, but not limited to, only being allowed to be declared
at the top-level as static arrays of zero-length. Not being arguments
or result of functions, not being stored ot memory, etc.
This commit introduces the __attribute__((wasm_table)) to attach to
arrays of WebAssembly reference types. And the following builtins to
manage tables:
* ref __builtin_wasm_table_get(table, idx)
* void __builtin_wasm_table_set(table, idx, ref)
* uint __builtin_wasm_table_size(table)
* uint __builtin_wasm_table_grow(table, ref, uint)
* void __builtin_wasm_table_fill(table, idx, ref, uint)
* void __builtin_wasm_table_copy(table, table, uint, uint, uint)
This commit also enables reference-types feature at bleeding-edge.
This is joint work with Alex Bradbury (@asb).
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D139010
Correctly handle single-element vectors to fix an assertion failure. Add tests
that were missing from the original commit.
Differential Revision: D151782
This reverts commit 8392bf6000ad039bd0e55383d40a05ddf7b4af13.
The commit missed some edge cases that led to crashes. Reverting to resolve
downstream breakage while a fix is pending.
This intrinsic is meant to lower directly to the i8x16.shuffle instruction,
which takes its lane index arguments as immmediates. The ISel for the intrinsic
assumed that the lane index arguments were constants, so bitcode that
"incorrectly" used this intrinsic with non-immediate arguments caused an
assertion failure in the backend.
Avoid the crash by defining the lane index arguments to be immediates, matching
the underlying instruction. Update ISel accordingly. This change means that the
bitcode that previously caused a crash will now fail to validate.
Fixes#55559.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D149898
WebAssembly tries to cast an `undef` to `CosntantSDNode` during `LowerAccessVectorElement`.
These operations will trigger an assertion error in cast.
To avoid this issue, we prevent casting, and abort the lowering operation.
A unit test is also included.
This patch fixes [pr61828](https://github.com/llvm/llvm-project/issues/61828)
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D147198
Constants in BUILD_VECTOR may be down cast into a smaller value that fits LaneBits, i.e., the bit width of elements in the vector.
This cast didn't consider 2^N where it would be cast into -2^N, which still doesn't fit into LaneBits after casting.
This will cause an assertion in later legalization.
2^N should be cast into 0, and this patch reflects such behavior.
This patch also includes a test to reflect the fix.
This patch fixes [issue 61780](https://github.com/llvm/llvm-project/issues/61780)
Related patch: https://reviews.llvm.org/D108669
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D147208