In review of bbde6b, I had originally proposed that we support the
legacy text format. As review evolved, it bacame clear this had been a
bad idea (too much complexity), but in order to let that patch finally
move forward, I approved the change with the variant. This change undoes
the variant, and updates all the tests to just use the array form.
Fixes https://github.com/llvm/llvm-project/issues/149230
Previously, even with simd enabled via `-mattr=+simd128`, the compiler
cannot utilize v128 to optimize loads and setcc of i128, instead
legalizing it to consecutive i64s.
This PR then adds support for setcc of i128 by converting them to
v16i8's anytrue and alltrue; consequently, this benefits memcmp of 16
bytes or more (when simd128 is present).
The check for enabling this optimization is if the comparison operand is
either a load or an integer in i128, with the comparison code being
either `EQ | NE`, without `NoImplicitFloat` function flag.
Inspiration taken from RISCV's isel lowering.
Many backends are missing either all tests for lrint, or specifically
those for f16, which currently crashes for `softPromoteHalf` targets.
For a number of popular backends, do the following:
* Ensure f16, f32, f64, and f128 are all covered
* Ensure both a 32- and 64-bit target are tested, if relevant
* Add `nounwind` to clean up CFI output
* Add a test covering the above if one did not exist
* Always specify the integer type in intrinsic calls
There are quite a few FIXMEs here, especially for `f16`, but much of
this will be resolved in the near future.
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
I fixed support for varargs functions
(previously it didn't crash but the codegen was incorrect).
I added tests for structs and unions which already work. With the
multivalue abi they crash in the backend, so I added a sema check that
rejects structs and unions for that abi.
It will also crash in the backend if passed an int128 or float128 type.
During target DAG combine, use two i16x8.extmul_low_i8x16 and a shuffle
for v16i8 mul.
On my AArch64 machine, using V8, I observe a 3.14% geomean improvement
across 65 benchmarks, including: 9.2% for spec2017.x264, 6% for libyuv
and 1.8% for ncnn.
After #149310 lifetime intrinsics require an alloca argument, an
invariant that this pass can break.
I've fixed this in two ways:
* First, move static allocas into the entry block. Currently, the way
the pass splits the entry block makes all allocas dynamic, which I
assume was not actually intended. This will avoid unnecessary SSA
reconstruction for allocas as well, and thus avoid the problem.
* If this fails (for dynamic allocas) drop all lifetime intrinsics if
any one of them would require a rewrite during SSA reconstruction.
Fixes https://github.com/llvm/llvm-project/issues/150498.
Tests if the runtime type of the function pointer matches the static
type. If this returns false, calling the function pointer will trap.
Uses `@llvm.wasm.ref.test.func` added in #147486.
Also adds a "gc" wasm feature to gate the use of the ref.test
instruction.
Fixes https://github.com/llvm/llvm-project/issues/117200.
The default behavior in TargetLoweringBase is only scalar floats on fexp
are supported by default, not the vectorized version. This PR adds
`ISD::FEXP10` to the supported list.
PR #147486 broke the sanitizer and expensive-checks buildbot.
These captures were needed when toWasmValType emitted a diagnostic but
are no longer needed since we changed it to an assertion failure. This
removes the unneeded captures and should fix the sanitizer-buildbot.
I also fixed the codegen in the wasm64 target: table.get requires an i32
but in wasm64 the function pointer is an i64. We need an additional
`i32.wrap_i64` to convert it. I also added `-verify-machineinstrs` to
the tests so that the test suite validates this fix.
Finally, I noticed that #150201 uses a feature of the intrinsic that is
not covered by the tests, namely `ptr` arguments. So I added one
additional test case to ensure that it works properly.
cc @dschuff
There are cases we end up removing some intructions that use stackified
registers after RegStackify. For example,
```wasm
bb.0:
%0 = ... ;; %0 is stackified
br_if %bb.1, %0
bb.1:
```
In this code, br_if will be removed in CFGSort, so we should unstackify
%0 so that it can be correctly dropped in ExplicitLocals.
Rather than handling this in case-by-case basis, this PR just
unstackifies all stackifies register with no uses in the beginning of
ExplicitLocals, so that they can be correctly dropped.
Fixes#149097.
This adds an llvm intrinsic for WebAssembly to test the type of a
function. It is intended for adding a future clang builtin
` __builtin_wasm_test_function_pointer_signature` so we can test whether
calling a function pointer will fail with function signature mismatch.
Since the type of a function pointer is just `ptr` we can't figure out
the expected type from that.
The way I figured out to encode the type was by passing 0's of the
appropriate type to the intrinsic.
The first argument gives the expected type of the return type and the
later values give the expected
type of the arguments. So
```llvm
@llvm.wasm.ref.test.func(ptr %func, float 0.000000e+00, double 0.000000e+00, i32 0)
```
tests if `%func` is of type `(double, i32) -> (i32)`. It will lower to:
```wat
local.get $func
table.get $__indirect_function_table
ref.test (double, i32) -> (i32)
```
To indicate the function should be void, I somewhat arbitrarily picked
`token poison`, so the following tests for `(i32) -> ()`:
```llvm
@llvm.wasm.ref.test.func(ptr %func, token poison, i32 0)
```
To lower this intrinsic, we need some place to put the type information.
With `encodeFunctionSignature()` we encode the signature information
into an `APInt`. We decode it in `lowerEncodedFunctionSignature` in
`WebAssemblyMCInstLower.cpp`.
convert_iKxN_s is canonicalized into convert_iKxN_u when the argument is
known to have sign bit 0. This results in emitting Wasm opcodes that, on
some targets (like x86_64), are dramatically slower than signed versions
on major engines.
Similarly to X86, we now fix this up in isel when the instruction has
nonneg flag from canonicalization or if we know the source has zero sign
bit.
Fixes#149457.
[WebAssembly] Fold fadd contract (fmul contract) to relaxed madd w/
-mattr=+simd128,+relaxed-simd
Fixes#121311
- Precommit test for #121311
- Fold fadd contract (fmul contract) to relaxed madd w/
-mattr=+simd128,+relaxed-simd
- Move PatFrag of fadd_contract in ARM.td and WebAssembly.td to
TargetSelectionDAG.td for reuse of pattern
Change isBuildVectorAll* -> isConstantSplatVectorAll* in VSelect in case
the fold happens after BuildVector has been canonically transformed to
Splat or if the Splat is initially in vselect already
- Fixes#73454
- Update related test cases, add extra tests in wasm
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Previously we only folded TargetGlobalAddresses into the memarg if they
were on their own, so this patch supports folding TargetGlobalAddresses
that are added to some other offset.
Previously we weren't able to do this because we didn't have nuw on the
add, but we can now that getelementptr has nuw and is plumbed through to
the add in 0564d0665b302d1c7861e03d2995612f46613a0f.
Fixes#61930
This commit fixes using inline assembly with v128 results. Previously
this failed with an internal assertion about a failure to legalize a
`CopyFromReg` where the source register was typed `v8f16`. It looks like
the type used for the destination register was whatever was listed first
in the `def V128 : WebAssemblyRegClass` listing, so the types were
shuffled around to have a default-supported type.
A small test was added as well which failed to generate previously and
should now pass in generation. This test passed on LLVM 18 additionally
and regressed by accident in #93228 which was first included in LLVM 19.
Fixes https://github.com/llvm/llvm-project/issues/50142, a miss of
further vectorization, where we can only achieve zext (xor (any_true),
-1).
Now in test case simd-setcc-reductions, it's converted to all_true.
Also fixes https://github.com/llvm/llvm-project/issues/145177, which is
all_true (setcc x, 0, eq) -> not any_true
any_true (setcc x, 0, ne) -> any_true
all_true (setcc x, 0, ne) -> all_true
---------
Co-authored-by: badumbatish <--show-origin>
[WebAssembly] [Backend] Wasm optimize illegal bitmask for #131980.
Currently, the case for illegal bitmask (v32i8 or v64i8) is that at the
SelectionDag level, two (four) vectors of v128 will be concatenated
together, then they'll all be SETCC by the same pseudo illegal
instruction, which requires expansion later on.
I opt for SETCC-ing them seperately, bitcast and zext them and then add
them up together in the end.
---------
Co-authored-by: badumbatish <--show-origin>
This change is a step towards fixing one long-standing problem with
LLVM's debug WASM codegen: excessive use of locals. One local for each
temporary value in IR (roughly speaking).
This has a lot of problems:
1) It makes it easy to hit engine limitations of 50K locals with certain
code patterns and large functions.
2) It makes for larger binaries that are slower to load and slower to
compile to native code.
3) It makes certain compilation strategies (spill all WASM locals to
stack, for example) for debug code excessively expensive and makes debug
WASM code either run very slow, or be less debuggable.
4) It slows down LLVM itself.
This change addresses these partially by running a limited version of
the stackification pass for unoptimized code, one that gets rid of the
most 'obviously' unnecessary locals.
Care needs to be taken to not impact LLVM's ability to produce high
quality debug variable locations with this pass. To that end:
1) We only allow stackification when it doesn't require moving any
instructions.
2) We disable stackification of any locals that are used in
DEBUG_VALUEs, or as a frame base.
I have verified on a moderately large example that the baseline and the
diff produce the same kinds (local/global/stack) of locations, and the
only differences are due to the shifting of instruction offsets, with
many local.[get|set]s not being present anymore.
Even with this quite conservative approach, the results are pretty good:
1) 30% reduction in raw code size, up to 10x reduction in the number of
locals for select large methods (~1000 => ~100).
2) ~10% reduction in instructions retired for an "llc -O0" run on a
moderately sized input.
Introduce MCAsmInfo::UsesSetToEquateSymbol to control the preferred
syntax for symbol equating. We now favor the more readable and common
`symbol = expression` syntax over `.set`. This aligns with pre- https://reviews.llvm.org/D44256 behavior.
On Apple platforms, this resolves a clang -S vs -c behavior difference (resolves#104623).
For targets whose = support is unconfirmed, UsesSetToEquateSymbol is set to false.
This also minimizes test updates.
Pull Request: https://github.com/llvm/llvm-project/pull/142289
Previous logic did not handle the case where the result bit size was
between 32 and 64 bits inclusive. I updated the if-statements for more
precise handling.
An alternative solution would have been to abort FastISel in case the
result type is not legal for FastISel.
Resolves: #64222.
This PR began as an investigation into the root cause of
https://github.com/ziglang/zig/issues/20966.
Godbolt link showing incorrect codegen on 20.1.0:
https://godbolt.org/z/cEr4vY7d4.
This adds a call to SimplifyDemandedBits from bitcasts with scalar input
types in SimplifyDemandedVectorElts, which can help simplify the input
scalar.
Currently iterators over EquivalenceClasses will iterate over std::set,
which guarantees the order specified by the comperator. Unfortunately in
many cases, EquivalenceClasses are used with pointers, so iterating over
std::set of pointers will not be deterministic across runs.
There are multiple places that explicitly try to sort the equivalence
classes before using them to try to get a deterministic order
(LowerTypeTests, SplitModule), but there are others that do not at the
moment and this can result at least in non-determinstic value naming in
Float2Int.
This patch updates EquivalenceClasses to keep track of all members via a
extra SmallVector and removes code from LowerTypeTests and SplitModule
to sort the classes before processing.
Overall it looks like compile-time slightly decreases in most cases, but
close to noise:
https://llvm-compile-time-tracker.com/compare.php?from=7d441d9892295a6eb8aaf481e1715f039f6f224f&to=b0c2ac67a88d3ef86987e2f82115ea0170675a17&stat=instructions
PR: https://github.com/llvm/llvm-project/pull/134075
This commit is the result of investigation and discussion on
WebAssembly/wide-arithmetic#6 where alternatives to the `i64.add128`
instruction were discussed but ultimately deferred to a future proposal.
In spite of this though I wanted to apply a few changes to the LLVM
backend here with `wide-arithmetic` enabled for a few minor changes:
* A lowering for the `ISD::UADDO` node is added which uses `add128`
where the upper bits of the two operands are constant zeros and the
result of the 128-bit addition is the result of the overflowing
addition.
* The high bits of a `I64_ADD128` node are now flagged as "known zero"
if the upper bits of the inputs are also zero, assisting this `UADDO`
lowering to ensure the backend knows that the carry result is a 1-bit
result.
A few tests were then added to showcase various lowerings for various
operations that can be done with wide-arithmetic. They don't all
optimize super well at this time but I wanted to add them as a reference
here regardless to have them on-hand for future evaluations if
necessary.
In [1], Nikita Popov suggested that during lowering 'unreachable' insn
should not generate extra code for naked functions, and this applies to
all architectures. Note that for naked functions, 'unreachable' insn is
necessary in IR since the basic block needs a terminator to end.
This patch checked whether a function is naked function or not. If it is
a naked function, 'unreachable' insn will not generate ISD::TRAP.
[1] https://github.com/llvm/llvm-project/pull/131731
Co-authored-by: Yonghong Song <yonghong.song@linux.dev>
Unlike in Itanium EH IR, WinEH IR's unwinding instructions (e.g.
`invoke`s) can have multiple possible unwind destinations.
For example:
```ll
entry:
invoke void @foo()
to label %cont unwind label %catch.dispatch
catch.dispatch: ; preds = %entry
%0 = catchswitch within none [label %catch.start] unwind label %terminate
catch.start: ; preds = %catch.dispatch
%1 = catchpad within %0 [ptr null]
...
terminate: ; preds = %catch.dispatch
%2 = catchpad within none []
...
...
```
In this case, if an exception is not caught by `catch.dispatch` (and
thus `catch.start`), it should next unwind to `terminate`.
`findUnwindDestination` in ISel gathers the list of this unwind
destinations traversing the unwind edges:
ae42f07103/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (L2089-L2150)
But we don't use that, and instead use our custom
`findWasmUnwindDestinations` that only adds the first unwind
destination, `catch.start`, to the successor list of `entry`, and not
`terminate`:
ae42f07103/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (L2037-L2087)
The reason behind it was, as described in the comment block in the code,
it was assumed that there always would be an `invoke` that connects
`catch.start` and `terminate`. In case of `catch (type)`, there will be
`call void @llvm.wasm.rethrow()` in `catch.start`'s predecessor that
unwinds to the next destination. For example:
0db702ac8e/llvm/test/CodeGen/WebAssembly/exception.ll (L429-L430)
In case of `catch (...)`, `__cxa_end_catch` can throw, so it becomes an
`invoke` that unwinds to the next destination. For example:
0db702ac8e/llvm/test/CodeGen/WebAssembly/exception.ll (L537-L538)
So the unwind ordering relationship between `catch.start` and
`terminate` here would be preserved.
But turns out this assumption does not always hold. For example:
```ll
entry:
invoke void @foo()
to label %cont unwind label %catch.dispatch
catch.dispatch: ; preds = %entry
%0 = catchswitch within none [label %catch.start] unwind label %terminate
catch.start: ; preds = %catch.dispatch
%1 = catchpad within %0 [ptr null]
...
call void @_ZSt9terminatev()
unreachable
terminate: ; preds = %catch.dispatch
%2 = catchpad within none []
call void @_ZSt9terminatev()
unreachable
...
```
In this case there is no `invoke` that connects `catch.start` to
`terminate`. So after `catch.dispatch` BB is removed in ISel,
`terminate` is considered unreachable and incorrectly removed in DCE.
This makes Wasm just use the general `findUnwindDestination`. In that
case `entry`'s successor is going to be [`catch.start`, `terminate`]. We
can get the first unwind destination by just traversing the list from
the front.
---
This required another change in WinEHPrepare. WinEHPrepare demotes all
PHIs in EH pads because they are funclets in Windows and funclets can't
have PHIs. When used in Wasm they are not funclets so we don't need to
do that wholesale but we still need to demote PHIs in `catchswitch` BBs
because they are deleted during ISel. (So we created
[`-demote-catchswitch-only`](a5588b6d20/llvm/lib/CodeGen/WinEHPrepare.cpp (L57-L59))
option for that)
But turns out we need to remove PHIs that have a `catchswitch` BB as an
incoming block too:
```ll
...
catch.dispatch:
%0 = catchswitch within none [label %catch.start] unwind label %terminate
catch.start:
...
somebb:
...
ehcleanup ; preds = %catch.dispatch, %somebb
%1 = phi i32 [ 10, %catch.dispatch ], [ 20, %somebb ]
...
```
In this case the `phi` in `ehcleanup` BB should be demoted too because
`catch.dispatch` BB will be removed in ISel so one if its incoming block
will be gone. This pattern didn't manifest before presumably due to how
`findWasmUnwindDestinations` worked. (In this example, in our
`findWasmUnwindDestinations`, `catch.dispatch` would have had only one
successor, `catch.start`. But now `catch.dispatch` has both
`catch.start` and `ehcleanup` as successors, revealing this bug. This
case is
[represented](ab87206c4b/llvm/test/CodeGen/WebAssembly/exception.ll (L445))
by `rethrow_terminator` function in `exception.ll` (or
`exception-legacy.ll`) and without the WinEHPrepare fix it will crash.
---
Discovered by the reproducer provided in #126916, even though the bug
reported there was not this one.
This change effectively reverts 296ccef
(https://reviews.llvm.org/D77192)
Most of these symbols are just normal C symbols that get imported from
wither libcompiler-rt or from emscripten's JS library code. In most
cases it should not be necessary to give them explicit import names.
The advantage of doing this is that we can wasm-ld can/will fail with a
useful error message when these symbols are missing. As opposed to today
where it will simply import them and defer errors until later (when they
are less specific).