FastISel emits separate load and AND instructions for bitmasking.
(before) %1:i32 = LOAD_I32 %addr; %2:i32 = AND_I32 %1, 255
Fold AND masks into ZExt loads by verifying operands with
maskTrailingOnes. A getFoldedLoadOpcode wrapper is implemented
to manage dispatching logic for better extensibility.
(after) %1:i32 = LOAD8_U_I32 %addr
Fixed: https://github.com/llvm/llvm-project/issues/180783
When folding loads with extensions, the extension operand can be a freeze node
in addition to a load. We can look through it to do the desirability check.
Fixes#184676
Existing ISel patterns match setne/seteq following SIMD boolean reductions
any_true and all_true, and drop the ones that are redundant (because the
reductions always return 1 or 0). This adds patterns to also produce eqz
instructions instead of a comparison with a const.
Current tooling for the WebAssembly component model uses import modules
and names such as `$root` and `[thread-index]`. Importing these from
assembly files requires support for non-valid identifiers in
`.import_name` and `.import_module` directives. This PR adds support for
specifying those as strings, e.g.:
```asm
.import_module __wasm_component_model_builtin_thread_index, "$root"
.import_name __wasm_component_model_builtin_thread_index, "[thread-index]"
```
For wasm, forming minnum/maxnum style ISD nodes is non-profitable,
because (in cases where any float min/max support exists at all), it has
pmin/pmax instructions that correspond to the fcmp+select semantics, or
relaxed_fmin/relaxed_fmax (for the nnan+nsz case) with even loser
semantics.
As such, return false from isProfitableToCombineMinNumMaxNum(), and also
respect that hook in the SDAGBuilder.
Initial support for acquire-release atomics, specified as part of
https://github.com/WebAssembly/shared-everything-threads
This adds an ordering operand to atomic loads, stores, RMWs,
wait/notify,
and fences. It currently defaults to 0 and ISel is not updated yet, so
atomics produced by the compiler will still always be seqcst.
Asm parsing and printing, binary emission and disassembly are all
updated. Binary emission will always use the old encoding because the
encoding is smaller, and to get backwards compatibility for free.
The CombineSetCC helpers and performAnyAllCombine generate MVT::i1
results.
However MVT::i1 is an illegal type in WebAssembly, and this combiner can
run either before or after legalization. Directly creating the intrinsic
and negating its result using XOR instead of i1 and a NOT operation
avoids this problem.
Fixes#183842
The `tryToFoldLoadIntoMI` function omitted materializing base registers
for addresses before folding sign-extend instructions into loads. This
left `$noreg` as the base register, crashing subsequent passes.
WebAssembly memory instructions structurally require a valid base
register. Calling the existing `materializeLoadStoreOperands` function
ensures that a `CONST 0` virtual register is generated when addressing
global variables directly without a pre-existing base register.
(before) %1:i32 = LOAD8_S_I32_A32 0, @ch, $noreg ... -> CRASH (after)
%3:i32 = CONST_I32 0
%1:i32 = LOAD8_S_I32_A32 0, @ch, %3:i32 ... -> Folded safely
FastISel currently defaults to unsigned loads for i8/i16/i32 types,
leaving any sign-extension to be handled by a separate instruction. This
patch optimizes this by folding the SExtInst into the LoadInst, directly
emitting a signed load (e.g., i32.load8_s).
When a load has a single SExtInst use, selectLoad emits a signed load
and safely removes the redundantly emitted SExtInst.
Fixed: #180783
cc https://github.com/llvm/llvm-project/issues/179143
This adds a second pattern: we already recognize "shuffle + extend +
add" as `addext`, this adds another pattern for "extend + shuffle +
add", which can come up when programs are optimized.
SELECT_CC nodes with externref or funcref return types were not being
expanded, causing "Cannot select" errors during instruction selection.
This adds SELECT_CC to the list of operations that should be expanded
for reference types, similar to how it's already handled for scalar
types (i32, i64, f32, f64). This allows the SELECT_CC to be lowered to a
SELECT node, which already has instruction patterns defined in
WebAssemblyInstrRef.td.
This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c.
The underlying issue causing the revert has been fixed independently
as 301fa24671256734df6b7ee65f23ad885400108e.
Original message:
Move narrowInterleaveGroups to to general VPlan optimization stage.
To do so, narrowInterleaveGroups now has to find a suitable VF where all
interleave groups are consecutive and saturate the full vector width.
If such a VF is found, the original VPlan is split into 2:
a) a new clone which contains all VFs of Plan, except VFToOptimize, and
b) the original Plan with VFToOptimize as single VF.
The original Plan is then optimized. If a new copy for the other VFs has
been created, it is returned and the caller has to add it to the list of
candidate plans.
Together with https://github.com/llvm/llvm-project/pull/149702, this
allows to take the narrowed interleave groups into account when
computing costs to choose the best VF and interleave count.
One example where we currently miss interleaving/unrolling when
narrowing interleave groups is https://godbolt.org/z/Yz77zbacz
PR: https://github.com/llvm/llvm-project/pull/149706
This checks every user function of `setjmp` or `longjmp` and if any of
them does not have `+exception-handling` target feature, errors out.
Hopefully this gives a clearer error message to the users in case they
do not provide consistent SjLj flags at compile time vs. link time.
Closes#178135 and closes
https://github.com/emscripten-core/emscripten/issues/26165.
This lowers the splitting as:
```
any_active(hi_mask)
? (find_last_active(hi_mask) + lo_mask.getVectorElementCount())
: find_last_active(lo_mask)
```
And trivially lowers `<1 x i1>` scalarization to returning zero. Which
is a natural result of the splitting (and the lack of a sentinel
"none-active" result value).
The lowerings likely can be improved. This patch is for completeness.
Should fix:
https://github.com/llvm/llvm-project/pull/178862#issuecomment-3862310334Fixes#180212
We don't currently support common symbols for Wasm, and we currently
emit a generic error with a backtrace. Instead, don't crash, and report
the names of the offending symbols.
Before, Wasm FastISel treated all indirect calls the same, causing
miscompilations at O0 when trying to call a funcref (`call ptr
addrspace(20)`), as it would treat the funcref as a normal `ptr`
This adds a check so it falls back to ISelDAG when encountering calls
outside addrspace 0 (which covers direct calls and indirect calls
through normal function pointers).
Related: #140933
Fixes a crash during type legalization by allowing
ISD::ANY_EXTEND_VECTOR_INREG to fall back to default expansion instead
of hitting llvm_unreachable.
Fixed: #177209
Custom lower FMINNUM, FMINIMUMNUM, FMAXNUM and FMAXIMUMNUM to generate
relaxed_min and relaxed_max when the inputs cannot be NaN or signed
zero.
Tablegen patterns have also been modified to check the above conditions
when trying to match relaxed min/max using the pmin/pmax pattern.
This restriction was originally added in
https://reviews.llvm.org/D143256, with the given justification:
> Currently, in TargetLowering, if the target does not support fminnum,
we lower to fminimum if neither operand could be a NaN. But this isn't
quite correct because fminnum and fminimum treat +/-0 differently; so,
we need to prove that one of the operands isn't a zero.
As far as I can tell, this was never correct. Before
https://github.com/llvm/llvm-project/pull/172012, `minnum` and `maxnum`
were nondeterministic with regards to signed zero, so it's always been
perfectly legal to lower them to operations that order signed zeroes.
These are long overdue for removal. These were originally a hack
to support loading half values before there was any / decent support
for the half type through the backend. There's no reason to continue
supporting these, they're equivalent to fpext/fptrunc with a bitcast.
SelectionDAG stopped translating these directly, and used the
bitcast + fp cast since f7a02c17628e825, so there's been no reason
to use these since 2014.
We've already optimised these, so update the cost model to reflect it.
And skip the isBeforeLegalize check when lowering i8 muls, because it
then misses the cases where, say v32i8, has been type legalised into 2x
v16i8.
Also explicitly disable memory interleaving for any factor other than
two or four.
Commit
6ad41bcc49
changed how frem is expanded during legalization and it
broke WebAssembly but we were missing test coverage. We want to maintain
our previous behavior of unrolling vectors and using a libcall to
implement scalar frem. I'm not sure why this now has to be different
(in ISelLowering) from other libcalls like fsin which work the same way
in the end, but this code does accurately describe what we want.
Fixes: https://github.com/emscripten-core/emscripten/issues/25991
The default `half` legalization, which Wasm currently uses, does not
respect IEEE conventions: for example, casting to bits may invoke a lossy
libcall, meaning soft float operations cannot be correctly implemented.
Change to the soft promotion legalization which passes `f16` as an `i16`
and treats each `half` operation as an individual
f16->f32->libcall->f32->f16 sequence.
Of note in the test updates are that `from_bits` and `to_bits` are now
libcall-free, and that chained operations now round back to `f16` after
each step.
Fixes the wasm portion of
https://github.com/llvm/llvm-project/issues/97981
Fixes the wasm portion of
https://github.com/llvm/llvm-project/issues/97975
Fixes: https://github.com/llvm/llvm-project/issues/96437
Fixes: https://github.com/llvm/llvm-project/issues/96438
The keep-registers mode isn't super useful without disabling
explicit-locals,
as the local gets/sets are irrelevant noise in most cases.
Switching this test makes the output much more concise and will make
upcoming
changes easier to review.
This patch extends the `load_zero` pattern matching to
support floating-point vector types (`v4f32` and `v2f64`).
Previously, the optimization to generate `v128.load32_zero` and
`v128.load64_zero` was only enabled for integer types
(`v4i32` and `v2i64`). This change adds the necessary TableGen
patterns to correctly match scalar floating-point loads inserted
into zero-initialized vectors.
Fixes https://github.com/llvm/llvm-project/issues/172647
Currently, MC assumes that all `target-feature` flag attributes are well
formed and will crash otherwise. This change handles those cases more
gracefully.
Adds lowering of `addrspacecast [0 -> 20]` to allow easy conversion of
function pointers to Wasm `funcref`
When given a constant function pointer, it lowers to a direct
`ref.func`. Otherwise it lowers to a `table.get` from
`__indirect_function_table` using the provided pointer as the index.
This reverts commit 0859ac5866a0228f5607dd329f83f4a9622dedcc.
This caused a couple test failures, likely due to a mid-air collision.
Reverting for now to get the tree back to green and allow the original
author to run UTC/friends and verify the output.
This maybe a bug which is introduced by commit
6749ae36b4a33769e7a77cf812d7cd0a908ae3b9, and has been present ever
since.
In this case, `OtherReg` always overlaps with `DstReg` cause they from
the `Copy` all.
Treat it in the same manner of zero_extend_vector_inreg and generate an
extend_low_u if possible. This is to try an prevent expensive shuffles
from being generated instead. computeKnownBitsForTargetNode has also
been updated to specify known zeros on extend_low_u.
Fixes https://github.com/llvm/llvm-project/issues/165438
With `simd128` enabled, we may meet vector type truncation in FastISel.
To respect #138479, this patch merely bails out on non-integer IR types,
though I prefer bailing out for all non-simple types as most targets
(X86, AArch64) do.
Fill out more information for sign and zero extend and add some truncate
information; however, the primary change is to int/fp conversions. In
particular, fp to (narrow) int appears to be relatively expensive.