FastISel can already fold some sign- and zero-extending loads, but a
number of i64 extension patterns still leave redundant instructions
behind.
This patch series extends load folding to recognize several such cases,
including:
- promoted sign-extension chains
- copy + i64.extend_i32_{u,s} chains
- AND-based zero-extension chains
- shift-based sign-extension chains
When these patterns originate from narrow integer loads, fold them
directly into widened i64 loads instead of materializing intermediate
i32 loads followed by redundant i64 extends.
Fixes#179672
WebAssembly FastISel currently fails to fold sign-extension patterns
composed of zero-extending loads followed by shift operations. This
results in redundant shift and constant instructions in the output.
Before:
i32.load8_u $push3=, 0($0)
i32.const $push0=, 24
i32.shl $push1=, $pop3, $pop0
i32.const $push4=, 24
i32.shr_s $push2=, $pop1, $pop4
The matched shift instruction sequence is removed and safely folded into
a single sign-extending load, erasing the dead code via the
MachineBasicBlock iterator.
After:
i32.load8_s $push0=, 0($0)
Fixed: #184302
Largely a straight-forward replacement with occasional simplifcations.
For AMDGPU, I assumed that unconditional branches are always uniform and
therefore "simplified"/changed AMDGPUAnnotateUniformValues to only
annotate conditional branches.
Target-specific FastISel only selects conditional branches,
unconditional branches are already handled by the non-target-specific
code.
BranchInst currently represents both unconditional and conditional
branches. However, these are quite different operations that are often
handled separately. Therefore, split them into separate opcodes and
classes to allow distinguishing these operations in the type system.
Additionally, this also slightly improves compile-time performance.
FastISel emits separate load and AND instructions for bitmasking.
(before) %1:i32 = LOAD_I32 %addr; %2:i32 = AND_I32 %1, 255
Fold AND masks into ZExt loads by verifying operands with
maskTrailingOnes. A getFoldedLoadOpcode wrapper is implemented
to manage dispatching logic for better extensibility.
(after) %1:i32 = LOAD8_U_I32 %addr
Fixed: https://github.com/llvm/llvm-project/issues/180783
The `tryToFoldLoadIntoMI` function omitted materializing base registers
for addresses before folding sign-extend instructions into loads. This
left `$noreg` as the base register, crashing subsequent passes.
WebAssembly memory instructions structurally require a valid base
register. Calling the existing `materializeLoadStoreOperands` function
ensures that a `CONST 0` virtual register is generated when addressing
global variables directly without a pre-existing base register.
(before) %1:i32 = LOAD8_S_I32_A32 0, @ch, $noreg ... -> CRASH (after)
%3:i32 = CONST_I32 0
%1:i32 = LOAD8_S_I32_A32 0, @ch, %3:i32 ... -> Folded safely
FastISel currently defaults to unsigned loads for i8/i16/i32 types,
leaving any sign-extension to be handled by a separate instruction. This
patch optimizes this by folding the SExtInst into the LoadInst, directly
emitting a signed load (e.g., i32.load8_s).
When a load has a single SExtInst use, selectLoad emits a signed load
and safely removes the redundantly emitted SExtInst.
Fixed: #180783
Before, Wasm FastISel treated all indirect calls the same, causing
miscompilations at O0 when trying to call a funcref (`call ptr
addrspace(20)`), as it would treat the funcref as a normal `ptr`
This adds a check so it falls back to ISelDAG when encountering calls
outside addrspace 0 (which covers direct calls and indirect calls
through normal function pointers).
Related: #140933
Fixes https://github.com/llvm/llvm-project/issues/165438
With `simd128` enabled, we may meet vector type truncation in FastISel.
To respect #138479, this patch merely bails out on non-integer IR types,
though I prefer bailing out for all non-simple types as most targets
(X86, AArch64) do.
This patch replaces:
using Foo = enum { A, B, C };
with the more conventional:
enum Foo { A, B, C };
These two enum declaration styles are not identical, but their
difference does not matter in these .cpp files. With the "using Foo"
style, the enum is unnamed and cannot be forward-declared, whereas the
conventional style creates a named enum that can be. Since these
changes are confined to .cpp files, this distinction has no practical
impact here.
Recently my change to avoid duplicate `dontcall` attribute errors
(#152810) caused the Clang `Frontend/backend-attribute-error-warning.c`
test to fail on Arm32:
<https://lab.llvm.org/buildbot/#/builders/154/builds/20134>
The root cause is that, if the default `IFastSel` path bails, then
targets are given the opportunity to lower instructions via
`fastSelectInstruction`. That's the path taken by Arm32 and since its
implementation of `selectCall` didn't call `diagnoseDontCall` no error
was emitted.
I've checked the other implementations of `fastSelectInstruction` and
the only other one that lowers call instructions in WebAssembly, so I've
fixed that too.
Previous logic did not handle the case where the result bit size was
between 32 and 64 bits inclusive. I updated the if-statements for more
precise handling.
An alternative solution would have been to abort FastISel in case the
result type is not legal for FastISel.
Resolves: #64222.
This PR began as an investigation into the root cause of
https://github.com/ziglang/zig/issues/20966.
Godbolt link showing incorrect codegen on 20.1.0:
https://godbolt.org/z/cEr4vY7d4.
This defines some new target features. These are subsets of existing
features that reflect implementation concerns:
- "call-indirect-overlong" - implied by "reference-types"; just the
overlong encoding for the `call_indirect` immediate, and not the actual
reference types.
- "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and
`memory.fill`, and not the other instructions in the bulk-memory
proposal.
This is split out from https://github.com/llvm/llvm-project/pull/112035.
---------
Co-authored-by: Heejin Ahn <aheejin@gmail.com>
MachineFunction's probably should not include a backreference to
the owning MachineModuleInfo. Most of these references were used
just to query the MCContext, which MachineFunction already directly
stores. Other contexts are using it to query the LLVMContext, which
can already be accessed through the IR function reference.
When promoted value, it is meaningless to copy value from reg to another
reg with the same type.
This PR add additional check for this cases to reduce the code size.
Fixes: #80053.
This patch fixes WebAssembly's FastISel pass to correctly consider
signext/zeroext parameter flags at function declaration.
Previously, the flags at call sites were only considered during code
generation, which caused an interesting bug report #63388 .
This is problematic especially because in WebAssembly's ABI, either
signext or zeroext can be tagged to a function argument, and it must be
correctly reflected in the generated code. Unit test
https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/WebAssembly/signext-zeroext.ll
shows that `i8 zeroext %t` and `i8 signext %t`'s code gen are different.
Vectors are always bit-packed and don't respect the elements' alignment
requirements. This is different from arrays. This means offsets of
vector GEPs need to be computed differently than offsets of array GEPs.
This PR fixes many places that rely on an incorrect pattern
that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`.
We replace these by usages of `GTI.getSequentialElementStride(DL)`,
which is a new helper function added in this PR.
This changes behavior for GEPs into vectors with element types for which
the (bit) size and alloc size is different. This includes two cases:
* Types with a bit size that is not a multiple of a byte, e.g. i1.
GEPs into such vectors are questionable to begin with, as some elements
are not even addressable.
* Overaligned types, e.g. i16 with 32-bit alignment.
Existing tests are unaffected, but a miscompilation of a new test is fixed.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
Move WebAssemblyUtilities from Utils to the CodeGen library. It
primarily deals in MIR layer types, so it really lives in the CodeGen
library.
Move a variety of other things around to try create better separation.
See issue #64166 for more info on layering.
Move llvm/include/CodeGen/WasmAddressSpaces.h back to
llvm/lib/Target/WebAssembly/Utils.
Differential Revision: https://reviews.llvm.org/D156472
Propagate PC sections metadata to MachineInstr when FastISel is doing
instruction selection.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D130884
This refactors some code dealing with setting Wasm symbol types.
Some of the code dealing with types was moved from
`WebAssemblyUtilities` to `WebAssemblyTypeUtilities`.
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D118121
If the icmp is in a different block, then the register for the icmp
operand may not be initialized, as it nominally does not have
cross-block uses. Add a check that the icmp is in the same block
as the branch, which should be the common case.
This matches what X86 FastISel does:
5b6b090cf2/llvm/lib/Target/X86/X86FastISel.cpp (L1648)
The "not" transform that could have a similar issue is dropped
entirely, because it is currently dead: The incoming value is
a branch or select condition of type i1, but this code requires
an i32 to trigger.
Fixes https://bugs.llvm.org/show_bug.cgi?id=51651.
Differential Revision: https://reviews.llvm.org/D108840
AttributeList::hasAttribute() is confusing, use clearer methods like
hasParamAttr()/hasRetAttr().
Add hasRetAttr() since it was missing from AttributeList.
Reland of 31859f896.
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D104797
Reland of 31859f896.
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Differential Revision: https://reviews.llvm.org/D104797
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D95425
This patch adds support for WebAssembly globals in LLVM IR, representing
them as pointers to global values, in a non-default, non-integral
address space. Instruction selection legalizes loads and stores to
these pointers to new WebAssemblyISD nodes GLOBAL_GET and GLOBAL_SET.
Once the lowering creates the new nodes, tablegen pattern matches those
and converts them to Wasm global.get/set of the appropriate type.
Based on work by Paulo Matos in https://reviews.llvm.org/D95425.
Reviewed By: pmatos
Differential Revision: https://reviews.llvm.org/D101608
This CL
1. Creates Utils/ directory under lib/Target/WebAssembly
2. Moves existing WebAssemblyUtilities.cpp|h into the Utils/ directory
3. Creates Utils/WebAssemblyTypeUtilities.cpp|h and put type
declarataions and type conversion functions scattered in various
places into this single place.
It has been suggested several times that it is not easy to share utility
functions between subdirectories (AsmParser, DIsassembler, MCTargetDesc,
...). Sometimes we ended up [[ https://reviews.llvm.org/D92840#2478863 | duplicating ]] the same function because of
this.
There are already other targets doing this: AArch64, AMDGPU, and ARM
have Utils/ subdirectory under their target directory.
This extracts the utility functions into a single directory Utils/ and
make them sharable among all passes in WebAssembly/ and its
subdirectories. Also I believe gathering all type-related conversion
functionalities into a single place makes it more usable. (Actually I
was working on another CL that uses various type conversion functions
scattered in multiple places, which became the motivation for this CL.)
Reviewed By: dschuff, aardappel
Differential Revision: https://reviews.llvm.org/D100995
This is a followup to D98145: As far as I know, tracking of kill
flags in FastISel is just a compile-time optimization. However,
I'm not actually seeing any compile-time regression when removing
the tracking. This probably used to be more important in the past,
before FastRA was switched to allocate instructions in reverse
order, which means that it discovers kills as a matter of course.
As such, the kill tracking doesn't really seem to serve a purpose
anymore, and just adds additional complexity and potential for
errors. This patch removes it entirely. The primary changes are
dropping the hasTrivialKill() method and removing the kill
arguments from the emitFast methods. The rest is mechanical fixup.
Differential Revision: https://reviews.llvm.org/D98294
Now that the WebAssembly SIMD specification is finalized and engines are
generally up-to-date, there is no need for a separate target feature for gating
SIMD instructions that engines have not implemented. With this change,
v128.const is now enabled by default with the simd128 target feature.
Differential Revision: https://reviews.llvm.org/D98457
The WebAssembly text and binary formats have different operand orders
for the "type" and "table" fields of call_indirect (and
return_call_indirect). In LLVM we use the binary order for the MCInstr,
but when we produce or consume the text format we should use the text
order. For compilation units targetting WebAssembly 1.0 (without the
reference types feature), we omit the table operand entirely.
Differential Revision: https://reviews.llvm.org/D97761
If the reference-types feature is enabled, call_indirect will explicitly
reference its corresponding function table via TABLE_NUMBER
relocations against a table symbol.
Also, as before, address-taken functions can also cause the function
table to be created, only with reference-types they additionally cause a
symbol table entry to be emitted.
Differential Revision: https://reviews.llvm.org/D90948