The extending loads combine tries to prefer sign-extends folding into loads vs
zexts, and in cases where a G_ZEXTLOAD is first used by a G_ZEXT, and then used
by a G_SEXT, it would select the G_SEXT even though the load is already
zero-extending.
Fixes issue #59630
This patch introduces a new type __externref_t that denotes a WebAssembly opaque
reference type. It also implements builtin __builtin_wasm_ref_null_extern(),
that returns a null value of __externref_t. This lays the ground work
for further builtins and reference types.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D122215
For in-order cores MachineCombiner makes better decisions when the critical path
is calculated only for the current basic block and does not take into account
other blocks from the trace.
This patch adds a virtual method to TargetInstrInfo to allow each target decide
which strategy to use.
Depends on D140541
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D140542
Some of TargetLowering functions needed opcodes are often used in DAGCombiner.
The patch make those MatchContextClass classes have TargetLowering members and
pass specific opcodes for those TargetLowering functions.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D144075
Now that we've inserted a call to an intrinsic, we need to update
certain previous uses of CallBrInst values to use the value of this
intrinsic instead.
There are 3 cases to handle:
1. The @llvm.callbr.landingpad.<type>() intrinsic call is in the same
BasicBlock as the use of the callbr we're replacing.
2. The use is dominated by the direct destination.
3. The use is not dominated by the direct destination, and may or may
not be dominated by the indirect destination.
Part 2c of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8.
Reviewed By: efriedma, void, jyknight
Differential Revision: https://reviews.llvm.org/D139970
We are about to allow different trace strategies for MachineCombiner. Make
the name of the ensemble strategy-neutral.
Depends on D140540
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D140541
This strategy makes each trace local to the basic block. For in-order cores some
heuristics work better when we do local decisions. For example, MachineCombiner
may expect that instructions outside the current basic block do not lengthen
the critical path when we execute instructions in order or the core has a
small re-order buffer.
This patch only introduce the strategy, real use-case is added in the further
pathes.
Depends on D140539
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D140540
Recommit bbdf24357932b064f2aa18ea1356b474e0220dde.
Original commit message:
If a chain of two selects share a true/false value and are controlled
by two setcc nodes, that are never both true, we can fold away one of
the selects. So, the following:
(select (setcc X, const0, eq), Y,
(select (setcc X, const1, eq), Z, Y))
Can be combined to:
select (setcc X, const1, eq) Z, Y
Differential Revision: https://reviews.llvm.org/D142535
This doesn't make sense as an option. fneg and fabs are bit
preserving by definition. If a target has some fneg or fabs
instruction that are not bitpreserving it's incorrect to lower
fneg/fabs to use it.
Make forward declaration possible to reduce amount of dependencies and reduce
re-compilation burden caused by further patches.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D140539
Requiring a bitcast to exist was unhelpful. The most basic cases
are always going to be a CopyFromReg or load, so they would need
a new cast inserted. Don't require a bitcast if it's a free
operation. I don't think this logic makes particularly much sense
(it seems to be imparting special interpretation of bitcast), but
this needs to be in sync with foldSignChangeInBitcast.
We should also get rid of this hasBitPreservingFPLogic hook. fabs/fneg
are bitpreserving or incorrectly implemented, so this should just be a
regular legality check.
While working on D143731 I hit a case where a build_vector with 2 undef operands could be generated (with one undef hidden behind a bitcast).
That made `reduceBuildVecTruncToBitCast` crash because it seems to assume there is at least one good operand.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D143886
Remove a dead masked store if another one has the same base pointer and mask or
the following store has all true constant mask and size if equal or bigger to
the first store.
Differential Revision: https://reviews.llvm.org/D143069
With the NPM, we're now defaulting to preserving LCSSA, so a couple
of tests have changed slightly.
Differential Revision: https://reviews.llvm.org/D140982
This change moves "DefaultVLIWScheduler" class declaration from
DFAPacketizer.cpp to DFAPacketizer.h.
This is needed because there is a protected class member of
type "DefaultVLIWScheduler*" in "VLIWPacketizerList" class.
The derived classes cannot use this memeber unless declaration
is available to it. More specifically :
// Without this change
```
class HexagonPacketizerList : public VLIWPacketizerList {
public :
HexagonPacketizerList() {
// Below line will cause incomplete class error since
// declaration was not available through header.
VLIWScheduler->schedule();
}
}
```
Reviewed By: kparzysz
Differential Revision: https://reviews.llvm.org/D139767
This fixes a few places where the addrx3 and strx3 forms were missed.
Previously this meant if one of these forms appeared somewhere various
errors could occur. This now also adds an extra test case for the addrx3
form (which previously failed).
Differential Revision: https://reviews.llvm.org/D143488
Without this patch `getDerefOffsetInBytes` incorrectly always returns
`std::nullopt` for expressions with fragments due to an off-by-one error with
fragment element indices.
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D143567
The forwarding header is left in place because of its use in
`polly/lib/External/isl/interface/extract_interface.cc`, but I have
added a GCC warning about the fact it is deprecated, because it is used
in `isl` from where it is included by Polly.
RegBankSelect can insert G_UNMERGE_VALUES in a lot of places which
left us with a lot of unmerge/merge pairs that could be simplified.
These often got in the way of pattern matching and made codegen
worse.
This patch:
- Makes the necessary changes to the merge/unmerge combines so they can run post RegBankSelect
- Adds relevant unmerge combines to the list of RegBankSelect combines for AMDGPU
- Updates some tablegen patterns that were missing explicit cross-regbank copies (V_BFI patterns were causing constant bus violations with this change).
This seems to be mostly beneficial for code quality.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D142192
**Summary**
After this patch, `DW_TAG_imported_declaration`s will be emitted into
the DWARF accelerator tables (under `.apple_namespaces`)
**Motivation**
Currently LLDB expression evaluation doesn't see through namespace
aliases. This is because LLDB only considers namespaces that are
part of `.apple_namespaces` when building a nested namespace
identifier for C++, which currently doesn't include import
declarations. The alternative to putting imports into accelerator
tables is to do a linear scan of a `DW_TAG_namespace` and look
for import declarations that look like they would satisfy the lookup
request, which is prohibitively expensive.
**Testing**
* Added unit-test
Differential Revision: https://reviews.llvm.org/D143397
The motivation behind this patch is to unify some of the outliner logic across architectures. This looks nicer in general and makes fixing [issues like this](https://reviews.llvm.org/D124707#3483805) easier.
There are some notable changes here:
1. `isMetaInstruction()` is used directly instead of checking for specific meta-instructions like `IMPLICIT_DEF` or `KILL`. This was already done in the RISC-V implementation, but other architectures still did hardcoded checks.
- As an exception to this, CFI instructions are explicitly delegated to the target because RISC-V has different handling for those.
2. `isTargetIndex()` checks are replaced with an assert; none of the architectures supported actually use `MO_TargetIndex` at this point in time.
3. `isCFIIndex()` and `isFI()` checks are also replaced with asserts, since these operands should not exist in [any context](https://reviews.llvm.org/D122635#3447214) at this stage in the pipeline.
Reviewed by: paquette
Differential Revision: https://reviews.llvm.org/D125072
This reverts commit cce239c45d6ef3865a017b5b3f935964e0348734.
HHVM calling conventions are unused. Remove them by partially reverting the commit.
Reviewed By: MaskRay, MatzeB
Differential Revision: https://reviews.llvm.org/D124330
Alignment of an alloca in IR can be lower than the preferred alignment
on purpose, but this override essentially treats the preferred
alignment as the minimum alignment.
The patch changes this behavior to always use the specified
alignment. If alignment is not set explicitly in LLVM IR, it is set to
DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign.
Tests are changed as well: explicit alignment is increased to match
the preferred alignment if it changes output, or omitted when it is
hard to determine the right value (e.g. for pointers, some structs, or
weird types).
Differential Revision: https://reviews.llvm.org/D135462
This function was added for ARM targets, but aligning global/stack pointer
arguments passed to memcpy/memmove/memset can improve code size and
performance for all targets that don't have fast unaligned accesses.
This adds a generic implementation that adjusts the alignment to pointer
size if unaligned accesses are slow.
Review D134168 suggests that this significantly improves performance on
synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134282
This patch adds a basic block profile dump option within the AsmPrinter
and dumps basic block profile information so that cost models can use
the data for downstream analysis.
Differential Revision: https://reviews.llvm.org/D143311
One of the cleanups necessary for D136529 - another being how we're going to handle moving freeze through multiple result nodes (like uaddo and subcarry)
Emit all constant integers produced by SanitizerBinaryMetadata as
ULEB128 to further reduce binary space used. Increasing the version is
not necessary given this change depends on (and will land) along with
the bump to v2.
To support this, the !pcsections metadata format is extended to allow
for per-section options, encoded in the first MD operator which must
always be a string and contain the section: "<section>!<options>".
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D143484
Optimize the encoding of "covered" metadata by:
1. Reducing feature mask from 4 bytes to 1 byte (needs increase once we
reach more than 8 features).
2. Only emitting UAR stack args size if it is non-zero, saving 4 bytes
in the common case.
One caveat is that the emitted metadata for function PC (offset), size,
and UAR size (if enabled) are no longer aligned to 4 bytes.
SanitizerBinaryMetadata version base is increased to 2, since the change
is backwards incompatible.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D143482
So long as the operation is reassociative, we can reassociate the double
vecreduce from for example fadd(vecreduce(a), vecreduce(b)) to
vecreduce(fadd(a,b)). This will in general save a few instructions, but some
architectures (MVE) require the opposite fold, so a shouldExpandReduction is
added to account for it. Only targets that use shouldExpandReduction will be
affected.
Differential Revision: https://reviews.llvm.org/D141870