LLVM provides target hooks to recognise stack spill and restore
instructions, such as isLoadFromStackSlot, and it also provides post frame
elimination versions such as isLoadFromStackSlotPostFE. These are supposed
to return the store-source and load-destination registers; unfortunately on
X86, the PostFE recognisers just return "1", apparently to signify "yes
it's a spill/load". This patch alters the hooks to correctly return the
store-source and load-destination registers:
This is really useful for debug-info as we it helps follow variable values
as they move on/off the stack. There should be no codegen changes: the only
other users of these PostFE target hooks are MachineInstr::getRestoreSize
and MachineInstr::getSpillSize, which don't attempt to interpret the
returned register location.
While we're here, delete the (InstrRef) LiveDebugValues heuristic that
tries to find the spill source register by looking for a killed reg -- we
should be able to rely on the target hooks for that. This involves
temporarily turning off a n InstrRef LivedDebugValues test on aarch64
(patch to re-enable it is in D104521).
Differential Revision: https://reviews.llvm.org/D105428
This reverts commit 84ed3a794b4ffe7bd673f1e5a17d507aa3113d12.
A number of clang tests are also affected by this change. Revert
until I can update them.
This reverts commit 52aeacfbf5ce5f949efe0eae029e56db171ea1f7.
There isn't full agreement on a path forward yet, but there is agreement that
this shouldn't land as-is. See discussion on https://reviews.llvm.org/D105338
Also reverts unreviewed "[clang] Improve `-Wnull-dereference` diag to be more in-line with reality"
This reverts commit f4877c78c0fc98be47b926439bbfe33d5e1d1b6d.
And all the related changes to tests:
This reverts commit 9a0152799f8e4a59e0483728c9f11c8a7805616f.
This reverts commit 3f7c9cc27422f7302cf5a683eeb3978e6cb84270.
This reverts commit 329f8197ef59f9bd23328b52d623ba768b51dbb2.
This reverts commit aa9f58cc2c48ca6cfc853a2467cd775dc7622746.
This reverts commit 2df37d5ddd38091aafbb7d338660e58836f4ac80.
This reverts commit a72a44181264fd83e05be958c2712cbd4560aba7.
Summary:
The bit order of the has_vec and longtbtable bits in the traceback table generated by the XL compiler flipped at some point after v12.1. This is different from the definition is the AIX header debug.h. The change in the XL compiler that caused the deviation from the OS header definition was unintentional. Since both orderings are extant and the XL compiler runtime also expects the ordering defined by the OS, we will correct the output from LLVM to match the defined ordering given by the OS (which is also consistent with the Assembler Language Reference). Mitigation for traceback tables encoded with the wrong ordering is required for either ordering.
Reviewers: XingXue, HubertTong
Differential Revision: https://reviews.llvm.org/D105487
This reverts commit 4e413e16216d0c94ada2171f3c59e0a85f4fa4b6,
which landed almost 10 months ago under premise that the original behavior
didn't match reality and was breaking users, even though it was correct as per
the LangRef. But the LangRef change still hasn't appeared, which might suggest
that the affected parties aren't really worried about this problem.
Please refer to discussion in:
* https://reviews.llvm.org/D87399 (`Revert "[InstCombine] erase instructions leading up to unreachable"`)
* https://reviews.llvm.org/D53184 (`[LangRef] Clarify semantics of volatile operations.`)
* https://reviews.llvm.org/D87149 (`[InstCombine] erase instructions leading up to unreachable`)
clang has `-Wnull-dereference` which will diagnose the obvious cases
of null dereference, it was adjusted in f4877c78c0fc98be47b926439bbfe33d5e1d1b6d,
but it will only catch the cases where the pointer is a null literal,
it will not catch the cases where an arbitrary store is expected to trap.
Differential Revision: https://reviews.llvm.org/D105338
Its proving tricky to move this to the generic legalizer code, so manually insert the v2i32 subvector into v4i32, insert the AssertSext/AssertZext node, then extract the subvector again.
This avoids masks in the truncation/pack code, which means we avoid a PSHUFB in the fp_to_sint/uint code for sub-128 bit types (specific targets can still combine the packs to a pshufb if they have fast variable per-lane shuffles).
This was noticed when I was trying to improve fp_to_sint/uint costs with D103695 (and some targets had very high fp_to_sint costs due to the PSHUFB), so we can then update the fp_to_uint codegen from D89697.
When the instruction has imm form and fed by LI, we can remove the redundat LI instruction.
Below is an example:
```
renamable $x5 = LI8 2
renamable $x4 = exact SRD killed renamable $x4, killed renamable $r5, implicit $x5
```
will be converted to:
```
renamable $x5 = LI8 2
renamable $x4 = exact RLDICL killed renamable $x4, 62, 2, implicit killed $x5
```
But when we do this optimization, we forget to remove implicit killed $x5
This bug has caused a lnt case error. This patch is to fix above bug.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D85288
The rest of the SOP instructions implicitly set SCC and not
suitable for the rematerialization.
Differential Revision: https://reviews.llvm.org/D105670
Override the `shouldScalarizeBinop` target lowering hook using the same
implementation used in the x86 backend. This causes `extract_vector_elt`s of
vector binary ops to be scalarized if the scalarized version would be supported.
Differential Revision: https://reviews.llvm.org/D105646
Revived D101297 in its original form + added some changes in X86
legalization cehcking for masked gathers.
This solution is the most stable and the most correct one. We have to
check the legality before trying to build the masked gather in SLP.
Without this check we have incorrect cost (for SLP) in case if the masked gather
is not legal/slower than the gather. And we're missing some
vectorization opportunities.
This can be fixed in the cost model, but in this case we need to add
special checks for the cost of GEPs for ScatterVectorize node, add
special check for small trees, etc., i.e. there are a lot of corner
cases here and there, which insrease code base and make it harder to
maintain the code.
> Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed.
The question from D101297. Actually, no, it can't. Actually, simple
gather may give us better result, especially after we started
vectorization of insertelements. Plus, like I said before, the cost for
non-legal masked gathers leads to missed vectorization opportunities.
Differential Revision: https://reviews.llvm.org/D105042
Additionally, lower the floating point compare SVE intrinsics to
SETCC_MERGE_ZERO ISD nodes to avoid duplicating ISel patterns.
Differential Revision: https://reviews.llvm.org/D105486
WebAssembly's shift instructions implicitly masks the shift count, so optimize
out redundant explicit masks of the shift count. For vector shifts, this
currently only works if the mask is applied before splatting the shift count,
but this should be addressed in a future commit. Resolves PR49655.
Differential Revision: https://reviews.llvm.org/D105600
Lowering for scalar to vector would skip if current subtarget is big
endian and the scalar is larger or equal than 64 bits. However there's
some issue in implementation that SToVRHS may refer to SToVLHS's scalar
size if SToVLHS is present, which leads to some crash.o
Reviewed By: nemanjai, shchenz
Differential Revision: https://reviews.llvm.org/D105094
This patch prevents GlobalISel from optimizing out redundant branch
instructions when compiling without optimizations.
The motivating example is code like the following common pattern in
Swift, where users expect to be able to set a breakpoint on the early
exit:
public func f(b: Bool) {
guard b else {
return // I would like to set a breakpoint here.
}
...
}
The patch modifies two places in GlobalISEL: The first one is in
IRTranslator.cpp where the removal of redundant branches is made
conditional on the optimization level. The second one is in
AArch64InstructionSelector.cpp where an -O0 *only* optimization is
being removed.
Disabling these optimizations increases code size at -O0 by
~8%. However, doing so improves debuggability, and debug builds are
the primary reason why developers compile without optimizations. We
thus concluded that this is the right trade-off.
rdar://79515454
Differential Revision: https://reviews.llvm.org/D105238
There are some patterns involving the permuted scalar to vector node
for which we don't have patterns without direct moves on little endian
subtargets. This causes selection errors. While we can of course add
the missing patterns, any additional effort to make this work is not
useful since there is no support for any CPU that can run in
little endian mode and does not support direct moves.
Adding usage of VSSRC and VSFRC when adding the live in registers on AIX.
This matches the behaviour of the rest of PPC Subtargets.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D104396
We already have reassociation code for Adds and Ors separately in DAG
combiner, this adds it for the combination of the two where Ors act like
Adds. It reassociates (add (or (x, c), y) -> (add (add (x, y), c)) where
we know that the Ors operands have no common bits set, and the Or has
one use.
Differential Revision: https://reviews.llvm.org/D104765
This will use the python that LLVM was configured to use rather than
python from PATH.
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D105224
The combine was disabled in 4e22c7265d86 as it caused failures in
the ppc64be-multistage (bootstrap) bot.
It turns out that the combine did not correctly update the MMO for
the high load which caused aliased stores to be reported as unaliased.
This patch fixes that problem and re-enables the combine.
There are cases where infer address spaces pass cannot yet
infer an address space in the opt pipeline and then in the
llc pipeline it runs too late for atomic expand pass to
benefit from a specific address space.
Move atomic expand pass past the infer address spaces.
Fixes: SWDEV-293410
Differential Revision: https://reviews.llvm.org/D105511
This applies to memory accesses to (compile-time) constant addresses
(such as memory-mapped registers). Currently when a misaligned access
to such an address is detected, a fatal error is reported. This change
will emit a remark, and the compilation will continue with a trap,
and "undef" (for loads) emitted.
This fixes https://llvm.org/PR50838.
Differential Revision: https://reviews.llvm.org/D50524
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.
There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.
The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.
As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base. This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.
This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes). The resulting changes
seem okay to me, but suggestions welcome. As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.
I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.
Recommitting with fix to MemoryDepChecker::isDependent.
Differential Revision: https://reviews.llvm.org/D104806
These are fp->int conversions using either RMM or dynamic rounding modes.
The lround and lrint opcodes have a return type of either i32 or
i64 depending on sizeof(long) in the frontend which should follow
xlen. llround/llrint should always return i64 so we'll need a libcall
for those on rv32.
The frontend will only emit the intrinsics if -fno-math-errno is in
effect otherwise a libcall will be emitted which will not use
these ISD opcodes.
gcc also does this optimization.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D105206
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.
There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.
The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.
As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base. This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.
This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes). The resulting changes
seem okay to me, but suggestions welcome. As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.
I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.
Differential Revision: https://reviews.llvm.org/D104806
The odd register of a (128 bit) register pair is accessed with the 'N' code
with an inline assembly operand.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D105502
This adds a DAG combine to detect sext/zext inputs and emit a
new ISD opcode. The extends will either be removed or replaced
with narrower extends.
Isel patterns are used to match add and widening mul to vwmacc
similar to the recently added vmacc patterns.
There's still some work to be to match vmulsu.
We should also rewrite splats that were extended as scalars and
then splatted.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D104802
Benchmarking has shown that it is worthwhile to implement a variable length
memset of 0 with XC (exclusive or) like gcc does, instead of using a libcall.
This requires the use of the EXecute Relative Long (EXRL) instruction which
can now be done in a framework that can also be used with other target
instructions (not just XC).
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D103865
This avoids the use of the vector unit for copying from scalar to
vector. There is an extra ptrue instruction, but a predicate register
with the ptrue pattern populated is likely to be free in the context of
real code.
Tests were generated from a template to cover the axes mentioned at the
top of the test file.
Co-authored-by: Francesco Petrogalli <francesco.petrogalli@arm.com>
Differential Revision: https://reviews.llvm.org/D103170
Set informational fields in the .shader_functions table.
Also correct the documentation, .scratch_memory_size and .lds_size are
integers.
Differential Revision: https://reviews.llvm.org/D105116
This patch implaments the load and reserve and store conditional
builtins for the PowerPC target, in order to have feature parody with
xlC on AIX.
Differential revision: https://reviews.llvm.org/D105236