On AArch64, it is safe to let the linker handle relaxation of
unconditional branches; in most cases, the destination is within range,
and the linker doesn't need to do anything. If the linker does insert
fixup code, it clobbers the x16 inter-procedural register, so x16 must
be available across the branch before linking. If x16 isn't available,
but some other register is, we can relax the branch either by spilling
x16 OR using the free register for a manually-inserted indirect branch.
This patch builds on D145211. While that patch is for correctness, this
one is for performance of the common case. As noted in
https://reviews.llvm.org/D145211#4537173, we can trust the linker to
relax cross-section unconditional branches across which x16 is
available.
Programs that use machine function splitting care most about the
performance of hot code at the expense of the performance of cold code,
so we prioritize minimizing hot code size.
Here's a breakdown of the cases:
Hot -> Cold [x16 is free across the branch]
Do nothing; let the linker relax the branch.
Cold -> Hot [x16 is free across the branch]
Do nothing; let the linker relax the branch.
Hot -> Cold [x16 used across the branch, but there is a free register]
Spill x16; let the linker relax the branch.
Spilling requires fewer instructions than manually inserting an
indirect branch.
Cold -> Hot [x16 used across the branch, but there is a free register]
Manually insert an indirect branch.
Spilling would require adding a restore block in the hot section.
Hot -> Cold [No free regs]
Spill x16; let the linker relax the branch.
Cold -> Hot [No free regs]
Spill x16 and put the restore block at the end of the hot function; let the linker relax the branch.
Ex:
[Hot section]
func.hot:
... hot code...
func.restore:
... restore x16 ...
B func.hot
[Cold section]
func.cold:
... spill x16 ...
B func.restore
Putting the restore block at the end of the function instead of
just before the destination increases the cost of executing the
store, but it avoids putting cold code in the middle of hot code.
Since the restore is very rarely taken, this is a worthwhile
tradeoff.
Differential Revision: https://reviews.llvm.org/D156767
Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold.
This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well.
Alive2: https://alive2.llvm.org/ce/z/2UpSbJ
Differential Revision: https://reviews.llvm.org/D159198
Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold.
This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well.
Alive2: https://alive2.llvm.org/ce/z/2UpSbJ
Differential Revision: https://reviews.llvm.org/D159198
On PPC there are instructions to store element from vector(e.g.
stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail
constant in memset and constant splat array initialization.
This patch tries to explore these opportunities.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D138883
In some cases where the same mask is used for multiple
extending masked loads it can be more efficient to combine
the zero- or sign-extend into the load even if it's not a
legal or custom operation. This leads to splitting up the
extending load into smaller parts, which also requires
splitting the mask. For SVE at least this improves the
performance of the SPEC benchmark x264 slightly on
neoverse-v1 (~0.3%), and at least one other benchmark
improves by around 30%. The uplift for SVE seems due to
removing the dependencies (vector unpacks) introduced
between the loads and the vector operations, since this
should increase the level of parallelism.
See tests:
CodeGen/AArch64/sve-masked-ldst-sext.ll
CodeGen/AArch64/sve-masked-ldst-zext.ll
https://reviews.llvm.org/D159191
Remove CodeGen leftovers from the old combiner backend and adapt the API to fit the new backend better.
It's now quite a bit closer to how InstructionSelector works.
- `CombinerInfo` is now a simple "options" struct.
- `Combiner` is now the base class of all TableGen'd combiner implementation.
- Many fields have been moved from derived classes into that class.
- It has been refactored to create & own the Observer and Builder.
- `tryCombineAll` TableGen'd method can now be renamed, which allows targets to implement the actual `tryCombineAll` call manually and do whatever they want to do before/after it.
Note: `CombinerHelper` needs to be mutable because none of its methods are const. This can be revisited later.
Depends on D158710
Reviewed By: aemerson, dsanders
Differential Revision: https://reviews.llvm.org/D158713
uses when looking for load/store users. This was a simple logic bug during translation
of the equivalent function in SelectionDAG:
```
for (SDNode *Node : N->uses()) {
if (auto *LoadStore = dyn_cast<MemSDNode>(Node)) {
```
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.
https://reviews.llvm.org/D157871
If llvm-reduce is going to unconditionally verify functions with
LiveIntervals, it needs to be tolerant of generic vregs.
https://reviews.llvm.org/D133813
Extension to the signbit case, if the signbits extend down through all the demanded bits then SMIN/SMAX/UMIN/UMAX nodes can be simplified to a OR/AND/AND/OR.
Alive2: https://alive2.llvm.org/ce/z/mFVFAn (general case)
Differential Revision: https://reviews.llvm.org/D158364
Irritatingly, atomic_store had operands in the opposite order from
regular store. This made it difficult to share patterns between
regular and atomic stores.
There was a previous incomplete attempt to move atomic_store into the
regular StoreSDNode which would be better.
I think it was a mistake for all atomicrmw to swap the operand order,
so maybe it's better to take this one step further.
https://reviews.llvm.org/D123143
With the large code model, the label difference may not fit into 32 bits.
Even if we assume that any individual function is no larger than 2^32
and use a difference from the function entry to the target destination,
things like BOLT can rearrange blocks (even if BOLT doesn't necessarily
work with the large code model right now).
set directives avoid static relocations in some 32-bit entry cases, but
don't worry about set directives for 64-bit jump table entries (we can
do that later if somebody really cares about it).
check-llvm in a bootstrapped clang with the large code model passes.
Fixes#62894
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D159297
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:
* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).
Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.
Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)
This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D149367
Should add some minor type safety to the use of this information, since
there's quite a bit of metadata being laundered through an `unsigned`.
I'm looking to potentially add more bitfields to that `unsigned`, but I
find InlineAsm's big ol' bag of enum values and usage of `unsigned`
confusing, type-unsafe, and un-ergonomic. These can probably be better
abstracted.
I think the lack of static_cast outside of InlineAsm indicates the prior
code smell fixed here.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D159242
When replacing ComplexDeinterleavingPass::ReductionOperation, we can do it
either from the Real or Imaginary part. The correct way is to take whichever
is later in the BasicBlock, but before the patch, we just always took the
Real part.
Fixes https://github.com/llvm/llvm-project/issues/65044
Differential Revision: https://reviews.llvm.org/D159209
This adds some more extensive test coverage for fdiv through global isel,
switching the opcodes to use the more complete ActionDefinitions to handle more
cases and moving it into the position of the existing code which is no longer
needed.
From the discussion in https://reviews.llvm.org/D158853, moving the truncate
into the splat helps more splatted scalar operands get selected on RISC-V, and
also avoids the need for splat_vector_parts on RV32.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D159147
That matches how such a SPLAT_VECTOR would have been type legalized
so assume it is ok to use for creating constants after type legalization.
Still need some improvements to SPLAT_VECTOR lowering.
This overlaps with some of what D158742 was trying to fix.
Reviewed By: luke
Differential Revision: https://reviews.llvm.org/D158870
Before elimination of mostly empty block it makes sense to remove dead PHI nodes.
It open more opportunity for elimination plus eliminates dead code itself.
It appeared that change results in failing many unit tests and some of
them I've updated and for another one I disable this optimization.
The pattern I observed in the tests is that there is a infinite loop
without side effects. As a result after elimination of dead phi node all other
related instruction are also removed and tests stops to check what it is expected.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D158503
Jump tables on AArch64 are label-relative rather than table-relative, so
having jump table destinations that are in different sections causes
problems with relocation. Jump table lookups have a max range of 1MB, so
all destinations must be in the same section as the lookup code. Both of
these restrictions can be mitigated with some careful and complex logic,
but doing so doesn't gain a huge performance benefit.
Efficiently ensuring jump tables are correct and can be compressed on
AArch64 is a TODO item. In the meantime, don't split blocks that can
cause problems.
Differential Revision: https://reviews.llvm.org/D157124
We can work out the known bits for a given lane by concatenating the known bits of each scalar operand.
In the description of ISD::SPLAT_VECTOR_PARTS in ISDOpcodes.h it says that the
total size of the scalar operands must cover the output element size, but I've
added a stricter assertion here that the total width of the scalar operands
must be exactly equal to the element size. It doesn't seem to trigger, and I'm
not sure if there any targets that use SPLAT_VECTOR_PARTS for anything other
than v4i32 -> v2i64 splats.
We also need to include it in isTargetCanonicalConstantNode, otherwise
returning the known bits introduces an infinite combine loop.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D158852
This improves some cases where a splat_vector uses a build_pair that can be
simplified, e.g:
(rotl x:i64, splat_vector (build_pair x1:i32, x2:i32))
rotl only demands the bottom 6 bits, so this patch allows it to simplify it to:
(rotl x:i64, splat_vector (build_pair x1:i32, undef:i32))
Which in turn improves some cases where a splat_vector_parts is lowered on
RV32.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D158839
This reverts commit 8d0c3db388143f4e058b5f513a70fd5d089d51c3.
Causes crashes, see comments in https://reviews.llvm.org/D149367.
Some follow-up fixes are also reverted:
This reverts commit 636269f4fca44693bfd787b0a37bb0328ffcc085.
This reverts commit 5966079cf4d4de0285004eef051784d0d9f7a3a6.
This reverts commit e7294dbc85d24a08c716d9babbe7f68390cf219b.
The option was added in github.com/llvm/llvm-project/commit/90ab85a but it doesn't seem to be used. The triple check has been removed so this shouldn't be required going forward.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D158885
This patch fixes:
llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp:3468:14: error:
variable 'foundJTI' set but not used
[-Werror,-Wunused-but-set-variable]