I introduced new tests in
commit 5cc1016a57b3 ("[llvm][SelectionDAGBuilder] codegen callbr.landingpad intrinsic")
https://reviews.llvm.org/D140160
that fails expensive checks. Disable -verify-machineinstrs in those
tests for now. Enable it in other tests for now, since MachineVerifier
isn't on by default for assertion builds.
Link: https://github.com/llvm/llvm-project/issues/60827
If the value from constant-pool is a splat value of vector type, do not
need swap after load from constant-pool.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D139491
We currently don't handle tail calls in fast-isel but
we continue with the lowering when -mlongcall is
specified and lower the calls normally. We should
defer to SDISel for this so that it is lowered correctly.
Differential revision: https://reviews.llvm.org/D123997
This doesn't make sense as an option. fneg and fabs are bit
preserving by definition. If a target has some fneg or fabs
instruction that are not bitpreserving it's incorrect to lower
fneg/fabs to use it.
Two of the float materialization patterns use the VSSRC regsiter class. This
register class is not available before Power 8. The patterns will stay the same
for Power 8 and up but must use the class F4RC for Power 7 and earlier.
This patch fixes those patterns.
Reviewed By: nemanjai, amyk, #powerpc
Differential Revision: https://reviews.llvm.org/D142120
With the NPM, we're now defaulting to preserving LCSSA, so a couple
of tests have changed slightly.
Differential Revision: https://reviews.llvm.org/D140982
Alignment of an alloca in IR can be lower than the preferred alignment
on purpose, but this override essentially treats the preferred
alignment as the minimum alignment.
The patch changes this behavior to always use the specified
alignment. If alignment is not set explicitly in LLVM IR, it is set to
DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign.
Tests are changed as well: explicit alignment is increased to match
the preferred alignment if it changes output, or omitted when it is
hard to determine the right value (e.g. for pointers, some structs, or
weird types).
Differential Revision: https://reviews.llvm.org/D135462
YAML specification does not allow keys duplication an a mapping. However, YAML
parser in LLVM does not have any check on that and uses only the last key entry.
In this change duplicated keys are merged to satisfy the spec.
Differential Revision: https://reviews.llvm.org/D141848
Vector store on P8 little endian will have swap instruction added before
the store in PPCISelLowring. If the vector is generated by splat, the
swap instruction can be eliminated.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D139691
For big endian targets that need a node such as this:
v2i8 = bitcast i16:tN
legalized by:
1. Promoting the i16 input type
2. Widening the v2i32 result type
The result will be incorrect because the legalizer will promote
the input type and then produce a scalar_to_vector from that
wider type to a vector of N elements of that type. That puts
the desired bits into the low order bytes of element zero and
they need to be in the high order bytes on big endian systems.
This patch changes the legalization to widen to a vector with
elements of the original scalar size.
Differential revision: https://reviews.llvm.org/D140365
When lowering calls on target like PPC, some stack loads
will be generated for by value parameters. Node CALLSEQ_START
prevents such loads from being combined.
Suggested by @RolandF, this patch removes the unnecessary
loads for the byval parameter by extending ForwardStoreValueToDirectLoad
Reviewed By: nemanjai, RolandF
Differential Revision: https://reviews.llvm.org/D138899
The transform that converts this checks the alignment of the global
object being accessed. However, there was no check for the offset
within the global object which caused the compiler to produce a
DS relocation for an unaligned address.
Summary: Some 64 bit constants can be materialized with fewer instructions than we currently use. We consider a 64 bit immediate value divided into four parts, Hi16OfHi32 (bits 48...63), Lo16OfHi32 (bits 32...47), Hi16OfLo32 (bits 16...31), Lo16OfLo32 (bits 0...15). When any three parts are equal, the immediate can be treated as "almost" a splat of a 32 bit value in a 64 bit register. For such case, we can use 3 instructions to generate the splat and use 1 instruction to modify the different part:
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D139813
* Add -fast-isel=false to func-alias.ll. The test was added as a
SelectionDAG test. Without this option, FastISel successfully selects
the call that had a ConstantExpr argument.
* fast-isel-branch.ll couldn't be handled by FastISel. Now it can,
hence the change in the stack offsets.
These are essentially add/sub 1 with a clamping value.
AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
Follow-up for 4ece50737d5385fb80cfa23f5297d1111f8eed39 (D142027).
Assignment Tracking Analysis now always runs and is skipped internally if
assignment tracking is disabled. Update these tests to expect to see the
pass run.
Buildbot failure: https://lab.llvm.org/buildbot/#/builders/57/builds/24094
Issue #58168 describes the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. For simple code, its easy to
understand the stack layout and where space is being allocated, but in
more complex programs, where code may be heavily inlined, unrolled, and
have duplicated code paths, it is no longer easy to manually inspect the
source program and understand where stack space can be attributed.
This patch implements a machine function pass that emits remarks with a
textual representation of stack slots, and also outputs any available
debug information to map source variables to those slots.
The new behavior can be used by adding `-Rpass-analysis=stack-frame-layout`
to the compiler invocation. Like other remarks the diagnostic
information can be saved to a file in a machine readable format by
adding -fsave-optimzation-record.
Fixes: #58168
Reviewed By: nickdesaulniers, thegameg
Differential Revision: https://reviews.llvm.org/D135488
IR is now always parsed in opaque pointer mode, unless
-opaque-pointers=0 is explicitly given. There is no automatic
detection of typed pointers anymore.
The -opaque-pointers=0 option is added to any remaining IR tests
that haven't been migrated yet.
Differential Revision: https://reviews.llvm.org/D141912
As noted in https://reviews.llvm.org/D141778#inline-1369900,
we fail to produce splat shuffles from certain sequences
of shuffles, that may have non-shuffles in the middle of seq.
There is a big pitfail to avoid here: just because `isSplatValue()`
says that all demanded elements are splat, we can't pick any random
one of them, because some of them could be undef! We must ignore those!
https://reviews.llvm.org/D140493 is going to teach SROA how to promote allocas
that have variably-indexed loads. That does bring up questions of cost model,
since that requires creating wide shifts.
Indeed, our legalization for them is not optimal.
We either split it into parts, or lower it into a libcall.
But if the shift amount is by a multiple of CHAR_BIT,
we can also legalize it throught stack.
The basic idea is very simple:
1. Get a stack slot 2x the width of the shift type
2. store the value we are shifting into one half of the slot
3. pad the other half of the slot. for logical shifts, with zero, for arithmetic shift with signbit
4. index into the slot (starting from the base half into which we spilled, either upwards or downwards)
5. load
6. split loaded integer
This works for both little-endian and big-endian machines:
https://alive2.llvm.org/ce/z/YNVwd5
And better yet, if the original shift amount was not a multiple of CHAR_BIT,
we can just shift by that remainder afterwards: https://alive2.llvm.org/ce/z/pz5G-K
I think, if we are going perform shift->shift-by-parts expansion more than once,
we should instead go through stack, which is what this patch does.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D140638
Issue #58168 describes the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. For simple code, its easy to
understand the stack layout and where space is being allocated, but in
more complex programs, where code may be heavily inlined, unrolled, and
have duplicated code paths, it is no longer easy to manually inspect the
source program and understand where stack space can be attributed.
This patch implements a machine function pass that emits remarks with a
textual representation of stack slots, and also outputs any available
debug information to map source variables to those slots.
The new behavior can be used by adding `-Rpass-analysis=stack-frame-layout`
to the compiler invocation. Like other remarks the diagnostic
information can be saved to a file in a machine readable format by
adding -fsave-optimzation-record.
Fixes: #58168
Reviewed By: nickdesaulniers, thegameg
Differential Revision: https://reviews.llvm.org/D135488
Summary: The toc-data feature has been supported for assembly file generation.
This patch handles the toc-data for object file generation.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D139516
This is part of selecting `G_ATOMIC*` instructions. Select `isync`, `sync` and `lwsync` in GISel.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D141360
Summary: This patch handles relocation field overflows in an XCOFF32 file. (XCOFF64 files may not have overflow section headers.) If a section has more than 65,534 relocation entries or line number entries, both of these fields are set to a value of 65535. In this case, an overflow section header with the s_flags field equal to STYP_OVRFLO is used to contain the relocation and line-number count information. Since line number is not supported, this patch only handles the relocation overflow.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D137819
These new debug values get inserted after the place where the spill
happens, which means they won't be reached by the reverse traversal of
basic block instructions. This would crash or fail assertions if they
contained any virtual registers to be replaced. We can manually handle
the new debug values right away to resolve this.
Fixes https://github.com/llvm/llvm-project/issues/59172
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D139590
Dynamic tls access model will be lowered to MI which clobbers CTR in
the loop in ISEL(ADDItlsgdLADDR) and post-isel CTR loop pass will revert
the loop to a normal compare + branch form.
So no need to add this clobber check in hardware loop insertion pass now.
Reviewed By: nemanjai
Differential revision: https://reviews.llvm.org/D140367
Passes before hardware loop insertion change the loop to a form which
is not a hardware loop candidate (return early before checking the ctr clobbers).
And the PHI in the loop exit block is also optimized away. This breaks the
previous test point when the case was committed. Fixing this by running this
case just before hardware loop insertion pass.
Reviewed By: nemanjai
Differential revision: https://reviews.llvm.org/D140366
This patch also includes:
1: CRRegBank support
2: Some workarounds in PPC table gen for anyext/setcc patterns
selection.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D140878
Previous to this patch we only materialized 0.0 and all other floating point
values would be loaded from the TOC. This patch adds materialization for the
floating point values that can be represented as integers in [-16.0, 15.0].
For example we will now materialize 3.0 and -5.0 but not 4.7.
Reviewed By: nemanjai, lei, #powerpc
Differential Revision: https://reviews.llvm.org/D138844
If the dividend has leading zeros, we can use them to reduce the
size of the multiplier and avoid the fixup cases.
This patch is for scalars only, but we might be able to do this
for vectors in a follow up.
Differential Revision: https://reviews.llvm.org/D140750