3554 Commits

Author SHA1 Message Date
Nick Desaulniers
39811e2e53 [llvm][test] enable/disable -verify-machineinstrs where possible for callbr
I introduced new tests in
commit 5cc1016a57b3 ("[llvm][SelectionDAGBuilder] codegen callbr.landingpad intrinsic")
https://reviews.llvm.org/D140160
that fails expensive checks. Disable -verify-machineinstrs in those
tests for now. Enable it in other tests for now, since MachineVerifier
isn't on by default for assertion builds.

Link: https://github.com/llvm/llvm-project/issues/60827
2023-02-16 20:28:18 -08:00
Nick Desaulniers
a3a84c9e25 [llvm] add CallBrPrepare pass to pipelines
Capstone of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8

Clang changes are still necessary to enable the use of outputs along
indirect edges of asm goto statements.

Link: https://github.com/llvm/llvm-project/issues/53562

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D140180
2023-02-16 17:58:34 -08:00
Nick Desaulniers
5cc1016a57 [llvm][SelectionDAGBuilder] codegen callbr.landingpad intrinsic
Given a CallBrInst, retain its first virtual register in SelectionDagBuilder's
FunctionLoweringInfo if there's corresponding landingpad. Walk the list
of COPY MachineInstr to find the original virtual and physical registers
defined by the INLINEASM_BR MachineInst.

Test cases from https://reviews.llvm.org/D139565.
Link: https://github.com/llvm/llvm-project/issues/59538

Part 3 from
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8

Follow up patches still need to wire up CallBrPrepare into the pass
pipelines.

Reviewed By: efriedma, void

Differential Revision: https://reviews.llvm.org/D140160
2023-02-16 17:58:34 -08:00
Ting Wang
52a774fd4c [PowerPC] remove XXSWAPD after load from CP which is a splat value
If the value from constant-pool is a splat value of vector type, do not
need swap after load from constant-pool.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D139491
2023-02-16 19:21:35 -05:00
Nemanja Ivanovic
56e41fcf50 [PowerPC] Bail out of FISel when lowering long calls
We currently don't handle tail calls in fast-isel but
we continue with the lowering when -mlongcall is
specified and lower the calls normally. We should
defer to SDISel for this so that it is lowered correctly.

Differential revision: https://reviews.llvm.org/D123997
2023-02-16 16:15:32 -05:00
Matt Arsenault
09dd4d870e DAG: Remove hasBitPreservingFPLogic
This doesn't make sense as an option. fneg and fabs are bit
preserving by definition. If a target has some fneg or fabs
instruction that are not bitpreserving it's incorrect to lower
fneg/fabs to use it.
2023-02-14 10:25:24 -04:00
Arthur Eubanks
7c6b46e87e Revert "[DAGCombiner] handle more store value forwarding"
This reverts commit f35a09daebd0a90daa536432e62a2476f708150d.

Causes miscompiles, see D138899
2023-02-13 19:07:28 -08:00
Chen Zheng
6ee2f770ef [PowerPC][GISel] add support for fpconstant
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D133340
2023-02-14 02:39:22 +00:00
Stefan Pintilie
2e47aafb02 [PowerPC] Fix float materialization patterns.
Two of the float materialization patterns use the VSSRC regsiter class. This
register class is not available before Power 8. The patterns will stay the same
for Power 8 and up but must use the class F4RC for Power 7 and earlier.

This patch fixes those patterns.

Reviewed By: nemanjai, amyk, #powerpc

Differential Revision: https://reviews.llvm.org/D142120
2023-02-13 10:18:53 -05:00
Samuel Parker
2a58be4239 [HardwareLoops] NewPM support.
With the NPM, we're now defaulting to preserving LCSSA, so a couple
of tests have changed slightly.

Differential Revision: https://reviews.llvm.org/D140982
2023-02-13 09:46:31 +00:00
Andrew Savonichev
c65b4d64d4 [SelectionDAG] Do not second-guess alignment for alloca
Alignment of an alloca in IR can be lower than the preferred alignment
on purpose, but this override essentially treats the preferred
alignment as the minimum alignment.

The patch changes this behavior to always use the specified
alignment. If alignment is not set explicitly in LLVM IR, it is set to
DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign.

Tests are changed as well: explicit alignment is increased to match
the preferred alignment if it changes output, or omitted when it is
hard to determine the right value (e.g. for pointers, some structs, or
weird types).

Differential Revision: https://reviews.llvm.org/D135462
2023-02-09 18:45:20 +03:00
Anton Sidorenko
6820cb2dd5 [Test] Fix YAML mapping keys duplication. NFC.
YAML specification does not allow keys duplication an a mapping. However, YAML
parser in LLVM does not have any check on that and uses only the last key entry.
In this change duplicated keys are merged to satisfy the spec.

Differential Revision: https://reviews.llvm.org/D141848
2023-02-09 12:59:50 +03:00
Kai Luo
96aaebd12e [MachineCopyPropagation] Eliminate spillage copies that might be caused by eviction chain
Remove spill-reload like copy chains. For example
```
r0 = COPY r1
r1 = COPY r2
r2 = COPY r3
r3 = COPY r4
<def-use r4>
r4 = COPY r3
r3 = COPY r2
r2 = COPY r1
r1 = COPY r0
```
will be folded into
```
r0 = COPY r1
r1 = COPY r4
<def-use r4>
r4 = COPY r1
r1 = COPY r0
```

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D122118
2023-02-08 03:34:25 +00:00
Simon Pilgrim
9ffe58dc27 [PowerPC] aix32-cc-abi-vaarg.ll - improve DAG checks
More closely match the actual output and should make the merge with D127115 easier.
2023-02-04 11:17:36 +00:00
Ting Wang
1d8f13ae45 [PowerPC] add a peephole to remove redundant swap instructions after vector splats on P8
Vector store on P8 little endian will have swap instruction added before
the store in PPCISelLowring. If the vector is generated by splat, the
swap instruction can be eliminated.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D139691
2023-02-02 20:52:52 -05:00
Nemanja Ivanovic
a5b662a834 [SelectionDAG] Correctly widen bitcast of scalar to vector for big endian
For big endian targets that need a node such as this:
v2i8 = bitcast i16:tN

legalized by:

1. Promoting the i16 input type
2. Widening the v2i32 result type

The result will be incorrect because the legalizer will promote
the input type and then produce a scalar_to_vector from that
wider type to a vector of N elements of that type. That puts
the desired bits into the low order bytes of element zero and
they need to be in the high order bytes on big endian systems.
This patch changes the legalization to widen to a vector with
elements of the original scalar size.

Differential revision: https://reviews.llvm.org/D140365
2023-02-02 12:01:14 -06:00
Chen Zheng
f35a09daeb [DAGCombiner] handle more store value forwarding
When lowering calls on target like PPC, some stack loads
will be generated for by value parameters. Node CALLSEQ_START
prevents such loads from being combined.

Suggested by @RolandF, this patch removes the unnecessary
loads for the byval parameter by extending ForwardStoreValueToDirectLoad

Reviewed By: nemanjai, RolandF

Differential Revision: https://reviews.llvm.org/D138899
2023-02-01 21:06:17 -05:00
Chen Zheng
0a32e693e3 [DAGCombiner][NFC] add testcases for D138899 2023-02-01 21:06:09 -05:00
Nemanja Ivanovic
19311e0a2e [PowerPC] Do not convert lwz to lwa if the offset is not a multiple of 4
The transform that converts this checks the alignment of the global
object being accessed. However, there was no check for the offset
within the global object which caused the compiler to produce a
DS relocation for an unaligned address.
2023-01-31 09:54:29 -06:00
esmeyi
2224b53f06 [PowerPC] Improve materialization for immediates which is almost a 32 bit splat.
Summary: Some 64 bit constants can be materialized with fewer instructions than we currently use. We consider a 64 bit immediate value divided into four parts, Hi16OfHi32 (bits 48...63), Lo16OfHi32 (bits 32...47), Hi16OfLo32 (bits 16...31), Lo16OfLo32 (bits 0...15). When any three parts are equal, the immediate can be treated as "almost" a splat of a 32 bit value in a 64 bit register. For such case, we can use 3 instructions to generate the splat and use 1 instruction to modify the different part:

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D139813
2023-01-31 06:02:17 -05:00
Nemanja Ivanovic
7087f053f6 [PowerPC] Pre-commit test for fix to peephole opt
This just adds a test case with current code gen. The patch
with the fix will correct the code gen.
2023-01-30 21:01:21 -06:00
Nemanja Ivanovic
f68fc8d9d2 [PowerPC] Fix incorrect shift amount for build_vector
The pattern for a build_vector node was incorrect for big endian
subtargets.
2023-01-30 16:36:08 -06:00
Sergei Barannikov
de66efdb25 [PowerPC] Convert more tests to opaque pointers (NFC) 2023-01-30 05:19:24 +03:00
Sergei Barannikov
f4fbcd62af [PowerPC] Convert more tests to opaque pointers (NFC)
* Add -fast-isel=false to func-alias.ll. The test was added as a
SelectionDAG test. Without this option, FastISel successfully selects
the call that had a ConstantExpr argument.
* fast-isel-branch.ll couldn't be handled by FastISel. Now it can,
hence the change in the stack offsets.
2023-01-30 04:27:10 +03:00
Sergei Barannikov
fd9f42fad2 [PowerPC] Convert some tests to opaque pointers (NFC) 2023-01-30 00:40:12 +03:00
Simon Pilgrim
846ec90924 [PowerPC] ppc64-P9-vabsd.ll - add some basic ISD::ABDS test coverage
Test coverage to ensure D142313 lowers ISD::ABDU -> VABSD but not ISD::ABDS (although I think v4i32 would be compatible with the XVNEGSP trick)
2023-01-27 11:12:16 +00:00
Matt Arsenault
778cf5431c IR: Add atomicrmw uinc_wrap and udec_wrap
These are essentially add/sub 1 with a clamping value.

AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
2023-01-24 17:55:11 -04:00
Simon Pilgrim
2e8aa2dcbc [PowerPC] Regenerate vec_absd.ll test checks 2023-01-22 17:19:48 +00:00
OCHyams
99c12afeb4 [Assignment Tracking] Fix tests for buildbot failure (2)
Follow-up for 4ece50737d5385fb80cfa23f5297d1111f8eed39 (D142027).

Assignment Tracking Analysis now always runs and is skipped internally if
assignment tracking is disabled. Update these tests to expect to see the
pass run.

Buildbot failure: https://lab.llvm.org/buildbot/#/builders/57/builds/24094
2023-01-20 15:58:35 +00:00
Paul Kirth
557a5bc336 [codegen] Add StackFrameLayoutAnalysisPass
Issue #58168 describes the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. For simple code, its easy to
understand the stack layout and where space is being allocated, but in
more complex programs, where code may be heavily inlined, unrolled, and
have duplicated code paths, it is no longer easy to manually inspect the
source program and understand where stack space can be attributed.

This patch implements a machine function pass that emits remarks with a
textual representation of stack slots, and also outputs any available
debug information to map source variables to those slots.

The new behavior can be used by adding `-Rpass-analysis=stack-frame-layout`
to the compiler invocation. Like other remarks the diagnostic
information can be saved to a file in a machine readable format by
adding -fsave-optimzation-record.

Fixes: #58168

Reviewed By: nickdesaulniers, thegameg

Differential Revision: https://reviews.llvm.org/D135488
2023-01-19 01:51:14 +00:00
Nikita Popov
9ed2f14c87 [AsmParser] Remove typed pointer auto-detection
IR is now always parsed in opaque pointer mode, unless
-opaque-pointers=0 is explicitly given. There is no automatic
detection of typed pointers anymore.

The -opaque-pointers=0 option is added to any remaining IR tests
that haven't been migrated yet.

Differential Revision: https://reviews.llvm.org/D141912
2023-01-18 09:58:32 +01:00
Lei Huang
ee559b21b9 [P10] Fix the implementation for BRH
Fixes the patterns for the brh instruction to include a clrldi when emitted.

Reviewed By: amyk

Differential Revision: https://reviews.llvm.org/D141697
2023-01-16 13:53:43 -06:00
Roman Lebedev
f8d9097168
[DAGCombiner] combineShuffleOfSplatVal(): try to canonicalize to a splat shuffle
As noted in https://reviews.llvm.org/D141778#inline-1369900,
we fail to produce splat shuffles from certain sequences
of shuffles, that may have non-shuffles in the middle of seq.

There is a big pitfail to avoid here: just because `isSplatValue()`
says that all demanded elements are splat, we can't pick any random
one of them, because some of them could be undef! We must ignore those!
2023-01-15 21:11:33 +03:00
Roman Lebedev
cc39c3b17f
[Codegen][LegalizeIntegerTypes] New legalization strategy for scalar shifts: shift through stack
https://reviews.llvm.org/D140493 is going to teach SROA how to promote allocas
that have variably-indexed loads. That does bring up questions of cost model,
since that requires creating wide shifts.

Indeed, our legalization for them is not optimal.
We either split it into parts, or lower it into a libcall.
But if the shift amount is by a multiple of CHAR_BIT,
we can also legalize it throught stack.

The basic idea is very simple:
1. Get a stack slot 2x the width of the shift type
2. store the value we are shifting into one half of the slot
3. pad the other half of the slot. for logical shifts, with zero, for arithmetic shift with signbit
4. index into the slot (starting from the base half into which we spilled, either upwards or downwards)
5. load
6. split loaded integer

This works for both little-endian and big-endian machines:
https://alive2.llvm.org/ce/z/YNVwd5

And better yet, if the original shift amount was not a multiple of CHAR_BIT,
we can just shift by that remainder afterwards: https://alive2.llvm.org/ce/z/pz5G-K

I think, if we are going perform shift->shift-by-parts expansion more than once,
we should instead go through stack, which is what this patch does.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D140638
2023-01-14 19:12:18 +03:00
Paul Kirth
fdc0bf6adc Revert "[codegen] Add StackFrameLayoutAnalysisPass"
This breaks on some AArch64 bots

This reverts commit 0a652c540556a118bbd9386ed3ab7fd9e60a9754.
2023-01-13 22:59:36 +00:00
Paul Kirth
0a652c5405 [codegen] Add StackFrameLayoutAnalysisPass
Issue #58168 describes the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. For simple code, its easy to
understand the stack layout and where space is being allocated, but in
more complex programs, where code may be heavily inlined, unrolled, and
have duplicated code paths, it is no longer easy to manually inspect the
source program and understand where stack space can be attributed.

This patch implements a machine function pass that emits remarks with a
textual representation of stack slots, and also outputs any available
debug information to map source variables to those slots.

The new behavior can be used by adding `-Rpass-analysis=stack-frame-layout`
to the compiler invocation. Like other remarks the diagnostic
information can be saved to a file in a machine readable format by
adding -fsave-optimzation-record.

Fixes: #58168

Reviewed By: nickdesaulniers, thegameg

Differential Revision: https://reviews.llvm.org/D135488
2023-01-13 20:52:48 +00:00
esmeyi
5ce0a26bd1 [XCOFF] handle the toc-data for object file generation.
Summary: The toc-data feature has been supported for assembly file generation.
         This patch handles the toc-data for object file generation.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D139516
2023-01-11 23:27:47 -05:00
Kai Luo
d9630c34f4 [PowerPC][GISel] Select sync instructions required by atomic operations
This is part of selecting `G_ATOMIC*` instructions. Select `isync`, `sync` and `lwsync` in GISel.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D141360
2023-01-11 16:25:46 +08:00
esmeyi
2aa4b69bd6 [XCOFF][NFC] Update the test aix-xcoff-huge-relocs.ll 2023-01-10 05:18:53 -05:00
esmeyi
ea6dec1b3a [XCOFF] support the overflow section (only relocation overflow is handled).
Summary: This patch handles relocation field overflows in an XCOFF32 file. (XCOFF64 files may not have overflow section headers.) If a section has more than 65,534 relocation entries or line number entries, both of these fields are set to a value of 65535. In this case, an overflow section header with the s_flags field equal to STYP_OVRFLO is used to contain the relocation and line-number count information. Since line number is not supported, this patch only handles the relocation overflow.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D137819
2023-01-10 02:39:02 -05:00
Josh Stone
87f57f459e [RegAllocFast] Handle new debug values for spills
These new debug values get inserted after the place where the spill
happens, which means they won't be reached by the reverse traversal of
basic block instructions. This would crash or fail assertions if they
contained any virtual registers to be replaced. We can manually handle
the new debug values right away to resolve this.

Fixes https://github.com/llvm/llvm-project/issues/59172

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D139590
2023-01-05 20:41:11 -08:00
Chen Zheng
85edf1fc70 [PowerPC] remove the ctr clobbers check related to TLS access
Dynamic tls access model will be lowered to MI which clobbers CTR in
the loop in ISEL(ADDItlsgdLADDR) and post-isel CTR loop pass will revert
the loop to a normal compare + branch form.

So no need to add this clobber check in hardware loop insertion pass now.

Reviewed By: nemanjai

Differential revision: https://reviews.llvm.org/D140367
2023-01-05 21:23:29 -05:00
Chen Zheng
dd0edc876c [PowerPC][NFC] add an option to keep the test point
Passes before hardware loop insertion change the loop to a form which
is not a hardware loop candidate (return early before checking the ctr clobbers).
And the PHI in the loop exit block is also optimized away. This breaks the
previous test point when the case was committed. Fixing this by running this
case just before hardware loop insertion pass.

Reviewed By: nemanjai

Differential revision: https://reviews.llvm.org/D140366
2023-01-05 21:18:53 -05:00
Luke Drummond
108766fc7e Fix typos
I found one typo of "implemnt", then some more.
s/implemnt/implement/g
2023-01-05 18:49:23 +00:00
Nikita Popov
60442f0d44 [CodeGen] Convert some tests to opaque pointers (NFC)
These are mostly MIR tests, which I did not handle during previous
conversions.
2023-01-05 13:21:20 +01:00
Chen Zheng
6a930e8891 1: use class instead of MVT
2: minor fix for the comments
2023-01-05 07:53:59 +00:00
Chen Zheng
ac93a4e77d [PowerPC][GISel]fcmp support
This patch also includes:
1: CRRegBank support
2: Some workarounds in PPC table gen for anyext/setcc patterns
   selection.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D140878
2023-01-05 07:45:29 +00:00
Stefan Pintilie
c1d0118459 [PowerPC] Materialize floats in the range [-16.0, 15.0].
Previous to this patch we only materialized 0.0 and all other floating point
values would be loaded from the TOC. This patch adds materialization for the
floating point values that can be represented as integers in [-16.0, 15.0].

For example we will now materialize 3.0 and -5.0 but not 4.7.

Reviewed By: nemanjai, lei, #powerpc

Differential Revision: https://reviews.llvm.org/D138844
2023-01-04 12:52:30 -06:00
Matt Arsenault
bf4596bf58 CodeGen: Clean up some tests with broken "strictfp" attribute 2023-01-03 20:26:57 -05:00
Craig Topper
8abd70081f [TargetLowering] Teach BuildUDIV to take advantage of leading zeros in the dividend.
If the dividend has leading zeros, we can use them to reduce the
size of the multiplier and avoid the fixup cases.

This patch is for scalars only, but we might be able to do this
for vectors in a follow up.

Differential Revision: https://reviews.llvm.org/D140750
2022-12-29 13:58:46 -08:00