3492 Commits

Author SHA1 Message Date
Roman Lebedev
428f36401b
Reland "[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block"
This reverts commit 37b8f09a4b61bf9bf9d0b9017d790c8b82be2e17,
and returns commit 1bd0b82e508d049efdb07f4f8a342f35818df341.
The miscompile was in InstCombine, and it has been addressed.

This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37

Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb

Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts

Differential Revision: https://reviews.llvm.org/D139275
2022-12-17 05:18:54 +03:00
Alexander Kornienko
37b8f09a4b Revert "[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block"
This reverts commit 1bd0b82e508d049efdb07f4f8a342f35818df341, since it leads to
miscompiles. See https://reviews.llvm.org/D139275#3993229 and
https://reviews.llvm.org/D139275#4001580.
2022-12-16 17:23:35 +01:00
Nemanja Ivanovic
cb3f415cd2 [PowerPC] Fix up memory ordering after combining BV to a load
The combiner for BUILD_VECTOR that merges consecutive
loads into a wide load had two issues:

- It didn't check that the input loads all have the
  same input chain
- It didn't update nodes that are chained to the original
  loads to be chained to the new load

This caused issues with bootstrap when
3c4d2a03968ccf5889bacffe02d6fa2443b0260f was committed.
This patch fixes the issue so it can unblock this commit.

Differential revision: https://reviews.llvm.org/D140046
2022-12-16 08:57:36 -06:00
Kai Nacke
110340c687 [PowerPC][GIsel] Materialize i64 constants.
Adds support for i64 constant. It uses the same pattern-based
approach as in SDAG (see PPCISelDAGToDAG::selectI64ImmDirect(),
PPCISelDAGToDAG::selectI64Imm()). It does not support the
prefixed instructions.

Reviewed By: arsenm, tschuett

Differential Revision: https://reviews.llvm.org/D140119
2022-12-15 21:22:58 +00:00
Ron Lieberman
38f1abef86 Revert "[SelectionDAG] Do not second-guess alignment for alloca"
Breaks amdgpu buildbot https://lab.llvm.org/buildbot/#/builders/193
 23491

This reverts commit ffedf47d8b793e07317f82f9c2a5f5425ebb71ad.
2022-12-15 10:55:18 -06:00
Andrew Savonichev
ffedf47d8b [SelectionDAG] Do not second-guess alignment for alloca
Alignment of an alloca in IR can be lower than the preferred alignment
on purpose, but this override essentially treats the preferred
alignment as the minimum alignment.

The patch changes this behavior to always use the specified
alignment. If alignment is not set explicitly in LLVM IR, it is set to
DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign.

Tests are changed as well: explicit alignment is increased to match
the preferred alignment if it changes output, or omitted when it is
hard to determine the right value (e.g. for pointers, some structs, or
weird types).

Differential Revision: https://reviews.llvm.org/D135462
2022-12-15 18:18:12 +03:00
esmeyi
2e8c7f6527 [XCOFF] adjust the Fixedvalue for R_RBR relocations.
Summary: Currently we get a wrong fixed value for R_RBR relocations when -ffunction-sections enabled. This patch fixes this.

Reviewed By: DiggerLin, shchenz

Differential Revision: https://reviews.llvm.org/D138982
2022-12-15 01:56:53 -05:00
esmeyi
d4fd275896 [NFC][PowerPC] Add tests for 64-bit constants that require 5 instructions to materialize.
Differential Revision: https://reviews.llvm.org/D139914
2022-12-13 02:44:49 -05:00
Ting Wang
e6d925bc4b [PowerPC][NFC] Add test case for memset tail store
Add test case to show something can be improved.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D138881
2022-12-12 20:07:23 -05:00
Roman Lebedev
1bd0b82e50
[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block
This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37

Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb

Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts

Differential Revision: https://reviews.llvm.org/D139275
2022-12-12 18:20:03 +03:00
Chen Zheng
d7ee19d163 [PowerPC][GISel] add the missing verify option - NFC 2022-12-12 12:59:27 +00:00
Chen Zheng
b41d22db18 [PowerPC][GISel] support 32 bit load/store
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135535
2022-12-12 12:52:44 +00:00
Chen Zheng
503a935d89 [PowerPC][GISel] support 64 bit load/store
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134792
2022-12-12 12:20:54 +00:00
Roman Lebedev
d70f271726
[NFC] Port codegen PowerPC tests that invoke opt to -passes= syntax 2022-12-09 01:04:47 +03:00
Roman Lebedev
b1a9584818
[opt] Disincentivize new tests from using old pass syntax
Over the past day or so, i've took a large swing at our tests,
and reduced the number of tests that were still using the old syntax
from ~1800 to just 200.

Left to handle: (as it is seen in this patch)
* Transforms/LSR
* Transforms/CGP
* Transforms/TypePromotion
* Transforms/HardwareLoops
* Analysis/*
* some misc.

I think this is the right point to start actively refusing
to honor the old syntax, except for the old tests,
to prevent the old syntax from creeping back in.

Thus, let's add temporary default-off flag,
and if it is not passed refuse to accept old syntax.
The tests that still need porting are annotated with this flag.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D139647
2022-12-08 23:54:03 +03:00
Ting Wang
140a83e32f [PowerPC][NFC] Test case update on ppc64-acc-regalloc-bugfix.ll
Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D139492
2022-12-07 20:16:12 -05:00
Anton Sidorenko
f8ed709345 [MachineCombiner] Extend reassociation logic to handle inverse instructions
Machine combiner supports generic reassociation only of associative and
commutative instructions, for example (A + X) + Y => (X + Y) + A. However, we
can extend this generic support to handle patterns like
(X + A) - Y => (X - Y) + A), where `-` is the inverse of `+`.
This patch adds interface functions to process reassociation patterns of
associative/commutative instructions and their inverse variants with minimal
changes in backends.

Differential Revision: https://reviews.llvm.org/D136754
2022-12-07 13:50:28 +03:00
Qiu Chaofan
62f20f51ce [PowerPC] Support test data class intrinsic of 128-bit float
We've exploited test data class instructions introduced in ISA 3.0.
This change unifies the scalar intrinsics into ppc_test_data_class
and add support for 128-bit precision float values using xststdcqp.

Vector versions of the intrinsic can't be unified because they return
vector int instead of int.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D138105
2022-12-07 16:44:12 +08:00
Roman Lebedev
58b0485118
[NFC][PPC] Autogenerate checklines in ppc-ctr-dead-code.ll to simplify update 2022-12-06 03:47:46 +03:00
Jonas Paulsson
5ecd363295 Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
This reverts commit 122efef8ee9be57055d204d52c38700fe933c033.

- Patch fixed to not reuse definitions from predecessors in EH landing pads.
- Late review suggestions (by MaskRay) have been addressed.
- M68k/pipeline.ll test updated.
- Init captures added in processBlock() to avoid capturing structured bindings.
- RISCV has this disabled for now.

Original commit message:

A new pass MachineLateInstrsCleanup is added to be run after PEI.

This is a simple pass that removes redundant and identical instructions
whenever found by scanning the MF once while keeping track of register
definitions in a map. These instructions are typically immediate loads
resulting from rematerialization, and address loads emitted by target in
eliminateFrameInde().

This is enabled by default, but a target could easily disable it by means of
'disablePass(&MachineLateInstrsCleanupID);'.

This late cleanup is naturally not "optimal" in removing instructions as it
is done by looking at phys-regs, but still quite effective. It would be
desirable to improve other parts of CodeGen and avoid these redundant
instructions in the first place, but there are no ideas for this yet.

Differential Revision: https://reviews.llvm.org/D123394

Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
2022-12-05 12:53:50 -06:00
Roman Lebedev
2a05bd212e
[NFC] Fix test/CodeGen/PowerPC/O0-pipeline.ll 2022-12-05 17:21:39 +03:00
Dmitry Vyukov
dbe8c2c316 Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-12-05 14:40:31 +01:00
Chen Zheng
0a9b1c59f0 [PowerPC][GISel]support for float point and integer convertion
Add support for fptosi,fptoui,sitofp,uitofp

For now only handle 64 bit integer to make it does not depend on
any other patches. 32 bit integer needs handling for G_SEXT/G_ZEXT.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D139174
2022-12-04 22:21:57 -05:00
Chen Zheng
b5e1fc19da [PowerPC] don't check CTR clobber in hardware loop insertion pass
We added a new post-isel CTRLoop pass in D122125. That pass will expand
the hardware loop related intrinsic to CTR loop or normal loop based
on the loop context. So we don't need to conservatively check the CTR
clobber now on the IR level.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D135847
2022-12-04 20:53:49 -05:00
Jonas Paulsson
122efef8ee Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions.""
This reverts commit 17db0de330f943833296ae72e26fa988bba39cb3.

Some more bots got broken - need to investigate.
2022-12-05 00:52:00 +01:00
Jonas Paulsson
17db0de330 Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
Init captures added in processBlock() to avoid capturing structured bindings,
which caused the build problems (with clang).

RISCV has this disabled for now until problems relating to post RA pseudo
expansions are resolved.
2022-12-03 14:15:15 -06:00
Chen Zheng
b61ff0ca76 [PowerPC] move ctrloop pass before tail duplication
Tail duplication may modify the loop to a "non-canonical" form
that CTR Loop pass can not recognize. We fixed one issue in D135846.
And we found in some other case, the loop is changed to irreducible form.
It is hard to fix this case in CTR loop pass, instead we reorder the
CTR loop pass before tail duplication pass and just after finalize-isel
pass to avoid any unexpected change to the loop form.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D138265
2022-12-02 00:31:00 -05:00
Chen Zheng
dff8227189 Revert "[PowerPC] handle more than two predecessors loop header in ctrloop pass"
This reverts commit df9d60af1f9fa44f411b656bbc691d950c6fc087.

The CTRLoops pass is reordered to front of tail duplication pass in D138265.
2022-12-02 00:30:56 -05:00
Chen Zheng
4dfa12addd [PowerPC] [NFC] add test for O0 pipeline
This is to address comments https://reviews.llvm.org/D138265#3950197

This should be helpful for detecting optimization passes added to O0
pipeline by mistake.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D138973
2022-12-01 22:16:54 -05:00
Jonas Paulsson
8ef4632681 Revert "[CodeGen] Add new pass for late cleanup of redundant definitions."
Temporarily revert and fix buildbot failure.

This reverts commit 6d12599fd4134c1da63198c74a25490d28c733f6.
2022-12-01 13:29:24 -05:00
Jonas Paulsson
6d12599fd4 [CodeGen] Add new pass for late cleanup of redundant definitions.
A new pass MachineLateInstrsCleanup is added to be run after PEI.

This is a simple pass that removes redundant and identical instructions
whenever found by scanning the MF once while keeping track of register
definitions in a map. These instructions are typically immediate loads
resulting from rematerialization, and address loads emitted by target in
eliminateFrameInde().

This is enabled by default, but a target could easily disable it by means of
'disablePass(&MachineLateInstrsCleanupID);'.

This late cleanup is naturally not "optimal" in removing instructions as it
is done by looking at phys-regs, but still quite effective. It would be
desirable to improve other parts of CodeGen and avoid these redundant
instructions in the first place, but there are no ideas for this yet.

Differential Revision: https://reviews.llvm.org/D123394

Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
2022-12-01 13:21:35 -05:00
Roman Lebedev
7850ab2112
[NFC] Port an assortment of tests that invoke SROA to new pass manager 2022-12-01 21:17:18 +03:00
Freddy Ye
89f36dd8f3 [X86] Add ExpandLargeFpConvert Pass and enable for X86
As stated in
https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528,
this implementation is very similar to ExpandLargeDivRem, which expands
‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions
with a bitwidth above a threshold into auto-generated functions. This is
useful for targets like x86_64 that cannot lower fp convertions with more
than 128 bits. The expanded nodes are referring from the IR generated by
`compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`,
and etc.

Corner cases:
1. For fp16: as there is no related builtins added in compliler-rt. So I
mainly utilized the fp32 <-> fp16 lib calls to implement.
2. For fp80: as this pass is soft fp emulation and no fp80 instructions can
help in this problem. I recommend users to deprecate this usage. For now, the
implementation uses fp128 as the temporary conversion type and inserts
fptrunc/ext at top/end of the function.
3. For bf16: as clang FE currently doesn't support bf16 algorithm operations
(convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for
now.
4. For unsigned FPToI: since both default hardware behaviors and libgcc are
ignoring "returns 0 for negative input" spec. This pass follows this old way
to ignore unsigned FPToI. See this example:
https://gcc.godbolt.org/z/bnv3jqW1M

The end-to-end tests are uploaded at https://reviews.llvm.org/D138261

Reviewed By: LuoYuanke, mgehre-amd

Differential Revision: https://reviews.llvm.org/D137241
2022-12-01 13:47:43 +08:00
Marco Elver
b95646fe70 Revert "Use-after-return sanitizer binary metadata"
This reverts commit d3c851d3fc8b69dda70bf5f999c5b39dc314dd73.

Some bots broke:

- https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8796062278266465473/overview
- https://lab.llvm.org/buildbot/#/builders/124/builds/5759/steps/7/logs/stdio
2022-11-30 23:35:50 +01:00
Dmitry Vyukov
d3c851d3fc Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-11-30 14:50:22 +01:00
Maryam Moghadas
7614ba0a5d [PowerPC] Fix vperm codegen
Commit rG934d5fa2b8672695c335deed0e19d0e777c98403 changed the vperm codegen
for cases that vperm is not replaced by xxperm, this patch is to revert that.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D138736
2022-11-29 15:47:32 -06:00
Qiu Chaofan
b9be5a6823 Pre-commit PowerPC case for zero/inf fpclassify 2022-11-25 17:37:41 +08:00
Maryam Moghadas
934d5fa2b8 [PowerPC] Exploit xxperm, check for dead vectors and substitute vperm with xxperm
vperm instruction requires the data to be in the Altivec registers, if one of
the vector operands is not used after this vperm instruction then it can be
substituted by xxperm which doubles the number of available registers.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D133700
2022-11-23 13:28:12 -06:00
Stefan Pintilie
1ac6956b52 [PowerPC] Add handling for WACC register spilling.
This patch adds spilling for the new WACC registers.

In order to get the spilling test to work the MMA instructions from Power 10 are
now supported for Future CPU except that they are all using the new WACC
registers instead of the ACC registers from Power 10.

Reviewed By: amyk, saghir

Differential Revision: https://reviews.llvm.org/D136728
2022-11-22 09:37:52 -06:00
esmeyi
c7c7ef8bda [XCOFF] set fragment for XMC_PR csects.
Summary: -xcoff-traceback-table is a default option on AIX regardless of optimization and debug levels. An error of relocation for paired relocatable term is not yet supported in XCOFFObjectWriter::recordRelocation occurred when both of the -xcoff-traceback-table and -function-sections are enabled.
The root cause is that we missed to calculate the symbols difference as absolute value before adding fixups when symbol_A without the fragment set is the csect itself and symbol_B is in it.
This patch only sets the fragment for XMC_PR csects because we don't have other cases that hit this problem yet.

Reviewed By: DiggerLin, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D137230
2022-11-22 07:17:44 -05:00
Chen Zheng
d9143ce3fd [PowerPC][GISel]add support for float point arithmetic operations
Add global isel support for G_FADD, G_FSUB, G_FMUL, G_FDIV.

Reviewed By: Kai, nemanjai, arsenm, amyk

Differential Revision: https://reviews.llvm.org/D132942
2022-11-22 03:00:27 -05:00
Chen Zheng
375323fb85 [PowerPC] store the LR before stack update for big offsets.
For case that LROffset + FrameSize can not be encoded to the LR
store instruction, we have to store the LR before the stack update.
2022-11-22 07:25:28 +00:00
Chen Zheng
2aa8a1a3bd [PowerPC][NFC] add test case for mflr store fix 2022-11-22 07:25:23 +00:00
Kai Nacke
2b1e895afb [PowerPC] Add support for G_ADD and G_SUB.
Extends the global isel implementation to support G_ADD and G_SUB.

Reviewed By: arsenm, amyk

Differential Revision: https://reviews.llvm.org/D128106
2022-11-21 23:35:17 +00:00
Kai Nacke
be4a1dfbf9 [PowerPC] Extend GlobalISel implementation to emit and/or/xor.
Adds some more code to GlobalISel to enable instruction selection for and/or/xor.

- Makes G_IMPLICIT_DEF, G_CONSTANT, G_AND, G_OR, G_XOR legal for 64bit register size.
- Implement lowerReturn in CallLowering
- Provides mapping of the operands to register banks.
- Adds register info to G_COPY operands.

The utility functions are all only implemented so far to support this use case.
Especially the functions in PPCGenRegisterBankInfo.def are too simple for
general use.

Reviewed By: nemanjai, shchenz, amyk

Differential Revision: https://reviews.llvm.org/D127530
2022-11-21 20:08:20 +00:00
Paul Scoropan
2234098291 [PowerPC] XCOFF exception section support on the integrated assembler path
Continuation of https://reviews.llvm.org/D132146 (direct assembly path support, needs to merge first). Adds support to the integrated assembler path for emitting XCOFF exception sections. Both features need https://reviews.llvm.org/D133030 to merge first

Reviewed By: shchenz, DiggerLin

Differential Revision: https://reviews.llvm.org/D134195
2022-11-21 01:16:31 -05:00
Chen Zheng
f034c98af0 [PowerPC] mark dead def for ctr be clobber.
TLS pseudo ADDIStlsgdHA will have such def. This dead def should
also prevent PPC from generating CTR loops.
2022-11-18 06:55:42 +00:00
Qiu Chaofan
5d19fea81f [PowerPC] Fix strict load-conversion recognition
Direct-move instructions are usually more efficient than load then store
for conversion. But direct moves are not needed when the source register
was just loaded from some address.

The pattern has already been recognized, but the source value of strict
nodes are not the first (that's the chain), but the second.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D138011
2022-11-16 10:02:10 +08:00
Qiu Chaofan
a853c42a6a Pre-commit load/store cases for PowerPC direct-move 2022-11-15 17:35:49 +08:00
Chen Zheng
eb7d16ea25 [PowerPC] make expensive mflr be away from its user in the function prologue
mflr is kind of expensive on Power version smaller than 10, so we should
schedule the store for the mflr's def away from mflr.

In epilogue, the expensive mtlr has no user for its def, so it doesn't
matter that the load and the mtlr are back-to-back.

Reviewed By: RolandF

Differential Revision: https://reviews.llvm.org/D137423
2022-11-14 21:14:20 -05:00