3772 Commits

Author SHA1 Message Date
Douglas Yung
cc2c8ab21f Require asserts for llvm/test/CodeGen/PowerPC/sms-regpress.mir. 2024-01-22 13:51:03 -08:00
Ryotaro KASUGA
7556626dcf
[CodeGen][MachinePipeliner] Limit register pressure when scheduling (#74807)
In software pipelining, when searching for the Initiation Interval (II),
`MachinePipeliner` tries to reduce register pressure, but doesn't check
how many variables can actually be alive at the same time. As a result,
a lot of register spills/fills can be generated after register
allocation, which might cause performance degradation. To prevent such
cases, this patch adds a check phase that calculates the maximum
register pressure of the scheduled loop and reject it if the pressure is
too high. This can be enabled this by specifying
`pipeliner-register-pressure`. Additionally, an II search range is
currently fixed at 10, which is too small to find a schedule when the
above algorithm is applied. Therefore this patch also adds a new option
`pipeliner-ii-search-range` to specify the length of the range to
search. There is one more new option
`pipeliner-register-pressure-margin`, which can be used to estimate a
register pressure limit less than actual for conservative analysis.

Discourse thread:
https://discourse.llvm.org/t/considering-register-pressure-when-deciding-initiation-interval-in-machinepipeliner/74725
2024-01-22 17:06:37 +09:00
Hans Wennborg
677ced8af2 Require asserts for llvm/test/CodeGen/PowerPC/fence.ll 2024-01-15 17:25:49 +01:00
Nikita Popov
87bc91d425
[PowerPC] Fix shuffle combine with undef elements (#77787)
This custom DAG combine works on a shuffle where one source vector is a
zero splat, which means we can adjust the shuffle indices to refer to
any element of the splat -- as long as we stay in the same vector.

In the case where an undef (-1) index into the non-splat vector was
used, we ended up adjusting the splat index to -1+NumElements, which
points into the wrong vector.

Fix this by using the first element from the splat if the other one is undef.
There are four cases this theoretically affects, but in practice I only
managed to demonstrate a miscompile with one of them. I think two of
theses are effectively dead due to the operand canonicalization at the
start of the transform.

Fixes https://github.com/llvm/llvm-project/issues/77748.
2024-01-15 10:12:33 +01:00
Qiu Chaofan
ce1f9465b0 [NFC] Pre-commit case of ppcf128 extractelt soften 2024-01-15 15:27:36 +08:00
Qiu Chaofan
85071a3c74
[PowerPC] Implement fence builtin (#76495) 2024-01-15 11:19:16 +08:00
Philip Reames
e4d01bb227
[SCEV] Special case sext in isKnownNonZero (#77834)
The existing logic in isKnownNonZero relies on unsigned ranges, which
can be problematic when our range calculation is imprecise. Consider the
following:
  %offset.nonzero = or i32 %offset, 1
  -->  %offset.nonzero U: [1,0) S: [1,0)
  %offset.i64 = sext i32 %offset.nonzero to i64
  -->  (sext i32 %offset.nonzero to i64) U: [-2147483648,2147483648)
                                         S: [-2147483648,2147483648)

Note that the unsigned range for the sext does contain zero in this case
despite the fact that it can never actually be zero.

Instead, we can push the query down one level - relying on the fact that
the sext is an invertible operation and that the result can only be zero
if the input is. We could likely generalize this reasoning for other
invertible operations, but special casing sext seems worthwhile.
2024-01-12 07:45:28 -08:00
Nikita Popov
13b5882ee6 [PowerPC] Add test for #77748 (NFC) 2024-01-11 15:45:52 +01:00
Kai Luo
6615581526
[PowerPC] Make verifier happy when lowering llvm.trap (#77266)
`llvm.trap` is lowered to `PPC::TRAP` and `PPC::TRAP` is set as
terminator. Verifier complains about terminator should not lie in the
middle of an MBB. See #77095.

Fix it by removing `isTerminator` and `isBarrier` and then set `isTrap`
which was introduced by https://reviews.llvm.org/D48836# and is being
used by X86 and AArch64.

`PPC::TRAP` is not a hardware memory barrier and `llvm.trap` doesn't
indicate a memory barrier either.
2024-01-10 09:23:30 +08:00
Fangrui Song
f972e4d343 [MC,ELF] .section: unconditionally print section flag 'G' after 'o'
* Placing 'G' before 'M' (SHF_MERGE) can be misleading as the sh_entsize
  argument goes before the section group name, if a reader doesn't know
  that the order of extra arguments is not affected by the order of flags.
* 'a', 'w', and 'x' indicate basic permission-related flags. Separating
  them with 'G' is kinda ugly.

Simplify code and move 'G' after 'o'. The new output is more similar to
GCC.
2024-01-09 10:48:23 -08:00
Kai Luo
225e2704af [PowerPC] Precommit test for lowering llvm.trap on ppc64le. NFC. 2024-01-08 10:20:01 +08:00
Chen Zheng
d6aef863d8
[PowerPC] make LR/LR8 CTR/CTR8 aliased (#76926)
fixes https://github.com/llvm/llvm-project/issues/47156 
fixes https://github.com/llvm/llvm-project/issues/47155
2024-01-08 09:37:40 +08:00
Chen Zheng
dd4dc2111e nfc add cases for pr47156 and pr47155 2024-01-04 03:56:40 -05:00
Arthur Eubanks
ece1359857 Revert "[PowerPC] Add test after #75271 on PPC. NFC. (#75616)"
This reverts commit 5cfc7b3342ce4de0bbe182b38baa8a71fc83f8f8.

This depends on 0e46b49de43349f8cbb2a7d4c6badef6d16e31ae which is being reverted.
2024-01-03 17:09:45 +00:00
Kai Luo
8ae73fea3a [PowerPC] Precommit test for #72845. NFC. 2024-01-03 03:03:48 +00:00
Qiu Chaofan
c97a7675ee
[PowerPC] Expand FSINCOS of fp128 (#76494) 2023-12-29 11:27:06 +08:00
Kai Luo
5cfc7b3342
[PowerPC] Add test after #75271 on PPC. NFC. (#75616)
Demonstrate `IMPLICIT_DEF implicit-def ...` can be generated after
coalescing on PPC.

The case is reduced from failure in #75570. The failure is triggered
after #75271 .
2023-12-26 00:21:56 +08:00
stephenpeckham
7026086073
[XCOFF] Use RLDs to print branches even without -r (#74342)
This presents misleading and confusing output. If you have a function
defined at the beginning of an XCOFF object file, and you have a
function call to an external function, the function call disassembles as
a branch to the local function. That is,

`void f() { f(); g();}`

disassembles as 
>00000000 <.f>:
       0: 7c 08 02 a6   mflr 0
4: 94 21 ff c0 stwu 1, -64(1)
       8: 90 01 00 48   stw 0, 72(1)
      c: 4b ff ff f5   bl 0x0 <.f>
      10: 4b ff ff f1   bl 0x0 <.f> 

With this PR, the second call will display:

`10: 4b ff ff f1   bl 0x0 <.g>  `

Using -r can help, but you still get the confusing output:

>10: 4b ff ff f1   bl 0x0 <.f>
      00000010:  R_RBR        .g
2023-12-21 08:17:32 -06:00
Kai Luo
56414220df
[PowerPC] Use 'sync; ld; cmp; bc; isync' for atomic load seq-cst on 32-bit platform (#75905)
`cmp; bc; isync` is more performant than `lwsync` theoretically.

64-bit platform already features it, now implement it for 32-bit
platform.
2023-12-20 10:01:02 +08:00
Paul Kirth
9a578a9f60
Revert "[StackColoring] Delete dead stack slots (#75351)" (#75655)
This reverts commit 08b306dc8e7c0b2498f4f194a3c51686d56dbd20.

it causes the following assertion failure:
llvm/include/llvm/CodeGen/MachineFrameInfo.h:530: int64_t
llvm::MachineFrameInfo::getObjectOffset(int) const: Assertion
`!isDeadObjectIndex(ObjectIdx) && "Getting frame offset for a dead
object?"' failed.
2023-12-15 13:32:39 -08:00
mohammed-nurulhoque
08b306dc8e
[StackColoring] Delete dead stack slots (#75351)
deletes slots that have lifetime markers and the lifetime ranges are empty.
2023-12-15 09:58:19 +00:00
Nikita Popov
9c093cbb5e Revert "[StackColoring] Delete dead stack slots (#72633)"
This reverts commit a29457844bf0c4b2eb5c0f3877b6e8ef30cdef52.

Causes an assertion failure in llvm/test/DebugInfo/COFF/lexicalblock.ll.
2023-12-13 14:31:09 +01:00
mohammed-nurulhoque
a29457844b
[StackColoring] Delete dead stack slots (#72633)
Deletes slots that have lifetime markers and the lifetime ranges are
empty.
2023-12-13 13:01:21 +01:00
paperchalice
60eca674b1
[CodeGen] Port ExpandMemCmp to new pass manager (#74050) 2023-12-13 16:18:24 +08:00
bcahoon
a19c7c403f
[MachinePipeliner] Fix store-store dependences (#72575)
The pipeliner needs to mark store-store order dependences as
loop carried dependences. Otherwise, the stores may be scheduled
further apart than the MII. The order dependences implies that
the first instance of the dependent store is scheduled before the
second instance of the source store instruction.
2023-12-11 21:10:34 -06:00
Maryam Moghadas
8f6f5ec776
[PowerPC] Move __ehinfo TOC entries to the end of the TOC section (#73586)
On AIX, the __ehinfo toc-entry is never referenced directly using
instructions, therefore we can allocate them with the TE storage mapping
class to move them to the end of TOC.
2023-12-08 15:03:11 -05:00
Stefan Pintilie
ea8b95d0d5
[PowerPC] Add a set of extended mnemonics that are missing from Power 10. (#73003)
This patch adds the majority of the missing extended mnemonics that were
introduced in Power 10.

The only extended mnemonics that were not added are related to the plq
and pstq instructions. These will be added in a separate patch as the
instructions themselves would also have to be added.
2023-12-07 13:40:00 -05:00
Chen Zheng
4b932d84f4
[PowerPC] redesign the target flags (#69695)
12 bit is not enough for PPC's target specific flags. If 8 bit for the
bitmask flags, 4 bit for the direct mask, PPC can total have 16 direct
mask and 8 bitmask. Not enough for PPC, see this issue in
https://github.com/llvm/llvm-project/pull/66316

Redesign how PPC target set the target specific flags. With this patch,
all ppc target flags are direct flags. No bitmask flag in PPC anymore.

This patch aligns with some targets like X86 which also has many target
specific flags.

The patch also fixes a bug related to flag `MO_TLSGDM_FLAG` and `MO_LO`.
They are the same value and the test case changes in this PR shows the
bug.
2023-12-07 12:47:25 +08:00
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
stephenpeckham
4b1254e7d4
[AIX] In assembly file, create a dummy text renamed to an empty string (#73052)
This works around an AIX assembler and linker bug. If the
-fno-integrated-as and -frecord-command-line options are used but
there's no actual code in the source file, the assembler creates an
object file with only an .info section. The AIX linker rejects such an
object file.
2023-12-04 17:35:47 -06:00
Ramkumar Ramachandra
d48d1edcf3
PowerPC/aix-cc-abi: regenerate test using UTC (NFC) (#73963)
Split out the parts of aix-cc-abi.ll that requires to be regenerated by
utils/update_mir_test_checks.py into aix-cc-abi-mir.ll, and regenerate
it using the script. Regenerate aix-cc-abi.ll using
utils/update_llc_test_checks.py.
2023-12-01 08:22:18 +00:00
Kai Luo
afd9582b36 [PowerPC] Enhance test for PR #73609. NFC. 2023-11-30 05:06:29 +00:00
Kai Luo
00f9946680 [PowerPC] Precommit test of building vector via load and zeros. NFC. 2023-11-28 03:32:57 +00:00
Bjorn Pettersson
30afb21547 Revert "[MCP] Enhance MCP copy Instruction removal for special case (#70778)"
This reverts commit cae46f6210293ba4d3568eb21b935d438934290d.

Reverted due to miscompiles.
See https://github.com/llvm/llvm-project/issues/73512
2023-11-27 19:39:40 +01:00
Chen Zheng
abc405858d
[XCOFF] make related SD symbols as isFunction (#69553)
This will help tools like llvm-symbolizer recognizes more functions.
2023-11-26 11:59:09 +08:00
Stefan Pintilie
d896b1f5a6
[PowerPC] Do not string pool globals that are part of llvm used. (#66848)
The string pooling pass was incorrectly pooling global varables that
were part of llvm.used or llvm.compiler.used. This patch fixes the pass
to prevent that by checking each candidate to make sure that it is not
in either of those lists.
2023-11-24 12:21:28 -05:00
LWenH
32903b0b6d [MCP] fix PowerPC redundant copy instructions removal fail test cases, NFC 2023-11-23 01:54:53 +08:00
Kai Luo
bfd3734610 [PowerPC] Use MIR test so that it's not affected by instruction selection. NFC. 2023-11-20 09:51:12 +00:00
Kai Luo
592386400d [PowerPC] Precommit test to show codegen while isel is unavailable. NFC. 2023-11-20 07:28:21 +00:00
Kai Luo
eb7698254a
[PowerPC][EarlyIfConversion] Do not insert isel if subtarget doesn't support isel (#72211)
Some subtargets of PPC don't support `isel` instruction, early-ifcvt
should not insert this instruction.
2023-11-20 09:17:04 +08:00
Qiu Chaofan
426ad99bb2
[PowerPC] Forbid f128 SELECT_CC optimized into fsel (#71497) 2023-11-15 12:20:06 +08:00
Qiongsi Wu
c8b11091e8
[SelectionDAG] Handling Oversized Alloca Types under 32 bit Mode to Avoid Code Generator Crash (#71472)
Situations may arise leading to negative `NumElements` argument of an
`alloca` instruction. In this case the `NumElements` is treated as a
large unsigned value. Such large arrays may cause the size constant to
overflow during code generation under 32 bit mode, leading to a crash.
This PR limits the constant's bit width to the width of the pointer on
the target. With this fix,
```
alloca i32, i32 -1
```
and
```
alloca [4294967295 x i32], i32 1
```
generates the exact same PowerPC assembly code under 32 bit mode.
2023-11-14 10:52:51 -05:00
Kai Luo
acdf7c8f27 [PowerPC] Precommit test to show impact of early-ifcvt on target without isel. NFC. 2023-11-14 06:10:05 +00:00
stephenpeckham
1d1fede493
[XCOFF] Ensure .file is emitted before any .info pseudo-ops (#71577)
When generating the assembly code for AIX/XCOFF, the .file pseudo-op
needs to be emitted first, before any csects are generated. Otherwise,
information such as the embedded command line will be associated with
part of the object file rather than the entire object file.
2023-11-09 16:03:45 -06:00
Juergen Ributzka
6d1d7be133
Obsolete WebKit Calling Convention (#71567)
The WebKit Calling Convention was created specifically for the WebKit
FTL. FTL
doesn't use LLVM anymore and therefore this calling convention is
obsolete.

This commit removes the WebKit CC, its associated tests, and
documentation.
2023-11-09 09:08:41 -08:00
Qiu Chaofan
5f295552f1
[PowerPC] Fix incorrect symbol name of frexp libcall (#71626)
frexpl is for ppc_fp128. The correct symbol name for f128 is frexpf128.
2023-11-08 14:41:19 +08:00
Qiu Chaofan
d199fd76f7 [NFC] Add f128 frexp intrinsics for PowerPC 2023-11-08 11:27:40 +08:00
Nikita Popov
e4a4122eb6
[IR] Remove zext and sext constant expressions (#71040)
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.

There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.

This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
2023-11-03 10:46:07 +01:00
Nikita Popov
060de415af Reapply [InstCombine] Simplify and/or of icmp eq with op replacement (#70335)
Relative to the first attempt, this contains two changes:

First, we only handle the case where one side simplifies to true or
false, instead of calling simplification recursively. The previous
approach would return poison if one operand simplified to poison
(under the equality assumption), which is incorrect.

Second, we do not fold llvm.is.constant in simplifyWithOpReplaced().
We may be assuming that a value is constant, if the equality holds,
but it may not actually be constant. This is nominally just a QoI
issue, but the std::list implementation in libstdc++ relies on the
precise behavior in a way that causes miscompiles.

-----

and/or in logical (select) form benefit from generic simplifications via
simplifyWithOpReplaced(). However, the corresponding fold for plain
and/or currently does not exist.

Similar to selects, there are two general cases for this fold
(illustrated with `and`, but there are `or` conjugates).

The basic case is something like `(a == b) & c`, where the replacement
of a with b or b with a inside c allows it to fold to true or false.
Then the whole operation will fold to either false or `a == b`.

The second case is something like `(a != b) & c`, where the replacement
inside c allows it to fold to false. In that case, the operand can be
replaced with c, because in the case where a == b (and thus the icmp is
false), c itself will already be false.

As the test diffs show, this catches quite a lot of patterns in existing
test coverage. This also obsoletes quite a few existing special-case
and/or of icmp folds we have (e.g. simplifyAndOrOfICmpsWithLimitConst),
but I haven't removed anything as part of this patch in the interest of
risk mitigation.

Fixes #69050.
Fixes #69091.
2023-11-03 10:16:15 +01:00
Kai Luo
7b5505b0d5
[PowerPC] Change registers used in test due to ABI breakage. NFC. (#70758)
Usage of `r30` and `r31` has broken current traceback table's convention
on AIX. Avoid using CSRs in livein list.
2023-11-03 07:08:33 +08:00