In software pipelining, when searching for the Initiation Interval (II),
`MachinePipeliner` tries to reduce register pressure, but doesn't check
how many variables can actually be alive at the same time. As a result,
a lot of register spills/fills can be generated after register
allocation, which might cause performance degradation. To prevent such
cases, this patch adds a check phase that calculates the maximum
register pressure of the scheduled loop and reject it if the pressure is
too high. This can be enabled this by specifying
`pipeliner-register-pressure`. Additionally, an II search range is
currently fixed at 10, which is too small to find a schedule when the
above algorithm is applied. Therefore this patch also adds a new option
`pipeliner-ii-search-range` to specify the length of the range to
search. There is one more new option
`pipeliner-register-pressure-margin`, which can be used to estimate a
register pressure limit less than actual for conservative analysis.
Discourse thread:
https://discourse.llvm.org/t/considering-register-pressure-when-deciding-initiation-interval-in-machinepipeliner/74725
This custom DAG combine works on a shuffle where one source vector is a
zero splat, which means we can adjust the shuffle indices to refer to
any element of the splat -- as long as we stay in the same vector.
In the case where an undef (-1) index into the non-splat vector was
used, we ended up adjusting the splat index to -1+NumElements, which
points into the wrong vector.
Fix this by using the first element from the splat if the other one is undef.
There are four cases this theoretically affects, but in practice I only
managed to demonstrate a miscompile with one of them. I think two of
theses are effectively dead due to the operand canonicalization at the
start of the transform.
Fixes https://github.com/llvm/llvm-project/issues/77748.
The existing logic in isKnownNonZero relies on unsigned ranges, which
can be problematic when our range calculation is imprecise. Consider the
following:
%offset.nonzero = or i32 %offset, 1
--> %offset.nonzero U: [1,0) S: [1,0)
%offset.i64 = sext i32 %offset.nonzero to i64
--> (sext i32 %offset.nonzero to i64) U: [-2147483648,2147483648)
S: [-2147483648,2147483648)
Note that the unsigned range for the sext does contain zero in this case
despite the fact that it can never actually be zero.
Instead, we can push the query down one level - relying on the fact that
the sext is an invertible operation and that the result can only be zero
if the input is. We could likely generalize this reasoning for other
invertible operations, but special casing sext seems worthwhile.
`llvm.trap` is lowered to `PPC::TRAP` and `PPC::TRAP` is set as
terminator. Verifier complains about terminator should not lie in the
middle of an MBB. See #77095.
Fix it by removing `isTerminator` and `isBarrier` and then set `isTrap`
which was introduced by https://reviews.llvm.org/D48836# and is being
used by X86 and AArch64.
`PPC::TRAP` is not a hardware memory barrier and `llvm.trap` doesn't
indicate a memory barrier either.
* Placing 'G' before 'M' (SHF_MERGE) can be misleading as the sh_entsize
argument goes before the section group name, if a reader doesn't know
that the order of extra arguments is not affected by the order of flags.
* 'a', 'w', and 'x' indicate basic permission-related flags. Separating
them with 'G' is kinda ugly.
Simplify code and move 'G' after 'o'. The new output is more similar to
GCC.
Demonstrate `IMPLICIT_DEF implicit-def ...` can be generated after
coalescing on PPC.
The case is reduced from failure in #75570. The failure is triggered
after #75271 .
This presents misleading and confusing output. If you have a function
defined at the beginning of an XCOFF object file, and you have a
function call to an external function, the function call disassembles as
a branch to the local function. That is,
`void f() { f(); g();}`
disassembles as
>00000000 <.f>:
0: 7c 08 02 a6 mflr 0
4: 94 21 ff c0 stwu 1, -64(1)
8: 90 01 00 48 stw 0, 72(1)
c: 4b ff ff f5 bl 0x0 <.f>
10: 4b ff ff f1 bl 0x0 <.f>
With this PR, the second call will display:
`10: 4b ff ff f1 bl 0x0 <.g> `
Using -r can help, but you still get the confusing output:
>10: 4b ff ff f1 bl 0x0 <.f>
00000010: R_RBR .g
This reverts commit 08b306dc8e7c0b2498f4f194a3c51686d56dbd20.
it causes the following assertion failure:
llvm/include/llvm/CodeGen/MachineFrameInfo.h:530: int64_t
llvm::MachineFrameInfo::getObjectOffset(int) const: Assertion
`!isDeadObjectIndex(ObjectIdx) && "Getting frame offset for a dead
object?"' failed.
The pipeliner needs to mark store-store order dependences as
loop carried dependences. Otherwise, the stores may be scheduled
further apart than the MII. The order dependences implies that
the first instance of the dependent store is scheduled before the
second instance of the source store instruction.
On AIX, the __ehinfo toc-entry is never referenced directly using
instructions, therefore we can allocate them with the TE storage mapping
class to move them to the end of TOC.
This patch adds the majority of the missing extended mnemonics that were
introduced in Power 10.
The only extended mnemonics that were not added are related to the plq
and pstq instructions. These will be added in a separate patch as the
instructions themselves would also have to be added.
12 bit is not enough for PPC's target specific flags. If 8 bit for the
bitmask flags, 4 bit for the direct mask, PPC can total have 16 direct
mask and 8 bitmask. Not enough for PPC, see this issue in
https://github.com/llvm/llvm-project/pull/66316
Redesign how PPC target set the target specific flags. With this patch,
all ppc target flags are direct flags. No bitmask flag in PPC anymore.
This patch aligns with some targets like X86 which also has many target
specific flags.
The patch also fixes a bug related to flag `MO_TLSGDM_FLAG` and `MO_LO`.
They are the same value and the test case changes in this PR shows the
bug.
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
This works around an AIX assembler and linker bug. If the
-fno-integrated-as and -frecord-command-line options are used but
there's no actual code in the source file, the assembler creates an
object file with only an .info section. The AIX linker rejects such an
object file.
Split out the parts of aix-cc-abi.ll that requires to be regenerated by
utils/update_mir_test_checks.py into aix-cc-abi-mir.ll, and regenerate
it using the script. Regenerate aix-cc-abi.ll using
utils/update_llc_test_checks.py.
The string pooling pass was incorrectly pooling global varables that
were part of llvm.used or llvm.compiler.used. This patch fixes the pass
to prevent that by checking each candidate to make sure that it is not
in either of those lists.
Situations may arise leading to negative `NumElements` argument of an
`alloca` instruction. In this case the `NumElements` is treated as a
large unsigned value. Such large arrays may cause the size constant to
overflow during code generation under 32 bit mode, leading to a crash.
This PR limits the constant's bit width to the width of the pointer on
the target. With this fix,
```
alloca i32, i32 -1
```
and
```
alloca [4294967295 x i32], i32 1
```
generates the exact same PowerPC assembly code under 32 bit mode.
When generating the assembly code for AIX/XCOFF, the .file pseudo-op
needs to be emitted first, before any csects are generated. Otherwise,
information such as the embedded command line will be associated with
part of the object file rather than the entire object file.
The WebKit Calling Convention was created specifically for the WebKit
FTL. FTL
doesn't use LLVM anymore and therefore this calling convention is
obsolete.
This commit removes the WebKit CC, its associated tests, and
documentation.
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.
There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.
This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
Relative to the first attempt, this contains two changes:
First, we only handle the case where one side simplifies to true or
false, instead of calling simplification recursively. The previous
approach would return poison if one operand simplified to poison
(under the equality assumption), which is incorrect.
Second, we do not fold llvm.is.constant in simplifyWithOpReplaced().
We may be assuming that a value is constant, if the equality holds,
but it may not actually be constant. This is nominally just a QoI
issue, but the std::list implementation in libstdc++ relies on the
precise behavior in a way that causes miscompiles.
-----
and/or in logical (select) form benefit from generic simplifications via
simplifyWithOpReplaced(). However, the corresponding fold for plain
and/or currently does not exist.
Similar to selects, there are two general cases for this fold
(illustrated with `and`, but there are `or` conjugates).
The basic case is something like `(a == b) & c`, where the replacement
of a with b or b with a inside c allows it to fold to true or false.
Then the whole operation will fold to either false or `a == b`.
The second case is something like `(a != b) & c`, where the replacement
inside c allows it to fold to false. In that case, the operand can be
replaced with c, because in the case where a == b (and thus the icmp is
false), c itself will already be false.
As the test diffs show, this catches quite a lot of patterns in existing
test coverage. This also obsoletes quite a few existing special-case
and/or of icmp folds we have (e.g. simplifyAndOrOfICmpsWithLimitConst),
but I haven't removed anything as part of this patch in the interest of
risk mitigation.
Fixes#69050.
Fixes#69091.