This presents misleading and confusing output. If you have a function
defined at the beginning of an XCOFF object file, and you have a
function call to an external function, the function call disassembles as
a branch to the local function. That is,
`void f() { f(); g();}`
disassembles as
>00000000 <.f>:
0: 7c 08 02 a6 mflr 0
4: 94 21 ff c0 stwu 1, -64(1)
8: 90 01 00 48 stw 0, 72(1)
c: 4b ff ff f5 bl 0x0 <.f>
10: 4b ff ff f1 bl 0x0 <.f>
With this PR, the second call will display:
`10: 4b ff ff f1 bl 0x0 <.g> `
Using -r can help, but you still get the confusing output:
>10: 4b ff ff f1 bl 0x0 <.f>
00000010: R_RBR .g
This reverts commit 08b306dc8e7c0b2498f4f194a3c51686d56dbd20.
it causes the following assertion failure:
llvm/include/llvm/CodeGen/MachineFrameInfo.h:530: int64_t
llvm::MachineFrameInfo::getObjectOffset(int) const: Assertion
`!isDeadObjectIndex(ObjectIdx) && "Getting frame offset for a dead
object?"' failed.
The pipeliner needs to mark store-store order dependences as
loop carried dependences. Otherwise, the stores may be scheduled
further apart than the MII. The order dependences implies that
the first instance of the dependent store is scheduled before the
second instance of the source store instruction.
On AIX, the __ehinfo toc-entry is never referenced directly using
instructions, therefore we can allocate them with the TE storage mapping
class to move them to the end of TOC.
This patch adds the majority of the missing extended mnemonics that were
introduced in Power 10.
The only extended mnemonics that were not added are related to the plq
and pstq instructions. These will be added in a separate patch as the
instructions themselves would also have to be added.
12 bit is not enough for PPC's target specific flags. If 8 bit for the
bitmask flags, 4 bit for the direct mask, PPC can total have 16 direct
mask and 8 bitmask. Not enough for PPC, see this issue in
https://github.com/llvm/llvm-project/pull/66316
Redesign how PPC target set the target specific flags. With this patch,
all ppc target flags are direct flags. No bitmask flag in PPC anymore.
This patch aligns with some targets like X86 which also has many target
specific flags.
The patch also fixes a bug related to flag `MO_TLSGDM_FLAG` and `MO_LO`.
They are the same value and the test case changes in this PR shows the
bug.
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
This works around an AIX assembler and linker bug. If the
-fno-integrated-as and -frecord-command-line options are used but
there's no actual code in the source file, the assembler creates an
object file with only an .info section. The AIX linker rejects such an
object file.
Split out the parts of aix-cc-abi.ll that requires to be regenerated by
utils/update_mir_test_checks.py into aix-cc-abi-mir.ll, and regenerate
it using the script. Regenerate aix-cc-abi.ll using
utils/update_llc_test_checks.py.
The string pooling pass was incorrectly pooling global varables that
were part of llvm.used or llvm.compiler.used. This patch fixes the pass
to prevent that by checking each candidate to make sure that it is not
in either of those lists.
Situations may arise leading to negative `NumElements` argument of an
`alloca` instruction. In this case the `NumElements` is treated as a
large unsigned value. Such large arrays may cause the size constant to
overflow during code generation under 32 bit mode, leading to a crash.
This PR limits the constant's bit width to the width of the pointer on
the target. With this fix,
```
alloca i32, i32 -1
```
and
```
alloca [4294967295 x i32], i32 1
```
generates the exact same PowerPC assembly code under 32 bit mode.
When generating the assembly code for AIX/XCOFF, the .file pseudo-op
needs to be emitted first, before any csects are generated. Otherwise,
information such as the embedded command line will be associated with
part of the object file rather than the entire object file.
The WebKit Calling Convention was created specifically for the WebKit
FTL. FTL
doesn't use LLVM anymore and therefore this calling convention is
obsolete.
This commit removes the WebKit CC, its associated tests, and
documentation.
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.
There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.
This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
Relative to the first attempt, this contains two changes:
First, we only handle the case where one side simplifies to true or
false, instead of calling simplification recursively. The previous
approach would return poison if one operand simplified to poison
(under the equality assumption), which is incorrect.
Second, we do not fold llvm.is.constant in simplifyWithOpReplaced().
We may be assuming that a value is constant, if the equality holds,
but it may not actually be constant. This is nominally just a QoI
issue, but the std::list implementation in libstdc++ relies on the
precise behavior in a way that causes miscompiles.
-----
and/or in logical (select) form benefit from generic simplifications via
simplifyWithOpReplaced(). However, the corresponding fold for plain
and/or currently does not exist.
Similar to selects, there are two general cases for this fold
(illustrated with `and`, but there are `or` conjugates).
The basic case is something like `(a == b) & c`, where the replacement
of a with b or b with a inside c allows it to fold to true or false.
Then the whole operation will fold to either false or `a == b`.
The second case is something like `(a != b) & c`, where the replacement
inside c allows it to fold to false. In that case, the operand can be
replaced with c, because in the case where a == b (and thus the icmp is
false), c itself will already be false.
As the test diffs show, this catches quite a lot of patterns in existing
test coverage. This also obsoletes quite a few existing special-case
and/or of icmp folds we have (e.g. simplifyAndOrOfICmpsWithLimitConst),
but I haven't removed anything as part of this patch in the interest of
risk mitigation.
Fixes#69050.
Fixes#69091.
and/or in logical (select) form benefit from generic simplifications via
simplifyWithOpReplaced(). However, the corresponding fold for plain
and/or currently does not exist.
Similar to selects, there are two general cases for this fold
(illustrated with `and`, but there are `or` conjugates).
The basic case is something like `(a == b) & c`, where the replacement
of a with b or b with a inside c allows it to fold to true or false.
Then the whole operation will fold to either false or `a == b`.
The second case is something like `(a != b) & c`, where the replacement
inside c allows it to fold to false. In that case, the operand can be
replaced with c, because in the case where a == b (and thus the icmp is
false), c itself will already be false.
As the test diffs show, this catches quite a lot of patterns in existing
test coverage. This also obsoletes quite a few existing special-case
and/or of icmp folds we have (e.g. simplifyAndOrOfICmpsWithLimitConst),
but I haven't removed anything as part of this patch in the interest of
risk mitigation.
Fixes#69050.
Fixes#69091.
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:
* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
The issue #55208 noticed that std::rint is vectorized by the
SLPVectorizer, but a very similar function, std::lrint, is not.
std::lrint corresponds to ISD::LRINT in the SelectionDAG, and
std::llrint is a familiar cousin corresponding to ISD::LLRINT. Now,
neither ISD::LRINT nor ISD::LLRINT have a corresponding vector variant,
and the LangRef makes this clear in the documentation of llvm.lrint.*
and llvm.llrint.*.
This patch extends the LangRef to include vector variants of
llvm.lrint.* and llvm.llrint.*, and lays the necessary ground-work of
scalarizing it for all targets. However, this patch would be devoid of
motivation unless we show the utility of these new vector variants.
Hence, the RISCV target has been chosen to implement a custom lowering
to the vfcvt.x.f.v instruction. The patch also includes a CostModel for
RISCV, and a trivial follow-up can potentially enable the SLPVectorizer
to vectorize std::lrint and std::llrint, fixing #55208.
The patch includes tests, obviously for the RISCV target, but also for
the X86, AArch64, and PowerPC targets to justify the addition of the
vector variants to the LangRef.
This custom combine currently converts `and(anyext(x),c)` into
`anyext(and(x,c))`. This is not correct, because the original expression
guaranteed that the high bits are zero, while the new one sets them to
undef.
Emit `zext(and(x,c))` instead.
Fixes https://github.com/llvm/llvm-project/issues/68783.
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
PowerPC backend generate calls to libc function calls
for soft-float, regardless of the -nostdlib /-ffreestanding flag.
fma is not a function provided by compiler-rt builtins and
thus should not be generated here.
PR : [[ https://github.com/llvm/llvm-project/issues/55230 | #55230 ]]
Below is patch given by @nemanjai
Reviewed By: jhibbits
Differential Revision: https://reviews.llvm.org/D156344
The patch failed in test-suite due to a liveness error after rebasing on https://reviews.llvm.org/D133103, and now it's fixed.
```
[PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel.
Summary: rldicl/rldicr can be eliminated if it's used to clear thehigh-order or low-order n bits and all bits cleared will be ANDed with 0 byandi/andis. Or they can be folded to `andi 0` if all bits to AND are alreadyzero in the input.
Reviewed By: qiucf, shchenz
Differential Revision: https://reviews.llvm.org/D159073
```