When the code is built with -mbackchain, it is possible to retrieve the
caller's frame and return addresses. GCC already can do this, add this
support to Clang as well. Use RISCVTargetLowering and GCC's
s390_return_addr_rtx() as inspiration. Add tests based on what GCC is
emitting.
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
On z/OS, many library functions have a non-standard name. This change
initializes the table of runtime function which results from lowering
intrinsics to library calls.
Calling isPlainlyKilled instead of directly checking for a kill flag
should make processTiedPairs behave the same with LiveIntervals
(i.e. when compiling with -early-live-intervals) as it does with
LiveVariables.
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.
This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
The function emitFunctionEntryLabel does not look at whether or not a function is a leaf when setting the entry flags, and instead blindly marks all functions as non-leaf routines.
Differential Revision: https://reviews.llvm.org/D157701
Reviewed By: uweigand
The function emitFunctionEntryLabel does not look at whether or not a function is a leaf when setting the entry flags,
and instead blindly marks all functions as non-leaf routines. Change it to check if a function is a leaf function and
mark it accordingly.
This PR causes the PPA1 to emit the function's name if it exists. This field is not emitted for unnamed functions.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D157494
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:
- The dependency tracking is more accurate and creates fewer useless
edges in the dependency graph. An AMDGPU example, edited for clarity:
SU(0): $vgpr1 = V_MOV_B32 $sgpr0
SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0
There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
SU(1) to SU(2). But the old dependency tracking code also added a
useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.
- On targets like AMDGPU that make heavy use of subregisters, each
register can have a huge number of aliases - it can be quadratic in
the size of the largest defined register tuple. There is a much lower
bound on the number of regunits per register, so iterating over
regunits is faster than iterating over aliases.
The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.
Recommit after fixing AggressiveAntiDepBreaker in D156880.
Differential Revision: https://reviews.llvm.org/D156552
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:
- The dependency tracking is more accurate and creates fewer useless
edges in the dependency graph. An AMDGPU example, edited for clarity:
SU(0): $vgpr1 = V_MOV_B32 $sgpr0
SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0
There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
SU(1) to SU(2). But the old dependency tracking code also added a
useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.
- On targets like AMDGPU that make heavy use of subregisters, each
register can have a huge number of aliases - it can be quadratic in
the size of the largest defined register tuple. There is a much lower
bound on the number of regunits per register, so iterating over
regunits is faster than iterating over aliases.
The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.
Differential Revision: https://reviews.llvm.org/D156552
This patch adds support for the ADA (associated data area), doing the following:
-Creates the ADA table to handle displacements
-Emits the ADA section in the SystemZAsmPrinter
-Lowers the ADA_ENTRY node into the appropriate load instruction
Differential Revision: https://reviews.llvm.org/D153788
- Creates the ADA table to handle displacements
- Emits the ADA section in the SystemZAsmPrinter
- Lowers the ADA_ENTRY node into the appropriate load instruction
Differential Revision: https://reviews.llvm.org/D153788
The Length/4 of Params field in the PPA1 ought to be the length of the parameters for the current function. Currently we are storing the length of the parameter area in the current function's stack frame, which represents the length of the params of the longest callee in the current function.
Differential Revision: https://reviews.llvm.org/D152920
Reviewed By: uweigand
The Length/4 of Params field in the PPA1 ought to be the length of the parameters for the current function. Currently we are storing the length of the parameter area in the current function's stack frame, which represents the length of the params of the longest callee in the current function.
Differential revision: https://reviews.llvm.org/D119049
Reviewed By: uweigand
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D127115
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D127115
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
An instruction should be sunk (if otherwise legal and profitable) regardless
of if it has a dead def of a physreg or not. Physreg defs are checked in other
places and sinking is only done with dead defs of regs that are not live into
the target MBB.
Differential Revision: https://reviews.llvm.org/D150447
Reviewed By: sebastian-ne, arsenm
When the stack frame extension routine is used, the contents of r3 is overwritten.
However, if r3 is live in the prologue (ie. one of the function's parameters
resides in r3), it needs to be saved. We save r3 in r0 if r0 is available
(ie. r0 is not used as temporary storage for r4), and in the corresponding
stack slot for the third parameter otherwise.
Differential Revision: https://reviews.llvm.org/D150332
Reviewed By: uweigand
When the stack frame extension routine is used, the contents of r3 is
overwritten. However, if r3 is live in the prologue (ie. one of the
function's parameters resides in r3), it needs to be saved. We save
r3 in r0 if r0 is available (ie. r0 is not used as temporary storage
for r4), and in the corresponding stack slot for the third parameter otherwise.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D150332
In D79537, `nomerge` was made to only apply to non-tail calls. This fixes it by also applying it to tail calls.
For ARM, I only made the new MI to inherit the flag under `TCRETURNdi` and `TCRETURNri`, because that's the place tail calls got replaced. Not sure if there's any other place needed.
Fixes#61545.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D146749
The new test case showed that the NoPHIs flag needs to be cleared.
Original commit message:
[SystemZ] Bugfix in expansion of memmem operations.
Since NC, OC, and XC clobber CC, the EXRL_Pseudo targeting these must also be
marked to do so.
Original patch by uweigand.
Reviewed by: uweigand
Differential Revision: https://reviews.llvm.org/D150251
Fixes: https://github.com/llvm/llvm-project/issues/62572
Sorry - mir test fails with expensive checks on build bot.
Seems to relate to the fact that there are no PHIs in the .mir input, but after
they are created the verifyer reports "Found PHI instruction with NoPHIs property
set".
This reverts commit 00454a17f361d677d5423905c888daca1a80661a.
Support bitcasting between int/fp/vector values and 'r'/'f'/'v' inline
assembly operands. This is intended to match GCCs beahvior.
Reviewed By: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D146059
A rare case where coalescing resulted in a hh32 (high32 of high64 of vector
register) subreg usage caused getSubReg() to fail as the vector reg does not
have that subreg in its subregs list, but rather h32 which was expected to
also act as hh32. See link below for the discussion when solving this.
Patch By: critson
Reviewed By: uweigand
Fixes: https://github.com/llvm/llvm-project/issues/61390
The SystemZ backend will try to reuse an existing subtraction of two values
whenever they are to be compared for equality. This depends on the SystemZ
subtraction instruction setting the condition code, which can also signal
overflow.
A later pass will remove the compare and reuse the CC from the subtraction
directly. However, if that subtraction has the NSW flag set it will not
include the overflow bit in the updated CC user. That was a bug which can
lead to wrong results, as shown by a csmith program.
Fixes: https://github.com/llvm/llvm-project/issues/61268
Reviewed By: nikic, uweigand
Differential Revision: https://reviews.llvm.org/D145811
ValueTracking attempts to match compare+select patterns to FP min/max
operations, but it was created before the newer IEEE-754-2019
minimum/maximum ops were defined. Ie, matchSelectPattern() does not
account for the -0.0/+0.0 behavior that is specified in the newer
standard.
FMINIMUM/FMAXIMUM nodes were created to map to the newer standard:
/// FMINIMUM/FMAXIMUM - NaN-propagating minimum/maximum that also treat -0.0
/// as less than 0.0. While FMINNUM_IEEE/FMAXNUM_IEEE follow IEEE 754-2008
/// semantics, FMINIMUM/FMAXIMUM follow IEEE 754-2018 draft semantics.
We could adjust ValueTracking to deal with signed zero, but it seems like
a moot point given the divergent NaN behavior discussed in D143056, so just
delete this possibility to avoid bugs when converting IR to SDAG.
Differential Revision: https://reviews.llvm.org/D143106
Returning true from this method for PCREL_WRAPPER and PCREL_OFFSET avoids
problems when a PCREL_OFFSET node ends up with a freeze operand, which is not
handled or expected by the backend.
Fixes#60107
Reviewed By: uweigand, RKSimon
Differential Revision: https://reviews.llvm.org/D142971
Only allow replacements of nodes that have a single user. This is better as
simple instructions (e.g. XGRK) are one cycle faster, and it helps in cases
where both inputs share a common node.
Review: Ulrich Weigand
Add support for _FLT_ROUNDS_ in SystemZ.
Patch by Tulio Magno Quites Machado Filho.
Reviewed By: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D140988