For cpu=future, acc registers no longer overlap VSRs and are prefixed
with `dm`. The original, xxmfacc/xxmtacc instructions are now extended
menemonics to it's dm* equivalents.
This change removes the uint64_t constructor on LocationSize
preventing implicit conversion, and fixes up the using APIs to adapt to
the change. Note that I'm adding a couple of explicit conversion points
on routines where passing in a fixed offset as an integer seems likely
to have well understood semantics.
We had an unfortunate case which arose if you tried to pass a TypeSize
value to a parameter of LocationSize type. We'd find the implicit
conversion path through TypeSize -> uint64_t -> LocationSize which works
just fine for fixed values, but looses information and fails assertions
if the TypeSize was scalable. This change breaks the first link in that
implicit conversion chain since that seemed to be the easier one.
ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated,
using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO,
UADDO_CARRY, USUBO, USUBO_CARRY in the patch.
NVPTX, SPIRV, and WebAssembly pass virtual registers to this function
since they don't perform register allocation. We need to use Register to
avoid a virtual register being converted to MCRegister by the caller.
Use nonstatic member instead. This requires explicit conversions, but
many will go away as we continue converting unsigned to Register.
In a few places where it was simple, I changed unsigned to Register.
ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated,
using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO,
UADDO_CARRY, USUBO, USUBO_CARRY in the patch.
At some corner cases, the cloned MI still retains an old slot index,
which leads to the compiler crashing. This patch update the slot index
map before delete the recycled MI.
https://github.com/llvm/llvm-project/issues/123165
This patch is in preparation to enable setting the MachineInstr::MIFlag
flags, i.e. FrameSetup/FrameDestroy, on callee saved register
spill/reload instructions in prologue/epilogue. This eventually helps in
setting the prologue_end and epilogue_begin markers more accurately.
The DWARF Spec in "6.4 Call Frame Information" says:
The code that allocates space on the call frame stack and performs the
save
operation is called the subroutine’s prologue, and the code that
performs
the restore operation and deallocates the frame is called its epilogue.
which means the callee saved register spills and reloads are part of
prologue (a.k.a frame setup) and epilogue (a.k.a frame destruction),
respectively. And, IIUC, LLVM backend uses FrameSetup/FrameDestroy flags
to identify instructions that are part of call frame setup and
destruction.
In the trunk, while most targets consistently set
FrameSetup/FrameDestroy on save/restore call frame information (CFI)
instructions of callee saved registers, they do not consistently set
those flags on the actual callee saved register spill/reload
instructions.
I believe this patch provides a clean mechanism to set
FrameSetup/FrameDestroy flags on the actual callee saved register
spill/reload instructions as needed. And, by having default argument of
MachineInstr::NoFlags for Flags, this patch is a NFC.
With this patch, the targets have to just pass FrameSetup/FrameDestroy
flag to the storeRegToStackSlot/loadRegFromStackSlot calls from the
target derived spillCalleeSavedRegisters and restoreCalleeSavedRegisters
to set those flags on callee saved register spill/reload instructions.
Also, this patch makes it very easy to set the source line information
on callee saved register spill/reload instructions which is needed by
the DwarfDebug.cpp implementation to set prologue_end and epilogue_begin
markers more accurately.
As per DwarfDebug.cpp implementation:
prologue_end is the first known non-DBG_VALUE and non-FrameSetup
location
that marks the beginning of the function body
epilogue_begin is the first FrameDestroy location that has been seen in
the
epilogue basic block
With this patch, the targets have to just do the following to set the
source line information on callee saved register spill/reload
instructions, without hampering the LLVM's efforts to avoid adding
source line information on the artificial code generated by the
compiler.
<Foo>InstrInfo::storeRegToStackSlot() {
...
DebugLoc DL =
Flags & MachineInstr::FrameSetup ? DebugLoc() : MBB.findDebugLoc(I);
...
}
<Foo>InstrInfo::loadRegFromStackSlot() {
...
DebugLoc DL =
Flags & MachineInstr::FrameDestroy ? MBB.findDebugLoc(I) : DebugLoc();
...
}
While I understand this patch would break out-of-tree backend builds, I
think it is in the right direction.
One immediate use case that can benefit from this patch is fixing
#120553 becomes simpler.
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of
`TargetRegisterInfo::getRegPressureSetLimit` with some logics to
adjust the limit by removing reserved registers.
It seems that we shouldn't use
`TargetRegisterInfo::getRegPressureSetLimit`
directly, just like the comment "This limit must be adjusted
dynamically for reserved registers" said.
Separate from https://github.com/llvm/llvm-project/pull/118787
Fixes: https://github.com/llvm/llvm-project/issues/71030
Bug only happens in 64bit involving spills. Since we don't know when the
spill will happen, all instructions in the chain used to deduce sign
extension for eliminating 'extsw' will need to be promoted to 64-bit
pseudo instructions.
The following instruction will promoted in PPCMIPeepholes: EXTSH, LHA,
ISEL to EXTSH8, LHA8, ISEL8
Add support for using a thread-local variable with a specified offset
for holding the stack guard canary value. This supports both 32- and 64-
bit PowerPC targets.
This mirrors changes from #108942 but targeting PowerPC instead of
RISCV. Because both of these PRs modify the same driver functions, this
series is stack on top of the RISC-V one.
---------
Signed-off-by: Keith Packard <keithp@keithp.com>
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.
This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.
This patch adds support for toc-data for 64-bit large code-model on AIX.
The sequence ADDIStocHA8/ADDItocL8 is used to access the data directly
from the TOC.
When emitting the instruction ADDIStocHA8, we check if the symbol has
toc-data attribute before creating a toc entry for it. When emitting the
instruction ADDItocL8, we use the LA8 instruction to load the address.
Fixes#82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
We split target-dependent MachineCombiner patterns into their target
folder.
This makes MachineCombiner much more target-independent.
Reviewers:
davemgreen, asavonic, rotateright, RKSimon, lukel97, LuoYuanke, topperc, mshockwave, asi-sc
Reviewed By: topperc, mshockwave
Pull Request: https://github.com/llvm/llvm-project/pull/87991
In preparation of adding a similar instruction for large code model on
AIX for 32-bit, rename the exisitng ADDItocL 64-instruction to ADDItocL8
to match the naming convention of other instructions with 32-bit and
64-bit variants.
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
Make __builtin_cpu_{init|supports|is} target independent and provide an
opt-in query for targets that want to support it. Each target is still
responsible for their specific lowering/code-gen. Also provide code-gen
for PowerPC.
I originally proposed this in https://reviews.llvm.org/D152914 and this
addresses the comments I received there.
---------
Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn>
Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>
12 bit is not enough for PPC's target specific flags. If 8 bit for the
bitmask flags, 4 bit for the direct mask, PPC can total have 16 direct
mask and 8 bitmask. Not enough for PPC, see this issue in
https://github.com/llvm/llvm-project/pull/66316
Redesign how PPC target set the target specific flags. With this patch,
all ppc target flags are direct flags. No bitmask flag in PPC anymore.
This patch aligns with some targets like X86 which also has many target
specific flags.
The patch also fixes a bug related to flag `MO_TLSGDM_FLAG` and `MO_LO`.
They are the same value and the test case changes in this PR shows the
bug.
These are picked up from getMemOperandsWithOffsetWidth but weren't then
being passed through to shouldClusterMemOps, which forces backends to
collect the information again if they want to use the kind of heuristics
typically used for the similar shouldScheduleLoadsNear function (e.g.
checking the offset is within 1 cache line).
This patch just adds the parameters, but doesn't attempt to use them.
There is potential to use them in the current PPC and AArch64
shouldClusterMemOps implementation, and I intend to use the offset in
the heuristic for RISC-V. I've left these for future patches in the
interest of being as incremental as possible.
As noted in the review and in an inline FIXME, an ElementCount-style abstraction may later be used to condense these two parameters to one argument. ElementCount isn't quite suitable as it doesn't support negative offsets.
getOperandLatency has the following behavior: it returns -1 as a special
value, negative numbers other than -1 on some target-specific overrides,
or a valid non-negative latency. This behavior can be surprising, as
some callers do arithmetic on these negative values. Change the
interface of getOperandLatency to return a std::optional<unsigned> to
prevent surprises in callers. While at it, change the interface of
getInstrLatency to return unsigned instead of int.
This change was inspired by a refactoring in
TargetSchedModel::computeOperandLatency.
Don't blindly copy the original flags from the pre-reassociated
instrutions.
This copied the integer poison flags which are not safe to preserve
after reassociation.
For the FP flags, I think we should only keep the intersection of
the flags. Override setSpecialOperandAttr to do this.
Fixes#72777.
The MI Peephole pass has grown to include a large number of transformations over the years. Many of the transformations require re-computation of kill flags but don't do a good job of re-computing them. This causes us to have very common failures when the compiler is built with expensive checks. Over time, we added and augmented a function that is supposed to go and fix up kill flags after each transformation but we keep missing cases.
This patch does the following:
- Removes the function to re-compute kill flags
- Adds LiveVariables to compute and maintain kill flags while transforming code
- Adds re-computation of kill flags for the post-RA peepholes for each block that contains a transformed instruction
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D133103
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
See issue #64166 for more information about the layering issue.
The PPCMCTargetDesc library was including CodeGen headers such as
PPCInstrInfo.h and calling inline functions in them. This doesn't work
in the Bazel build, and is error-prone. If the inline function moves to
a cpp file, it will result in linker errors.
To address the issue, I moved several inline functions to
PPCMCTargetDesc.cpp, and declared them in the PPC namespace in
PPCMCTargetDesc.h, which seemed like the most straightforward fix.
Differential Revision: https://reviews.llvm.org/D156488
getOperandConstraint does not return a bool, it returns an int. It
returns -1 if there is no TIED_TO.
Additionally, TIED_TO is only set on use operands not defs and it
points to the def that the use is tied to. So calling it on operand 0
is guaranteed to return -1.
As far as I can tell this code must have been copied from the
generic implementation prior to 6aa2744bed0b8.o
Unfortunately, this code is not executed in lit tests. I just happened
to notice it while looking for other uses of TIED_TO for something
I was working on.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D152754
Currently `isTriviallyReMaterializable` calls
`isReallyTriviallyReMaterializable` and
`isReallyTriviallyReMaterializableGeneric`. The two interfaces
are confusing, but there are also some real issues with this.
The documentation of this function (see below) suggests that
`isReallyTriviallyRematerializable` allows the target to override the
default behaviour.
/// For instructions with opcodes for which the M_REMATERIALIZABLE flag is
/// set, this hook lets the target specify whether the instruction is actually
/// trivially rematerializable, taking into consideration its operands.
It however implements something different. The default behaviour
is the analysis done in `isReallyTriviallyReMaterializableGeneric`,
which is testing if it is safe to rematerialize the MachineInstr.
The result of `isReallyTriviallyReMaterializable` is only considered if
`isReallyTriviallyReMaterializableGeneric` returns `false`. That means
there is no way to override the default behaviour if
`isReallyTriviallyReMaterializableGeneric` returns true (i.e. it is safe to
rematerialize, but we'd rather not).
By making this a single interface, we can override the interface to do either.
Reviewed By: craig.topper, nemanjai
Differential Revision: https://reviews.llvm.org/D156520