Following discussions in #110443, and the following earlier discussions
in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html,
https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this
PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine`
interface classes. More specifically:
1. Makes `TargetMachine` the only class implemented under
`TargetMachine.h` in the `Target` library.
2. `TargetMachine` contains target-specific interface functions that
relate to IR/CodeGen/MC constructs, whereas before (at least on paper)
it was supposed to have only IR/MC constructs. Any Target that doesn't
want to use the independent code generator simply does not implement
them, and returns either `false` or `nullptr`.
3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming
aims to make the purpose of `LLVMTargetMachine` clearer. Its interface
was moved under the CodeGen library, to further emphasis its usage in
Targets that use CodeGen directly.
4. Makes `TargetMachine` the only interface used across LLVM and its
projects. With these changes, `CodeGenCommonTMImpl` is simply a set of
shared function implementations of `TargetMachine`, and CodeGen users
don't need to static cast to `LLVMTargetMachine` every time they need a
CodeGen-specific feature of the `TargetMachine`.
5. More importantly, does not change any requirements regarding library
linking.
cc @arsenm @aeubanks
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.
Note: I don't have commit access.
This fixes all the places that hit the new assertion added in
https://github.com/llvm/llvm-project/pull/106524 in tests. That is,
cases where the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.
The fixes either set the correct value for isSigned, set the
implicitTrunc flag, or perform more calculations inside APInt.
Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
Added .setMIFlag(MachineInstr::FrameSetup) to all BuildMI calls in
HexagonFrameLowering::insertAllocframe. This change ensures that the
test sugared-constants.ll passes upstream by correctly marking
instructions as part of the frame setup.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704,
and extends the costing API for compares and selects to provide
information about the operands passed in an analogous manner. This
allows us to model the cost of materializing the vector constant, as
some select-of-constants are significantly more expensive than others
when you account for the cost of materializing the constants involved.
This is a stepping stone towards fixing
https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch
will be required to utilize the new API.
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
This patch does 3 things:
1. Add support for optimizing the address mode of HVX load/store
instructions
2. Reduce the value of Add instruction immediates by replacing with the
difference from other Addi instructions that share common base:
For Example, If we have the below sequence of instructions: r1 =
add(r2,# 1024) ... r3 = add(r2,# 1152) ... r4 = add(r2,# 1280)
Where the register r2 has the same reaching definition, They get
modified to the below sequence:
r1 = add(r2,# 1024)
...
r3 = add(r1,# 128)
...
r4 = add(r1,# 256)
3. Fixes a bug pass where the addi instructions were modified based on a
predicated register definition, leading to incorrect output.
Eg:
INST-1: if (p0) r2 = add(r13,# 128)
INST-2: r1 = add(r2,# 1024)
INST-3: r3 = add(r2,# 1152)
INST-4: r5 = add(r2,# 1280)
In the above case, since r2's definition is predicated, we do not want
to modify the uses of r2 in INST-3/INST-4 with add(r1,#128/256)
4.Fixes a corner case
It looks like we never check whether the offset register is actually
live (not clobbered) at optimization site. Add the check whether it is
live at MBB entrance. The rest should have already been verified.
5. Fixes a bad codegen
For whatever reason we do transformation without checking if the value
in register actually reaches the user. This is second identical fix for
this pass.
Co-authored-by: Anirudh Sundar <quic_sanirudh@quicinc.com>
Co-authored-by: Sergei Larin <slarin@quicinc.com>
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.
This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.
This pass utilizes the new Hexagon Mask Instruction.
Authored by : Harsha Jagasia, Krzysztof Parzyszek
Co-authored-by: Harsha Jagasia <harsha.jagasia@gmail.com>
Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>
When concatenation of vector instructions is formed, as a part of it
vector rotation is performed. The direction of the shift was not
correctly calculated. This fixes the rotation factor.
When the constant extender optimization pass encounters an instruction
that uses an extended address pointing to another function's block,
avoid adding the instruction to the extender list for the current
machine function.
Fixes https://github.com/llvm/llvm-project/issues/99714
This reverts commit be5a845e4c29aadb513ae6e5e2879dccf37efdbb.
This causes large code size regressions, which were not present
in the initial version of this change.
This builds on top of commit 9d0754ada5dbbc0c009bcc2f7824488419cc5530
("[MC] Relax fragments eagerly") and relaxes fragments eagerly to
eliminate MCSection::HasLayout and `getFragmentOffset` overhead. The
approach is slightly different from
1a47f3f3db66589c11f8ddacfeaecc03fb80c510 and has less performance
benefit.
The new layout algorithm also addresses the following problems:
* Size change of MCFillFragment/MCOrgFragment did not influence the
fixed-point iteration, which could be problematic for contrived cases.
* The `invalid number of bytes` error was reported too early. Since
`.zero A-B` might have temporary negative values in the first few
iterations.
* X86AsmBackend::finishLayout performed only one iteration, which might
not converge. In addition, the removed `#ifndef NDEBUG` code (disabled
by default) in X86AsmBackend::finishLayout was problematic, as !NDEBUG
and NDEBUG builds evaluated fragment offsets at different times before
this patch.
* The computed layout for relax-recompute-align.s is optimal now.
Builds with many text sections (e.g. full LTO) shall observe a decrease
in compile time while the new algorithm could be slightly slower for
some -O0 -g projects.
Aligned bundling from the deprecated PNaCl placed constraints how we can
perform iteration.
This reverts commit 1a47f3f3db66589c11f8ddacfeaecc03fb80c510.
Fix#100283
This commit is actually a trigger of other preexisting problems:
* Size change of fill fragments does not influence the fixed-point iteration.
* The `invalid number of bytes` error is reported too early. Since
`.zero A-B` might have temporary negative values in the first few
iterations.
However, the problems appeared at least "benign" (did not affect the
Linux kernel builds) before this commit.
It is now translated to `<1 x i64>`, which allows the removal of a bunch
of special casing.
This _incompatibly_ changes the ABI of any LLVM IR function with
`x86_mmx` arguments or returns: instead of passing in mmx registers,
they will now be passed via integer registers. However, the real-world
incompatibility caused by this is expected to be minimal, because Clang
never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>`
or `double`, depending on ABI.
This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type.
That type simply no longer corresponds to an IR type, and is used only
by MMX intrinsics and inline-asm operands.
Because SelectionDAGBuilder only knows how to generate the
operands/results of intrinsics based on the IR type, it thus now
generates the intrinsics with the type MVT::v1i64, instead of
MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus
have the X86 backend fix them up in DAGCombine. (This may be a
short-lived hack, if all the MMX intrinsics can be removed in upcoming
changes.)
Works towards issue #98272.
Rebase of #84114. I've only included the core changes to frame layout
calculation & CFI generation which sidesteps the regressions found after
merging #84114. Since these changes are a necessary precursor to the
overall fix and are themselves slightly beneficial as CFI is now
generated correctly, I think it is reasonable to merge this first step.
---
For very large stack frames, the offset from the stack pointer to a
local can be more than 2^31 which overflows various `int` offsets in the
frame lowering code.
This patch updates the frame lowering code to calculate the offsets as
64-bit values and fixes CFI to use the corrected sizes.
After this patch, additional work is needed to fix offset truncations in
each target's codegen.
This builds on top of commit 9d0754ada5dbbc0c009bcc2f7824488419cc5530
("[MC] Relax fragments eagerly") and relaxes fragments eagerly to
eliminate MCSection::HasLayout and `getFragmentOffset` overhead.
Note: The removed `#ifndef NDEBUG` code (disabled by default) in
X86AsmBackend::finishLayout was problematic, as (a) !NDEBUG and NDEBUG
builds evaluated fragment offsets at different times before this patch
(b) one iteration might not be sufficient to converge. There might be
some edge cases that it did not handle. Anyhow, this patch probably
makes it work for more cases.
The parameter is confusing as it duplicates MCStreamer::isVeboseAsm
(initialized from MCTargetOptions::AsmVerbose). After
233cca169237b91d16092c82bd55ee6a283afe98, no in-tree target uses the
parameter.
This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5.
I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
MachineFunction's probably should not include a backreference to
the owning MachineModuleInfo. Most of these references were used
just to query the MCContext, which MachineFunction already directly
stores. Other contexts are using it to query the LLVMContext, which
can already be accessed through the IR function reference.
Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up
miscompiling.
This patch also refactors the logic out of isInTailCallPosition() into the callers.
As a result memcpy and memmove are also modified to do the same thing
for consistency.
rdar://131419786