The stack slot coloring pass is concerned with optimizing spill
slots. If any change is a pass is made over the function to remove
stack stores that use the same register and stack slot as an
immediately preceding load.
The register check is too simple for constant registers like AArch64
and RISC-V's zero register. This register can be used as the result
of a load if we want to discard the result, but still have the memory
access performed. Like for a volatile or atomic load.
If the code sees a load from the zero register followed by a store
of the zero register at the same stack slot, the pass mistakenly
believes the store isn't needed.
Since the main stack coloring optimization is only concerned with
spill slots, it seems reasonable that RemoveDeadStores should
only be concerned with spills. Since we never generate a reload of
x0, this avoids the issue seen by RISC-V.
Test case concept is adapted from pr30821.mir from X86. That test
had to be updated to mark the stack slot as a spill slot.
Fixes#80052.
RegisterBank Selection for scalable vector G_ADD and G_SUB by creating
new mappings for different types of vector register banks.
Then implement Instruction Selection for the same operations by choosing
the correct RISC-V vector register class.
This patch adds support for common and local symbols in the TOC for AIX.
Note that we need to update isVirtualSection so as a common symbol in
TOC will have the symbol type XTY_CM and will be initialized when placed
in the TOC so sections with this type are no longer virtual.
---------
Co-authored-by: Zaara Syeda <syzaara@ca.ibm.com>
https://github.com/llvm/llvm-project/pull/78171 added support for
non-consecutive local value numbers. This extends the support for global
value numbers (for globals and functions).
This means that it is now possible to delete an unnamed global
definition/declaration without breaking the IR.
This is a lot less common than unnamed local values, but it seems like
something we should support for consistency. (Unnamed globals are used a
lot in Rust though.)
Extra space causes the checks generated by update_mir_test_checks to be
unavailable.
```
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
# RUN: llc -mtriple=x86_64-- -o - %s -run-pass=none -verify-machineinstrs -simplify-mir | FileCheck %s
---
name: foo
body: |
; CHECK-LABEL: name: foo
; CHECK: bb.0:
; CHECK-NEXT: successors:
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: RET 0, $eax
bb.0:
successors:
bb.1:
RET 0, $eax
...
```
The failure log is as follows:
```
llvm/test/CodeGen/MIR/X86/unreachable-block-print.mir:9:16: error: CHECK-NEXT: is on the same line as previous match
; CHECK-NEXT: {{ $}}
^
<stdin>:21:13: note: 'next' match was here
successors:
^
<stdin>:21:13: note: previous match ended here
successors:
```
This reverts commit f8525030004f907cd108e7c18df255a6d3b23124.
It was supposed to speed things up but llvm-compile-time-tracker.com
showed a slight slow down.
Previously we called ignoreCSRForAllocationOrder on every alias of every
CSR which was expensive on targets like AMDGPU which define a very large
number of overlapping register tuples.
On such targets it is simpler and faster to call
ignoreCSRForAllocationOrder once for every physical register.
Differential Revision: https://reviews.llvm.org/D146735
This is a fix for the regression seen in
https://github.com/llvm/llvm-project/pull/79498
> Currently, the way that recomputeLiveIns works is that it will
recompute the livein registers for that MachineBasicBlock but it matters
what order you call recomputeLiveIn which can result in incorrect
register allocations down the line.
Now we do not recompute the entire CFG but we do ensure that the newly
added MBB do reach convergence.
32-bit ARMv6 with thumb doesn't support MULHS/MUL_LOHI as legal/custom
nodes during expansion which will cause fixed point multiplication of
_Accum types to fail with fixed point arithmetic. Prior to this, we just
happen to use fixed point multiplication on platforms that happen to
support these MULHS/MUL_LOHI.
This patch attempts to check if the multiplication can be done via
libcalls, which are provided by the arm runtime. These libcall attempts
are made elsewhere, so this patch refactors that libcall logic into its
own functions and the fixed point expansion calls and reuses that logic.
Change the implementation of getLastCalleeSavedAlias to use RegUnits
instead of register aliases. This is much faster on targets like AMDGPU
which define a very large number of overlapping register tuples.
No functional change intended. If PhysReg overlaps multiple CSRs then
getLastCalleeSavedAlias(PhysReg) could conceivably return a different
arbitrary one, but currently it is only used for some debug printing
anyway.
Differential Revision: https://reviews.llvm.org/D146734
Legalize G_ABS for larger/smaller width vectors with legal element sizes
Fallsback for the smaller width vector tests because it is unable to
legalize for G_ANYEXT smaller width vectors
This fills out the fcmp handling to be more like the other instructions,
adding better support for fp16 and some larger vectors.
Select of f16 values is still not handled optimally in places as the
select is only legal for s32 values, not s16. This would be correct for
integer but not necessarily for fp. It is as if we need to do
legalization -> regbankselect -> extra legaliation -> selection.
The use of SmallDenseMap saves 0.48% of heap allocations during the
compilation of a large preprocessed file, namely X86ISelLowering.cpp,
for the X86 target. During the experiment, the maximum size of
WorklistMap was 24 or less 74% of the time. (Note that DenseMap has
the maximum occupancy rate of 3/4.)
Currently, the way that recomputeLiveIns works is that it will recompute
the livein registers for that MachineBasicBlock but it matters what
order you call recomputeLiveIn which can result in incorrect register
allocations down the line.
This PR fixes that by simply recomputing the liveins for the entire CFG
until convergence is achieved. This makes it harder to introduce subtle
bugs which alter liveness.
This pass should be the last machine function pass in pipeline, also
ignore `PI.runAfterPass(*P, MF, PassPA);` to avoid accessing a dangling
reference.
b1ae461a5358932851de42b66ffde8748da51a83 renamed Cycle ->
ReleaseAtCycle.
7e09239e24b339f45f63a670e2e831150826bf70 was committed without rebasing
but used the old Cycle syntax.
This caused a build failure when
7e09239e24b339f45f63a670e2e831150826bf70 was squash-and-merged. This
patch fixes this problem.
TargetSchedule.td explicitly allows the usage of a ProcResource for zero
cycles, in order to represent that the ProcResource must be available
but is not consumed by the instruction. On the other hand,
ResourceSegments explicitly does not allow for a zero sized interval. In
order to remedy this, this patch handles the special case of when there
is an empty interval usage of a resource by not adding an empty
interval.
We ran into this issue downstream, but it makes sense to have
this upstream since it is explicitly allowed by TargetSchedule.td.
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.
TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.
patch 3 from: https://github.com/llvm/llvm-project/pull/73337
`CodeGenPassBuilder` is not very tightly coupled to CodeGen, it may need
to reference some method in pass builder in future, so move
`CodeGenPassBuilder.h` to Passes.
Add new pass manager support to `llc`. Users can use
`--passes=pass1,pass2...` to run mir passes, and use `--enable-new-pm`
to run default codegen pipeline.
This patch is taken from [D83612](https://reviews.llvm.org/D83612), the
original author is @yuanfang-chen.
---------
Co-authored-by: Yuanfang Chen <455423+yuanfang-chen@users.noreply.github.com>
Now that the work embedding PGO information in SHT_LLVM_BB_ADDR_MAP ELF
sections has landed, there is no longer a need to keep around the
mbb-profile-dump flag.
This patch adds basic TLSDESC support in the RISC-V backend.
Specifically, we add new relocation types for TLSDESC, as prescribed in
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373, and add a
new pseudo instruction to simplify code generation.
This patch does not try to optimize the local dynamic case, which can be
improved in separate patches.
Linker side changes will also be handled separately.
The current implementation is only enabled when passing the new
`-enable-tlsdesc` codegen flag.
This patch makes a SmallVector slightly larger. We encounter quite a
few instructions with 3 or 4 defs but very few beyond that on X86.
This saves 0.39% of heap allocations during the compilation of a large
preprocessed file, namely X86ISelLowering.cpp, for the X86 target.
Fixes#78897 - although the test case still has a number of poor codegen issues (in particular for i686 triples) that will need addressing (combining the nodes in topological order should help).
The RemoveDIs project is aiming to eliminate debug intrinsics like
dbg.value and dbg.declare from LLVM, and replace them with DPValue objects
attached to instructions. ISel is one of the "terminals" where that
information needs to be converted into MIR format: this patch implements
support for that in GlobalISel. We aim for the output of LLVM to be
identical with/without RemoveDIs debug-info.
This patch should be NFC, as we're handling the same data about variables
stored in a different format -- it now appears in a DPValue object rather
than as an intrinsic. To that end, I've refactored the handling of
dbg.values into a dedicated function, and call it whenever a dbg.value or a
DPValue is encountered. dbg.declare is handled in a similar way.
Testing: adding the --try-experimental-debuginfo-iterators switch to llc
causes it to try and convert to the "new" debug-info format if it's built
in (LLVM_EXPERIMENTAL_DEBUGINFO_ITERATORS=On), and it'll be covered by our
buildbot. One test has a few extra wildcard-regexes added: this is because
there's some extra data printed about attached debug-info, which is safe to
ignore.
This patch adds support for DPVAssigns across all of
AssignmentTrackingAnalysis except for AssignmentTrackingLowering, which
is implemented in a separate patch. This patch includes handling
DPValues in MemLocFragFill, the removal of redundant DPValues as part of
AssignmentTrackingAnalysis (which is different to the version in
`BasicBlockUtils.cpp`), and preventing the DPVAssigns from being
directly emitted in SelectionDAG (just as we don't emit llvm.dbg.assigns
directly, but receive a set of locations from
AssignmentTrackingAnalysis' output).
Make Candidate's front() and back() functions return references to
MachineInstr and introduce begin() and end() returning iterators, the
same way it is usually done in other container-like classes.
This makes possible to iterate over the instructions contained in
Candidate the same way one can iterate over MachineBasicBlock (note that
begin() and end() return bundled iterators, just like MachineBasicBlock
does, but no instr_begin() and instr_end() are defined yet).
Following on from the previous patch 6aeb7a7,
this patch adds the necessary code to process the DPV equivalents of
llvm.dbg.assign intrinsics. Most of the content of this patch is simply
duplicating existing functionality, using generic code for simple
functions and PointerUnions where storage is required. The most complex
changes are in the places that iterate over instructions, as iterating
over DPValues between instructions is different to iterating over
instructions that may or may not be debug intrinsics; this is most
complex in `AssignmentTrackingLowering::process`, where I've added some
comments to explain the state of the program at each key point depending
on whether we are operating on intrinsics or DPValues.
This combines the previously posted patches with some additional work
I've done to more closely match MSVC output.
Most of the important logic here is implemented in
AArch64Arm64ECCallLowering. The purpose of the
AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for
other targets, and generate most of the Arm64EC-specific bits:
generating thunks, mangling symbols, generating aliases, and generating
the .hybmp$x table. This is all done late for a few reasons: to
consolidate the logic as much as possible, and to ensure the IR exposed
to optimization passes doesn't contain complex arm64ec-specific
constructs.
The other changes are supporting changes, to handle the new constructs
generated by that pass.
There's a global llvm.arm64ec.symbolmap representing the .hybmp$x
entries for the thunks. This gets handled directly by the AsmPrinter
because it needs symbol indexes that aren't available before that.
There are two new calling conventions used to represent calls to and
from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few
changes to handle the associated exception-handling info,
SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX.
I've intentionally left out handling for structs with small
non-power-of-two sizes, because that's easily separated out. The rest of
my current work is here. I squashed my current patches because they were
split in ways that didn't really make sense. Maybe I could split out
some bits, but it's hard to meaningfully test most of the parts
independently.
Thanks to @dpaoliello for extensive testing and suggestions.
(Originally posted as https://reviews.llvm.org/D157547 .)
Although there are predicated versions of minnum/maxnum, the ones for
minimum/maximum are currently missing. This patch introduces these
intrinsics and implements their lowering to RISC-V.
Some operations behave like selects. For example `or(zext(c), y)` is the
same as select(c, y|1, y)` and instcombine can canonicalize the select
to the or form. These operations can still be worthwhile converting to
branch as opposed to keeping as a select or or instruction.
This patch attempts to add some basic handling for them, creating a
SelectLike abstraction in the select optimization pass. The backend can
opt into handling `or(zext(c),x)` as a select if it could be profitable,
and the select optimization pass attempts to handle them in much the
same way as a `select(c, x|1, x)`. The Or(x, 1) may need to be added as
a new instruction, generated as the or is converted to branches.
This helps fix a regression from selects being converted to or's
recently.