The allocation order of 16 bit registers is vgpr0lo16, vgpr0hi16,
vgpr1lo16, vgpr1hi16, vgpr2lo16.... We prefer (essentially require) that
allocation order, because it uses the minimum number of registers. But
when you have 16 bit data passing between 16 and 32 bit instructions you
get lots of COPY.
This patch teach the compiler that a COPY of a 16-bit value from a 32
bit register to a lo-half 16 bit register is free, to a hi-half 16 bit
register is not.
This might get improved to coalescing with additional cases, and perhaps
as an alternative to the RA hints. For now upstreaming this solution
first.
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.
This would be the last analysis required by `PHIElimination`.
The function to generate S_MOV_B64_IMM_PSEUDO was recently modified to
optimize AGPR to AGPR copy but it missed checking for the SGPR
clobbering for the S_MOV_B64_IMM_PSEUDO generation.
Differential Revision: https://reviews.llvm.org/D113005
On targets that do not support AGPR to AGPR copying directly, try to find the
defining accvgpr_write and propagate its source vgpr register to the copies
before register allocation so the source vgpr register does not get clobbered.
The postrapseudos pass also attempt to propagate the defining accvgpr_write but
if the register to propagate is clobbered, it will give up and create new
temporary vgpr registers instead.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D108830
This is to allow 64 bit constant rematerialization. If a constant
is split into two separate moves initializing sub0 and sub1 like
now RA cannot rematerizalize a 64 bit register.
This gives 10-20% uplift in a set of huge apps heavily using double
precession math.
Fixes: SWDEV-292645
Differential Revision: https://reviews.llvm.org/D104874