This implements the remaining overflow generating instructions in the AArch64
GlobalISel selector. Now wide add/sub operations do not fallback to SelectionDAG
anymore. We make use of PostSelectOptimize to cleanup the hereby generated
flag-setting operations when the carry-out is unused. Since we do not fallback
anymore when selecting add/sub atomics on O0 some test changes were required
there.
Fixes: https://github.com/llvm/llvm-project/issues/59407
Differential Revision: https://reviews.llvm.org/D153164
This does some trivial cross-regclass folding, where we can either do some extra
constraining to eliminate the copy or modify uses to use a smaller regclass.
There are minor code size improvements on average.
Program size.__text
before after diff
tramp3d-v4/tramp3d-v4 366000.00 366012.00 0.0%
mafft/pairlocalalign 248196.00 248188.00 -0.0%
7zip/7zip-benchmark 568612.00 568592.00 -0.0%
kimwitu++/kc 434704.00 434676.00 -0.0%
Bullet/bullet 456128.00 456096.00 -0.0%
sqlite3/sqlite3 284136.00 284100.00 -0.0%
ClamAV/clamscan 381492.00 381396.00 -0.0%
SPASS/SPASS 412052.00 411944.00 -0.0%
lencod/lencod 428060.00 427912.00 -0.0%
consumer-typeset/consumer-typeset 413148.00 411116.00 -0.5%
Geomean difference -0.1%
Differential Revision: https://reviews.llvm.org/D136793
The non-flag setting variants of instructions may have different regclass
requirements. If so, we need to constrain them.
Differential Revision: https://reviews.llvm.org/D97343
There are two optimizations here:
1. Consider the following code:
FCMPSrr %0, %1, implicit-def $nzcv
%sel1:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
%sub:gpr32 = SUBSWrr %_, %_, implicit-def $nzcv
FCMPSrr %0, %1, implicit-def $nzcv
%sel2:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
This kind of code where we have 2 FCMPs each feeding a CSEL can happen
when we have a single IR fcmp being used by two selects. During selection,
to ensure that there can be no clobbering of nzcv between the fcmp and the
csel, we have to generate an fcmp immediately before each csel is
selected.
However, often we can essentially CSE these together later in MachineCSE.
This doesn't work though if there are unrelated flag-setting instructions
in between the two FCMPs. In this case, the SUBS defines NZCV
but it doesn't have any users, being overwritten by the second FCMP.
Our solution here is to try to convert flag setting operations between
a interval of identical FCMPs, so that CSE will be able to eliminate one.
2. SelectionDAG imported patterns for arithmetic ops currently select the
flag-setting ops for CSE reasons, and add the implicit-def $nzcv operand
to those instructions. However if those impdef operands are not marked as
dead, the peephole optimizations are not able to optimize them into non-flag
setting variants. The optimization here is to find these dead imp-defs and
mark them as such.
This pass is only enabled when optimizations are enabled.
Differential Revision: https://reviews.llvm.org/D89415