6 Commits

Author SHA1 Message Date
Tobias Stadler
84a6a057e6 [AArch64][GlobalISel] Select G_UADDE/G_SADDE/G_USUBE/G_SSUBE
This implements the remaining overflow generating instructions in the AArch64
GlobalISel selector. Now wide add/sub operations do not fallback to SelectionDAG
anymore. We make use of PostSelectOptimize to cleanup the hereby generated
flag-setting operations when the carry-out is unused. Since we do not fallback
anymore when selecting add/sub atomics on O0 some test changes were required
there.

Fixes: https://github.com/llvm/llvm-project/issues/59407

Differential Revision: https://reviews.llvm.org/D153164
2023-06-25 14:32:00 -07:00
Amara Emerson
974cf71649 [AArch64][GlobalISel] Add a simple cross-regclass copy optimization post-selection.
This does some trivial cross-regclass folding, where we can either do some extra
constraining to eliminate the copy or modify uses to use a smaller regclass.

There are minor code size improvements on average.

Program                                       size.__text
                                              before         after           diff
tramp3d-v4/tramp3d-v4                         366000.00      366012.00       0.0%
mafft/pairlocalalign                          248196.00      248188.00      -0.0%
7zip/7zip-benchmark                           568612.00      568592.00      -0.0%
kimwitu++/kc                                  434704.00      434676.00      -0.0%
Bullet/bullet                                 456128.00      456096.00      -0.0%
sqlite3/sqlite3                               284136.00      284100.00      -0.0%
ClamAV/clamscan                               381492.00      381396.00      -0.0%
SPASS/SPASS                                   412052.00      411944.00      -0.0%
lencod/lencod                                 428060.00      427912.00      -0.0%
consumer-typeset/consumer-typeset             413148.00      411116.00      -0.5%
                           Geomean difference                               -0.1%

Differential Revision: https://reviews.llvm.org/D136793
2022-11-01 16:09:21 -07:00
Shengchen Kan
37b378386e [NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments 2022-03-16 20:25:42 +08:00
Michael Benfield
00d19c6704 [various] Remove or use variables which are unused but set.
This is in preparation for the -Wunused-but-set-variable warning.

Differential Revision: https://reviews.llvm.org/D102942
2021-06-01 15:38:48 -07:00
Amara Emerson
eb55203e00 [AArch64][GlobalISel][PostSelectOpt] Constrain reg operands after mutating instructions.
The non-flag setting variants of instructions may have different regclass
requirements. If so, we need to constrain them.

Differential Revision: https://reviews.llvm.org/D97343
2021-02-23 19:32:18 -08:00
Amara Emerson
0f0fd383b4 [AArch64][GlobalISel] Introduce a new post-isel optimization pass.
There are two optimizations here:

1. Consider the following code:
 FCMPSrr %0, %1, implicit-def $nzcv
 %sel1:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
 %sub:gpr32 = SUBSWrr %_, %_, implicit-def $nzcv
 FCMPSrr %0, %1, implicit-def $nzcv
 %sel2:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
This kind of code where we have 2 FCMPs each feeding a CSEL can happen
when we have a single IR fcmp being used by two selects. During selection,
to ensure that there can be no clobbering of nzcv between the fcmp and the
csel, we have to generate an fcmp immediately before each csel is
selected.

However, often we can essentially CSE these together later in MachineCSE.
This doesn't work though if there are unrelated flag-setting instructions
in between the two FCMPs. In this case, the SUBS defines NZCV
but it doesn't have any users, being overwritten by the second FCMP.

Our solution here is to try to convert flag setting operations between
a interval of identical FCMPs, so that CSE will be able to eliminate one.

2. SelectionDAG imported patterns for arithmetic ops currently select the
flag-setting ops for CSE reasons, and add the implicit-def $nzcv operand
to those instructions. However if those impdef operands are not marked as
dead, the peephole optimizations are not able to optimize them into non-flag
setting variants. The optimization here is to find these dead imp-defs and
mark them as such.

This pass is only enabled when optimizations are enabled.

Differential Revision: https://reviews.llvm.org/D89415
2020-10-23 10:18:36 -07:00