207 Commits

Author SHA1 Message Date
Jack Andersen
f108c7f59d [GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues.
Expanding on D109750.

Since `DBG_VALUE` instructions have final register validity determined in
`LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune
unused register operands as their defs are erased. Consequently, this renders
`MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a
substantial performance improvement.

The only necessary changes involve making relevant passes consider invalid
DBG_VALUE vregs uses as valid.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D112852
2021-12-05 15:55:59 -05:00
Abinav Puthan Purayil
bc5dbb0bae [GlobalISel] Add matchers for constant splat.
This change exposes isBuildVectorConstantSplat() to the llvm namespace
and uses it to implement the constant splat versions of
m_SpecificICst().

CombinerHelper::matchOrShiftToFunnelShift() can now work with vector
types and CombinerHelper::matchMulOBy2()'s match for a constant splat is
simplified.

Differential Revision: https://reviews.llvm.org/D114625
2021-11-30 15:18:50 +05:30
Mirko Brkusanin
0dd570ff56 [AMDGPU][GlobalISel] Transform (fsub (fpext (fneg (fmul x, y))), z) -> (fneg (fma (fpext x), (fpext y), z))
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D98050
2021-11-29 16:27:22 +01:00
Mirko Brkusanin
37c2a2201d [AMDGPU][GlobalISel] Transform (fsub (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), (fneg z))
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D98049
2021-11-29 16:27:22 +01:00
Mirko Brkusanin
5fe7fcd28e [AMDGPU][GlobalISel] Transform (fsub (fneg (fmul, x, y)), z) -> (fma (fneg x), y, (fneg z))
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D98048
2021-11-29 16:27:22 +01:00
Mirko Brkusanin
a782169270 [AMDGPU][GlobalISel] Transform (fsub (fmul x, y), z) -> (fma x, y, -z)
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D96614
2021-11-29 16:27:22 +01:00
Mirko Brkusanin
e5e49a08f1 [AMDGPU][GlobalISel] Transform (fadd (fma x, y, (fpext (fmul u, v))), z) -> (fma x, y, (fma (fpext u), (fpext v), z))
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D98047
2021-11-29 16:27:21 +01:00
Mirko Brkusanin
f732292536 [AMDGPU][GlobalISel] Transform (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y, (fma u, v, z))
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D97938
2021-11-29 16:27:21 +01:00
Mirko Brkusanin
8951136216 [AMDGPU][GlobalISel] Transform (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D97937
2021-11-29 16:27:21 +01:00
Mirko Brkusanin
881840fc26 [AMDGPU][GlobalISel] Transform (fadd (fmul x, y), z) -> (fma x, y, z)
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D93305
2021-11-29 16:27:21 +01:00
Abinav Puthan Purayil
4af45f10cc [GlobalISel] Fold or of shifts to funnel shift.
This change folds a basic funnel shift idiom:
- (or (shl x, amt), (lshr y, sub(bw, amt))) -> fshl(x, y, amt)
- (or (shl x, sub(bw, amt)), (lshr y, amt)) -> fshr(x, y, amt)

This also helps in folding to rotate shift if x and y are equal since we
already have a funnel shift to rotate combine.

Differential Revision: https://reviews.llvm.org/D114499
2021-11-26 17:05:29 +05:30
Kazu Hirata
259cd6f893 [llvm] Use range-based for loops (NFC) 2021-11-25 22:17:10 -08:00
Kazu Hirata
d45cb1d7ea [llvm] Use range-based for loops (NFC) 2021-11-23 08:54:48 -08:00
Mirko Brkusanin
db6bc2ab51 [AMDGPU][GlobalISel] Fold G_FNEG above when users cannot fold mods
If possible fold fneg into instruction above if users cannot fold mods and we
know it will decrease instruction count.
Follows same logic as SDAG combiner in choosing opportunities to combine.

Differential Revision: https://reviews.llvm.org/D112827
2021-11-17 14:25:13 +01:00
Jon Roelofs
b046eb19b8 [AArch64][GlobalISel] combine (and (or x, c1), c2) => (and x, c2) iff c1 & c2 == 0
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111856
2021-10-20 12:11:52 -07:00
Jon Roelofs
1300677f97 [AArch64][GlobalISel] combine and + [la]sr => ubfx
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111839
2021-10-18 10:33:01 -07:00
Amara Emerson
53ebfa7c5d [AArch64][GlobalISel] Fix combiner assertion in matchConstantOp().
We shouldn't call APInt::getSExtValue() on a >64b value.
2021-10-11 15:55:13 -07:00
Roman Lebedev
684cbae89a
[KnownBits] Introduce countMaxActiveBits() and use it in a few places 2021-10-11 23:36:06 +03:00
Amara Emerson
f95d9c95bb [GlobalISel] Fix the stores of truncates -> wide store combine for non-evenly dividing type sizes.
If the wide store we'd generate is not a multiple of the memory type of the
narrow stores (e.g. s48 and s32), we'd assert. Fix that.
2021-10-09 21:18:20 -07:00
Amara Emerson
17b89f9daa [GlobalISel] Improve G_UMHULH -> LSHR combine to accept non-uniform constant vectors. 2021-10-08 11:25:26 -07:00
Mirko Brkusanin
d20840c937 [GlobalISel] Combine for eliminating redundant operand negations
Differential Revision: https://reviews.llvm.org/D111319
2021-10-08 14:29:22 +02:00
Amara Emerson
08b3c0d995 [GlobalISel] Combine G_UMULH x, (1 << c)) -> x >> (bitwidth - c)
In order to not generate an unnecessary G_CTLZ, I extended the constant folder
in the CSEMIRBuilder to handle G_CTLZ. I also added some extra handing of
vector constants too. It seems we don't have any support for doing constant
folding of vector constants, so the tests show some other useless G_SUB
instructions too.

Differential Revision: https://reviews.llvm.org/D111036
2021-10-07 23:51:37 -07:00
Amara Emerson
8bfc0e06dc [GlobalISel] Port the udiv -> mul by constant combine.
This is a straight port from the equivalent DAG combine.

Differential Revision: https://reviews.llvm.org/D110890
2021-10-07 11:37:17 -07:00
Mirko Brkusanin
40e00063bc [GlobalISel] Combine fabs(fneg(x)) to fabs(x)
Differential Revision: https://reviews.llvm.org/D110943
2021-10-05 13:43:39 +02:00
Jay Foad
a9bceb2b05 [APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807
2021-10-04 08:57:44 +01:00
Amara Emerson
ca8316b704 [GlobalISel] Extend CombinerHelper::matchConstantOp() to match constant splat vectors.
This allows the "x op 0 -> x" fold to optimize vector constant RHSs.

Differential Revision: https://reviews.llvm.org/D110802
2021-09-30 14:31:25 -07:00
Amara Emerson
80f4bb5c61 [GlobalISel] Extend G_SELECT of known condition combine to vectors.
Adds a new utility function: isConstantOrConstantSplatVector().

Differential Revision: https://reviews.llvm.org/D110786
2021-09-30 12:16:44 -07:00
Jessica Paquette
15a24e1fdb [GlobalISel] Combine mulo x, 2 -> addo x, x
Similar to what SDAG does when it sees a smulo/umulo against 2
(see: `DAGCombiner::visitMULO`)

This pattern is fairly common in Swift code AFAICT.

Here's an example extracted from a Swift testcase:

https://godbolt.org/z/6cT8Mesx7

Differential Revision: https://reviews.llvm.org/D110662
2021-09-28 16:59:43 -07:00
Petar Avramovic
d477a7c2e7 GlobalISel/Utils: Refactor integer/float constant match functions
Rework getConstantstVRegValWithLookThrough in order to make it clear if we
are matching integer/float constant only or any constant(default).
Add helper functions that get DefVReg and APInt/APFloat from constant instr
getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT
getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT
getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT

Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal
and getIConstantVRegSExtVal. These now only match G_CONSTANT as described
in comment.

Relevant matchers now return both DefVReg and APInt/APFloat.

Replace existing uses of getConstantstVRegValWithLookThrough and
getConstantVRegVal with new helper functions. Any constant match is
only required in:
ConstantFoldBinOp: for constant argument that was bit-cast of float to int
getAArch64VectorSplat: AArch64::G_DUP operands can be any constant
amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant

In other places use integer only constant match.

Differential Revision: https://reviews.llvm.org/D104409
2021-09-17 11:22:13 +02:00
Konstantin Schwarz
d2e66d7fa4 [GlobalISel] Add a combine for and(load , mask) -> zextload
This only handles simple masks, not shifted masks, for now.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D109357
2021-09-16 10:42:46 +02:00
Ahmed Bougacha
94a2f9cdb6 [GlobalISel] Fix CombinerHelper::isPredecessor for same def/use MI.
The doc comment for isPredecessor says:
  Returns true if \p DefMI precedes \p UseMI or they are the same
  instruction.
And dominates relies on that behavior for its own:
  Returns true if \p DefMI dominates \p UseMI. By definition an
  instruction dominates itself.

Make both statements correct by fixing isPredecessor.
Found by inspection.
2021-09-15 16:45:27 -07:00
Amara Emerson
5ec1845cad [AArch64][GlobalISel] Add a new reassociation for G_PTR_ADDs.
G_PTR_ADD (G_PTR_ADD X, C), Y) -> (G_PTR_ADD (G_PTR_ADD(X, Y), C)

Improves CTMark -Os on AArch64:

Program            before after  diff
           sqlite3 286932 287024  0.0%
                kc 432512 432508 -0.0%
             SPASS 412788 412764 -0.0%
    pairlocalalign 249460 249416 -0.0%
            bullet 475740 475512 -0.0%
    7zip-benchmark 568864 568356 -0.1%
  consumer-typeset 419088 418648 -0.1%
        tramp3d-v4 367628 367224 -0.1%
          clamscan 383184 382732 -0.1%
            lencod 430028 429284 -0.2%
Geomean difference               -0.1%

Differential Revision: https://reviews.llvm.org/D109528
2021-09-14 23:57:41 -07:00
Chris Lattner
735f46715d [APInt] Normalize naming on keep constructors / predicate methods.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`.  This achieves two things:

1) This starts standardizing predicates across the LLVM codebase,
   following (in this case) ConstantInt.  The word "Value" doesn't
   convey anything of merit, and is missing in some of the other things.

2) Calling an integer "null" doesn't make any sense.  The original sin
   here is mine and I've regretted it for years.  This moves us to calling
   it "zero" instead, which is correct!

APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go.  As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.

Included in this patch are changes to a bunch of the codebase, but there are
more.  We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.

Differential Revision: https://reviews.llvm.org/D109483
2021-09-09 09:50:24 -07:00
Amara Emerson
eae44c8a86 [GlobalISel] Implement merging of stores of truncates.
This is a port of a combine which matches a pattern where a wide type scalar
value is stored by several narrow stores. It folds it into a single store or
a BSWAP and a store if the targets supports it.

Assuming little endian target:
 i8 *p = ...
 i32 val = ...
 p[0] = (val >> 0) & 0xFF;
 p[1] = (val >> 8) & 0xFF;
 p[2] = (val >> 16) & 0xFF;
 p[3] = (val >> 24) & 0xFF;
=>
 *((i32)p) = val;

On CTMark AArch64 -Os this results in a good amount of savings:

Program            before        after       diff
             SPASS 412792       412788       -0.0%
                kc 432528       432512       -0.0%
            lencod 430112       430096       -0.0%
  consumer-typeset 419156       419128       -0.0%
            bullet 475840       475752       -0.0%
        tramp3d-v4 367760       367628       -0.0%
          clamscan 383388       383204       -0.0%
    pairlocalalign 249764       249476       -0.1%
    7zip-benchmark 570100       568860       -0.2%
           sqlite3 287628       286920       -0.2%
Geomean difference                           -0.1%

Differential Revision: https://reviews.llvm.org/D109419
2021-09-08 17:06:33 -07:00
Mirko Brkusanin
36527cbe02 [AMDGPU][GlobalISel] Legalize memcpy family of intrinsics
Legalize G_MEMCPY, G_MEMMOVE, G_MEMSET and G_MEMCPY_INLINE.

Corresponding intrinsics are replaced by a loop that uses loads/stores in
AMDGPULowerIntrinsics pass unless their length is a constant lower then
MemIntrinsicExpandSizeThresholdOpt (default 1024). Any G_MEM* instruction that
reaches legalizer should have a const length argument and should be expanded
into appropriate number of loads + stores.

Differential Revision: https://reviews.llvm.org/D108357
2021-09-07 12:24:07 +02:00
Konstantin Schwarz
90d5298759 [GlobalISel] Add convenience constructors to MemDesc
This allows constructing a MemDesc from a MachineMemoryOperand, a pattern that starts to show up more frequently.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D109161
2021-09-03 12:52:18 +02:00
Jessica Paquette
844d8e0337 [GlobalISel] Combine icmp eq/ne x, 0/1 -> x when x == 0 or 1
This adds the following combines:

```
x = ... 0 or 1
c = icmp eq x, 1

->

c = x
```

and

```
x = ... 0 or 1
c = icmp ne x, 0

->

c = x
```

When the target's true value for the relevant types is 1.

This showed up in the following situation:

https://godbolt.org/z/M5jKexWTW

SDAG currently supports the `ne` case, but not the `eq` case. This can probably
be further generalized, but I don't feel like thinking that hard right now.

This gives some minor code size improvements across the board on CTMark at
-Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.)

Differential Revision: https://reviews.llvm.org/D109130
2021-09-02 15:05:31 -07:00
Konstantin Schwarz
4b4bc1ea16 [GlobalISel] Do not generate illegal G_SEXTLOADs after legalization
The sext_inreg_of_load combine did not have the isLegalOrBeforeLegalizer check,
leading to the generation of potentially illegal G_SEXTLOADs when run after legalization.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108626
2021-08-25 10:13:39 +02:00
Sebastian Neubauer
fbae34635d [GlobalISel] Add combine for PTR_ADD with regbanks
Combine two G_PTR_ADDs, but keep the register bank of the constant.
That way, the combine can be used in post-regbank-select combines.

Introduce two helper methods in CombinerHelper, getRegBank and
setRegBank that get and set an optional register bank to a register.
That way, they can be used before and after register bank selection.

Differential Revision: https://reviews.llvm.org/D103326
2021-08-17 13:58:16 +02:00
Jessica Paquette
50efbf9cbe [GlobalISel] Narrow binops feeding into G_AND with a mask
This is a fairly common pattern:

```
%mask = G_CONSTANT iN <mask val>
%add = G_ADD %lhs, %rhs
%and = G_AND %add, %mask
```

We have combines to eliminate G_AND with a mask that does nothing.

If we combined the above to this:

```
%mask = G_CONSTANT iN <mask val>
%narrow_lhs = G_TRUNC %lhs
%narrow_rhs = G_TRUNC %rhs
%narrow_add = G_ADD %narrow_lhs, %narrow_rhs
%ext = G_ZEXT %narrow_add
%and = G_AND %ext, %mask
```

We'd be able to take advantage of those combines using the trunc + zext.

For this to work (or be beneficial in the best case)

- The operation we want to narrow then widen must only be used by the G_AND
- The G_TRUNC + G_ZEXT must be free
- Performing the operation at a narrower width must not produce a different
  value than performing it at the original width *after masking.*

Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj

At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign.

Differential Revision: https://reviews.llvm.org/D107929
2021-08-13 18:31:13 -07:00
Amara Emerson
7ec4ce157b [AArch64][GlobalISel] Relax oneuse restriction for PTR_ADD chain combining to check addressing legality.
With contributions by Sebastian Neubauer

Differential Revision: https://reviews.llvm.org/D105676
2021-08-10 16:41:18 -07:00
Amara Emerson
4c2e01232c [GlobalISel] Fix a combine causing DBG_VALUE with dangling vregs.
We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
instead of eraseFromParent().

We should probably use that in other places too but fix this issue which
affects clang bootstrap builds for now.
2021-08-07 01:41:02 -07:00
Petar Avramovic
66de26b1f9 GlobalISel: Fix matchEqualDefs for instructions with multiple defs
Instructions that produceSameValue produce same values for operands with
same index. matchEqualDefs used to return true for any two values from
different instructions that produce same values. Fix this by checking if
values are defined by operands with the same index.

Differential Revision: https://reviews.llvm.org/D107362
2021-08-05 15:05:45 +02:00
Dominik Montada
cc947e29ea [GlobalISel] Combine shr(shl x, c1), c2 to G_SBFX/G_UBFX
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107330
2021-08-05 13:52:10 +02:00
Amara Emerson
c54d5c9756 [GlobalISel] Use GMergeLikeOp to simplify a combine. NFC. 2021-07-29 13:53:16 -07:00
Amara Emerson
532c458fa8 [GlobalISel] Add GPtrAdd and use it in some combines. 2021-07-29 12:04:02 -07:00
Amara Emerson
c658b472f3 [GlobalISel] Add a constant folding combine.
Use it AArch64 post-legal combiner. These don't always get folded because when
the instructions are created the constants are obscured by artifacts.

Differential Revision: https://reviews.llvm.org/D106776
2021-07-26 14:53:33 -07:00
Amara Emerson
dec34104bf [GlobalISel] Add combine for merge(unmerge) and use AArch64 postlegal-combiner.
Differential Revision: https://reviews.llvm.org/D106761
2021-07-26 10:37:31 -07:00
Amara Emerson
03cdb5221d [GlobalISel] Fix load-or combine moving loads across potential aliasing stores.
Although this combine checks that there's no load folding barriers between
the loads that it's trying to merge, it was inserting the load at the
MIRBuilder's default insertion point, which is the G_OR use inst.

This was causing a miscompile in the test suite's
SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-bswap-2

Differential Revision: https://reviews.llvm.org/D106251
2021-07-19 10:23:23 -07:00
Matt Arsenault
5a0d940f2a GlobalISel: Preserve memory type for memset expansion 2021-07-16 11:41:32 -04:00