50250 Commits

Author SHA1 Message Date
Matt Arsenault
bc7d88faf1 CodeGen: Disable isCopyInstrImpl if there are implicit operands
This is a conservative workaround for broken liveness tracking of
SUBREG_TO_REG to speculatively fix all targets. The current reported
failures are on X86 only, but this issue should appear for all targets
that use SUBREG_TO_REG. The next minimally correct refinement would be
to disallow only implicit defs.

The coalescer now introduces implicit-defs of the super register to
track the dependency on other subregisters. If we see such an implicit
operand, we cannot simply treat the subregister def as the result
operand in case downstream users depend on the implicitly defined
parts. Really target implementations should be considering the
implicit defs and trying to interpret them appropriately (maybe with
some generic helpers). The full implicit def could possibly be
reported as the move result, rather than the subregister def but that
requires additional work.

Hopefully fixes #64060 as well.

This needs to be applied to the release branch.

https://reviews.llvm.org/D156346
2023-10-02 15:16:40 +03:00
Simon Pilgrim
2984e3529b [X86] matchIndexRecursively - fold zext(addlike(shl_nuw(x,c1),c2) patterns into LEA
Pulled out of D155472 - handle zeroextended scaled address indices
2023-10-02 12:38:25 +01:00
Simon Pilgrim
2908142089 [X86] Add test coverage for zext(or(shl_nuw(x,c1),c2)) pointer math
Additional test coverage for D155472
2023-10-02 12:38:25 +01:00
JP Lehr
e816c89c84 Revert "InlineSpiller: Consider if all subranges are the same when avoiding redundant spills"
This reverts commit d8127b2ba8a87a610851b9a462f2fc2526c36e37.
2023-10-02 06:26:33 -05:00
Matt Arsenault
414ff812d6 RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG
Currently coalescing with SUBREG_TO_REG introduces an invisible load
bearing undef. There is liveness for the super register not
represented in the MIR.

This is part 1 of a fix for regressions that appeared after
b7836d856206ec39509d42529f958c920368166b. The allocator started
recognizing undef-def subregister MOVs as copies. Since there was no
representation for the dependency on the high bits, different undef
segments of the super register ended up disconnected and downstream
users ended up observing different undefs than they did previously.

This does not yet fix the regression. The isCopyInstr handling needs
to start handling implicit-defs on any instruction.

I wanted to include an end to end IR test since the actual failure
only appeared with an interaction between the coalescer and the
allocator. It's a bit bigger than I'd like but I'm having a bit of
trouble reducing it to something which definitely shows a diff that's
meaningful.

The same problem likely exists everywhere trying to do anything with
SUBREG_TO_REG. I don't understand how this managed to be broken for so
long.

This needs to be applied to the release branch.

https://reviews.llvm.org/D156345
2023-10-02 13:57:09 +03:00
Matt Arsenault
e28708d4f0 RegisterCoalescer: Avoid redundant implicit-def on rematerialize
If this was coalescing a def of a subregister with a def of the super
register, it was introducing a redundant super-register def and
marking the subregister def as dead.

Resulting in something like:

  dead $eax = MOVr0, implicit-def $rax, implicit-def $rax

Avoid this by checking if the new instruction already has the super
def, so we end up with this instead:

  dead $eax = MOVr0, implicit-def $rax

The dead flag looks suspicious to me, seems like it's easy to buggily
interpret dead def of subreg and a non-dead def of an aliasing
register. It seems to be intentional though.

https://reviews.llvm.org/D156343
2023-10-02 13:33:52 +03:00
Matt Arsenault
b1295dd5c9 RegisterCoalescer: Handle implicit-def of a super register when rematerializing
Permit an implicit-def of a virtual register when rematerializing if
it defines a super register of a subregister def. The
rematerialization pre-legality check should really have been checking
the implicit operands, but that should be fixed separately.

https://reviews.llvm.org/D156331
2023-10-02 13:11:22 +03:00
Matt Arsenault
274ba2c910 RegisterCoalescer: Add new rematerializing with subregister tests
None of the existing MIR tests seem to be directly targeting this
situation.
2023-10-02 12:38:46 +03:00
David Green
aacefaf1cc [AArch64] Move fcopysign to fcopysign-noneon. NFC 2023-10-02 08:03:34 +01:00
Philip Reames
f0505c3dbe
[RISCV] Form vredsum from explode_vector + scalar (left) reduce (#67821)
This change adds two related DAG combines which together will take a
left-reduce scalar add tree of an explode_vector, and will incrementally
form a vector reduction of the vector prefix. If the entire vector is
reduced, the result will be a reduction over the entire vector.

Profitability wise, this relies on vredsum being cheaper than a pair of
extracts and scalar add. Given vredsum is linear in LMUL, and the
vslidedown required for the extract is *also* linear in LMUL, this is
clearly true at higher index values. At N=2, it's a bit questionable,
but I think the vredsum form is probably a better canonical form
anyways.

Note that this only matches left reduces. This happens to be the
motivating example I have (from spec2017 x264). This approach could be
generalized to handle right reduces without much effort, and could be
generalized to handle any reduce whose tree starts with adjacent
elements if desired. The approach fails for a reduce such as (A+C)+(B+D)
because we can't find a root to start the reduce with without scanning
the entire associative add expression. We could maybe explore using
masked reduces for the root node, but that seems of questionable
profitability. (As in, worth questioning - I haven't explored in any
detail.)

This is covering up a deficiency in SLP. If SLP encounters the scalar
form of reduce_or(A) + reduce_sum(a) where a is some common
vectorizeable tree, SLP will sometimes fail to revisit one of the
reductions after vectorizing the other. Fixing this in SLP is hard, and
there's no good reason not to handle the easy cases in the backend.

Another option here would be to do this in VectorCombine or generic DAG.
I chose not to as the profitability of the non-legal typed prefix cases
is very target dependent. I think this makes sense as a starting point,
even if we move it elsewhere later.

This is currently restructed only to add reduces, but obviously makes
sense for any associative reduction operator. Once this is approved, I
plan to extend it in this manner. I'm simply staging work in case we
decide to go in another direction.
2023-10-01 17:42:07 -07:00
Simon Pilgrim
632022e61c [AArch64] aarch64-saturating-arithmetic.ll - refresh test missed in #67890 2023-10-01 15:39:24 +01:00
elhewaty
9103b1d68d
[DAG] Extend the computeOverflowForSignedSub/computeOverflowForUnsignedSub implementations with ConstantRange (#67890)
- Add tests for computeOverflowFor*Sub functions
- extend the computeOverflowForSignedSub/computeOverflowForUnsignedSub
implementations with ConstantRange (#37109)
2023-10-01 14:57:34 +01:00
Simon Pilgrim
04b403d8cc [X86] combineConcatVectorOps - only concatenate single-use subops
We could maybe extend this by allowing the lowest subop to have multiple uses and extract the lowest subvector result of the concatenated op, but let's just get the fix in first.

Fixes #67333
2023-10-01 14:27:55 +01:00
Matt Arsenault
d8127b2ba8 InlineSpiller: Consider if all subranges are the same when avoiding redundant spills
This avoids some redundant spills of subranges, and avoids a compile failure.
This greatly reduces the numbers of spills in a loop.

The main range is not informative when multiple instructions are needed to fully define
a register. A common scenario is a lowered reg_sequence where every subregister
is sequentially defined, but each def changes the main range's value number. If
we look at specific lanes at the use index, we can see the value is actually the
same.

In this testcase, there are a large number of materialized 64-bit constant defs
which are hoisted outside of the loop by MachineLICM. These are feeding REG_SEQUENCES,
which is not considered rematerializable inside the loop. After coalescing, the split
constant defs produce main ranges with an apparent phi def. There's no phi def if you look
at each individual subrange, and only half of the register is really redefined to a constant.

Fixes: SWDEV-380865

https://reviews.llvm.org/D147079
2023-10-01 11:37:53 +03:00
Matt Arsenault
7252787dd9 RegAllocGreedy: Fix detection of lanes read by a bundle
SplitKit creates questionably formed bundles of copies
when it needs to copy a subset of live lanes and can't do
it with a single subregister index. These are merely marked
as part of a bundle, and don't start with a BUNDLE instruction.
Queries for the slot index would give the first copy in the
bundle, and we need to inspect the operands of all the other
bundled copies.

Also fix and simplify detection of read lane subsets. This causes
some RISCV test regressions, but these look like accidentally beneficial
splits. I don't see a subrange based reason to perform these splits.

Avoids some really ugly regressions in a future patch.

https://reviews.llvm.org/D146859
2023-10-01 11:37:48 +03:00
Christian Sigg
5b7a7ec5a2
[NVPTX] Fix code generation for trap-unreachable. (#67478)
https://reviews.llvm.org/D152789 added an `exit` op before each
`unreachable`. This means we never get to the `trap` instruction.

This change limits the insertion of `exit` instructions to the cases
where `unreachable` is not lowered to `trap`. Trap itself is changed to
be emitted as `trap; exit;` to convey to `ptxas` that it exits the CFG.
2023-10-01 07:59:24 +02:00
Craig Topper
e39727d41f
[RISCV][GISel] Legalize G_SADDO/G_SSUBO/G_UADDO/G_USUBO. (#67615) 2023-09-30 11:15:05 -07:00
David Green
f71ad19c04 [AArch64] Add a target feature for AArch64StorePairSuppress
The AArch64StorePairSuppress pass prevents the creation of STP under some
heuristics. Unfortunately it often prevents the creation of STP in cases where
it is obviously beneficial, and it doesn't match my understanding of
scheduling/cpu pipelining to prevent the creation of STP. From some
benchmarking, even on an in-order cpu where the scheduling is most important I
don't see it giving better results. In general the lower instruction count for
STP would be expected to give a slightly better cycle count.

As the pass specifically mentions the cyclone cpu, this patch adds a target
feature for FeatureStorePairSuppress, enabled for all the non-Arm cpus. This
has the effect of disabling it for all Arm cpus.

Differential Revision: https://reviews.llvm.org/D134646
2023-09-30 11:40:26 +01:00
Mircea Trofin
b6e568da66 [mlgo] fix test post #67826 2023-09-29 18:24:35 -07:00
Mircea Trofin
f179486204
[AsmPrint] Correctly factor function entry count when dumping MBB frequencies (#67826)
The goal in #66818 was to capture function entry counts, but those are not the same as the frequency of the entry (machine) basic block. This fixes that, and adds explicit profiles to the test.

We also increase the precision of `MachineBlockFrequencyInfo::getBlockFreqRelativeToEntryBlock` to double. Existing code uses it as float so should be unaffected.
2023-09-29 18:06:53 -07:00
Arthur Eubanks
b915f60678
[CodeGen] Don't treat thread local globals as large data (#67764)
Otherwise they may mistakenly get the large section flag.
2023-09-29 12:56:53 -07:00
Visoiu Mistrih Francis
cc9ba5600e
[test] -march -> -mtriple (#67741)
Similar to 806761a
2023-09-29 10:43:23 -07:00
Fangrui Song
d20190e684 [test] Change llc -march=aarch64|arm64 to -mtriple=aarch64|arm64
Similar to commit 806761a7629df268c8aed49657aeccffa6bca449 to avoid issues due
to object file format differences. These tests are currently benign.
2023-09-29 10:13:06 -07:00
David Green
1610311a95 [AArch64] Fixes for BigEndian 128bit volatile, atomic and non-temporal loads/stores
This fixes up the generation of 128bit atomic, volatile and non-temporal
loads/stores, under the assumption that they should usually be the same as
standard versions.
https://godbolt.org/z/xxc89eMKE

Fixes #64580
Closes #67413
2023-09-29 17:21:19 +01:00
Jay Foad
6e3d2a4b38
[ISel] Fix another crash in new FMA DAG combine (#67818)
Following on from D135150, this patch fixes another crash caused by this
DAG combine:

fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)

The combine calls ReplaceAllUsesOfValueWith to replace (fmul C, D) with
(fma C, D, E). This can cause nodes to get CSEd. In D135150 the problem
was that the (fma C, D, E) node got CSEd away. In this new case, the
problem is that the outer fadd node gets CSEd away. To fix it we have
to return SDValue(N, 0) from the combine and be careful not to add a
deleted node to the worklist.
2023-09-29 17:18:23 +01:00
Matthew Devereau
6f5b372d59
[AArch64][SME2][SVE2p1] Add PNR_3b regclass (#67785)
This patch adds the PNR_3b regclass for predicate-as-counter registers
0-7 and allows the Upl ASM constraint to use this register class.
2023-09-29 16:17:31 +01:00
Philip Reames
cd03d97043 [RISCV] Add test coverage for sum reduction recognition in DAG
And adjust an existing test to not be a simple reduction to preserve test intent.
2023-09-29 07:54:55 -07:00
Nikita Popov
4251aa7a6f [IRBuilder] Migrate most casts to folding API
Migrate creation of most casts to use the FoldXYZ rather than
CreateXYZ style APIs. This means that InstSimplifyFolder now
works for these, which is what accounts for the AMDGPU test changes.
2023-09-29 12:40:38 +02:00
Mirko Brkušanin
2cd2445c21
[AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on supported subtargets (#67461)
In order to avoid duplicating every dpp pseudo opcode that has src1, we
allow it for all opcodes and add manual checks on subtargets that do not
support it.
2023-09-29 11:54:49 +02:00
Matthew Devereau
0d328e3875
[AArch64][SME] Use PNR Reg classes for predicate constraint (#67606)
This patch fixes an error where ASM with constraints cannot select SME
instructions which use the top eight predicate-as-counter registers.
2023-09-29 10:33:25 +01:00
Simon Pilgrim
5d7672b98e [X86] combine-subo.ll - add common CHECK prefix 2023-09-29 10:31:38 +01:00
Simon Pilgrim
956ae7cf8d [X86] combine-addo.ll - add common CHECK prefix 2023-09-29 10:31:38 +01:00
Momchil Velikov
b454b04d68
[AArch64] Fix a compiler crash in MachineSink (#67705)
There were a couple of issues with maintaining register def/uses held
in `MachineRegisterInfo`:

* when an operand is changed from one register to another, the
corresponding instruction must already be inserted into the function,
or MRI won't be updated

* when traversing the set of all uses of a register, that set must not
change
2023-09-29 09:29:20 +01:00
David Green
7cc83c5a18 [AArch64] Don't expand RSHRN intrinsics to add+srl+trunc.
We expand aarch64_neon_rshrn intrinsics to trunc(srl(add)), having tablegen
patterns to combine the results back into rshrn. See D140297.  Unfortunately,
but perhaps not surprisingly, other combines can happen that prevent us
converting back.  For example sext(rshrn) becomes sext(trunc(srl(add))) which
will turn into sext_inreg(srl(add))).

This patch just prevents the expansion of rshrn intrinsics, reinstating the old
tablegen patterns for selecting them. This should allow us to still regognize
the rshrn instructions from trunc+shift+add, without performing any negative
optimizations for the intrinsics.

Closes #67451
2023-09-29 08:26:32 +01:00
Jakub Chlanda
3f8d4a8ef2
Reland [NVPTX] Add support for maxclusterrank in launch_bounds (#66496) (#67667)
This reverts commit 0afbcb20fd908f8bf9073697423da097be7db592.
2023-09-29 08:39:31 +02:00
Yashwant Singh
7ac532efc8
[AMDGPU] Introduce AMDGPU::SGPR_SPILL asm comment flag (#67091)
Use this flag to give more context to implicit def comments in assembly.

Reviewed on phabricator: 
https://reviews.llvm.org/D153754
2023-09-29 11:15:01 +05:30
Tobias Stadler
305fbc1b32 Revert "[GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND"
This reverts commit 3686a0b611c65f0d7190345b8e3e73cdca9fa657.
This seems to have broken some sanitizer tests:
https://lab.llvm.org/buildbot/#/builders/184/builds/7721
2023-09-29 03:35:40 +02:00
Tobias Stadler
3686a0b611 [GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND
The legalizer currently generates lots of G_AND artifacts.
For example between boolean uses and defs there is always a G_AND with a mask of 1, but when the target uses ZeroOrOneBooleanContents, this is unnecessary.
Currently these artifacts have to be removed using post-legalize combines.
Omitting these artifacts at their source in the artifact combiner has a few advantages:
- We know that the emitted G_AND is very likely to be useless, so our KnownBits call is likely worth it.
- The G_AND and G_CONSTANT can interrupt e.g. G_UADDE/... sequences generated during legalization of wide adds which makes it harder to detect these sequences in the instruction selector (e.g. useful to prevent unnecessary reloading of AArch64 NZCV register).
- This cleans up a lot of legalizer output and even improves compilation-times.
AArch64 CTMark geomean: `O0` -5.6% size..text; `O0` and `O3` ~-0.9% compilation-time (instruction count).

Since this introduces KnownBits into code-paths used by `O0`, I reduced the default recursion depth.
This doesn't seem to make a difference in CTMark, but should prevent excessive recursive calls in the worst case.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D159140
2023-09-29 02:11:57 +02:00
Jay Foad
c3939eb827
[AMDGPU] Fix typo in scheduler option name (#67661)
Fix: -amdgpu-disable-unclustred-high-rp-reschedule
Now: -amdgpu-disable-unclustered-high-rp-reschedule
2023-09-28 20:54:57 +01:00
Noah Goldstein
de7881ebf5 [DAGCombiner] Combine (select c, (and X, 1), 0) -> (and (zext c), X)
The middle end canonicalizes:
`(and (zext c), X)`
    -> `(select c, (and X, 1), 0)`

But the `and` + `zext` form gets better codegen.
2023-09-28 13:46:46 -05:00
Noah Goldstein
e3e9c94006 [X86][AArch64][RISCV] Add tests for combining (select c, (and X, 1), 0) -> (and (zext c), X); NFC 2023-09-28 13:46:46 -05:00
Hiroshi Yamauchi
0ecd8846ae
[AArch64][Win] Emit SEH instructions for the swift async context-related instructions in the prologue and the epilogue. (#66967)
This fixes an error from checkARM64Instructions() in MCWin64EH.cpp.
2023-09-28 09:43:39 -07:00
Jay Foad
fb32baf0ec [ARM] Make some test checks more robust
This makes some tests robust against minor codegen differences
that will be caused by PR #67038.
2023-09-28 14:26:13 +01:00
Tuan Chuong Goh
c381cea873 [AArch64] Fixup test for G_VECREDUCE_ADD
Fix test since the review was created
2023-09-28 12:52:17 +00:00
Jay Foad
01aa0c776d [SPARC] Add a missing SPARC64-LABEL check 2023-09-28 13:15:09 +01:00
Jay Foad
a0a06b1804 [AMDGPU] Make a check slightly more robust
Previously this was relying on [[RESULT]] having been defined in an earlier function.
2023-09-28 13:09:51 +01:00
chuongg3
140a094f5f
[AArch64][GlobalISel] More type support for G_VECREDUCE_ADD (#67433)
G_VECREDUCE_ADD is now able to have v4i16 and v8i8 vector types as
source registers
2023-09-28 11:47:26 +01:00
Luke Lau
b14f6eebc9
[RISCV] Fix crash when lowering fixed length insert_subvector into undef at 0 (#67535)
This fixes a crash seen in https://github.com/openxla/iree/issues/15038
and
elsewhere. We were reducing the LMUL for inserts into undef at 0 without
inserting it back into the original LMUL at the end. But we don't
actually
perform the slidedown in this path, so we can just skip reducing LMUL
here.
2023-09-28 10:22:16 +01:00
Kishan Parmar
696ea67f19 Disable call to fma for soft-float
PowerPC backend generate calls to libc function calls
for soft-float, regardless of the -nostdlib /-ffreestanding flag.
fma is not a function provided by compiler-rt builtins and
thus should not be generated here.
PR : [[ https://github.com/llvm/llvm-project/issues/55230 | #55230 ]]

Below is patch given by @nemanjai

Reviewed By: jhibbits

Differential Revision: https://reviews.llvm.org/D156344
2023-09-28 14:06:54 +05:30
Qiu Chaofan
cc627828f5 Pre-commit some PowerPC test cases 2023-09-28 15:51:14 +08:00