34332 Commits

Author SHA1 Message Date
Simon Pilgrim
4f95821f58 [DAG] SelectionDAG::getNode() - consistently use N1 for first operand. NFCI.
This has been annoying me for years - rename Operand to N1 so it matches all the other getNode() calls, and simplifies my debug watch windows!
2023-07-17 17:17:40 +01:00
Simon Pilgrim
e9caa37e9c [DAG] Move lshr narrowing from visitANDLike to SimplifyDemandedBits
Inspired by some of the cases from D145468

Let SimplifyDemandedBits handle the narrowing of lshr to half-width if we don't require the upper bits, the narrowed shift is profitable and the zext/trunc are free.

A future patch will propose the equivalent shl narrowing combine.

Differential Revision: https://reviews.llvm.org/D146121
2023-07-17 15:50:09 +01:00
Jay Foad
a1a9c53ae7 [GlobalISel] Fix infinite loop in reassociation combine
Don't reassociate (C1+C2)+Y -> C1+(C2+Y).

Fixes https://github.com/llvm/llvm-project/issues/63849

Differential Revision: https://reviews.llvm.org/D155284
2023-07-16 14:15:24 +01:00
Matt Arsenault
c4ccd6e3d2 MachineSink: Remove unnecessary empty block check 2023-07-14 18:46:18 -04:00
Matt Arsenault
6d3027e3d1 MachineSink: Move helper function and use more const 2023-07-14 18:46:18 -04:00
Weining Lu
ef33d6cbfc [XRay] Add initial support for loongarch64
Only support patching FunctionEntry/FunctionExit/FunctionTailExit for now.

Reviewed By: MaskRay, xen0n
Co-Authored-By: zhanglimin <zhanglimin@loongson.cn>

Differential Revision: https://reviews.llvm.org/D140727
2023-07-14 09:27:13 +08:00
Amara Emerson
432338a673 Don't assert on a non-pointer value being used for a "p" inline asm constraint.
GCC and existing codebases allow the use of integral values to be used
with this constraint. A recent change D133914 in this area started causing asserts.
Removing the assert is enough as the rest of the code works fine.

rdar://109675485

Differential Revision: https://reviews.llvm.org/D155023
2023-07-13 10:45:56 -07:00
Oliver Stannard
aea8db8eb9 Revert "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI."
This reverts commit 58d1eaa3b6ce4f7285c51f83faff7a3ac374c746.
2023-07-13 14:25:39 +01:00
Jon Roelofs
56e60bc5bb
TargetLowering: fix an infinite DAG combine in SimplifySETCC
TargetLowering::SimplifySetCC wants to swap the operands of a SETCC to
canonicalize the constant to the RHS. The bug here was that it did so whether
or not the RHS was already a constant, leading to an infinite loop.

rdar://111847838

Divverential revision: https://reviews.llvm.org/D155095

This reverts commit cdc633e4bc93d4bf241ecd4c29691ae065749313.
2023-07-12 16:13:27 -07:00
Noah Goldstein
a4c461c063 [SelectionDAG] Fill in some more cases in isKnownNeverZero
This mostly copies cases that already exist in ValueTracking, although
it skips the more complex ones. Those can be filled in as needed.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D149199
2023-07-12 17:17:53 -05:00
Noah Goldstein
74f0ec5e24 [DAGCombiner] Make it so that udiv can be folded with (select c, NonZero, 1)
This is done by allowing speculation of `udiv` if we can prove the
denominator is non-zero.

https://alive2.llvm.org/ce/z/VNCt_q

Differential Revision: https://reviews.llvm.org/D149198
2023-07-12 17:17:53 -05:00
Fangrui Song
86a542b005 [CodeGen] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D154281 2023-07-12 14:37:40 -07:00
Jon Roelofs
cdc633e4bc
Revert "TargetLowering: fix an infinite DAG combine in SimplifySETCC"
This reverts commit b76c85b355578d9076c22a86faf4ea8de1745bdf.

It broke the RISCV-enabled bots. Oops.
2023-07-12 12:22:03 -07:00
Jon Roelofs
b76c85b355
TargetLowering: fix an infinite DAG combine in SimplifySETCC
TargetLowering::SimplifySetCC wants to swap the operands of a SETCC to
canonicalize the constant to the RHS. The bug here was that it did so whether
or not the RHS was already a constant, leading to an infinite loop.

rdar://111847838

Differential revision: https://reviews.llvm.org/D155095
2023-07-12 11:44:15 -07:00
Qiongsi Wu
41447f6fdf [libLTO][AIX] Respect -f[no]-integrated-as on AIX
`libLTO` currently ignores the `-f[no-]integrated-as` flags. This patch teaches `libLTO` to respect them on AIX.

The implementation consists of two parts:

  # Migrate `llc`'s `-no-integrated-as` option to a codegen option so that the option is available to `libLTO`/`lld`/`gold`.
  # Teach `clang` to pass `-no-integrated-as` accordingly to `libLTO` depending on the `-f[no-]integrated-as` flags.

On platforms other than AIX, the `-f[no-]integrated-as` flags are ignored.

Reviewed By: MaskRay, steven_wu

Differential Revision: https://reviews.llvm.org/D152924
2023-07-12 13:22:02 -04:00
Jingu Kang
33e60484d7 [MachineLICM] Handle Subloops
MachineLICM pass handles inner loops only when outmost loop does not have unique
predecessor. If the loop has preheader and there is loop invariant code, the
invariant code can be hoisted to the preheader in general. This patch makes the
pass handle inner loops in general.

Differential Revision: https://reviews.llvm.org/D154205
2023-07-12 16:32:14 +01:00
Craig Topper
45b172c838 [LegalizeDAG] Prevent LegalizeLoadOps from creating extloads that mix int and fp types.
For RISC-V, getRegisterType for fp16 returns i16. i16->fp64 extload
is considered legal because the LoadExtActions defaults to Legal
for all entries. Only fp/fp and int/int entries are changed to
Expand fore RISC-V.

This patch detects the FP-ness has changed and won't try to call
isLoadExtLegal.

Alternatively, we could add Expand for int/fp and fp/int, but that
seemed a little silly.

Fixes #63816

Reviewed By: asb, wangpc

Differential Revision: https://reviews.llvm.org/D155040
2023-07-12 08:03:35 -07:00
Ivan Kosarev
e705b2b1f4 Fix warnings about unused varibles on builds without asserts. 2023-07-12 14:45:29 +01:00
Jay Foad
58d1eaa3b6 [CodeGen] Store SP adjustment in MachineBasicBlock. NFCI.
Record the SP adjustment on entry to each basic block. This is almost
always zero except on targets like ARM which can split a basic block in
the middle of a call sequence.

This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.

Differential Revision: https://reviews.llvm.org/D154281
2023-07-12 14:29:26 +01:00
Marco Elver
de79233b2e [X86] Complete preservation of !pcsections in X86ISelLowering
https://reviews.llvm.org/D130883 introduced MIMetadata to simplify
metadata propagation (DebugLoc and PCSections).

However, we're currently still permitting implicit conversion of
DebugLoc to MIMetadata, to allow for a gradual transition and let the
old code work as-is.

This manifests in lost !pcsections metadata for X86-specific lowerings.
For example, 128-bit atomics.

Fix the situation for X86ISelLowering by converting all BuildMI() calls
to use an explicitly constructed MIMetadata.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D154986
2023-07-12 15:09:31 +02:00
Ivan Kosarev
15e7749e19 [Codegen] Generate fast fp64-to-fp16 conversions in unsafe mode.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154528
2023-07-12 11:55:19 +01:00
Jay Foad
f7684d8510 [DAG] Use legal shift amount type in DAGTypeLegalizer::JoinIntegers
Documentation for TargetLowering::getShiftAmountTy says that LegalTypes
should generally be true during type legalization, so this patch does
that.

On AMDGPU the effect is that we use i32 (a sane type) instead of i64
(pointer sized type) for more shift amounts, which in turn allows more
formation of rotates and funnel shifts pre-legalization.

Differential Revision: https://reviews.llvm.org/D154960
2023-07-12 08:12:09 +01:00
Han Shen
65ef4d4357 [CodeGen] Part II of "Fine tune MachineFunctionSplitPass (MFS) for FSAFDO".
This CL adds a new discriminator pass. Also adds a new sample profile
loading pass when MFS is enabled.

Differential Revision: https://reviews.llvm.org/D152577
2023-07-11 22:40:25 -07:00
Matt Arsenault
3701ebe76b AtomicExpand: Fix expanding atomics into unconstrained FP in strictfp functions
Ideally the normal fadd/fmin/fmax this was creating would fail the verifier.
It's probably also necessary to force off FP exception handlers in the cmpxchg
loop but we don't have a generic way to do that now.

Note strictfp builder is broken in the minnum/maxnum case

https://reviews.llvm.org/D154993
2023-07-11 18:51:15 -04:00
Matt Arsenault
b59022b42e DAG: Handle lowering of unordered fcZero|fcSubnormal to fcmp 2023-07-11 18:30:15 -04:00
pvanhout
8444038d16 [AMDGPU] Use GlobalISel MatchTable Combiner Backend
Use the new matchtable-based combiner backend for all AMDGPU combiners.
This drop-in from the user's perspective; there are no test changes, the new combiner behaves exactly like the old one.

Depends on D153757

NOTE: This would land iff D153757 (RFC) lands too.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D153758
2023-07-11 11:27:13 +02:00
pvanhout
1fe7d9c799 [GlobalISel] Generalize InstructionSelector Match Tables
Makes `InstructionSelector.h`/`InstructionSelectorImpl.h` generic so the match tables can also be used for the combiner.

Some notes:
 - Coverage was made an optional parameter of `executeMatchTable`, combines won't use it for now.
 - `GIPFP_` -> `GICXXPred_` so it's more generic. Those are just C++ predicates and aren't PatFrag-specific.
 - Pass the MatcherState directly to testMIPredicate_MI, the combiner will need it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D153755
2023-07-11 09:42:30 +02:00
Matt Arsenault
1d92b68ead DAG: Correct chain management for frexp libcalls
We need to replace the other uses of the call chain with the new load
chain.

Fixes not preserving the return def with unused x86_fp80
results. Regression reported here:
https://reviews.llvm.org/rGb15bf305ca3e9ce63aaef7247d32fb3a75174531#1224999
2023-07-10 21:39:15 -04:00
Han Shen
8df75969ae [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO.
The original MFS work D85368 shows good performance improvement with
Instrumented FDO. However, AutoFDO or Flow-Sensitive AutoFDO (FSAFDO)
does not show performance gain. This is mainly caused by a less
accurate profile compared to the iFDO profile.

For the past few months, we have been working to improve FSAFDO
quality, like in D145171. Taking advantage of this improvement, MFS
now shows performance improvements over FSAFDO profiles.

That being said, 2 minor changes need to be made, 1) An FS-AutoFDO
profile generation pass needs to be added right before MFS pass and an
FSAFDO profile load pass is needed when FS-AutoFDO is enabled and the
MFS flag is present. 2) MFS only applies to hot functions, because we
believe (and experiment also shows) FS-AutoFDO is more accurate about
functions that have plenty of samples than those with no or very few
samples.

With this improvement, we see a 1.2% performance improvement in clang
benchmark, 0.9% QPS improvement in our internal search benchmark, and
3%-5% improvement in internal storage benchmark.

This is #1 of the two patches that enables the improvement.

Reviewed By: wenlei, snehasish, xur

Differential Revision: https://reviews.llvm.org/D152399
2023-07-10 16:00:30 -07:00
Fangrui Song
0b69cc8bcb [AArch64] Improve sanitize_memtag test
The ELFObjectWriter::shouldRelocateWithSymbol change in D128958 is untested. Add
the testing.

Also, change a diagnostic to follow the convention (no capitalization or
trailing period). Test it.
2023-07-10 13:25:09 -07:00
Jake Egan
bbd0d123d3 Implement -frecord-command-line for XCOFF
This patch extends support of the option `-frecord-command-line` to XCOFF. XCOFF doesn’t have custom sections like ELF, so the command line data is emitted to a .info section instead. A C_INFO symbol is generated with the .info section to preserve the command line data past the link step. Multiple command lines are separated by newlines and null bytes. The command line data can be retrieved on AIX with command `what file_name`.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D153600
2023-07-10 12:47:07 -04:00
Igor Kirillov
0aecf7ff0d [CodeGen] Fix incorrectly detected reduction bug in ComplexDeinterleaving pass
Using ACLE intrinsics, it is possible to create a loop that the
deinterleaving pass incorrectly classified as a reduction loop.
For example, for fixed-width vectors the loop was like below:

vector.body:
  %a = phi <4 x float> [ %init.a, %entry ], [ %updated.a, %vector.body ]
  %b = phi <4 x float> [ %init.b, %entry ], [ %updated.b, %vector.body ]
  ...
; Does not depend on %a or %b:
  %updated.a = ...
  %updated.b = ...

Differential Revision: https://reviews.llvm.org/D154598
2023-07-10 12:54:38 +00:00
Amara Emerson
3a80bdb316 [GlobalISel] Remove an erroneous oneuse check in the G_ADD reassociation combine.
This check was unnecessary/incorrect, it was already being done by the target
hook default implementation, and the one in the matcher was checking for a
completely different thing. This change:
 1) Removes the check and updates affected tests which now do some more reassociations.
 2) Modifies the AMDGPU hooks which were stubbed with "return true" to also do the oneuse
    check. Not sure why I didn't do this the first time.
2023-07-10 01:03:12 -07:00
David Green
758c4640c9 [CGP] Enable CodeGenPrepares phi type convertion.
This is a recommit of 67121d7, enabling the CodeGenPrepare OptimizePhiTypes
option that can help with the type of phi instructions into ISel.
2023-07-09 10:32:11 +01:00
Matt Arsenault
310f839612 DAG: Lower is.fpclass fcInf to fcmp of fabs
InstCombine should have taken care of this, but I think
this is more useful in the future when the expansion
tries to handle multiple cases at a time with fcmp.

x87 looks worse to me but the only thing I know about it is that
I aggressively do not care about it.

https://reviews.llvm.org/D143198
2023-07-07 17:00:10 -04:00
Jay Foad
fa78983bcb [PEI][Mips] Switch to backwards frame index elimination
This adds support for running PEI::replaceFrameIndicesBackward with no
RegisterScavenger, and basic support for eliminating call frame pseudo
instructions.

Differential Revision: https://reviews.llvm.org/D154347
2023-07-07 18:30:08 +01:00
Jay Foad
4fd186d804 [PEI] Simplify iterator handling in replaceFrameIndicesBackward. NFCI.
Differential Revision: https://reviews.llvm.org/D154346
2023-07-07 18:30:08 +01:00
Yashwant Singh
b7836d8562 [CodeGen]Allow targets to use target specific COPY instructions for live range splitting
Replacing D143754. Right now the LiveRangeSplitting during register allocation uses
TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a
problem as we have both vector and scalar copies. Vector copies perform a copy over
a vector register but only on the lanes(threads) that are active. This is mostly sufficient
however we do run into cases when we have to copy the entire vector register and
not just active lane data. One major place where we need that is live range splitting.

Allowing targets to use their own copy instructions(if defined) will provide a lot of
flexibility and ease to lower these pseudo instructions to correct MIR.

- Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range
 splitting.
- Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline.

Reviewed By: arsenm, cdevadas, kparzysz

Differential Revision: https://reviews.llvm.org/D150388
2023-07-07 22:29:50 +05:30
Matt Arsenault
64df9573a7 DAG: Handle inversion of fcSubnormal | fcZero
There are a number of more test combinations here that
can be done together and reduce the number of instructions.

https://reviews.llvm.org/D143191
2023-07-06 21:19:44 -04:00
Matt Arsenault
61820f8b5d CodeGen: Optimize lowering of is.fpclass fcZero|fcSubnormal
Combine the two checks into a check if the exponent bits are 0. The
inverted case isn't reachable until a future change, and GlobalISel
currently doesn't attempt the inversion optimization.

https://reviews.llvm.org/D143182
2023-07-06 13:03:57 -04:00
Matt Arsenault
1588e18b2d DAG: Check isCondCodeLegal in is_fpclass expansion to fcmp eq 0
Results in some x86 codegen diffs. Some look better, some look worse.

https://reviews.llvm.org/D152094
2023-07-06 13:00:52 -04:00
Matt Arsenault
e8ed6e35bd DAG: Implement soften float for ffrexp
Fixes #63661

https://reviews.llvm.org/D154555
2023-07-05 21:42:27 -04:00
Matt Arsenault
20964c901a DAG: Fix dropping flags when widening unary vector ops 2023-07-05 17:25:24 -04:00
Oskar Wirga
198df5f682 Weaken MFI Max Call Frame Size Assertion
A year ago when I was not invested at all into compilers, I found an assertion error when building an AArch64 debug build with LTO + CFI, among other combinations.

It was posted as a github issue here: https://github.com/llvm/llvm-project/issues/54088

I took it upon myself to revisit the issue now that I have spent some more time working on LLVM.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D151276
2023-07-05 14:02:51 -07:00
Igor Kirillov
7f20407cee [CodeGen] Add support for Splats in ComplexDeinterleaving pass
This commit allows generating of complex number intrinsics for expressions
with constants or loops invariants, which are represented as splats.
For instance, after vectorizing loops in the following code snippets,
the ComplexDeinterleaving pass will be able to generate complex number
intrinsics:

```
complex<> x = ...;
for (int i = 0; i < N; ++i)
    c[i] = a[i] * b[i] * x;
```

or

```
for (int i = 0; i < N; ++i)
    c[i] = a[i] * b[i] * (11.0 + 3.0i);
```

Differential Revision: https://reviews.llvm.org/D153355
2023-07-05 17:02:52 +00:00
Amaury Séchet
ee2d10cd16 [NFC] Reorder functions in DAGCombiner so all UADDO_CARRY related functions are next to each others. 2023-07-04 14:55:11 +00:00
Yashwant Singh
7aebe4eaaa [CodeGen] Move lowerCopy from expandPostRA to TII
This will allow targets to lower their 'copy' instructions easily.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D152261
2023-07-04 09:04:49 +05:30
Christudasan Devadasan
aa82b562b7 [CodeGen] MRI call back in TargetMachine
It is needed for target specific initializatons.

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D143758
2023-07-03 21:29:37 +05:30
Igor Kirillov
b4f9c3a933 [CodeGen] Refactor ComplexDeinterleaving to run identification on Values instead of Instructions
This change will make it easier to add identification of complex constants
in future patches.

Differential Revision: https://reviews.llvm.org/D153446
2023-07-03 10:35:14 +00:00
David Green
f55d96b9a2 [DAG][AArch64] Handle vector types when expanding sdiv/udiv into mulh
The aarch64 backend will benefit from expanding 64vector sdiv/udiv into mulh
using shift(mul(ext, ext)), as the larger type size is legal and the mul(ext,
ext) can efficiently use smull/umull instructions. This extends the existing
code in GetMULHS to handle vector types for it.

Differential Revision: https://reviews.llvm.org/D154049
2023-07-02 15:02:52 +01:00