Inspired by some of the cases from D145468
Let SimplifyDemandedBits handle the narrowing of lshr to half-width if we don't require the upper bits, the narrowed shift is profitable and the zext/trunc are free.
A future patch will propose the equivalent shl narrowing combine.
Differential Revision: https://reviews.llvm.org/D146121
GCC and existing codebases allow the use of integral values to be used
with this constraint. A recent change D133914 in this area started causing asserts.
Removing the assert is enough as the rest of the code works fine.
rdar://109675485
Differential Revision: https://reviews.llvm.org/D155023
TargetLowering::SimplifySetCC wants to swap the operands of a SETCC to
canonicalize the constant to the RHS. The bug here was that it did so whether
or not the RHS was already a constant, leading to an infinite loop.
rdar://111847838
Divverential revision: https://reviews.llvm.org/D155095
This reverts commit cdc633e4bc93d4bf241ecd4c29691ae065749313.
This mostly copies cases that already exist in ValueTracking, although
it skips the more complex ones. Those can be filled in as needed.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D149199
TargetLowering::SimplifySetCC wants to swap the operands of a SETCC to
canonicalize the constant to the RHS. The bug here was that it did so whether
or not the RHS was already a constant, leading to an infinite loop.
rdar://111847838
Differential revision: https://reviews.llvm.org/D155095
`libLTO` currently ignores the `-f[no-]integrated-as` flags. This patch teaches `libLTO` to respect them on AIX.
The implementation consists of two parts:
# Migrate `llc`'s `-no-integrated-as` option to a codegen option so that the option is available to `libLTO`/`lld`/`gold`.
# Teach `clang` to pass `-no-integrated-as` accordingly to `libLTO` depending on the `-f[no-]integrated-as` flags.
On platforms other than AIX, the `-f[no-]integrated-as` flags are ignored.
Reviewed By: MaskRay, steven_wu
Differential Revision: https://reviews.llvm.org/D152924
MachineLICM pass handles inner loops only when outmost loop does not have unique
predecessor. If the loop has preheader and there is loop invariant code, the
invariant code can be hoisted to the preheader in general. This patch makes the
pass handle inner loops in general.
Differential Revision: https://reviews.llvm.org/D154205
For RISC-V, getRegisterType for fp16 returns i16. i16->fp64 extload
is considered legal because the LoadExtActions defaults to Legal
for all entries. Only fp/fp and int/int entries are changed to
Expand fore RISC-V.
This patch detects the FP-ness has changed and won't try to call
isLoadExtLegal.
Alternatively, we could add Expand for int/fp and fp/int, but that
seemed a little silly.
Fixes#63816
Reviewed By: asb, wangpc
Differential Revision: https://reviews.llvm.org/D155040
Record the SP adjustment on entry to each basic block. This is almost
always zero except on targets like ARM which can split a basic block in
the middle of a call sequence.
This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.
Differential Revision: https://reviews.llvm.org/D154281
https://reviews.llvm.org/D130883 introduced MIMetadata to simplify
metadata propagation (DebugLoc and PCSections).
However, we're currently still permitting implicit conversion of
DebugLoc to MIMetadata, to allow for a gradual transition and let the
old code work as-is.
This manifests in lost !pcsections metadata for X86-specific lowerings.
For example, 128-bit atomics.
Fix the situation for X86ISelLowering by converting all BuildMI() calls
to use an explicitly constructed MIMetadata.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D154986
Documentation for TargetLowering::getShiftAmountTy says that LegalTypes
should generally be true during type legalization, so this patch does
that.
On AMDGPU the effect is that we use i32 (a sane type) instead of i64
(pointer sized type) for more shift amounts, which in turn allows more
formation of rotates and funnel shifts pre-legalization.
Differential Revision: https://reviews.llvm.org/D154960
This CL adds a new discriminator pass. Also adds a new sample profile
loading pass when MFS is enabled.
Differential Revision: https://reviews.llvm.org/D152577
Ideally the normal fadd/fmin/fmax this was creating would fail the verifier.
It's probably also necessary to force off FP exception handlers in the cmpxchg
loop but we don't have a generic way to do that now.
Note strictfp builder is broken in the minnum/maxnum case
https://reviews.llvm.org/D154993
Use the new matchtable-based combiner backend for all AMDGPU combiners.
This drop-in from the user's perspective; there are no test changes, the new combiner behaves exactly like the old one.
Depends on D153757
NOTE: This would land iff D153757 (RFC) lands too.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153758
Makes `InstructionSelector.h`/`InstructionSelectorImpl.h` generic so the match tables can also be used for the combiner.
Some notes:
- Coverage was made an optional parameter of `executeMatchTable`, combines won't use it for now.
- `GIPFP_` -> `GICXXPred_` so it's more generic. Those are just C++ predicates and aren't PatFrag-specific.
- Pass the MatcherState directly to testMIPredicate_MI, the combiner will need it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153755
The original MFS work D85368 shows good performance improvement with
Instrumented FDO. However, AutoFDO or Flow-Sensitive AutoFDO (FSAFDO)
does not show performance gain. This is mainly caused by a less
accurate profile compared to the iFDO profile.
For the past few months, we have been working to improve FSAFDO
quality, like in D145171. Taking advantage of this improvement, MFS
now shows performance improvements over FSAFDO profiles.
That being said, 2 minor changes need to be made, 1) An FS-AutoFDO
profile generation pass needs to be added right before MFS pass and an
FSAFDO profile load pass is needed when FS-AutoFDO is enabled and the
MFS flag is present. 2) MFS only applies to hot functions, because we
believe (and experiment also shows) FS-AutoFDO is more accurate about
functions that have plenty of samples than those with no or very few
samples.
With this improvement, we see a 1.2% performance improvement in clang
benchmark, 0.9% QPS improvement in our internal search benchmark, and
3%-5% improvement in internal storage benchmark.
This is #1 of the two patches that enables the improvement.
Reviewed By: wenlei, snehasish, xur
Differential Revision: https://reviews.llvm.org/D152399
The ELFObjectWriter::shouldRelocateWithSymbol change in D128958 is untested. Add
the testing.
Also, change a diagnostic to follow the convention (no capitalization or
trailing period). Test it.
This patch extends support of the option `-frecord-command-line` to XCOFF. XCOFF doesn’t have custom sections like ELF, so the command line data is emitted to a .info section instead. A C_INFO symbol is generated with the .info section to preserve the command line data past the link step. Multiple command lines are separated by newlines and null bytes. The command line data can be retrieved on AIX with command `what file_name`.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D153600
Using ACLE intrinsics, it is possible to create a loop that the
deinterleaving pass incorrectly classified as a reduction loop.
For example, for fixed-width vectors the loop was like below:
vector.body:
%a = phi <4 x float> [ %init.a, %entry ], [ %updated.a, %vector.body ]
%b = phi <4 x float> [ %init.b, %entry ], [ %updated.b, %vector.body ]
...
; Does not depend on %a or %b:
%updated.a = ...
%updated.b = ...
Differential Revision: https://reviews.llvm.org/D154598
This check was unnecessary/incorrect, it was already being done by the target
hook default implementation, and the one in the matcher was checking for a
completely different thing. This change:
1) Removes the check and updates affected tests which now do some more reassociations.
2) Modifies the AMDGPU hooks which were stubbed with "return true" to also do the oneuse
check. Not sure why I didn't do this the first time.
InstCombine should have taken care of this, but I think
this is more useful in the future when the expansion
tries to handle multiple cases at a time with fcmp.
x87 looks worse to me but the only thing I know about it is that
I aggressively do not care about it.
https://reviews.llvm.org/D143198
This adds support for running PEI::replaceFrameIndicesBackward with no
RegisterScavenger, and basic support for eliminating call frame pseudo
instructions.
Differential Revision: https://reviews.llvm.org/D154347
Replacing D143754. Right now the LiveRangeSplitting during register allocation uses
TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a
problem as we have both vector and scalar copies. Vector copies perform a copy over
a vector register but only on the lanes(threads) that are active. This is mostly sufficient
however we do run into cases when we have to copy the entire vector register and
not just active lane data. One major place where we need that is live range splitting.
Allowing targets to use their own copy instructions(if defined) will provide a lot of
flexibility and ease to lower these pseudo instructions to correct MIR.
- Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range
splitting.
- Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline.
Reviewed By: arsenm, cdevadas, kparzysz
Differential Revision: https://reviews.llvm.org/D150388
Combine the two checks into a check if the exponent bits are 0. The
inverted case isn't reachable until a future change, and GlobalISel
currently doesn't attempt the inversion optimization.
https://reviews.llvm.org/D143182
A year ago when I was not invested at all into compilers, I found an assertion error when building an AArch64 debug build with LTO + CFI, among other combinations.
It was posted as a github issue here: https://github.com/llvm/llvm-project/issues/54088
I took it upon myself to revisit the issue now that I have spent some more time working on LLVM.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D151276
This commit allows generating of complex number intrinsics for expressions
with constants or loops invariants, which are represented as splats.
For instance, after vectorizing loops in the following code snippets,
the ComplexDeinterleaving pass will be able to generate complex number
intrinsics:
```
complex<> x = ...;
for (int i = 0; i < N; ++i)
c[i] = a[i] * b[i] * x;
```
or
```
for (int i = 0; i < N; ++i)
c[i] = a[i] * b[i] * (11.0 + 3.0i);
```
Differential Revision: https://reviews.llvm.org/D153355
The aarch64 backend will benefit from expanding 64vector sdiv/udiv into mulh
using shift(mul(ext, ext)), as the larger type size is legal and the mul(ext,
ext) can efficiently use smull/umull instructions. This extends the existing
code in GetMULHS to handle vector types for it.
Differential Revision: https://reviews.llvm.org/D154049