- Remove some unused typedefs.
- Rename BinOpChain struct to MulCandidate.
- Remove the size method of MulCandidate.
- Store only the first input of the ValueList provided to
MulCandidate, as it's the only value we care about. This means we
don't have to perform any ugly (and unnecessary) iterations of the
list later on.
llvm-svn: 367208
Summary:
If isel is presented with <2 x half> vectors then it will correctly select
v_pk_fma style instructions.
If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for
other instruction types (such as fadd, fmul etc.)
Added extra support to enable this. Updated one of the tests to include a test
for this (as well as extending the test to GFX9)
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65325
Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee
llvm-svn: 367206
The pmaddwd inserts a truncate, if that truncate would end up
creating additional instructions instead of making a zext
narrower, then we shouldn't do it.
I've restricted this to only sse4.1 targets since on prior
targets the zext will be done in stages. So the truncate will
probably not create additional instructions. Might need some
more investigation of mul shrinking and the other pmaddwd
transform to be sure this is the right decision.
There might be a slight regression on AVX1 targets due to add
splitting. Hard to say for sure. Maybe we need to look into
using the vector reduction flag to use 2 narrow loads and a
blend instead of extracting and inserting.
llvm-svn: 367198
This reverts r340478 and r340631 and replaces them with a simpler
method of just letting DAG combine revisit the nodes to handle
the other operand.
llvm-svn: 367195
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.
llvm-svn: 367194
This adds the patterns required to transform xor P0, -1 to a VPNOT. The
instruction operands have to change a little for this, adding an in and an out
VCCR reg and using a custom DecodeMVEVPNOT for the decode.
Differential Revision: https://reviews.llvm.org/D65133
llvm-svn: 367192
These are some better patterns for converting between predicates and floating
points. Much like the extends, we select "1"/"-1" or "0" depending on the
predicate value. Or we perform a compare against 0 to convert to a predicate.
Differential Revision: https://reviews.llvm.org/D65103
llvm-svn: 367191
This allows us to peek through BITCASTs, attempt to simplify the source operand, and then bitcast back.
This reapplies rL367091 which was reverted at rL367118 - we were inconsistently peeking through the bitcasts to the source value.
Fixes PR42777
llvm-svn: 367174
The test case from:
https://bugs.llvm.org/show_bug.cgi?id=42771
...shows a ~30x slowdown caused by the awkward loop iteration (rL207302) that is
seemingly done just to avoid invalidating the instruction iterator. We can instead
delay instruction deletion until we reach the end of the block (or we could delay
until we reach the end of all blocks).
There's a test diff here for a degenerate case with llvm.assume that is not
meaningful in itself, but serves to verify this change in logic.
This change probably doesn't result in much overall compile-time improvement
because we call '-instsimplify' as a standalone pass only once in the standard
-O2 opt pipeline currently.
Differential Revision: https://reviews.llvm.org/D65336
llvm-svn: 367173
Recommit rL367100 which was reverted at rL367141. Until PR42777 is fixed, we no longer get the benefits of peeking through bitcasts but it does still remove a GetDemandedBits user and gives us the equivalent combines.
llvm-svn: 367172
If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit.
This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it.
This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches.
llvm-svn: 367171
We're getting reports of massive compile time increases because SimplifyMultipleUseDemandedBits was losing track of the depth and not earlying-out. No repro yet, but consider this a pre-emptive commit.
llvm-svn: 367169
Add partial instruction selection for intrinsics like this:
```
declare i32 @llvm.aarch64.stlxr(i64, i32*)
```
(This only handles the case where a G_ZEXT is feeding the intrinsic.)
Also make sure that the added store instruction actually has the memory op from
the original G_STORE.
Update select-stlxr-intrin.mir and arm64-ldxr-stxr.ll.
Differential Revision: https://reviews.llvm.org/D65355
llvm-svn: 367163
This reverts r366998 (git commit 5354c83ece00690b4dbfa47925f8f5a8f33f1d9e)
This breaks a linux kernel build and we have reproducer to investigate.
llvm-svn: 367160
This adds support to the yaml remark parser to be able to parse remarks
directly from the metadata.
This supports parsing separate metadata and following the external file
with the associated metadata, and also a standalone file containing
metadata + remarks all together.
Original llvm-svn: 367148
Revert llvm-svn: 367151
This has a fix for gcc builds.
llvm-svn: 367155
unreachable loop.
updatePredecessorProfileMetadata in jumpthreading tries to find the
first dominating predecessor block for a PHI value by searching upwards
the predecessor block chain.
But jumpthreading may see some temporary IR state which contains
unreachable bb not being cleaned up. If an unreachable loop happens to
be on the predecessor block chain, keeping chasing the predecessor
block will run into an infinite loop.
The patch fixes it.
Differential Revision: https://reviews.llvm.org/D65310
llvm-svn: 367154
This adds support to the yaml remark parser to be able to parse remarks
directly from the metadata.
This supports parsing separate metadata and following the external file
with the associated metadata, and also a standalone file containing
metadata + remarks all together.
llvm-svn: 367148
Adds machine operand lowering for MCSymbolSDNodes to the PowerPC
backend. This is needed to produce call instructions in assembly for AIX
because the callee operand is a MCSymbolSDNode. The test is XFAIL'ed for
asserts due to a (valid) assertion in PEI that the AIX ABI isn't supported yet.
Differential Revision: https://reviews.llvm.org/D63738
llvm-svn: 367133
Summary:
The bitperm feature flag is now prefixed with SVE2, as it is for all other SVE2
extensions
Patch by Maciej Gabka.
Reviewers: sdesmalen, rovka, chill, SjoerdMeijer, rengolin
Reviewed By: SjoerdMeijer, rengolin
Differential Revision: https://reviews.llvm.org/D65327
llvm-svn: 367124
In preperation for AIX support in FrameLowering: replace a number of literal
'8' that represent the stack offset of the condition register save area with
a member in PPCFrameLowering.
Patch by Chris Bowler.
llvm-svn: 367111
Void return used to have unsigned with value 0 for virtual register
but with addition of Register class and changes to arguments to lowerCall
this is no longer valid.
Check for void return by inspecting the Ty field in OrigRet.
Differential Revision: https://reviews.llvm.org/D65321
llvm-svn: 367107
(Y * (1.0 - Z)) + (X * Z) -->
Y - (Y * Z) + (X * Z) -->
Y + Z * (X - Y)
This is part of solving:
https://bugs.llvm.org/show_bug.cgi?id=42716
Factoring eliminates an instruction, so that should be a good canonicalization.
The potential conversion to FMA would be handled by the backend based on target
capabilities.
Differential Revision: https://reviews.llvm.org/D65305
llvm-svn: 367101
Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm
only if there is other WQM computation in the shader.
Reviewers: nhaehnle, tpr
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64935
llvm-svn: 367097
Embedded Trace Extension and Trace Buffer Extension are optional
future architecture extensions.
(cf. https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools)
Their system registers are documented here:
https://developer.arm.com/docs/ddi0601/a
ETE shares register names with ETM. One exception is the ETE
TRCEXTINSELR0 register, which has the same encoding as the ETM
TRCEXTINSELR register (but different semantics). This patch treats
them as aliases: the assembler will accept both names, emitting
identical encoding, and the disassembler will keep disassembling
to TRCEXRINSELR.
Differential Revision: https://reviews.llvm.org/D63707
llvm-svn: 367093
Eventually all of these will be moved over, but we create nodes in GetDemandedBits recursion at the moment which causes regressions when we try to remove them all.
llvm-svn: 367092
Both WhileLoopStart and LoopEnd may get turned into a cmp and br pair,
so add an implicit def to these pseudo instructions in case that WLS
and LE aren't generated.
Differential Revision: https://reviews.llvm.org/D65275
llvm-svn: 367089
Summary:
This is an alternate approach to D57970.
Currently funclets reuse the same stack slots that are used in the
parent function for saving callee-saved xmm registers. If the parent
function modifies a callee-saved xmm register before an excpetion is
thrown, the catch handler will overwrite the original saved value.
This patch allocates space in funclets stack for saving callee-saved xmm
registers and uses RSP instead RBP to access memory.
Reviewers: andrew.w.kaylor, LuoYuanke, annita.zhang, craig.topper,
RKSimon
Subscribers: rnk, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63396
Signed-off-by: pengfei <pengfei.wang@intel.com>
llvm-svn: 367088
To avoid duplicates in loop metadata, if the string to add is
already there, just update the value.
Reviewers: reames, Ashutosh
Reviewed By: reames
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D65265
llvm-svn: 367087
Just move the utility function to LoopUtils.cpp to re-use it in loop peeling.
Reviewers: reames, Ashutosh
Reviewed By: reames
Subscribers: hiraditya, asbirlea, llvm-commits
Differential Revision: https://reviews.llvm.org/D65264
llvm-svn: 367085
COPYFILE_CLONE is only defined on newer macOS versions, using it without
check breaks build on systems running legacy OS and toolchain.
Differential Revision: https://reviews.llvm.org/D65317
llvm-svn: 367084