324 Commits

Author SHA1 Message Date
Nikita Popov
0b39d2d752 Revert "[NFC] Separate Peeling Properties into its own struct"
This reverts commit 0369dc98f958a1ca2ec05f1897f091129bb16e8a.

Many failing tests.
2020-07-08 21:43:32 +02:00
Sidharth Baveja
0369dc98f9 [NFC] Separate Peeling Properties into its own struct
Summary:
This patch makes the peeling properties of the loop accessible by other loop transformations.

Author: sidbav (Sidharth Baveja)

Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel)

Reviewed By: Meinersbur (Michael Kruse)

Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D80580
2020-07-08 18:59:59 +00:00
Anh Tuyen Tran
6965af43e6 Revert "[NFC] Separate Peeling Properties into its own struct"
This reverts commit fead250b439bbd4ec0f21e6a52d0c174e5fcdf5a.
2020-07-08 18:58:05 +00:00
Anh Tuyen Tran
fead250b43 [NFC] Separate Peeling Properties into its own struct
Summary:
This patch makes the peeling properties of the loop accessible by other loop transformations.

Author: sidbav (Sidharth Baveja)

Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel)

Reviewed By: Meinersbur (Michael Kruse)

Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D80580
2020-07-08 18:56:03 +00:00
David Green
146dad0077 [ARM] MVE FP16 cost adjustments
This adjusts the MVE fp16 cost model, similar to how we already do for
integer casts. It uses the base cost of 1 per cvt for most fp extend /
truncates, but adjusts it for loads and stores where we know that a
extending load has been used to get the load into the correct lane, and
only an MVE VCVTB is then needed.

Differential Revision: https://reviews.llvm.org/D81813
2020-07-06 15:57:51 +01:00
David Green
afdb2ef2ed [ARM] Adjust default fp extend and trunc costs
This adds some default costs for fp extends and truncates, generally
costing them as 1 per lane. If the type is not legal then the cost will
include a call to an __aeabi_ function.

Some NEON code is also adjusted to make sure it applies to the expected
types, now that fp16 is a more common thing.

Differential Revision: https://reviews.llvm.org/D82458
2020-07-06 14:23:17 +01:00
David Green
60b8b2beea [ARM] Add extra extend and trunc costs for cast instructions
This expands the existing extend costs with a few extras for larger
types than legal, which will usually be split under MVE. It also adds
trunk support for the same thing. These should not have a large effect
on many things, but makes the costs explicit and keeps a certain balance
between the trunks and extends.

Differential Revision: https://reviews.llvm.org/D82457
2020-07-06 11:33:05 +01:00
David Green
55227f85d0 [ARM] Use BaseT::getMemoryOpCost for getMemoryOpCost
This alters getMemoryOpCost to use the Base TargetTransformInfo version
that includes some additional checks for whether extending loads are
legal. This will generally have the effect of making <2 x ..> and some
<4 x ..> loads/stores more expensive, which in turn should help favour
larger vector factors.

Notably it alters the cost of a <4 x half>, which with the current
codegen will be expensive if it is not extended.

Differential Revision: https://reviews.llvm.org/D82456
2020-07-06 10:58:40 +01:00
Guillaume Chatelet
b66e33a689 [Alignment][NFC] Migrate TTI::getGatherScatterOpCost to Align
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82577
2020-06-26 11:08:27 +00:00
Guillaume Chatelet
fdc7c7fb87 [Alignment][NFC] Migrate TTI::getInterleavedMemoryOpCost to Align
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82573
2020-06-26 11:00:53 +00:00
Guillaume Chatelet
324cda2073 [Alignment][NFC] Conform X86, ARM and AArch64 TargetTransformInfo backends to the public API
The main interface has been migrated to Align already but a few backends where broadening the type from Align to MaybeAlign.
This patch makes sure all implementations conform to the public API.

Differential Revision: https://reviews.llvm.org/D82465
2020-06-25 13:23:13 +00:00
dfukalov
7ddee0922f [NFCI][CostModel] Add const to Value*.
Summary:
Get back `const` partially lost in one of recent changes.
Additionally specify explicit qualifiers in few places.

Reviewers: samparker

Reviewed By: samparker

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82383
2020-06-24 23:16:08 +03:00
Eric Christopher
cf23852587 [Target] As part of using inclusive language within the llvm project,
migrate away from the use of blacklist and whitelist.

This change affects an internal llvm command line option.
2020-06-20 00:06:39 -07:00
Sjoerd Meijer
e345d547a0 Recommit "[LV] Emit @llvm.get.active.lane.mask for tail-folded loops"
Fixed ARM regression test.

Please see the original commit message rG47650451738c for details.
2020-06-17 13:12:15 +01:00
Sjoerd Meijer
d4e183f686 Revert "[LV] Emit @llvm.get.active.mask for tail-folded loops"
This reverts commit 47650451738c821993c763356854b560a0f9f550
while I investigate the build bot failures.
2020-06-17 10:09:54 +01:00
Sjoerd Meijer
4765045173 [LV] Emit @llvm.get.active.mask for tail-folded loops
This emits new IR intrinsic @llvm.get.active.mask for tail-folded vectorised
loops if the intrinsic is supported by the backend, which is checked by
querying TargetTransform hook emitGetActiveLaneMask.

This intrinsic creates a mask representing active and inactive vector lanes,
which is used by the masked load/store instructions that are created for
tail-folded loops. The semantics of @llvm.get.active.mask are described here in
LangRef:

https://llvm.org/docs/LangRef.html#llvm-get-active-lane-mask-intrinsics

This intrinsic is also used to provide a hint to the backend. That is, the
second argument of the intrinsic represents the back-edge taken count of the
loop. For MVE, for example, we use that to set up tail-predication, which is a
new form of predication in MVE for vector loops that implicitely predicates the
last vector loop iteration by implicitely setting active/inactive lanes, i.e.
the tail loop is predicated. In order to set up a tail-predicated vector loop,
we need to know the number of data elements processed by the vector loop, which
corresponds the the tripcount of the scalar loop, which we can now reconstruct
using @llvm.get.active.mask.

Differential Revision: https://reviews.llvm.org/D79100
2020-06-17 09:53:58 +01:00
Sjoerd Meijer
20835cff27 [TTI] Refactor emitGetActiveLaneMask
Refactor TTI hook emitGetActiveLaneMask and remove the unused arguments
as suggested in D79100.
2020-06-17 09:53:58 +01:00
David Green
46529978bf [ARM] Always use reductions intrinsics under MVE
Similar to a recent change to the X86 backend, this changes things so
that we always produce a reduction intrinsics for all reduction types,
not just the legal ones. This gives a better chance in the backend to
custom lower them to something more suitable for MVE. Especially for
something like fadd the in-order reduction produced during DAG lowering
is already better than the shuffles produced in the midend, and we can
do even better with a bit of custom lowering.

Differential Revision: https://reviews.llvm.org/D81398
2020-06-12 19:21:17 +01:00
Sam Parker
fa8bff0cd1 [CostModel] Unify getArithmeticInstrCost
Add the remaining arithmetic opcodes into the generic implementation
of getUserCost and then call this from getInstructionThroughput. Most
of the backends have been modified to return the base implementation
for cost kinds other RecipThroughput. The outlier here is AMDGPU
which already uses getArithmeticInstrCost for all the cost kinds.
This change means that most of the opcodes can be removed from that
backends implementation of getUserCost.

Differential Revision: https://reviews.llvm.org/D80992
2020-06-10 09:08:45 +01:00
Sam Parker
37289615c0 [NFCI][CostModel] Unify getCmpSelInstrCost
Add cases for icmp, fcmp and select into the switch statement of the
generic getUserCost implementation with getInstructionThroughput then
calling into it. The BasicTTI and backend implementations have be set
to return a default value (1) when a cost other than throughput is
being queried.

Differential Revision: https://reviews.llvm.org/D80550
2020-06-09 07:41:22 +01:00
Sam Parker
5b5e78ad2b [CostModel] Follow-up to buildbot fix
Adding type checks into the other backends that call
getTypeLegalizationCost.

Differential Revision: https://reviews.llvm.org/D80984
2020-06-08 15:26:25 +01:00
Sam Parker
9303546b42 [CostModel] Unify getMemoryOpCost
Use getMemoryOpCost from the generic implementation of getUserCost
and have getInstructionThroughput return the result of that for loads
and stores.

This also means that the X86 implementation of getUserCost can be
removed with the functionality folded into its getMemoryOpCost.

Differential Revision: https://reviews.llvm.org/D80984
2020-06-05 10:13:38 +01:00
Sjoerd Meijer
7480ccbfc9 [TTI] New target hook emitGetActiveLaneMask
This is split off from D79100 and adds a new target hook emitGetActiveLaneMask
that can be queried to check if the intrinsic @llvm.get.active.lane.mask() is
supported by the backend and if it should be emitted for a given loop.

See also commit rG7fb8a40e5220 and its commit message for more details/context
on this new intrinsic.

Differential Revision: https://reviews.llvm.org/D80597
2020-05-29 09:10:58 +01:00
Sam Parker
8aaabadece [CostModel] Unify getCastInstrCost
Add the remaining cast instruction opcodes to the base implementation
of getUserCost and directly return the result. This allows
getInstructionThroughput to return getUserCost for the casts. This
has required changes to PPC and SystemZ because they implement
getUserCost and/or getCastInstrCost with adjustments for vector
operations. Adjusts have also been made in the remaining backends
that implement the method so that they still produce a cost of zero
or one for cost kinds other than throughput.

Differential Revision: https://reviews.llvm.org/D79848
2020-05-26 11:29:57 +01:00
Craig Topper
7392820f98 [Align] Remove operations on MaybeAlign that asserted that it had a defined value.
If the caller needs to reponsible for making sure the MaybeAlign
has a value, then we should just make the caller convert it to an Align
with operator*.

I explicitly deleted the relational comparison operators that
were being inherited from Optional. It's unclear what the meaning
of two MaybeAligns were one is defined and the other isn't
should be. So make the caller reponsible for defining the behavior.

I left the ==/!= operators from Optional. But now that exposed a
weird quirk that ==/!= between Align and MaybeAlign required the
MaybeAlign to be defined. But now we use the operator== from
Optional that takes an Optional and the Value.

Differential Revision: https://reviews.llvm.org/D80455
2020-05-22 21:54:28 -07:00
Christopher Tetreault
245679b62e [SVE] Remove usages of VectorType::getNumElements() from ARM
Reviewers: efriedma, fpetrogalli, kmclaughlin, grosbach, dmgreen

Reviewed By: dmgreen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, dmgreen, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79816
2020-05-15 12:55:27 -07:00
Pierre-vh
2668775f66 [LSR][ARM] Add new TTI hook to mark some LSR chains as profitable
This patch adds a new TTI hook to allow targets to tell LSR that
a chain including some instruction is already profitable and
should not be optimized. This patch also adds an implementation
of this TTI hook for ARM so LSR doesn't optimize chains that include
the VCTP intrinsic.

Differential Revision: https://reviews.llvm.org/D79418
2020-05-13 14:18:28 +01:00
Sam Parker
b4a8091a11 [ARM][CostModel] Improve getCastInstrCost
- Specifically check for sext/zext users which have 'long' form NEON
  instructions.
- Add more entries to the table for sext/zexts so that we can report
  more accurately the number of vmovls required for NEON.
- Pass the instruction to the pass implementation.

Differential Revision: https://reviews.llvm.org/D79561
2020-05-12 10:32:20 +01:00
Simon Pilgrim
4e3c005554 [TTI] getScalarizationOverhead - use explicit VectorType operand
getScalarizationOverhead is only ever called with vectors (and we already had a load of cast<VectorType> calls immediately inside the functions).

Followup to D78357

Reviewed By: @samparker

Differential Revision: https://reviews.llvm.org/D79341
2020-05-05 16:59:23 +01:00
Sam Parker
40574fefe9 [NFC][CostModel] Add TargetCostKind to relevant APIs
Make the kind of cost explicit throughout the cost model which,
apart from making the cost clear, will allow the generic parts to
calculate better costs. It will also allow some backends to
approximate and correlate the different costs if they wish. Another
benefit is that it will also help simplify the cost model around
immediate and intrinsic costs, where we currently have multiple APIs.

RFC thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html

Differential Revision: https://reviews.llvm.org/D79002
2020-05-05 10:35:54 +01:00
David Green
de904f5325 [ARM] isHardwareLoopProfitable debug messages. NFC 2020-05-04 19:20:34 +01:00
Sam Parker
e9c9329aa4 [TTI] Add TargetCostKind argument to getUserCost
There are several different types of cost that TTI tries to provide
explicit information for: throughput, latency, code size along with
a vague 'intersection of code-size cost and execution cost'.

The vectorizer is a keen user of RecipThroughput and there's at least
'getInstructionThroughput' and 'getArithmeticInstrCost' designed to
help with this cost. The latency cost has a single use and a single
implementation. The intersection cost appears to cover most of the
rest of the API.

getUserCost is explicitly called from within TTI when the user has
been explicit in wanting the code size (also only one use) as well
as a few passes which are concerned with a mixture of size and/or
a relative cost. In many cases these costs are closely related, such
as when multiple instructions are required, but one evident diverging
cost in this function is for div/rem.

This patch adds an argument so that the cost required is explicit,
so that we can make the important distinction when necessary.

Differential Revision: https://reviews.llvm.org/D78635
2020-04-28 08:57:45 +01:00
Craig Topper
d22989c34e [CallSite removal][Target] Replace CallSite with CallBase. NFC
In some cases just delete an unneeded include.
2020-04-21 23:29:36 -07:00
Sam Parker
e3056ae9a0 [NFC][TTI] Explicit use of VectorType
The API for shuffles and reductions uses generic Type parameters,
instead of VectorType, and so assertions and casts are used a lot.
This patch makes those types explicit, which means that the clients
can't be lazy, but results in less ambiguity, and that can only be a
good thing.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562

Differential Revision: https://reviews.llvm.org/D78357
2020-04-20 09:16:52 +01:00
Christopher Tetreault
e1e131ea5e Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: grosbach, efriedma, sdesmalen

Reviewed By: efriedma

Subscribers: hiraditya, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77271
2020-04-09 12:52:44 -07:00
Anna Welker
a6d3bec83f [TTI][ARM][MVE] Refine gather/scatter cost model
Refines the gather/scatter cost model, but also changes the TTI
function getIntrinsicInstrCost to accept an additional parameter
which is needed for the gather/scatter cost evaluation.
This did require trivial changes in some non-ARM backends to
adopt the new parameter.
Extending gathers and truncating scatters are now priced cheaper.

Differential Revision: https://reviews.llvm.org/D75525
2020-03-11 10:23:41 +00:00
Guillaume Chatelet
fc19465965 [Alignment][NFC] Use Align for code creating MemOp
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73874
2020-02-03 14:10:30 +01:00
Guillaume Chatelet
3c89b75f23 [NFC] Introduce a type to model memory operation
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73785
2020-01-31 17:29:01 +01:00
David Green
b535aa405a [ARM] Use reduction intrinsics for larger than legal reductions
The codegen for splitting a llvm.vector.reduction intrinsic into parts
will be better than the codegen for the generic reductions. This will
only directly effect when vectorization factors are specified by the
user.

Also added tests to make sure the codegen for larger reductions is OK.

Differential Revision: https://reviews.llvm.org/D72257
2020-01-24 17:07:24 +00:00
David Green
e9c198278e [ARM] Basic gather scatter cost model
This is a very basic MVE gather/scatter cost model, based roughly on the
code that we will currently produce. It does not handle truncating
scatters or extending gathers correctly yet, as it is difficult to tell
that they are going to be correctly extended/truncated from the limited
information in the cost function.

This can be improved as we extend support for these in the future.

Based on code originally written by David Sherwood.

Differential Revision: https://reviews.llvm.org/D73021
2020-01-22 14:41:38 +00:00
David Green
5e51f75542 [ARM] Favour post inc for MVE loops
We were previously not necessarily favouring postinc for the MVE loads
and stores, leading to extra code prior to the loop to set up the
preinc. MVE in general can benefit from postinc (as we don't have
unrolled loops), and certain instructions like the VLD2's only post-inc
versions are available.

Differential Revision: https://reviews.llvm.org/D70790
2020-01-20 06:57:07 +00:00
Sam Parker
15c7fa4d11 [ARM][MVE] Don't unroll intrinsic loops.
We don't unroll vector loops for MVE targets, but we miss the case
when loops only contain intrinsic calls. So just move the logic a
bit to catch this case.

Differential Revision: https://reviews.llvm.org/D72440
2020-01-09 11:57:34 +00:00
Anna Welker
346f6b54bd [ARM][MVE] Enable masked gathers from vector of pointers
Adds a pass to the ARM backend that takes a v4i32
gather and transforms it into a call to MVE's
masked gather intrinsics.

Differential Revision: https://reviews.llvm.org/D71743
2020-01-08 13:43:12 +00:00
Reid Kleckner
85ba5f637a Rename TTI::getIntImmCost for instructions and intrinsics
Soon Intrinsic::ID will be a plain integer, so this overload will not be
possible.

Rename both overloads to ensure that downstream targets observe this as
a build failure instead of a runtime failure.

Split off from D71320

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D71381
2019-12-11 18:00:20 -08:00
David Green
b1aba0378e [ARM] Enable MVE masked loads and stores
With the extra optimisations we have done, these should now be fine to
enable by default. Which is what this patch does.

Differential Revision: https://reviews.llvm.org/D70968
2019-12-09 11:37:34 +00:00
David Green
be7a107070 [ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
  %s = shl i32 %a, 3
  %a = and i32 %s, %b
Can under Arm or Thumb2 become:
  and r0, r1, r2, lsl #3

So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.

We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.

Differential Revision: https://reviews.llvm.org/D70966
2019-12-09 10:24:33 +00:00
David Green
f008b5b8ce [ARM] Additional tests and minor formatting. NFC
This adds some extra cost model tests for shifts, and does some minor
adjustments to some Neon code to make it clear as to what it applies to.
Both NFC.
2019-12-09 10:24:33 +00:00
David Green
882f23caea [ARM] MVE interleaving load and stores.
Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering
for MVE. This works the same way as Neon, recognising the load/shuffles
combination and converting them into intrinsics in a pre-isel pass,
which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad
and lowerInterleavedStore.

The main difference to Neon is that we do not have a VLD3 instruction.
Otherwise most of the code works very similarly, with just some minor
differences in the form of the intrinsics to work around. VLD3 is
disabled by making isLegalInterleavedAccessType return false for those
cases.

We may need some other future adjustments, such as VLD4 take up half the
available registers so should maybe cost more. This patch should get the
basics in though.

Differential Revision: https://reviews.llvm.org/D69392
2019-11-19 18:37:30 +00:00
Sjoerd Meijer
71327707b0 [ARM][MVE] tail-predication
This is a follow up of d90804d, to also flag fmcp instructions as instructions
that we do not support in tail-predicated vector loops.

Differential Revision: https://reviews.llvm.org/D70295
2019-11-15 11:01:13 +00:00
Sjoerd Meijer
d90804d26b [ARM][MVE] canTailPredicateLoop
This implements TTI hook 'preferPredicateOverEpilogue' for MVE.  This is a
first version and it operates on single block loops only. With this change, the
vectoriser will now determine if tail-folding scalar remainder loops is
possible/desired, which is the first step to generate MVE tail-predicated
vector loops.

This is disabled by default for now. I.e,, this is depends on option
-disable-mve-tail-predication, which is off by default.

I will follow up on this soon with a patch for the vectoriser to respect loop
hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled,
we don't want to tail-fold and we shouldn't query this TTI hook, which is
done in D70125.

Differential Revision: https://reviews.llvm.org/D69845
2019-11-13 13:24:33 +00:00