59 Commits

Author SHA1 Message Date
Sam Parker
3572b21dec
[WebAssembly] Lower extend v16i8 to v16i32 (#188936)
Split the input vector with an extend_low and high and then split the
results again with extend_low and high for a total of 6 instructions.
This is removes 3 shuffles and a couple of extends.
2026-03-30 08:32:41 +01:00
Jasmine Tang
d69c670934
[WebAssembly] Add initial shuffle cost capabilities (#187596)
Fixes #178940

Fixes the case of i16x8, i8x16 manual splat not recognized but the case of i32x4 still remains.
2026-03-23 09:28:37 -07:00
Benjamin Maxwell
88c0a1db85
Revert "[WebAssembly] Mark extract.last.active as having invalid cost." (#181545)
The failures should have been resolved with
https://github.com/llvm/llvm-project/pull/180290 (which also added
WebAssembly tests).

This reverts commit
811fb223af.

---

This is the same as #180942, but with a `lit.local.cfg` added to the
CostModel test folder.
2026-02-16 08:55:25 +00:00
Benjamin Maxwell
0ad941a98b
Revert "Revert "[WebAssembly] Mark extract.last.active as having invalid cost."" (#181342)
Reverts llvm/llvm-project#180942

Looks like something changed the cost model. Will investigate later.
2026-02-13 09:50:30 +00:00
Benjamin Maxwell
0f8325c9a9
Revert "[WebAssembly] Mark extract.last.active as having invalid cost." (#180942)
The failures should have been resolved with #180290 (which also added
WebAssembly tests).

This reverts commit 811fb223af2b3e2d68c99b346f4b75dcf3de3417.
2026-02-13 09:21:46 +00:00
Damian Heaton
762ba885f9
[LV] Add support for llvm.vector.partial.reduce.fadd (#163975)
Allows the Loop Vectorizer to generate `llvm.vector.partial.reduce.fadd`
intrinsics when sequences which match its requirements are found.
2026-01-28 15:05:34 +00:00
Florian Hahn
b794baf8e7
[TTI] Add VectorInstrContext for context-aware insert/extract costs. (#175982)
This commit introduces the VectorInstrContext (VIC) infrastructure to
improve cost estimates for insert/extracts based on the context
instruction in which the insert/extract is used.

This is similar to CastContextHint, and allows providing context on how
the insert/extract is going to be used before creating IR. This is
useful in the LoopVectorizer, where costs need to estimated before
creating IR.

The new hint currently only replaces an existing check in AArch64,
but new uses will be introduced in follow-ups, including
https://github.com/llvm/llvm-project/pull/177201.

PR: https://github.com/llvm/llvm-project/pull/175982
2026-01-27 16:30:29 +00:00
Florian Hahn
811fb223af
[WebAssembly] Mark extract.last.active as having invalid cost.
Currently the WebAssembly backend crashes when trying to lower some
extract.last.active intrinsic calls. Mark their cost as invalid
temporarily, to avoid them being introduced by the loop
vectorizer after 2abd6d6d7ac (#158088).
2026-01-17 21:21:34 +00:00
Sam Parker
e5b6833e49
[WebAssembly] vi8 mul cost modelling. (#175177)
We've already optimised these, so update the cost model to reflect it.
And skip the isBeforeLegalize check when lowering i8 muls, because it
then misses the cases where, say v32i8, has been type legalised into 2x
v16i8.

Also explicitly disable memory interleaving for any factor other than
two or four.
2026-01-12 09:25:54 +00:00
valadaptive
8da7c05933
[WebAssembly] Fold constant i8x16.swizzle and i8x16.relaxed.swizzle to shufflevector (#169110)
Resolves #169058.

This adds ~~an InstCombine pass~~ a TTI hook to the WebAssembly backend
that folds `i8x16.swizzle` and `i8x16.relaxed.swizzle` operations to
`shufflevector` operations if their mask operands are constant.

This is mainly useful for abstractions over the raw intrinsics--for
instance, in architecture-generic SIMD code that may not be able to
expose the constant shuffles due to type system limitations.

I took most of this from the x86 backend (in particular,
`simplifyX86vpermilvar` in `X86InstCombineIntrinsic`), and adapted it
for the WebAssembly backend. There wasn't any previous
`instCombineIntrinsic` method on the WebAssembly `TargetTransformInfo`,
so I added it. Right now, this swizzle optimization is the only one it
performs.

As I noted in the transform itself, the "relaxed" swizzle actually has
stricter preconditions than the non-relaxed one. If a non-negative but
still out-of-bounds index is provided, the "relaxed" swizzle can choose
between returning 0 and the lane at the index modulo 16. However, it
must make the same choice every time, and we don't know which choice the
runtime will make, so we can't constant-fold it.

The regression tests were mostly generated by Claude and adapted a bit
by me (I tried to follow the [InstCombine contributor
guide](https://llvm.org/docs/InstCombineContributorGuide.html#tests)).
There was previously no WebAssembly subdirectory within the InstCombine
tests, so I created that too; as of now, the swizzle fold test is the
only file in it. Everything else was written by myself (well, partly
copy-pasted from the x86 backend).

I'm not sure how to write an Alive2 test for this; I can't find any
examples where the input is an arbitrary constant.
2026-01-07 17:39:36 -08:00
Sam Parker
d10a85167a
[WebAssembly] Implement more of getCastInstrCost (#164612)
Fill out more information for sign and zero extend and add some truncate
information; however, the primary change is to int/fp conversions. In
particular, fp to (narrow) int appears to be relatively expensive.
2025-11-10 08:07:16 +00:00
Sam Parker
586c0ad918
[WebAssembly] Support partial-reduce accumulator (#158060)
We currently only support partial.reduce.add in the case where we are
performing a multiply-accumulate. Now add support for any partial
reduction where the input is being extended, where we can take advantage
of extadd_pairwise.
2025-09-12 07:03:49 +01:00
Sam Parker
7b3e77f8d9
[WebAssembly] Implement getInterleavedMemoryOpCost (#146864)
First pass where we calculate the cost of the memory operation, as well
as the shuffles required. Interleaving by a factor of two should be
relatively cheap, as many ISAs have dedicated instructions to perform
the (de)interleaving. Several of these permutations can be combined for
an interleave stride of 4 and this is the highest stride we allow.

I've costed larger vectors, and more lanes, as more expensive because
not only is more work is needed but the risk of codegen going 'wrong'
rises dramatically. I also filled in a bit of cost modelling for vector
stores.

It appears the main vector plan to avoid is an interleave factor of 4
with v16i8. I've used libyuv and ncnn for benchmarking, using V8 on
AArch64, and observe geomean improvement of ~3% with some kernels
improving 40-60%.

I know there is still significant performance being left on the table,
so this will need more development along with the rest of the cost
model.
2025-08-27 12:43:52 +01:00
Jasmine Tang
d7a29e5d56
[WebAssembly] Reapply #149461 with correct CondCode in combine of SETCC (#153703)
This PR reapplies https://github.com/llvm/llvm-project/pull/149461

In the original `combineVectorSizedSetCCEquality`, the result of setcc
is being negated by returning setcc with the same cond code, leading to
wrong logic.

For example, with
```llvm
 %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16)
  %res = icmp eq i32 %cmp_16, 0
```

the original PR producese all_true and then also compares the result
equal to 0 (using the same SETEQ in the returning setcc), meaning that
semantically, it effectively is calling icmp ne.

Instead, the PR should have use SETNE in the returning setcc, this way,
all true return 1, then it is compared again ne 0, which is equivalent
to icmp eq.
2025-08-15 12:06:47 -07:00
Jasmine Tang
d32793ca6e
Revert "[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128" (#153360)
Reverts llvm/llvm-project#149461

The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the
Emscripten test suite has failed. This PR applies a revert so I can take
a closer look at it

Test case link:
https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp

Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128
-o something.js`

Original comment report:
https://github.com/llvm/llvm-project/pull/149461#issuecomment-3181652746
2025-08-13 07:41:44 +00:00
Jasmine Tang
348f01f89c
[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 (#149461)
Fixes https://github.com/llvm/llvm-project/issues/149230

Previously, even with simd enabled via `-mattr=+simd128`, the compiler
cannot utilize v128 to optimize loads and setcc of i128, instead
legalizing it to consecutive i64s.

This PR then adds support for setcc of i128 by converting them to
v16i8's anytrue and alltrue; consequently, this benefits memcmp of 16
bytes or more (when simd128 is present).

The check for enabling this optimization is if the comparison operand is
either a load or an integer in i128, with the comparison code being
either `EQ | NE`, without `NoImplicitFloat` function flag.

Inspiration taken from RISCV's isel lowering.
2025-08-12 11:04:37 -07:00
Jasmine Tang
343f7475be
[WebAssembly] Add support for memcmp expansion (#148298)
Fixes https://github.com/llvm/llvm-project/issues/61400

Added test case in llvm/test/CodeGen/WebAssembly/memcmp-expand.ll
2025-07-20 10:27:42 -07:00
Philip Reames
b96370131d
[TTI] Plumb CostKind through getPartialReductionCost (#144953)
Purely for the sake of being idiomatic with other TTI costing routines,
no direct motivation beyond that.
2025-06-19 15:29:56 -07:00
Max Graey
8aaac80ddd
[NFC] Use more isa and isa_and_nonnull instead dyn_cast for predicates (#137393)
Also fix some typos in comments

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2025-05-13 22:34:42 +08:00
David Green
abd2c07e39
[CostModel] Make Op0 and Op1 const in getVectorInstrCost. NFC (#137631)
This does not alter much at the moment, but allows const pointers to be
passed as Op0 and Op1, simplifying later patches
2025-05-01 15:55:08 +01:00
Sergei Barannikov
3334c3597d
[TTI] Fix discrepancies in prototypes between interface and implementations (NFCI) (#136655)
These are not diagnosed because implementations hide the methods of the base class rather than overriding them.
This works as long as a hiding function is callable with the same arguments as the same function from the base class.

Pull Request: https://github.com/llvm/llvm-project/pull/136655
2025-04-22 11:40:12 +03:00
Sergei Barannikov
e0c1e23b99
[TTI] Constify BasicTTIImplBase::thisT() (NFCI) (#136575)
The main change is making `thisT` method `const`, the rest of the
changes is fixing compilation errors (*).

(*) There are two tricky methods, `getVectorInstrCost()` and
`getIntImmCost()`.
They have several overloads; some of these overloads are typically
pulled in to derived classes using the `using` directive, and then
hidden by methods in the derived class.
The compiler does not complain if the hiding methods are not marked as
`const`, which means that clients will use the methods from the base
class. If after this change your target fails cost model tests, this
must be the reason. To resolve the issue you need  to make all hiding
overloads `const`. See the second commit in this PR.

Pull Request: https://github.com/llvm/llvm-project/pull/136575
2025-04-21 21:42:40 +03:00
Sam Parker
df2de13695
[WebAssembly] Autovec support for dot (#123207)
Enable the use of partial.reduce.add that we can lower to dot or a tree
of (add (extmul_low_u, extmul_high_u)) for the unsigned case. We support
both v8i16 and v16i8 inputs.
2025-02-03 08:58:43 +00:00
Sam Parker
28d7880618
[WebAssembly] getMemoryOpCost and getCastInstrCost (#122896)
Add inital implementations of these TTI methods for SIMD types. For
casts, The costing covers the free extensions provided by extmul_low as
well as extend_low. For memory operations we consider the use of
load32_zero and load64_zero, as well as full width v128 loads.
2025-01-31 10:33:31 +00:00
hev
e26af0938c
[llvm] Add BasicTTIImpl::areInlineCompatible for target feature subset checks (#117493)
This patch moves the `areInlineCompatible` implementation from multiple
subclasses (`AArch64TTIImpl`, `RISCVTTIImpl`, `WebAssemblyTTIImpl`) to
the base class `BasicTTIImpl`. The new implementation checks whether the
callee's target features are a subset of the caller's, enabling
consistent behavior across targets. Subclasses now simply delegate to
the base implementation, reducing code duplication and improving
maintainability.
2024-11-25 11:22:49 +08:00
Kazu Hirata
43570a2841
[WebAssembly] Remove unused includes (NFC) (#116318)
Identified with misc-include-cleaner.
2024-11-15 07:26:37 -08:00
Jeffrey Byrnes
853c43d04a
[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
2024-10-09 14:30:09 -07:00
Sam Parker
d28ed29d6b
[TTI][WebAssembly] Pairwise reduction expansion (#93948)
WebAssembly doesn't support horizontal operations nor does it have a way
of expressing fast-math or reassoc flags, so runtimes are currently
unable to use pairwise operations when generating code from the existing
shuffle patterns.

This patch allows the backend to select which, arbitary, shuffle pattern
to be used per reduction intrinsic. The default behaviour is the same as
the existing, which is by splitting the vector into a top and bottom
half. The other pattern introduced is for a pairwise shuffle.

WebAssembly enables pairwise reductions for int/fp add/sub.
2024-07-17 09:21:52 +01:00
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Fangrui Song
8e247b8f47 Replace TypeSize::{getFixed,getScalable} with canonical TypeSize::{Fixed,Scalable}. NFC 2023-10-27 00:30:41 -07:00
ShihPo Hung
5fb3a57ea7 [Cost] Add CostKind to getVectorInstrCost and its related users
LoopUnroll estimates the loop size via getInstructionCost(),
but getInstructionCost() cannot pass CostKind to getVectorInstrCost().
And so does getShuffleCost() to getBroadcastShuffleOverhead(),
getPermuteShuffleOverhead(), getExtractSubvectorOverhead(),
and getInsertSubvectorOverhead().

To address this, this patch adds an argument CostKind to these
functions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D142116
2023-01-21 05:29:24 -08:00
Alexey Bataev
9b5f62685a [SLP]Fix cost of the broadcast buildvector/gather.
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.

Differential Revision: https://reviews.llvm.org/D140498
2023-01-06 09:25:05 -08:00
Philip Reames
104fa367ee [TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC]
This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both.

This is the change which motivated the whole sequence which preceeded it.  In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact.  This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through.

I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance.  For instance, every parameter which changes type in this change also changes name.  This was intentional to make sure that every call site possible effected must show up in the diff.  This let me audit each one closely.
2022-08-22 15:16:39 -07:00
Philip Reames
478cf94378 [X86][AArch64][WebAsm][RISCV] Query operand properties instead of using enums directly [nfc]
This is part of an ongoing transition to use OperandValueInfo which combines OperandValueKind and OperandValueProperties.  This change adds some accessor methods and uses them to simplify backend code.  The primary motivation of doing so is removing uses of the parameters so that an upcoming api change is less error prone.
2022-08-22 13:37:59 -07:00
Kazu Hirata
a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Chuanqi Xu
0b5ead6590 [WebAssembly] Don't set musttail for coroutines when tail-call is not
enabled

The C++20 Coroutines couldn't be compiled to WebAssembly due to an
optimization named symmetric transfer requires the support for musttail
calls but WebAssembly doesn't support it yet.

This patch tries to fix the problem by adding a supportsTailCalls
method to TargetTransformImpl to skip the symmetric transfer when
tail-call feature is not supported.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D128794
2022-06-30 11:15:40 +08:00
Roman Lebedev
6f6e9a867f
[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107271
2021-08-03 00:57:26 +03:00
Sander de Smalen
4f42d873c2 [TTI] NFC: Change getArithmeticInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100317
2021-04-14 17:20:36 +01:00
Sander de Smalen
1af35e77f4 [TTI] NFC: Change getVectorInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100315
2021-04-14 17:20:35 +01:00
Sander de Smalen
55d18b3cc2 [TTI] Return a TypeSize from getRegisterBitWidth.
This patch changes the interface to take a RegisterKind, to indicate
whether the register bitwidth of a scalar register, fixed-width vector
register, or scalable vector register must be returned.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D98874
2021-03-24 14:45:13 +00:00
Sam Parker
9d81ccc02f [WebAssembly] Enable loop unrolling
Enable partial and runtime unrolling with a threshold of 30, which
was derived from a large number of kernels running on node and
wasmtime for amd64 and aarch64.

Unrolling is enabled by default at -O2 and -O3 and is disabled at
-Oz and -Os. Compiling with -Os is recommended if the wasm binary
size is the most important factor.

Differential Revision: https://reviews.llvm.org/D95125
2021-02-10 08:25:46 +00:00
Thomas Lively
d53d952810 [WebAssembly] Allow inlining functions with different features
Allow inlining only when the Callee has a subset of the Caller's
features. In principle, we should be able to inline regardless of any
features because WebAssembly supports features at module granularity,
not function granularity, but without this restriction it would be
possible for a module to "forget" about features if all the functions
that used them were inlined.

Requested in PR46812.

Differential Revision: https://reviews.llvm.org/D85494
2020-08-13 13:57:43 -07:00
Christopher Tetreault
5e2c736395 [SVE] Remove calls to VectorType::getNumElements from WebASM
Summary:
The getNumElements in base VectorType is being deprecated.

See: http://lists.llvm.org/pipermail/llvm-dev/2020-March/139811.html

Reviewers: efriedma, tlively, fpetrogalli, c-rhodes, dschuff

Reviewed By: tlively, dschuff

Subscribers: dschuff, sbc100, tschuett, jgravelle-google, hiraditya, aheejin, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82217
2020-06-22 12:25:08 -07:00
Sam Parker
40574fefe9 [NFC][CostModel] Add TargetCostKind to relevant APIs
Make the kind of cost explicit throughout the cost model which,
apart from making the cost clear, will allow the generic parts to
calculate better costs. It will also allow some backends to
approximate and correlate the different costs if they wish. Another
benefit is that it will also help simplify the cost model around
immediate and intrinsic costs, where we currently have multiple APIs.

RFC thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html

Differential Revision: https://reviews.llvm.org/D79002
2020-05-05 10:35:54 +01:00
David Green
be7a107070 [ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
  %s = shl i32 %a, 3
  %a = and i32 %s, %b
Can under Arm or Thumb2 become:
  and r0, r1, r2, lsl #3

So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.

We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.

Differential Revision: https://reviews.llvm.org/D70966
2019-12-09 10:24:33 +00:00
Zi Xuan Wu
9802268ad3 recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374634
2019-10-12 02:53:04 +00:00
Jinsong Ji
9912232b46 Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize"
Also Revert "[LoopVectorize] Fix non-debug builds after rL374017"

This reverts commit 9f41deccc0e648a006c9f38e11919f181b6c7e0a.
This reverts commit 18b6fe07bcf44294f200bd2b526cb737ed275c04.

The patch is breaking PowerPC internal build, checked with author, reverting
on behalf of him for now due to timezone.

llvm-svn: 374091
2019-10-08 17:32:56 +00:00
Zi Xuan Wu
9f41deccc0 [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374017
2019-10-08 03:28:33 +00:00
Heejin Ahn
18c56a0762 [WebAssembly] clang-tidy (NFC)
Summary:
This patch fixes clang-tidy warnings on wasm-only files.
The list of checks used is:
`-*,clang-diagnostic-*,llvm-*,misc-*,-misc-unused-parameters,readability-identifier-naming,modernize-*`
(LLVM's default .clang-tidy list is the same except it does not have
`modernize-*`. But I've seen in multiple CLs in LLVM the modernize style
was recommended and code was fixed based on the style, so I added it as
well.)

The common fixes are:
- Variable names start with an uppercase letter
- Function names start with a lowercase letter
- Use `auto` when you use casts so the type is evident
- Use inline initialization for class member variables
- Use `= default` for empty constructors / destructors
- Use `using` in place of `typedef`

Reviewers: sbc100, tlively, aardappel

Subscribers: dschuff, sunfish, jgravelle-google, yurydelendik, kripken, MatzeB, mgorny, rupprecht, llvm-commits

Differential Revision: https://reviews.llvm.org/D57500

llvm-svn: 353075
2019-02-04 19:13:39 +00:00
Chandler Carruth
2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00