223056 Commits

Author SHA1 Message Date
Dawid Jurczak
9ba5bb4309 [NFC][LoopIdiom] Make for loops more readable
Patch simplifies for loops in LIR following LLVM guidelines: https://llvm.org/docs/CodingStandards.html#use-range-based-for-loops-wherever-possible.

Differential Revision: https://reviews.llvm.org/D112077
2021-10-21 12:17:44 +02:00
Frederic Cambus
9635b2951d [docs] Fix broken link rendering in the LLVM Coding Standards. 2021-10-21 11:12:33 +02:00
Evgeniy Brevnov
1a8ec24efb [NARY-REASSOCIATE][NFC] Simplify min/max handling
In order to explore different variants of reassociation current implementation uses "swap in a loop" approach. Unfortunately, the implementation is more complicated than it could be. This is an attempt to streamline the code. New approach is to extract core functionality into a helper function and call it explicitly as many times as required.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112128
2021-10-21 15:45:53 +07:00
David Sherwood
9448cdc900 [SVE][Analysis] Tune the cost model according to the tune-cpu attribute
This patch introduces a new function:

  AArch64Subtarget::getVScaleForTuning

that returns a value for vscale that can be used for tuning the cost
model when using scalable vectors. The VScaleForTuning option in
AArch64Subtarget is initialised according to the following rules:

1. If the user has specified the CPU to tune for we use that, else
2. If the target CPU was specified we use that, else
3. The tuning is set to "generic".

For CPUs of type "generic" I have assumed that vscale=2.

New tests added here:

  Analysis/CostModel/AArch64/sve-gather.ll
  Analysis/CostModel/AArch64/sve-scatter.ll
  Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

Differential Revision: https://reviews.llvm.org/D110259
2021-10-21 09:33:50 +01:00
eopXD
76db6d8080 [NFC][LoopIdiom] Add more test case to runtime-determined memset size
This patch supplements missing test case for D107353.
- Fix wrong descriptions in 64-bit mode test case
- Added testcase under 32-bit mode

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D108507
2021-10-21 00:05:18 -07:00
Yi Kong
1123e03a9d [opt-viewer] Use safe yaml load_all
Differential Revision: https://reviews.llvm.org/D112075
2021-10-21 14:00:03 +08:00
Vitaly Buka
66b650f3da [NFC][msan] Add NormalArgAfterNoUndef testcase 2021-10-20 21:08:12 -07:00
Vitaly Buka
60a8db6dc5 [NFC][msan] Rerun update_test_checks.py for a test 2021-10-20 21:08:12 -07:00
Vitaly Buka
6742c8a2d8 [NFC][msan] Break the loop when done
We have nothing to do after the Argument
is found.
2021-10-20 21:08:12 -07:00
Shengchen Kan
edff0070a1 [Codegen] Set ARITH_FENCE as meta-instruction
ARITH_FENCE, which was added by https://reviews.llvm.org/D99675,
should be a meta-instruction b/c it only emits comments "ARITH_FENCE".

Reviewed By: pengfei, LuoYuanke

Differential Revision: https://reviews.llvm.org/D112127
2021-10-21 10:19:22 +08:00
Craig Topper
b75f3dd88e [ARM] Use correct name of floating point ceil intrinsic in test.
The intrinsic is called llvm.ceil not llvm.fceil. The checks weren't
strong enough to notice that a call to llvm.fceil was emitted in
the final assembly.
2021-10-20 17:30:26 -07:00
Arthur Eubanks
6ea7437ca5 [SelectionDAG] Bail out of mergeTruncStores when not optimizing
With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores.

Fixes PR51827.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D111596
2021-10-20 16:58:22 -07:00
Nikita Popov
8e4ae603d6 [Tests] Add tests for non-speculatable ephemeral values
The loads in these examples are currently not considered ephemeral
because they are not speculatable.
2021-10-20 23:33:36 +02:00
Sanjay Patel
40163f1df8 [x86] add special-case lowering for usubsat for AVX512
This is a small extension of D112095 to avoid another regression
seen with D112085.
In this case, we allow the same conversion from usubsat to ALU
ops if the target supports vpternlog.

That pattern will get converted later in X86DAGToDAGISel::tryVPTERNLOG().
This seems better than putting a magic immediate constant directly in
this code to create the exact vpternlog that we need. It's possible that
there are other special-cases along these lines, so we should try to
keep all of the vpternlog magic in one place.

Differential Revision: https://reviews.llvm.org/D112138
2021-10-20 16:41:13 -04:00
Stanislav Mekhanoshin
b92412fb28 [InstCombine] Fold (a & ~b) & ~c to a & ~(b | c)
%not1 = xor i32 %b, -1
  %not2 = xor i32 %c, -1
  %and1 = and i32 %a, %not1
  %and2 = and i32 %and1, %not2
=>
  %i1 = or i32 %b, %c
  %i2 = xor i32 %1, -1
  %and2 = and i32 %i2, %a

Differential Revision: https://reviews.llvm.org/D112108
2021-10-20 13:05:46 -07:00
Stanislav Mekhanoshin
3c59cdee5c Precommit updated InstCombine/and-xor-or.ll test. NFC. 2021-10-20 12:50:23 -07:00
Florian Hahn
8977bd5806
[IndVars] Invalidate SCEV when IR is changed in rewriteLoopExitValue.
At the moment, rewriteLoopExitValue forgets the current phi node in the
loop that collects phis to rewrite. A few lines after the value is
forgotten, SCEV is used again to analyze incoming values and
potentially expand SCEV expression. This means that another SCEV is
created for PN, before the IR is actually updated in the next loop.

This leads to accessing invalid cached expression in combination with
D71539.

PN should only be changed once the actual incoming exit value is set in
the next loop. Moving invalidation there should ensure that PN is
invalidated in all relevant cases.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D111495
2021-10-20 20:48:33 +01:00
Jon Roelofs
b046eb19b8 [AArch64][GlobalISel] combine (and (or x, c1), c2) => (and x, c2) iff c1 & c2 == 0
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111856
2021-10-20 12:11:52 -07:00
Stanislav Mekhanoshin
c80d8a8cea [AMDGPU] MachineLICM cannot hoist VALU
MachineLoop::isLoopInvariant() returns false for all VALU
because of the exec use. Check TII::isIgnorableUse() to
allow hoisting.

That unfortunately results in higher register consumption
since MachineLICM does not adequately estimate pressure.
Therefor I think it shall only be enabled after D107677 even
though it does not depend on it.

Differential Revision: https://reviews.llvm.org/D107859
2021-10-20 11:47:24 -07:00
Stanislav Mekhanoshin
6185835656 [AMDGPU] Allow rematerialization of SOP with virtual registers
D106408 was doing this for all targets although it was
reverted due to couple performance regressions on some targets.
The difference for AMDGPU is the ability to rematerialize SOP
instructions with virtual register uses like we already do for VOP.

Differential Revision: https://reviews.llvm.org/D110743
2021-10-20 11:46:50 -07:00
Leonard Grey
5d57578a4e [MC] Recursively calculate symbol offset
This is speculative since I'm not sure if there's some implicit contract that a
variable symbol must not have another variable symbol in its evaluation tree.

Downstream bug: https://bugs.chromium.org/p/chromium/issues/detail?id=471146#c23.

Test is based on alias.s (removed checks since we just need to know it didn't
crash).

Differential Revision: https://reviews.llvm.org/D109109
2021-10-20 14:29:43 -04:00
Sanjay Patel
80ab06c599 [InstCombine] fold fake vector insert to bit-logic
bitcast (inselt (bitcast X), Y, 0) --> or (and X, MaskC), (zext Y)

https://alive2.llvm.org/ce/z/Ux-662

Similar to D111082 / db231ebdb07f :
We want to avoid relatively opaque vector ops on types that are
likely supported by the backend as scalar integers. The bitwise
logic ops are more likely to allow further combining.

We probably want to generalize this to allow a shift too, but
that would oppose instcombine's general rule of not creating
extra instructions, so that's left as a potential follow-up.
Alternatively, we could do that transform in VectorCombine
with the help of the TTI cost model.

This is part of solving:
https://llvm.org/PR52057
2021-10-20 14:21:40 -04:00
Stanislav Mekhanoshin
503d061dc7 Precommit InstCombine/and-xor-or.ll test. NFC. 2021-10-20 11:13:12 -07:00
Arthur Eubanks
00500d5bad [NFC] De-template LazyCallGraph::visitReferences() and move into .cpp file
This makes changing it and recompiling it much faster.
2021-10-20 10:50:00 -07:00
Itay Bookstein
08ed216000 [IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol
As discussed in:
* https://reviews.llvm.org/D94166
* https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html

The GlobalIndirectSymbol class lost most of its meaning in
https://reviews.llvm.org/D109792, which disambiguated getBaseObject
(now getAliaseeObject) between GlobalIFunc and everything else.
In addition, as long as GlobalIFunc is not a GlobalObject and
getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee
is a GlobalIFunc cannot currently be modeled properly. Creating
aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition,
calling getAliaseeObject on a GlobalIFunc will currently return nullptr,
which is undesirable because it should return the object itself for
non-aliases.

This patch refactors the GlobalIFunc class to inherit directly from
GlobalObject, and removes GlobalIndirectSymbol (while inlining the
relevant parts into GlobalAlias and GlobalIFunc). This allows for
calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc
itself, making getAliaseeObject() more consistent and enabling
alias-to-ifunc to be properly modeled in the IR.

I exercised some judgement in the API clients of GlobalIndirectSymbol:
some were 'monomorphized' for GlobalAlias and GlobalIFunc, and
some remained shared (with the type adapted to become GlobalValue).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108872
2021-10-20 10:29:47 -07:00
Zhi An Ng
e1fb13401e [WebAssembly] Add prototype relaxed float min max instructions
Add relaxed. f32x4.min, f32x4.max, f64x2.min, f64x2.max. These are only
exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112146
2021-10-20 09:41:51 -07:00
Sanjay Patel
ea9a0556b4 [InstCombine] add tests for casted insertelement; NFC 2021-10-20 12:17:58 -04:00
Fraser Cormack
eabf11f9ea [CodeGenPrepare] Avoid a scalable-vector crash in ctlz/cttz
This patch fixes a crash when despeculating ctlz/cttz intrinsics with
scalable-vector types. It is not safe to speculatively get the size of
the vector type in bits in case the vector type is not a fixed-length type. As
it happens this isn't required as vector types are skipped anyway.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112141
2021-10-20 16:45:55 +01:00
Bjorn Pettersson
3d152bc49d [NewPM][test] Strickly use -passes in some more lit tests
Removed/replaced RUN lines using legacy PM syntax in favor of using
-passes in lit tests for Float2Int, MetaRenamer, StripDeadPrototypes
and StripSymbols.
2021-10-20 17:06:47 +02:00
Bjorn Pettersson
a3ca7dd0ab [NewPM][test] Use -passes syntax in Mem2Reg lit tests
The legacy PM is deprecated, so use the new PM syntax in lit tests
verifying the mem2reg pass.
2021-10-20 17:06:47 +02:00
Craig Topper
fe1f0de003 [RISCV][WebAssembly][TargetLowering] Allow expandCTLZ/expandCTTZ to rely on CTPOP expansion for vectors.
Our fallback expansion for CTLZ/CTTZ relies on CTPOP. If CTPOP
isn't legal or custom for a vector type we would scalarize the
CTLZ/CTTZ. This is different than CTPOP itself which would use a
vector expansion.

This patch teaches expandCTLZ/CTTZ to rely on the vector CTPOP
expansion instead of scalarizing. To do this I had to add additional
checks to make sure the operations used by CTPOP expansions are all
supported. Some of the operations were already needed for the CTLZ/CTTZ
expansion.

This is a huge improvement to the RISCV which doesn't have a scalar
ctlz or cttz in the base ISA.

For WebAssembly, I've added Custom lowering to keep the scalarizing
behavior. I've also extended the scalarizing to CTPOP.

Differential Revision: https://reviews.llvm.org/D111919
2021-10-20 07:46:41 -07:00
Sanjay Patel
3efd2a0bec [x86] make helper for useVPTERNLOG; NFC
See D112085 for another use case.
2021-10-20 10:26:53 -04:00
Jeremy Morse
89950ade21 [DebugInfo][InstrRef] Track a single variable at a time
Here's another performance patch for InstrRefBasedLDV: rather than
processing all variable values in a scope at a time, instead, process one
variable at a time. The benefits are twofold:
 * It's easier to reason about one variable at a time in your mind,
 * It improves performance, apparently from increased locality.

The downside is that the value-propagation code gets indented one level
further, plus there's some churn in the unit tests.

Differential Revision: https://reviews.llvm.org/D111799
2021-10-20 15:03:52 +01:00
Bjorn Pettersson
e9320b1a95 [NewPM][test] Only use -passes syntax in Scalarizer lit tests
With legacy PM being deprecated it should be enough to verify the
scalarizer pass using the new-PM syntax when invoking opt.
2021-10-20 15:16:18 +02:00
Bjorn Pettersson
5e4dbd7a2f [NewPM][test] Use -passes syntax in VectorCombine lit tests
The legacy PM is deprecated, so use the new PM syntax in lit tests
running the vector-combine pass.
2021-10-20 15:16:17 +02:00
Bjorn Pettersson
15f1fb5a30 [NewPM][test] Use -passes syntax in BoundsChecking lit tests
The legacy PM is deprecated, so use the new PM syntax in lit tests
running the bounds-checking pass.
2021-10-20 15:16:17 +02:00
Bjorn Pettersson
57bd67abfc [NewPM][test] Use -passes syntax in SpeculativeExecution lit tests
The legacy PM is deprecated, so use the new PM syntax in lit tests
running the speculative-execution pass.
2021-10-20 15:16:17 +02:00
Bjorn Pettersson
a413663d8f [NewPM][test] Avoid using -enable-new-pm=1 since -passes implies new PM 2021-10-20 15:16:17 +02:00
Sander de Smalen
be6c8dc765 [SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors.
When inserting a scalable subvector into a scalable vector through
the stack, the index to store to needs to be scaled by vscale.
Before this patch, that didn't yet happen, so it would generate the
wrong offset, thus storing a subvector to the incorrect address
and overwriting the wrong lanes.

For some insert:
  nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2)

The offset was not scaled by vscale:
  orr     x8, x8, #0x4
  st1h    { z0.h }, p0, [sp]
  st1h    { z1.d }, p1, [x8]
  ld1h    { z0.h }, p0/z, [sp]

And is changed to:
  mov x8, sp
  st1h { z0.h }, p0, [sp]
  st1h { z1.d }, p1, [x8, #1, mul vl]
  ld1h { z0.h }, p0/z, [sp]

Differential Revision: https://reviews.llvm.org/D111633
2021-10-20 13:55:24 +01:00
Simon Pilgrim
a3c05982ac [SLP][X86] Improve SLP tests for division/multiplication by +/- pow2
Add PR51436 test as well as some basic multiply tests, and include SSE2 division coverage
2021-10-20 13:30:27 +01:00
Simon Pilgrim
5b395bd633 [CostModel][X86] Add costs for multiply-by-pow2 constants
These are folded to left shifts in the backend.

We should be able to extend this for multiply-by-negpow2 after D111968 has landed to resolve PR51436
2021-10-20 13:11:21 +01:00
Simon Pilgrim
9fc523d114 [X86] Remove X86ProcFamilyEnum::IntelSLM
Replace X86ProcFamilyEnum::IntelSLM enum with a TuningUseSLMArithCosts flag instead, matching what we already do for Goldmont.

This just leaves X86ProcFamilyEnum::IntelAtom to replace with general Tuning/Feature flags and we can finally get rid of the old X86ProcFamilyEnum enum.

Differential Revision: https://reviews.llvm.org/D112079
2021-10-20 11:58:39 +01:00
Daniel Kiss
f903c85055 [AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions.
autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions
therefore let's emit .cfi_negate_ra_state for these too.
In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication
in one step here we can't emit the . cfi_negate_ra_state because that would be point after
the ret* instruction.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D111780
2021-10-20 11:03:52 +02:00
Joerg Sonnenberger
ec428f7b78 [SPARC] Recognize the prefetch instruction
Reviewed By: LemonBoy

Differential Revision: https://reviews.llvm.org/D96311
2021-10-20 10:59:01 +02:00
David Green
862e8d7e55 [AArch64] Improve div and rem costmodel tests. NFC
Copied from the X86 tests, these give a better test coveraged than the
existing tests.
2021-10-20 09:58:35 +01:00
Paulo Matos
6d0c7bc17d [WebAssembly] Implementation of table.get/set for reftypes in LLVM IR
This change implements new DAG nodes TABLE_GET/TABLE_SET, and lowering
methods for load and stores of reference types from IR arrays. These
global LLVM IR arrays represent tables at the Wasm level.

Differential Revision: https://reviews.llvm.org/D111154
2021-10-20 10:31:31 +02:00
Zi Xuan Wu
de10a02fc0 [CSKY] Complete to add basic integer instruction set
Complete the basic integer instruction set and add related predictor in CSKY.td.
And it includes the instruction definition and asm parser support.

Differential Revision: https://reviews.llvm.org/D111701
2021-10-20 15:50:44 +08:00
Evgeniy Brevnov
269f563a2b [NARY-REASSOCIATE] Fix infinite recursion optimizing min\max
To guarantee convergence of the algorithm each optimization step should decrease number of instructions when IR is modified. This property is not held in this test case. The problem is that SCEV Expander may do "unexpected" reassociation what results in creation of new min/max chains and introduction of extra instructions. As a result on each step we indefinitely optimize back and forth.

The solution is to restrict SCEV Expander to perform uncontrolled reassociations by means of "Unknown" expressions.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112060
2021-10-20 14:23:03 +07:00
Wenlei He
e8c245dcd3 [llvm-profgen] Skip duplication factor outside of body sample computation
We incorrectly use duplication factor for total samples even though we already accumulate samples instead of taking MAX. It causes profile to have bloated total samples for functions with loop unrolled or vectorized. The change fix the issue for total sample, head sample and call target samples.

Differential Revision: https://reviews.llvm.org/D112042
2021-10-19 23:10:45 -07:00
Shao-Ce SUN
9378ca52ca [NFC] Fix typos 2021-10-20 11:47:26 +08:00