llvm-project

History

Sanjay Patel e2321bb448 [SLP] avoid reduction transform on patterns that the backend can load-combine

I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708
https://bugs.llvm.org/show_bug.cgi?id=43146

We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).

Here, we add a scalar cost model adjustment with a conservative pattern match and cost
summation for a multi-instruction sequence that can probably be reduced later.
This should prevent SLP from creating a vector reduction unless that sequence is
extremely cheap.

In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:

  movbe   rax, qword ptr [rdi]

or:

  mov     rax, qword ptr [rdi]

Not some (half) vector monstrosity as we currently do using SLP:

  vpmovzxbq       ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
  vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
  movzx   eax, byte ptr [rdi]
  movzx   ecx, byte ptr [rdi + 5]
  shl     rcx, 40
  movzx   edx, byte ptr [rdi + 6]
  shl     rdx, 48
  or      rdx, rcx
  movzx   ecx, byte ptr [rdi + 7]
  shl     rcx, 56
  or      rcx, rdx
  or      rcx, rax
  vextracti128    xmm1, ymm0, 1
  vpor    xmm0, xmm0, xmm1
  vpshufd xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
  vpor    xmm0, xmm0, xmm1
  vmovq   rax, xmm0
  or      rax, rcx
  vzeroupper
  ret

Differential Revision: https://reviews.llvm.org/D67841

llvm-svn: 373833

2019-10-05 18:03:58 +00:00

AliasAnalysis.cpp

Change TargetLibraryInfo analysis passes to always require Function

2019-09-07 03:09:36 +00:00

AliasAnalysisEvaluator.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

AliasAnalysisSummary.cpp

Move CFLGraph and the AA summary code over to the new CallBase

2019-02-11 09:25:41 +00:00

AliasAnalysisSummary.h

Move CFLGraph and the AA summary code over to the new CallBase

2019-02-11 09:25:41 +00:00

AliasSetTracker.cpp

[LICM/AST] Check if the AliasAny set is removed from the tracker.

2019-09-12 18:09:47 +00:00

Analysis.cpp

[MustExec] Add a generic "must-be-executed-context" explorer

2019-08-23 15:17:27 +00:00

AssumptionCache.cpp

Fix: Actually erase remove the elements from AssumeHandles

2019-10-02 17:35:06 +00:00

BasicAliasAnalysis.cpp

Change TargetLibraryInfo analysis passes to always require Function

2019-09-07 03:09:36 +00:00

BlockFrequencyInfo.cpp

Add optional arg to profile count getters to filter

2019-04-24 19:51:16 +00:00

BlockFrequencyInfoImpl.cpp

Add optional arg to profile count getters to filter

2019-04-24 19:51:16 +00:00

BranchProbabilityInfo.cpp

[BPI] Adjust the probability for floating point unordered comparison

2019-09-10 17:25:11 +00:00

CallGraph.cpp

Revert "[CallGraph] Refine call graph for indirect calls with !callees metadata"

2019-08-16 10:59:18 +00:00

CallGraphSCCPass.cpp

[CallSite removal] Move the legacy PM, call graph, and some inliner

2019-04-19 05:59:42 +00:00

CallPrinter.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

CaptureTracking.cpp

[CaptureTracker] Let subclasses provide dereferenceability information

2019-08-19 21:56:38 +00:00

CFG.cpp

Recommit "[GVN] Preserve loop related analysis/canonical forms."

2019-07-31 09:27:54 +00:00

CFGPrinter.cpp

Rename F_{None,Text,Append} to OF_{None,Text,Append}. NFC

2019-08-05 05:43:48 +00:00

CFLAndersAliasAnalysis.cpp

Change TargetLibraryInfo analysis passes to always require Function

2019-09-07 03:09:36 +00:00

CFLGraph.h

[CFLGraph] Add support for unary fneg instruction.

2019-06-06 19:21:23 +00:00

CFLSteensAliasAnalysis.cpp

Change TargetLibraryInfo analysis passes to always require Function

2019-09-07 03:09:36 +00:00

CGSCCPassManager.cpp

Revert "[CallGraph] Refine call graph for indirect calls with !callees metadata"

2019-08-16 10:59:18 +00:00

CMakeLists.txt

[SVFS] Vector Function ABI demangling.

2019-09-19 17:47:32 +00:00

CmpInstAnalysis.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

CodeMetrics.cpp

Remove CallSite from the CodeMetrics analysis, moving it to the new

2019-02-11 09:03:32 +00:00

ConstantFolding.cpp

[ConstantFolding] Fold constant calls to log2()

2019-09-30 20:53:23 +00:00

CostModel.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

DDG.cpp

[DDG] Data Dependence Graph - Root Node

2019-10-01 19:32:42 +00:00

Delinearization.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

DemandedBits.cpp

[DemandedBits] Remove some redundancy in the work list

2019-03-03 14:50:01 +00:00

DependenceAnalysis.cpp

[llvm] Migrate llvm::make_unique to std::make_unique

2019-08-15 15:54:37 +00:00

DependenceGraphBuilder.cpp

[DDG] Data Dependence Graph - Root Node

2019-10-01 19:32:42 +00:00

DivergenceAnalysis.cpp

[DivergenceAnalysis] Add methods for querying divergence at use

2019-07-29 10:22:09 +00:00

DominanceFrontier.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

DomPrinter.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

DomTreeUpdater.cpp

[DTU] Refine the interface and logic of applyUpdates

2019-02-22 13:48:38 +00:00

EHPersonalities.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

GlobalsModRef.cpp

Change TargetLibraryInfo analysis passes to always require Function

2019-09-07 03:09:36 +00:00

GuardUtils.cpp

[WideableCond] Fix a nasty bug in detection of "explicit guards"

2019-04-02 16:51:43 +00:00

IndirectCallPromotionAnalysis.cpp

[llvm] Migrate llvm::make_unique to std::make_unique

2019-08-15 15:54:37 +00:00

InlineCost.cpp

[Inliner] Remove incorrect early exit during switch cost computation

2019-09-20 23:29:17 +00:00

InstCount.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

InstructionPrecedenceTracking.cpp

Make widenable condition transparent for MemoryWriteTracking

2019-02-14 11:10:29 +00:00

InstructionSimplify.cpp

[InstCombine] Simplify fma multiplication to nan for undef or nan operands.

2019-10-02 12:32:52 +00:00

Interval.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

IntervalPartition.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

IVDescriptors.cpp

[IR] allow fast-math-flags on phi of FP values (2nd try)

2019-09-25 14:35:02 +00:00

IVUsers.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

LazyBlockFrequencyInfo.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

LazyBranchProbabilityInfo.cpp

Change TargetLibraryInfo analysis passes to always require Function

2019-09-07 03:09:36 +00:00

LazyCallGraph.cpp

Change TargetLibraryInfo analysis passes to always require Function

2019-09-07 03:09:36 +00:00

LazyValueInfo.cpp

[LVI] Look through extractvalue of insertvalue

2019-09-07 12:03:59 +00:00

LegacyDivergenceAnalysis.cpp

Remove an unnecessary cast. NFC.

2019-10-02 08:56:33 +00:00

Lint.cpp

Change TargetLibraryInfo analysis passes to always require Function

2019-09-07 03:09:36 +00:00

LLVMBuild.txt

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

Loads.cpp

[LV] Support invariant addresses in speculation logic

2019-09-12 16:49:10 +00:00

LoopAccessAnalysis.cpp

LoopAccessAnalysis isConsecutiveAccess() - silence static analyzer dyn_cast<SCEVConstant> null dereference warning. NFCI.

2019-10-02 13:08:56 +00:00

LoopAnalysisManager.cpp

[LoopPassManager + MemorySSA] Only enable use of MemorySSA for LPMs known to preserve it.

2019-08-21 17:00:57 +00:00

LoopCacheAnalysis.cpp

[llvm] Migrate llvm::make_unique to std::make_unique

2019-08-15 15:54:37 +00:00

LoopInfo.cpp

Revert "[LoopInfo] Limit the iterations to check whether a loop has dedicated

2019-09-27 05:43:30 +00:00

LoopPass.cpp

ftime-trace: Trace loop passes

2019-05-31 10:14:04 +00:00

LoopUnrollAnalyzer.cpp

[InstSimplify] Rename SimplifyFPUnOp and SimplifyFPBinOp

2019-07-24 12:50:10 +00:00

MemDepPrinter.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

MemDerefPrinter.cpp

OpaquePtr: add Type parameter to Loads analysis API.

2019-07-09 11:35:35 +00:00

MemoryBuiltins.cpp

[Alignment][NFC] Remove unneeded llvm:: scoping on Align types

2019-09-27 12:54:21 +00:00

MemoryDependenceAnalysis.cpp

Change TargetLibraryInfo analysis passes to always require Function

2019-09-07 03:09:36 +00:00

MemoryLocation.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

MemorySSA.cpp

MemorySSA tryOptimizePhi - assert that we've found a DefChainEnd. NFCI.

2019-10-02 13:09:04 +00:00

MemorySSAUpdater.cpp

[MemorySSA] Update Phi creation when inserting a Def.

2019-10-02 18:42:33 +00:00

ModuleDebugInfoPrinter.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

ModuleSummaryAnalysis.cpp

IR. Change strip* family of functions to not look through aliases.

2019-08-22 19:56:14 +00:00

MustExecute.cpp

[MustExec] Add a generic "must-be-executed-context" explorer

2019-08-23 15:17:27 +00:00

ObjCARCAliasAnalysis.cpp

[AliasAnalysis] Second prototype to cache BasicAA / anyAA state.

2019-03-22 17:22:19 +00:00

ObjCARCAnalysisUtils.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

ObjCARCInstKind.cpp

[ObjC][ARC] Delete ObjC runtime calls on global variables annotated

2019-06-14 22:06:32 +00:00

OptimizationRemarkEmitter.cpp

[llvm] Migrate llvm::make_unique to std::make_unique

2019-08-15 15:54:37 +00:00

OrderedBasicBlock.cpp

Recommit "[DSE] Preserve basic block ordering using OrderedBasicBlock."

2019-03-29 14:10:24 +00:00

OrderedInstructions.cpp

[llvm] Migrate llvm::make_unique to std::make_unique

2019-08-15 15:54:37 +00:00

PHITransAddr.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

PhiValues.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

PostDominators.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

ProfileSummaryInfo.cpp

[PGO][PGSO] ProfileSummary changes.

2019-09-24 22:17:51 +00:00

PtrUseVisitor.cpp

SROA: Allow eliminating addrspacecasted allocas

2019-06-14 21:38:31 +00:00

README.txt

…

RegionInfo.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

RegionPass.cpp

[IR] Refactor attribute methods in Function class (NFC)

2019-04-04 22:40:06 +00:00

RegionPrinter.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

ScalarEvolution.cpp

Revert "[SCEV] add no wrap flag for SCEVAddExpr."

2019-09-30 07:46:52 +00:00

ScalarEvolutionAliasAnalysis.cpp

[AliasAnalysis] Second prototype to cache BasicAA / anyAA state.

2019-03-22 17:22:19 +00:00

ScalarEvolutionExpander.cpp

[PatternMatch] Make m_Br more flexible, add matchers for BB values.

2019-09-25 15:05:08 +00:00

ScalarEvolutionNormalization.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

ScopedNoAliasAA.cpp

[AliasAnalysis] Second prototype to cache BasicAA / anyAA state.

2019-03-22 17:22:19 +00:00

StackSafetyAnalysis.cpp

IR. Change strip* family of functions to not look through aliases.

2019-08-22 19:56:14 +00:00

StratifiedSets.h

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

SyncDependenceAnalysis.cpp

[SDA] Don't stop divergence propagation at the IPD.

2019-09-18 13:40:22 +00:00

SyntheticCountsUtils.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

TargetLibraryInfo.cpp

[TLI][AMDGPU] AMDPAL does not have library functions

2019-09-11 07:26:39 +00:00

TargetTransformInfo.cpp

[SLP] avoid reduction transform on patterns that the backend can load-combine

2019-10-05 18:03:58 +00:00

Trace.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

TypeBasedAliasAnalysis.cpp

[AliasAnalysis] Second prototype to cache BasicAA / anyAA state.

2019-03-22 17:22:19 +00:00

TypeMetadataUtils.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

ValueLattice.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

ValueLatticeUtils.cpp

Update the file headers across all of the LLVM projects in the monorepo

2019-01-19 08:50:56 +00:00

ValueTracking.cpp

Remove local shadow constant. NFCI.

2019-09-26 11:30:35 +00:00

VectorUtils.cpp

InterleavedAccessInfo - Don't dereference a dyn_cast result. NFCI.

2019-09-17 13:25:56 +00:00

VFABIDemangling.cpp

[SVFS] Vector Function ABI demangling.

2019-09-19 17:47:32 +00:00

README.txt

Analysis Opportunities:

//===---------------------------------------------------------------------===//

In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the
ScalarEvolution expression for %r is this:

  {1,+,3,+,2}<loop>

Outside the loop, this could be evaluated simply as (%n * %n), however
ScalarEvolution currently evaluates it as

  (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n))

In addition to being much more complicated, it involves i65 arithmetic,
which is very inefficient when expanded into code.

//===---------------------------------------------------------------------===//

In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll,

ScalarEvolution is forming this expression:

((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32)))

This could be folded to

(-1 * (trunc i64 undef to i32))

//===---------------------------------------------------------------------===//