12812 Commits

Author SHA1 Message Date
Florian Hahn
fbcf8a8cbb
[ConstraintElim] Add (UGE, var, 0) to unsigned system for new vars. (#76262)
The constraint system used for ConstraintElimination assumes all
varibles to be signed. This can cause missed optimization in the
unsigned system, due to missing the information that all variables are
unsigned (non-negative).

Variables can be marked as non-negative by adding Var >= 0 for all
variables. This is done for arguments on ConstraintInfo construction and
after adding new variables. This handles cases like the ones outlined in
https://discourse.llvm.org/t/why-does-llvm-not-perform-range-analysis-on-integer-values/74341

The original example shared above is now handled without this change,
but adding another variable means that instcombine won't be able to
simplify examples like https://godbolt.org/z/hTnra7zdY

Adding the extra variables comes with a slight compile-time increase
https://llvm-compile-time-tracker.com/compare.php?from=7568b36a2bc1a1e496ec29246966ffdfc3a8b87f&to=641a47f0acce7755e340447386013a2e086f03d9&stat=instructions:u

stage1-O3    stage1-ReleaseThinLTO    stage1-ReleaseLTO-g  stage1-O0-g
 +0.04%           +0.07%                   +0.05%           +0.02%
stage2-O3    stage2-O0-g    stage2-clang
  +0.05%         +0.05%        +0.05%

https://github.com/llvm/llvm-project/pull/76262
2023-12-23 15:53:48 +01:00
Kazu Hirata
03dc806b12 [Transforms] Use {DenseMap,SmallPtrSet}::contains (NFC) 2023-12-22 14:51:22 -08:00
Nikita Popov
54067c5fbe [SROA] Use memcpy if type size does not match store size
The original memcpy also copies the padding, so make sure that
this is still the case after splitting.

Fixes https://github.com/llvm/llvm-project/issues/64081.
2023-12-22 10:19:22 +01:00
Shan Huang
06a9c6738a
[CVP] Fix #76058: missing debug location in processSDiv function (#76118)
This PR fixes #76058.
2023-12-22 09:26:32 +01:00
boxu.zhang
d3ef867082
[LoopUnroll] Make UnrollMaxUpperBound to be overridable by target (#76029)
The UnrollMaxUpperBound should be target dependent, since different
chips provide different register set which brings different ability of
storing more temporary values of a program. So I add a MaxUpperBound
value in UnrollingPreference which can be override by targets. All uses
of UnrollMaxUpperBound are replaced with UP.MaxUpperBound.
 
The default value is still 8 and the command line argument
'--unroll-max-upperbound' takes final effect if provided.
2023-12-21 09:47:46 +01:00
Florian Hahn
18170d0f28
[ConstraintElim] Extend AND implication logic to support OR as well. (#76044)
Extend the logic check if an operand of an AND is implied by the other
to also support OR. This is done by checking if !op1 implies op2 or vice
versa.
2023-12-20 18:13:41 +01:00
Florian Hahn
7cf499c63b
[ConstraintElim] Check if second op implies first for And. (#75750)
Generalize checkAndSecondOpImpliedByFirst to also check if the second
operand implies the first.
2023-12-20 11:58:35 +01:00
Paul Walker
dea16ebd26
[LLVM][IR] Replace ConstantInt's specialisation of getType() with getIntegerType(). (#75217)
The specialisation will not be valid when ConstantInt gains native
support for vector types.

This is largely a mechanical change but with extra attention paid to constant
folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to
remove the need to call `getIntegerType()`.

Co-authored-by: Nikita Popov <github@npopov.com>
2023-12-18 11:58:42 +00:00
Paul Walker
930b5b52ff
[ConstantHoisting] Add a TTI hook to prevent hoisting. (#69004)
Code generation can sometimes simplify expensive operations when
an operand is constant.  An example of this is divides on AArch64
where they can be rewritten using a cheaper sequence of multiplies
and subtracts.  Doing this is often better than hoisting expensive
constants which are likely to be hoisted by MachineLICM anyway.
2023-12-13 17:20:36 +00:00
Kazu Hirata
f0ac6f92a7 [Transforms] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Scalar/ConstraintElimination.cpp:1112:13: error:
  unused function 'dumpUnpackedICmp' [-Werror,-Wunused-function]
2023-12-13 08:31:46 -08:00
Yingwei Zheng
26fbdff458
[ConstraintElim] Refactor checkCondition. NFC. (#75319)
This patch refactors `checkCondition` to handle min/max intrinsic calls
in #75306.
2023-12-13 23:20:01 +08:00
Jeremy Morse
4b64138ba4
[DebugInfo][RemoveDIs] Switch some insertion routines to use iterators (#75330)
As part of RemoveDIs, we need instruction insertion to be done with
iterators rather than instruction pointers, so that we can communicate
some debug-info facts about the position. This patch is an entirely
mechanical replacement of Instruction * with BasicBlock::iterator, plus
using insertBefore to insert some instructions because we don't have
iterator-taking constructors yet.

Sadly it's not NFC because it causes dbg.value intrinsics / their
DPValue equivalents to shift location.
2023-12-13 14:04:35 +00:00
Bruno De Fraine
3c9236c0bf [LoopVersioningLICM] add comment regarding dubious check (NFC) 2023-12-13 12:20:08 +01:00
Kazu Hirata
4b4dcb4988 [Transforms] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Scalar/SROA.cpp:4855:9: error: unused variable
  'NewAssign' [-Werror,-Wunused-variable]
2023-12-12 08:50:49 -08:00
Orlando Cazalet-Hyams
3d42557872
[RemoveDI] Handle DPValues in SROA (#74089)
Handle dbg.declares in SROA using DPValues.

In order to reduce duplication, the migrate-debug-info loop has been changed
to a generic lambda with some helper function overloads, which is called
for dbg.declares, dbg.assigns, and DPValues alike.

The tests will become "live" once #74090 lands (see for more info).
2023-12-12 15:49:24 +00:00
Nikita Popov
6ab663be8d [LVI] Require UndefAllowed argument to getConstantRangeAtUse() (NFC)
For the remaining uses set it to true, matching the current
behavior.
2023-12-12 12:45:49 +01:00
Nabeel Omer
1f71db78ce
[NFC][DSE] Fix typo comment in eliminateDeadStores (#75166)
> We are re-using tryToMergePartialOverlappingStores, which requires
DeadSI to dominate DeadSI.

Should be "DeadSI to dominate KillingSI" because that's what the check
is for.
2023-12-12 11:13:40 +00:00
Nikita Popov
967e84eee3 [CVP] Don't use undef range for LHS of div/rem transforms
Using it for RHS is fine, as undef is UB in that case.
2023-12-12 12:06:19 +01:00
Nikita Popov
84df226c4a [CVP] Don't use undef ranges in willNotOverflow() 2023-12-12 11:54:33 +01:00
Nikita Popov
4949fb7954 [CVP] Don't allow undef range when inferring nowrap flags 2023-12-12 11:26:00 +01:00
Orlando Cazalet-Hyams
2d9d9a1a55
[NFC] Change FindDbgDeclareUsers interface to match findDbgUsers/values (#73498)
This simplifies an upcoming patch to support the RemoveDIs project (tracking
variable locations without using intrinsics).

Next in this series is #73500.
2023-12-12 09:43:58 +00:00
Thomas Symalla
b010747a4f
[NFC] Fix typo in ConstraintElimination assertion. (#75151)
unsinged => unsigned

Co-authored-by: Thomas Symalla <tsymalla@amd.com>
2023-12-12 10:24:11 +01:00
Wang Pengcheng
6aa6ef73ec
[MemCpyOpt] Don't perform call slot opt if alloc type is scalable (#75027)
This fixes #75010.
2023-12-11 19:45:13 +08:00
Kazu Hirata
a16429365c [Transforms] Remove unnecessary includes (NFC) 2023-12-09 18:23:06 -08:00
Yingwei Zheng
312cb34da6
[Reassociate] Preserve NUW flags after expr tree rewriting (#72360)
Alive2: https://alive2.llvm.org/ce/z/38KiC_
2023-12-09 16:45:48 +08:00
XiangZhang
1d6a678591
[LoopUnroll] Make use of MaxTripCount for loops with "#pragma unroll" (#74703)
Fix loop unroll fail caused by branches folding.

For example:
SimplifyCFG foldloop branches then cause loop unroll failed for "#program unroll" loop.
```
#program unroll
for (int I = 0; I < ConstNum; ++I) { // folding "I < ConstNum" and "Cond2"
  if (Cond2) {
  break;
  }
  xxx loop body;
}
```

The pragma unroll metadata only takes effect if there is an exact trip
count, but not if there is an upper bound trip count. This patch make it
work with an upper bound trip count as well in shouldPragmaUnroll().

Loop unroll is important in stack nervous devices (e.g. GPU, and that is
why a lot of GPU code mark loop with "#program unroll").
It usually much simplify the address (offset) calculations in old
iterations, then we can do a lot of others optimizations, e.g, SROA, for
these simplifed address (escape alloca the whole aggregates).
2023-12-08 19:43:10 +08:00
alex-t
d8cd7fc1f4
AlignmentFromAssumptions should only track pointer operand users (#73370)
AlignmentFromAssumptions uses SCEV to update the load/store alignment.
It tracks down the use-def chains for the pointer which it takes from
the assumption cache until it reaches the load or store instruction. It
mistakenly adds to the worklist the users of the load result
irrespective of the fact that the load result has no connection with the
original pointer, moreover, it is not a pointer at all in most cases.
Thus the def-use chain contains irrelevant load users. When it is a
store instruction the algorithm attempts to adjust its alignment to the
alignment of the original pointer. The problem appears when the load and
store memory operand pointers belong to different address spaces and
possibly have different sizes.
The 4bf015c035e4e5b63c7222dfb15ff274a5ed905c was an attempt to address a
similar problem. The truncation or zero extension was added to make
pointers the same size. That looks strange to me because the zero
extension of the pointer is not legal. The test in the
4bf015c035e4e5b63c7222dfb15ff274a5ed905c does not work any longer as for
the explicit address spaces conversion the addrspacecast is generated.
Summarize:
1. For the alloca to global address spaces conversion addrspacecasts are
used, so the code added by the 4bf015c035e4e5b63c7222dfb15ff274a5ed905c
is no longer functional.
2. The AlignmentFromAssumptions algorithm should not add the load users
to the worklist as they have nothing to do with the original pointer.
3. Instead we only track users that are: GetelementPtrIns, PHINodes.
2023-12-07 17:35:35 +01:00
Craig Topper
533a0856bf Recommit "[Reassociate] Use disjoint flag to convert Or to Add. (#72772)"
Original message:
We still have to keep the noCommonBitsSet call to handle multiple reassociations in one pass. We'll lose the flag on the first reassociation.
2023-12-06 14:16:56 -08:00
Craig Topper
92fccea2e5 Revert "[Reassociate] Use disjoint flag to convert Or to Add. (#72772)"
This reverts commit 78964457cf1bafe57a54629fafbd081452a9e528.

Looks like I didn't rebase this correctly before commit
2023-12-06 13:50:21 -08:00
Craig Topper
78964457cf
[Reassociate] Use disjoint flag to convert Or to Add. (#72772)
We still have to keep the noCommonBitsSet call to handle multiple reassociations in one pass. We'll lose the flag on the first reassociation.
2023-12-06 13:48:15 -08:00
Jeremy Morse
384f916ea8 Reapply 34cdc913214fd (#74455), call-site-splitting for RemoveDIs
Original commit message below; asan complained about this commit because it
transpires that the final comparison with CurrentI is in fact a comparison
of a pointer that has been freed. This seems to work fine most of the time,
but using the iterator for such an instruction causes the freed instruction
memory to be accessed, causing a use-after-free. The fix is to perform the
comparison as an instruction, not an iterator.

[NFC][DebugInfo][RemoveDIs] Use iterators to insert in callsite-splitting (#74455)

This patch gets call site splitting to use iterators for insertion
rather than instruction pointers. When we switch on non-instr debug-info
this becomes significant, as the iterators are going to signal whether
or not a position is before or after debug-info.

NFC as this isn't going to affect the output of any existing test.
2023-12-06 16:52:10 +00:00
Nikita Popov
ea4ce16da2
[ConstraintElim] Use disjoint flag for decomposition (#74478)
Use the or disjoint flag for decomposing or into add, which will handle
cases that haveNoCommonBitsSet() may not be able to reinfer (e.g.
because they require context-sensitive facts, which the call here does
not use.)
2023-12-06 10:36:55 +01:00
Jeremy Morse
989e8f9d51 Revert "[NFC][DebugInfo][RemoveDIs] Use iterators to insert in callsite-splitting (#74455)"
This reverts commit 34cdc913214fd9561b6ec8d535bd3d0313772cb5.

Two buildbots say this is bad:

  https://lab.llvm.org/buildbot/#/builders/265/builds/861
  https://lab.llvm.org/buildbot/#/builders/168/builds/17272
2023-12-05 17:32:23 +00:00
Jeremy Morse
34cdc91321
[NFC][DebugInfo][RemoveDIs] Use iterators to insert in callsite-splitting (#74455)
This patch gets call site splitting to use iterators for insertion
rather than instruction pointers. When we switch on non-instr debug-info
this becomes significant, as the iterators are going to signal whether
or not a position is before or after debug-info.

NFC as this isn't going to affect the output of any existing test.
2023-12-05 16:24:26 +00:00
Joshua Cao
72ffaa9156
[IR][TRE] Support associative intrinsics (#74226)
There is support for intrinsics in Instruction::isCommunative, but there
is no equivalent implementation for isAssociative. This patch builds
support for associative intrinsics with TRE as an application. TRE can
now have associative intrinsics as an accumulator. For example:
```
struct Node {
  Node *next;
  unsigned val;
}

unsigned maxval(struct Node *n) {
  if (!n) return 0;
  return std::max(n->val, maxval(n->next));
}
```
Can be transformed into:
```
unsigned maxval(struct Node *n) {
  struct Node *head = n;
  unsigned max = 0; // Identity of unsigned std::max
  while (true) {
    if (!head) return max;
    max = std::max(max, head->val);
    head = head->next;
  }
  return max;
}
```
This example results in about 5x speedup in local runs.

We conservatively only consider min/max and as associative for this
patch to limit testing scope. There are probably other intrinsics that
could be considered associative. There are a few consumers of
isAssociative() that could be impacted. Testing has only required to
Reassociate pass be updated.
2023-12-04 22:35:59 -08:00
Nikita Popov
4275da2278 [ValueTracking] Add isGuaranteedNotToBeUndef() variant (NFC)
We have a bunch of places where we have to guard against undef
to avoid multi-use issues, but would be fine with poison. Use a
different function for these to make it clear, and to indicate that
this check can be removed once we no longer support undef. I've
replaced some of the obvious cases, but there's probably more.

For now, the implementation is the same as UndefOrPoison, it just
has a more precise name.
2023-12-04 12:04:41 +01:00
Kazu Hirata
3406a2bc5f [llvm] Stop including tuple (NFC)
Identified with clangd.
2023-12-03 23:01:26 -08:00
Kazu Hirata
92c2529ccd [llvm] Stop including vector (NFC)
Identified with clangd.
2023-12-03 22:32:21 -08:00
Joshua Cao
bd382032f6
[BBUtils][NFC] Delete SplitLandingPadPredecessors with DT (#73406)
Function is marked for deprecation. There is only one consumer which is
converted to use DomTreeUpdater.
2023-12-02 11:33:43 -08:00
Jeremy Morse
4424903156
[DebugInfo][RemoveDIs] Handle DPValues at remaining dbg.value using sites (#73788)
This patch updates the last few places in LLVM using findDbgValues that
don't also collect and handle DPValue objects. This largely involves
instcombine and mem2reg changes, and are largely mechanical, calling
existing utilities on collections of DPValues instead of just
DbgValuesInsts.

A variety of tests have had RemoveDIs RUN lines added to them to cover
these behaviours. We have some technical debt of the instcombine sinking
code for DPValues not being implemented yet, so I've left FIXME stubs
indicating that we intend to cover tests with RemoveDIs but haven't yet.
2023-11-30 16:30:32 +00:00
Jeremy Morse
5ba5211a47
[DebugInfo][RemoveDIs] Have LICM insert at iterator positions (#73671)
Because we're storing some extra debug-info information in the iterator
class, we need to insert new LICM-created stores using such iterators.
Switch LICM to storing iterators instead of pointers when it promotes
variables in loops, add a test for the desired behaviour, and enable
RemoveDIs instrumentation on a variety of other LICM tests for good
measure.

(This would appear to be the only pass in LLVM that needs to store
iterators on the heap).
2023-11-30 13:00:26 +00:00
Jeremy Morse
2425e2940e
[DebugInfo][RemoveDIs] Have getInsertionPtAfterDef return an iterator (#73149)
Part of the "RemoveDIs" project to remove debug intrinsics requires
passing block-positions around in iterators rather than as instruction
pointers, allowing some debug-info to reside in BasicBlock::iterator.
This means getInsertionPointAfterDef has to return an iterator, and as
it can return no-instruction that means returning an optional iterator.

This patch changes the signature for getInsertionPtAfterDef and then
patches up the various places that use it to handle the different type.
This would overall be an NFC patch, however in
InstCombinerImpl::freezeOtherUses I've started skipping any debug
intrinsics at the returned insert-position. This should not have any
_meaningful_ effect on the compiler output: at worst it means variable
assignments that are skipped will now cover the freeze instruction and
anything inserted before it, which should be inconsequential.

Sadly: this makes the function signature ugly. This is probably the
ugliest piece of fallout for the "RemoveDIs" work, but it serves the
overall purpose of improving compile times and not allowing `-g` to
affect compiler output, so should be worthwhile in the end.
2023-11-30 12:19:57 +00:00
Simon Pilgrim
3246a32d3f Fix MSVC "not all control paths return a value" warning. NFC. 2023-11-30 10:07:01 +00:00
Nikita Popov
6d2dfd37bd [LPM] Set gen_crash_diag=false for non-MSSA pass in MSSA pipeline
When a loop pass that does not preserve MSSA is run as part of a
loop-mssa pipeline, this is user error and we should not ask for
a bug report.

Fixes https://github.com/llvm/llvm-project/issues/73554.
2023-11-30 10:21:35 +01:00
Philip Reames
e947f95337 [LSR][TTI][RISCV] Enable terminator folding for RISC-V
If looking for a miscompile revert candidate, look here!

The transform being enabled prefers comparing to a loop invariant
exit value for a secondary IV over using an otherwise dead primary
IV.  This increases register pressure (by requiring the exit value
to be live through the loop), but reduces the number of instructions
within the loop by one.

On RISC-V which has a large number of scalar registers, this is
generally a profitable transform.  We loose the ability to use a beqz
on what is typically a count down IV, and pay the cost of computing
the exit value on the secondary IV in the loop preheader, but save
an add or sub in the loop body.  For anything except an extremely
short running loop, or one with extreme register pressure, this is
profitable.  On spec2017, we see a 0.42% geomean improvement in
dynamic icount, with no individual workload regressing by more than
0.25%.

Code size wise, we trade a (possibly compressible) beqz and a (possibly
compressible) addi for a uncompressible beq.  We also add instructions
in the preheader.  Net result is a slight regression overall, but
neutral or better inside the loop.

Previous versions of this transform had numerous cornercase correctness
bugs.  All of them ones I can spot by inspection have been fixed, and I
have run this through all of spec2017, but there may be further issues
lurking.  Adding uses to an IV is a fraught thing to do given poison
semantics, so this transform is somewhat inherently risky.

This patch is a reworked version of D134893 by @eop.  That patch has
been abandoned since May, so I picked it up, reworked it a bit, and
am landing it.
2023-11-29 12:04:06 -08:00
Nikita Popov
4b3ea337ad [ValueTracking] Convert isKnownNonNegative() to use SimplifyQuery (NFC) 2023-11-29 10:52:52 +01:00
Florian Hahn
d045e23c2d
[ConstraintElim] Refactor GEP offset collection.
Move GEP offset collection to separate helper function and collect
variable and constant offsets in OffsetResult. For now, this only
supports 1 VariableOffset, but the new code structure can be more easily
extended to handle more offsets in the future.

The refactoring drops the check that the VariableOffset >= -1 * constant
offset. This is not needed to check whether the constraint is
monotonically increasing. The constant factors can be ignored, the
constraint will be monotonically increasing if all variables are
positive.

See https://alive2.llvm.org/ce/z/ah2uSQ,
    https://alive2.llvm.org/ce/z/NCADNZ
2023-11-27 09:05:58 +00:00
Aiden Grossman
5eb85c052e
[JumpThreading] Remove LVI printer flag (#73426)
This patch removes the -print-lvi-after-jump-threading flag now that we
can print everything in the LVI cache using the print<lazy-value-info>
pass.
2023-11-27 00:19:23 -08:00
Nikita Popov
2b646b5989
[CVP] Don't try to fold load/store operands to constant (#73338)
CVP currently tries to fold load/store pointer operands to constants
using LVI. If there is a dominating condition of the form `icmp eq ptr
%p, @g`, then `%p` will be replaced with `@g`.

LVI is geared towards range-based optimizations, and is *very*
inefficient at handling simple pointer equality conditions. We have
other passes that can handle this optimization in a more efficient way,
such as IPSCCP and GVN.

Removing this optimization gives a geomean 0.4-1.2% compile-time
improvement depending on configuration. At the same time, there
is no impact on codegen.
2023-11-27 09:17:03 +01:00
Youngsuk Kim
c419f3e10e [llvm][SROA] Replace calls to Type::getPointerTo (NFC)
NFC cleanup towards removing method Type::getPointerTo.

* Remove unnecessary call to Type::getPointerTo
* Replace call to Type::getPointerTo with IRB.getPtrTy
2023-11-26 20:17:06 -06:00