35023 Commits

Author SHA1 Message Date
Noah Goldstein
9ef829097b [InstCombine] Fix buggy transform in foldNestedSelects; PR 71330
The bug is that `IsAndVariant` is used to assume which arm in the
select the output `SelInner` should be placed but match the inner
select condition with `m_c_LogicalOp`. With fully simplified ops, this
works fine, but its possible if the select condition is not
simplified, for it match both `LogicalAnd` and `LogicalOr` i.e `select
true, true, false`.

In PR71330 for example, the issue occurs in the following IR:
```
define i32 @bad() {
  %..i.i = select i1 false, i32 0, i32 3
  %brmerge = select i1 true, i1 true, i1 false
  %not.cmp.i.i.not = xor i1 true, true
  %.mux = zext i1 %not.cmp.i.i.not to i32
  %retval.0.i.i = select i1 %brmerge, i32 %.mux, i32 %..i.i
  ret i32 %retval.0.i.i
}
```

When simplifying:
```
%retval.0.i.i = select i1 %brmerge, i32 %.mux, i32 %..i.i
```

We end up matching `%brmerge` as `LogicalAnd` for `IsAndVariant`, but
the inner select (`%..i.i`) condition which is `false` with
`LogicalOr`.

Closes #71489
2023-11-09 16:36:49 -06:00
Nikita Popov
ed86e740ef Revert "[SROA] Limit the number of allowed slices when trying to split allocas"
This reverts commit e13e808283f7fd9e873ae922dd1ef61aeaa0eb4a.

This causes performance regressions on GPU targets, see
https://github.com/llvm/llvm-project/issues/69785. Revert the
change for now.
2023-11-09 16:38:52 +01:00
Nikita Popov
369c9b791b
[MemCpyOpt] Require writable object during call slot optimization (#71542)
Call slot optimization may introduce writes to the destination object
that occur earlier than in the original function. We currently already
check that that the destination is dereferenceable and aligned, but we
do not make sure that it is writable. As such, we might introduce a
write to read-only memory, or introduce a data race.

Fix this by checking that the object is writable. For arguments, this is
indicated by the new writable attribute. Tests using
sret/dereferenceable are updated to use it.
2023-11-09 15:55:44 +01:00
Nikita Popov
1b1c81772f [InstCombine] Drop poison flags in simplifyAssocCastAssoc()
The nneg flag on zext may no longer hold after the reassociation.
2023-11-09 11:58:02 +01:00
Chuanqi Xu
b7b5907b56
[Coroutines] Introduce [[clang::coro_only_destroy_when_complete]] (#71014)
Close https://github.com/llvm/llvm-project/issues/56980.

This patch tries to introduce a light-weight optimization attribute for
coroutines which are guaranteed to only be destroyed after it reached
the final suspend.

The rationale behind the patch is simple. See the example:

```C++
A foo() {
  dtor d;
  co_await something();
  dtor d1;
  co_await something();
  dtor d2;
  co_return 43;
}
```

Generally the generated .destroy function may be:

```C++
void foo.destroy(foo.Frame *frame) {
  switch(frame->suspend_index()) {
    case 1:
      frame->d.~dtor();
      break;
    case 2:
      frame->d.~dtor();
      frame->d1.~dtor();
      break;
    case 3:
      frame->d.~dtor();
      frame->d1.~dtor();
      frame->d2.~dtor();
      break;
    default: // coroutine completed or haven't started
      break;
  }

  frame->promise.~promise_type();
  delete frame;
}
```

Since the compiler need to be ready for all the cases that the coroutine
may be destroyed in a valid state.

However, from the user's perspective, we can understand that certain
coroutine types may only be destroyed after it reached to the final
suspend point. And we need a method to teach the compiler about this.
Then this is the patch. After the compiler recognized that the
coroutines can only be destroyed after complete, it can optimize the
above example to:

```C++
void foo.destroy(foo.Frame *frame) {
  frame->promise.~promise_type();
  delete frame;
}
```

I spent a lot of time experimenting and experiencing this in the
downstream. The numbers are really good. In a real-world coroutine-heavy
workload, the size of the build dir (including .o files) reduces 14%.
And the size of final libraries (excluding the .o files) reduces 8% in
Debug mode and 1% in Release mode.
2023-11-09 14:42:07 +08:00
Allen
7ec86f4d68
[SimplifyCFG] Fix the compile crash for invalid upper bound value (#71351)
Fix the crash for the last land PR70542.

Note:
For '%add = add nuw i32 %x, 1', we can only infer the LowerBound is 1,
but the UpperBound is wrapped to 0 in computeConstantRange.
so we can't assume the UpperBound is valid bound when its value is 0.

Fix https://github.com/llvm/llvm-project/issues/71329.
Reviewed By: zmodem, nikic
2023-11-09 12:33:24 +08:00
Anna Thomas
29f03bf48d [GuardWidening] Require analyses only if necessary
We need to request analyses needed for guard widening only if there are
guards/widenable conditions.
2023-11-08 11:54:10 -05:00
Jeremy Morse
f1b0a54451 Reapply 7d77bbef4ad92, adding new debug-info classes
This reverts commit 957efa4ce4f0391147cec62746e997226ee2b836.

Original commit message below -- in this follow up, I've shifted
un-necessary inclusions of DebugProgramInstruction.h into being forward
declarations (fixes clang-compile time I hope), and a memory leak in the
DebugInfoTest.cpp IR unittests.

I also tracked a compile-time regression in D154080, more explanation
there, but the result of which is hiding some of the changes behind the
EXPERIMENTAL_DEBUGINFO_ITERATORS compile-time flag. This is tested by the
"new-debug-iterators" buildbot.

[DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info

This patch adds a variety of classes needed to record variable location
debug-info without using the existing intrinsic approach, see the rationale
at [0].

The two added files and corresponding unit tests are the majority of the
plumbing required for this, but at this point isn't accessible from the
rest of LLVM as we need to stage it into the repo gently. An overview is
that classes are added for recording variable information attached to Real
(TM) instructions, in the form of DPValues and DPMarker objects. The
metadata-uses of DPValues is plumbed into the metadata hierachy, and a
field added to class Instruction, which are all stimulated in the unit
tests. The next few patches in this series add utilities to convert to/from
this new debug-info format and add instruction/block utilities to have
debug-info automatically updated in the background when various operations
occur.

This patch was reviewed in Phab in D153990 and D154080, I've squashed them
together into this commit as there are dependencies between the two
patches, and there's little profit in landing them separately.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
2023-11-08 16:42:35 +00:00
Nikita Popov
2c61f9cab5 [CVP] Fix use after scope
Store the result of ConstantRange::sdiv() in a variable, as
getSingleElement() will return a pointer to the APInt it contains.
2023-11-08 16:53:47 +01:00
Florian Hahn
26ab444e88
[ConstraintElim] Make sure add-rec is for the current loop.
Update addInfoForInductions to also check if the add-rec is for the
current loop. Otherwise we might add incorrect facts or crash.

Fixes a miscompile & crash introduced by 00396e6a1a0b.
2023-11-08 14:07:28 +00:00
Nikita Popov
d687057de8 [CVP] Try to fold sdiv to constant
If we know that the sdiv result is a single constant, directly
use that instead of performing narrowing.

Fixes https://github.com/llvm/llvm-project/issues/71659.
2023-11-08 14:49:24 +01:00
Markos Horro
9d2903c8e5
[IndVars] Add check of loop invariant for trunc instructions (#71072)
The same idea as in 34d380e1f63a7e2cdb9ab1e6498f727fcd710a14, but considering
truncation instructions.
Improvement for #59633.
2023-11-08 11:16:23 +00:00
Nikita Popov
567c02a80e [InstCombine] Remove inttoptr/ptrtoint handling from indexed compare fold
Looking through inttoptr / ptrtoint intermixed with GEPs is very
questionable from a provenance perspective. We also don't seem to
have any test coverage that shows this is useful (apart from one
test I added to guard against a crash).
2023-11-08 11:13:57 +01:00
Nikita Popov
5918f62301
[InstCombine] Infer zext nneg flag (#71534)
Use KnownBits to infer the nneg flag on zext instructions.

Currently we only set nneg when converting sext -> zext, but don't set
it when we have a zext in the first place. If we want to use it in
optimizations, we should make sure the flag inference is consistent.
2023-11-08 09:34:40 +01:00
Vladislav Dzhidzhoev
6beddd668a Revert "[DebugMetadata][DwarfDebug] Support function-local types in lexical block scopes (4/7)"
This caused assert:
llvm/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp:110:
void llvm::DwarfFile::addScopeVariable(LexicalScope *, DbgVariable *):
Assertion `Ret.second' failed.

See comments https://reviews.llvm.org/D144006#4656350.

This reverts commit 3b449bd46a11a55a40cbc0016a99b202fa05248e.
2023-11-08 00:29:24 +01:00
Antonio Frighetto
7d39838948 [InstCombine] Favour CreateZExtOrTrunc in narrowFunnelShift (NFC)
Use `CreateZExtOrTrunc`, reduce test and regenerate checks.
2023-11-07 22:48:14 +01:00
Paulo Matos
7b9d73c2f9
[NFC] Remove Type::getInt8PtrTy (#71029)
Replace this with PointerType::getUnqual().
Followup to the opaque pointer transition. Fixes an in-code TODO item.
2023-11-07 17:26:26 +01:00
Philip Reames
551c280cfd
[indvars] Always fallback to truncation if AddRec widening fails (#70967)
The current code structure results in cases where if a) we can't clone
the IV user (because it's not in our whitelist) or b) can't prove the
SCEV expressions are identical, we'd sometimes leave both the original
unwiddened IV and the partially widdened IV in code. Instead, just
truncate thw wide IV to the use - same as what we'd do if we couldn't
find an addrec to start with.

Noticed this while playing with changing how we produce addrecs. The
current structure results in a very tight interlock between SCEVs
internal capabilities and indvars code.
2023-11-07 07:49:39 -08:00
Antonio Frighetto
caa124b58d [InstCombine] Zero-extend shift amounts in narrow funnel shift ops
An issue arose when handling shift amounts while performing
narrowed funnel shifts simplification. Specifically, shift
amounts were incorrectly truncated when their type was
narrower than the target bit width. This has been addressed
by zero-extending `ShAmt` in such cases.

Fixes: https://github.com/llvm/llvm-project/issues/71463.

Proof: https://alive2.llvm.org/ce/z/5draKz.
2023-11-07 14:15:32 +01:00
Nikita Popov
6e56c35d19 [SpeculativeExecution] Add only-if-divergent-target pass option
The optimization pipeline enables this option, but it was not
preserved in -print-pipeline-passes output.
2023-11-07 11:49:37 +01:00
Hans Wennborg
05ed92127c Revert "Reland [SimplifyCFG] Delete the unnecessary range check for small mask operation (#70542)"
This caused https://github.com/llvm/llvm-project/issues/71329

> Fix the compile crash when the default result has no result  for
> https://github.com/llvm/llvm-project/pull/65835
>
> Fixes https://github.com/llvm/llvm-project/issues/65120
> Reviewed By: zmodem, nikic

This reverts commit 7c4180a36a905b7ed46c09df77af1b65e356f92a.
2023-11-07 10:53:22 +01:00
Nikita Popov
e360a16fee
[GlobalOpt] Cache whether CC is changeable (#71381)
The hasAddressTaken() call in hasOnlyColdCalls() has quadratic
complexity if there are many cold calls to a function: We're going to
visit each call of the function, and then for each of them iterate all
the users of the function.

We've recently encountered a case where GlobalOpt spends more than an
hour in these hasAddressTaken() checks when full LTO is used.

Avoid this by moving the hasAddressTaken() check into hasChangeableCC()
and caching its result, so it is only computed once per function.
2023-11-07 10:36:45 +01:00
Allen
a0cd6265bc
[InstCombine] Split the FMul with reassoc into a helper function, NFC (#71493)
The reassoc check is really hard to find because the handle branch it
too large, so spilt it into a helper function.
2023-11-07 15:30:56 +08:00
Philip Reames
23099ac239
Add known and demanded bits support for zext nneg (#70858)
zext nneg was recently added to the IR in #67982.   This patch teaches
demanded bits and known bits about the semantics of the instruction, and
adds a couple of test cases to illustrate basic functionality.
2023-11-06 18:47:56 -08:00
LiqinWeng
5d3d08463d
[InstCombinePHI] Remove dead PHI on UnaryOperator (#71386)
This patch mainly solves the problem of dead PHI on UnaryOperator
2023-11-07 09:45:33 +08:00
Tom Stellard
2400c54c37
[Vectorize] Remove Transforms/Vectorize.h (#71294)
The only thing in this file is a declaration for
createLoadStoreVectorizerPass(), and this function is already declared
in LoadStoreVectorizer.h.
2023-11-06 14:04:22 -08:00
Simon Pilgrim
3ca4fe80d4 [Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC.
startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)
2023-11-06 16:50:18 +00:00
Florian Hahn
a002271972
[VPlan] Add VPValue::replaceUsesWithIf (NFCI).
Add replaceUsesWithIf helper and use it in a few places.
2023-11-06 16:08:22 +00:00
Nikita Popov
c4c0ac10f1 [IPO] Remove unnecessary bitcasts (NFC) 2023-11-06 16:49:45 +01:00
Alexey Bataev
ac254fc055 [SLP]Improve tryToGatherExtractElements by using per-register analysis.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Differential Revision: https://reviews.llvm.org/D148855
2023-11-06 07:29:27 -08:00
Nikita Popov
be3cef0b2a [LibCallsShrinkWrap] Avoid use of ConstantExpr::getFPExtend() (NFC)
Use the constant folding API instead.
2023-11-06 15:38:42 +01:00
Nikita Popov
16a595e398 [Attributor] Avoid use of ConstantExpr::getFPTrunc() (NFC)
Use the constant folding API instead. For simplificity I'm using
the DL-independent API here.
2023-11-06 15:27:01 +01:00
Nikita Popov
25af06fd7a [InstCombine] Avoid use of FP cast constant expressions (NFC)
Use the constant folding API instead. As we're working on plain
ConstantFP, this should always succeed.
2023-11-06 15:22:33 +01:00
Nikita Popov
abc27bd31f [InstCombine] Avoid some FP cast constant expressions (NFCI)
Instead of doing fptoxi and xitofp casts to check for round-trip,
directly check the IsExact flag on the convertToInteger() API.
2023-11-06 14:42:42 +01:00
Hans Wennborg
046c57e705 Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis."
This causes asserts:

  llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10082:
  Value *llvm::slpvectorizer::BoUpSLP::ShuffleInstructionBuilder::adjustExtracts(
    const TreeEntry *, MutableArrayRef<int>, unsigned int, bool &):
  Assertion `Part == 0 && "Expected firs part."' failed.

See comment on the code review.

> Currently tryToGatherExtractElements function analyzes the whole vector,
> regrdless number of actual registers, used in this vector. It may
> prevent some optimizations, because per-register analysis may allow to
> simplify the final code by reusing more already emitted vectors and
> better shuffles.
>
> Differential Revision: https://reviews.llvm.org/D148855

This reverts commit 9dfdbd788707edc8c39eb2bff16004aba1f3586b.
2023-11-06 13:56:42 +01:00
Dominik Adamski
2cce0f6c57
[OpenMP][OMPIRBuilder] Add support to omp target parallel (#67000)
Added support for LLVM IR code generation which is used for handling omp
target parallel code. The call for __kmpc_parallel_51 is generated and
the parallel region is outlined to separate function.

The proper setup of kmpc_target_init mode is not included in the commit.
It is assumed that the SPMD mode for target initialization is properly
set by other codegen functions.
2023-11-06 11:44:00 +01:00
Noah Goldstein
ad9147399f [InstCombine] Improve eq/ne by parts to handle ult/ugt equality pattern.
(icmp eq/ne (lshr x, C), (lshr y, C) gets optimized to `(icmp
ult/uge (xor x, y), (1 << C)`. This can cause the current equal by
parts detection to miss the high-bits as it may get optimized to the
new pattern.

This commit adds support for detecting / combining the ult/ugt
pattern.

Closes #69884
2023-11-04 19:00:28 -05:00
Teresa Johnson
87f5e22987
[MemProf] Tolerate missing leaf debug frames (#71233)
Loosen up the matching so that a missing leaf debug frame in the profile
does not prevent matching an allocation context if we can match further
up the inlined call context. This relies on the pre-inliner, which was
already the default when performing normal PGO feedback along with the
MemProf feedback, but to ensure matching is not affected by the presence
of PGO, enable the pre-inliner for MemProf feedback as well.
2023-11-03 21:01:07 -07:00
Nikita Popov
a682a9cfd0 Revert "Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235)"
This reverts commit 19b5495b653a00da7a250f48b4f739fcf2bbe82f.

PR landed without approval, with severe quality issues.
2023-11-03 21:15:46 +01:00
XChy
c880fdc0f0
[DFAJumpThreading] Remove incoming StartBlock from all phis when unfolding select (#71082)
Fixes #65222.
When unfolding select into diamond-like control flow, we need to remove
the StartBlock from all phis in EndBlock.
2023-11-04 03:32:20 +08:00
Philip Reames
5adf6ab7ff Revert "[IndVars] Generate zext nneg when locally obvious"
This reverts commit a6c8e27b3a052913a15a13ee0d4ac466c5ab3f92.  It appears likely to have caused https://lab.llvm.org/buildbot/#/builders/57/builds/30988.
2023-11-03 11:19:14 -07:00
Manman Ren
19b5495b65
Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235)
See RFC for details:
https://discourse.llvm.org/t/rfc-for-moving-swift-s-merge-function-pass-to-llvm/73778

We will need to refactor extension to FunctionComparator/FunctionHash to
StructuralHash. This patch adds a new pass which is ported from Swift,
and will need to discuss on how to migrate Swift’s pass over after we
land this in llvm.

Create this PR to get some early review on the patch.

---------

Co-authored-by: Manman Ren <mren@meta.com>
2023-11-03 11:13:58 -07:00
Philip Reames
7c93452e17 [indvars] Restructure getExtendedOperandRecurrence [nfc]
As suggested during review of https://github.com/llvm/llvm-project/pull/70990.
2023-11-03 10:50:57 -07:00
Alexey Bataev
9dfdbd7887 [SLP]Improve tryToGatherExtractElements by using per-register analysis.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Differential Revision: https://reviews.llvm.org/D148855
2023-11-03 10:43:58 -07:00
Johannes Doerfert
d3e7a48cbd [OpenMP][NFC] Remove a no-op function 2023-11-03 10:28:36 -07:00
Philip Reames
1ffea97ffd
[indvars] Support known positive extends in getExtendedOperandRecurrence (#70990)
IndVars has the existing notion of a narrow definition which is known to
positive and thus both sign and zero extension kinds are actually the
same operations. There's existing logic for forming a SCEV based on the
extension kind and the no-wrap flags. This change extends that logic to
form the opposite extension kind for a positive def if doing so is
allowed by the flags. Note that we already do something analogous for
the getWideRecurrence case as well.
2023-11-03 10:21:30 -07:00
Ellis Hoag
890335bb28
[InstrProf] Do not block functions from PGOUse (#71106)
The `skipPGO()` function was added in https://reviews.llvm.org/D137184.
Unfortunately, it also blocked functions from being annotated (PGOUse),
which I believe will cause confusion to users if a function has a
profile but it is not PGO'd.

The docs for `noprofile` and `skipprofile` only claim to block
instrumentation, not PGO optimization:
https://llvm.org/docs/LangRef.html
2023-11-03 09:41:26 -07:00
Philip Reames
a6c8e27b3a [IndVars] Generate zext nneg when locally obvious
zext nneg was recently added to the IR in #67982.  This patch teaches
SimplifyIndVars to prefer zext nneg over *both* sext and plain zext,
when a local SCEV query indicates the source is non-negative.

The choice to prefer zext nneg over sext looks slightly aggressive
here, but probably isn't so much in practice.  For cases where we'd
"remember" the range fact, instcombine would convert the sext into
a zext nneg anyways.  The only cases where this produces a different
result overall are when SCEV knows a non-local fact, and it doesn't
get materialized into the IR.  Those are exactly the cases where
using zext nneg are most useful.  We do run the risk of e.g. a
missing combine - since we haven't updated most of them yet - but
that seems like a manageable risk.

Note that there are much deeper algorithmic changes we could make
to this code to exploit zext nneg, but this seemed like a reasonable
and low risk starting point.
2023-11-03 09:20:59 -07:00
Nikita Popov
5c3beb7b1e [MemCpyOpt] Handle memcpy marked as memory(none)
Fixes #71183.
2023-11-03 15:20:21 +01:00
Nikita Popov
03110ddeb2 [IR] Remove ZExtOperator (NFC)
Now that zext constant expressions are no longer supported,
ZExtInst should be used instead.
2023-11-03 14:52:59 +01:00