18 Commits

Author SHA1 Message Date
Nick Anderson
f1ec0d12bb
Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#77182)
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #75380

Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
2024-01-09 13:32:59 +07:00
Simon Pilgrim
7648371c25 Revert 4d7c5ad58467502fcbc433591edff40d8a4d697d "[NewPM] Update CodeGenPreparePass reference in CodeGenPassBuilder (#77054)"
Revert e0c554ad87d18dcbfcb9b6485d0da800ae1338d1 "Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)"

Revert #75380 and #77054 as they were breaking EXPENSIVE_CHECKS buildbots: https://lab.llvm.org/buildbot/#/builders/104
2024-01-05 12:28:10 +00:00
Nick Anderson
e0c554ad87
Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #64560

Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
2024-01-05 13:47:56 +07:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Matt Arsenault
d9e51e7552 CodeGenPrepare: Convert most tests to opaque pointers
NVPTX/dont-introduce-addrspacecast.ll required manually removing a check for
a bitcast.

AArch64/combine-address-mode.ll required rerunning update_test_checks

Mips required some manual updates due to a CHECK-NEXT coming after a
deleted bitcast.

ARM/sink-addrmode.ll needed one small manual fix.

Excludes one X86 function which needs more attention.
2022-11-28 09:21:59 -05:00
Tim Besard
a323dfc015 Don't sink ptrtoint/inttoptr sequences into non-noop addrspacecasts.
In https://reviews.llvm.org/D30114, support for mismatching address
spaces was introduced to CodeGenPrepare's optimizeMemoryInst, using
addrspacecast as it was argued that only no-op addrspacecasts would be
considered when constructing the address mode. However, by doing
inttoptr/ptrtoint, it's possible to get CGP to emit an addrspace
that's not actually no-op, introducing a miscompilation:

define void @kernel(i8* %julia_ptr) {
  %intptr = ptrtoint i8* %julia_ptr to i64
  %ptr = inttoptr i64 %intptr to i32 addrspace(3)*

  br label %end
end:

  store atomic i32 1, i32 addrspace(3)* %ptr unordered, align 4
  ret void
}

Gets compiled to:

define void @kernel(i8* %julia_ptr) {
end:
  %0 = addrspacecast i8* %julia_ptr to i32 addrspace(3)*
  store atomic i32 1, i32 addrspace(3)* %0 unordered, align 4
  ret void
}

In the case of NVPTX, this introduces a cvta.to.shared, whereas
leaving out the %end block and branch doesn't trigger this
optimization. This results in illegal memory accesses as seen in
https://github.com/JuliaGPU/CUDA.jl/issues/558

In this change, I introduced a check before doing the pointer cast
that verifies address spaces are the same. If not, it emits a
ptrtoint/inttoptr combination to get a no-op cast between address
spaces. I decided against disallowing ptrtoint/inttoptr with
non-default AS in matchOperationAddr, because now its still possible
to look through multiple sequences of them that ultimately do not
result in a address space mismatch (i.e. the second lit test).
2022-07-16 10:56:42 -04:00
Eric Christopher
cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher
a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Sanjoy Das
aa92cae14e [BypassSlowDivision] Improve our handling of divisions by constants
(This reapplies r314253.  r314253 was reverted on r314482 because of a
correctness regression on P100, but that regression was identified to be
something else.)

Summary:
Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow .  This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.

Reviewers: jlebar

Subscribers: jholewinski, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D38265

llvm-svn: 319677
2017-12-04 19:21:58 +00:00
Sanjoy Das
0ac5ba5ade Revert "[BypassSlowDivision] Improve our handling of divisions by constants"
This reverts commit r314253.  It causes a miscompile on P100 in an internal
benchmark.  Reverting while I investigate.

llvm-svn: 314482
2017-09-29 00:54:16 +00:00
Sanjoy Das
eda7a86d42 [BypassSlowDivision] Improve our handling of divisions by constants
Summary:
Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow .  This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.

Reviewers: jlebar

Subscribers: jholewinski, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D38265

llvm-svn: 314253
2017-09-26 21:54:27 +00:00
Nikolai Bozhenov
fca527af5c [BypassSlowDivision] Do not bypass division of hash-like values
Disable bypassing if one of the operands looks like a hash value. Slow
division often occurs in hashtable implementations and fast division is
never taken there because a hash value is extremely unlikely to have
enough upper bits set to zero.

A value is considered to be hash-like if it is produced by

1) XOR operation
2) Multiplication by a constant wider than the shorter type
3) PHI node with all incoming values being hash-like

Differential Revision: https://reviews.llvm.org/D28200

llvm-svn: 299329
2017-04-02 13:14:30 +00:00
Nikolai Bozhenov
4a04fb9e90 [BypassSlowDivision] Use ValueTracking to simplify run-time checks
ValueTracking is used for more thorough analysis of operands. Based on the
analysis, either run-time checks can be simplified (e.g. check only one operand
instead of two) or the transformation can be avoided. For example, it is quite
often the case that a divisor is promoted from a shorter type and run-time
checks for it are redundant.

With additional compile-time analysis of values, two special cases naturally
arise and are addressed by the patch:

 1) Both operands are known to be short enough. Then, the long division can be
    simply replaced with a short one without CFG modification.

 2) If a division is unsigned and the dividend is known to be short then the
    long division is not needed at all. Because if the divisor is too big for
    short division then the quotient is obviously zero (and the remainder is
    equal to the dividend). Actually, the division is not needed when
    (divisor > dividend).

Differential Revision: https://reviews.llvm.org/D29897

llvm-svn: 296832
2017-03-02 22:12:15 +00:00
Justin Lebar
3e50a5be8f [CodeGenPrepare] Don't sink non-cheap addrspacecasts.
Summary:
Previously, CGP would unconditionally sink addrspacecast instructions,
even going so far as to sink them into a loop.

Now we check that the cast is "cheap", as defined by TLI.

We introduce a new "is-cheap" function to TLI rather than using
isNopAddrSpaceCast because some GPU platforms want the ability to ask
for non-nop casts to be sunk.

Reviewers: arsenm, tra

Subscribers: jholewinski, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26923

llvm-svn: 287591
2016-11-21 22:49:15 +00:00
Justin Lebar
2860573529 [BypassSlowDivision] Handle division by constant numerators better.
Summary:
We don't do BypassSlowDivision when the denominator is a constant, but
we do do it when the numerator is a constant.

This patch makes two related changes to BypassSlowDivision when the
numerator is a constant:

 * If the numerator is too large to fit into the bypass width, don't
   bypass slow division (because we'll never run the smaller-width
   code).

 * If we bypass slow division where the numerator is a constant, don't
   OR together the numerator and denominator when determining whether
   both operands fit within the bypass width.  We need to check only the
   denominator.

Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D26699

llvm-svn: 287062
2016-11-16 00:44:47 +00:00
Justin Lebar
1535a5e9df Add missing lit.local.cfg to llvm/test/Transforms/CodeGenPrepare/NVPTX.
llvm-svn: 285464
2016-10-28 21:56:07 +00:00
Justin Lebar
0ede5fb1bb Don't leave unused divs/rems sitting around in BypassSlowDivision.
Summary:
This "pass" eagerly creates div and rem instructions even when only one
is needed -- it relies on a later pass (machine DCE?) to clean them up.

This is problematic not just from a cleanliness perspective (this pass
is running during CodeGenPrepare, so should leave the IR in a better
state), but it also creates a problem for instruction selection.  If we
always have a div+rem, isel will always select a divrem instruction (if
possible), even when a single div or rem would do.

Specifically, in NVPTX, we want to compute rem from the output of div,
if available.  But if a div is not available, we want to leave the rem
alone.  This transformation is overeager if div is always available.

Because this code runs as part of CodeGenPrepare, it's nontrivial to
write a test for this change.  But this will effectively be tested by
a later patch which adds the aforementioned change to NVPTX isel.

Reviewers: tra

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26088

llvm-svn: 285460
2016-10-28 21:43:54 +00:00
Justin Lebar
468bf73209 Don't claim the udiv created in BypassSlowDivision is exact.
Summary:
In BypassSlowDivision's short-dividend path, we would create e.g.

  udiv exact i32 %a, %b

"exact" here means that we are asserting that %a is a multiple of %b.
But we have no reason to believe this must be true -- this is just a
bug, as far as I can tell.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D26097

llvm-svn: 285459
2016-10-28 21:43:51 +00:00