33 Commits

Author SHA1 Message Date
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
Jeremy Morse
d2d9dc8eb4
[DebugInfo][RemoveDIs] Make debugify pass convert to/from RemoveDIs mode (#73251)
Debugify is extremely useful as a testing and debugging tool, and a good
number of LLVM-IR transform tests use it. We need it to support "new"
non-instruction debug-info to get test coverage, but it's not important
enough to completely convert right now (and it'd be a large
undertaking). Thus: convert to/from dbg.value/DPValue mode on entry and
exit of the pass, which gives us the functionality without any further
work. The cost is compile-time, but again this is only happening during
tests.

Tested by: the large set of debugify tests enabled here. Note the
InstCombine test (cast-mul-select.ll) that hasn't been fully enabled:
this is because there's a debug-info sinking piece of code there that
hasn't been instrumented.
2023-11-29 13:19:50 +00:00
Nikita Popov
4d5525e0cb [LoopUnroll] Add tests for excessive znver3 unrolls (NFC) 2023-09-28 12:10:38 +02:00
Ganesh Gopalasubramanian
536e805e4d [X86] AMD Genoa (znver4) Change LoopMicroOpBufferSize to handle minimal unrolling of loops 2023-07-26 16:01:56 +05:30
Nikita Popov
b9808e5660 [LoopUnroll] Fold add chains during unrolling
Loop unrolling tends to produce chains of
`%x1 = add %x0, 1; %x2 = add %x1, 1; ...` with one add per unrolled
iteration. This patch simplifies these adds to `%xN = add %x0, N`
directly during unrolling, rather than waiting for InstCombine to do so.

The motivation for this is that having a single add (rather than
an add chain) on the induction variable makes it a simple recurrence,
which we specially recognize in a number of places. This allows
InstCombine to directly perform folds with that knowledge, instead
of first folding the add chains, and then doing other folds in another
InstCombine iteration.

Due to the reduced number of InstCombine iterations, this also
results in a small compile-time improvement.

Differential Revision: https://reviews.llvm.org/D153540
2023-07-05 09:54:28 +02:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Joshua Cao
898a9ca5e9 [SCEV] Strengthen huge constant trip multiples.
SCEV determines that loops with trip count >=2^32 have a trip multiple
of 1 to guard against huge multiples. This patch stregthens this to
instead find the greatest power of 2 divisor that is less than the
threshold.

Differential Revision: https://reviews.llvm.org/D147868
2023-04-10 20:00:46 -07:00
Joshua Cao
60470e163a [LoopUnroll] Add script test checks for LoopUnroll/X86/mmx.ll 2023-04-10 19:59:01 -07:00
Nikita Popov
ef992b6079 [LoopUnroll] Convert some tests to opaque pointers (NFC) 2022-12-23 16:35:26 +01:00
Roman Lebedev
da80639ee2
[NFC][IndVar] Autogenerate checklines in one test 2022-12-14 17:39:10 +03:00
Roman Lebedev
a64b2e9e3e
[NFC][SCEV][LoopUnroll] Add tests where treating or as add raises expansion cost
From https://reviews.llvm.org/rG46db90cc71d1#1154128
2022-12-12 20:41:56 +03:00
Roman Lebedev
5103ef64fe
[NFC] Port all (but one) LoopUnroll tests to -passes= syntax 2022-12-07 20:15:43 +03:00
Roman Lebedev
6f6e9a867f
[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107271
2021-08-03 00:57:26 +03:00
Juneyoung Lee
ae6e89327b Precommit tests that have poison as shufflevector's placeholder
This commit copies existing tests at llvm/Transforms containing
'shufflevector X, undef' and replaces them with 'shufflevector X, poison'.
The new copied tests have *-inseltpoison.ll suffix at its file name
(as db7a2f347f132b3920415013d62d1adfb18d8d58 did)
See https://reviews.llvm.org/D93793

Test files listed using

grep -R -E "^[^;]*shufflevector <.*> .*, <.*> undef" | cut -d":" -f1 | uniq

Test files copied & updated using

file_org=llvm/test/Transforms/$1
if [[ "$file_org" = *-inseltpoison.ll ]]; then
  file=$file_org
else
  file=${file_org%.ll}-inseltpoison.ll
  if [ ! -f $file ]; then
    cp $file_org $file
  fi
fi
sed -i -E 's/^([^;]*)shufflevector <(.*)> (.*), <(.*)> undef/\1shufflevector <\2> \3, <\4> poison/g' $file
head -1 $file | grep "Assertions have been autogenerated by utils/update_test_checks.py" -q
if [ "$?" == 1 ]; then
  echo "$file : should be manually updated"
  # The test is manually updated
  exit 1
fi
python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file
2020-12-29 17:09:31 +09:00
Arthur Eubanks
a820261bf3 [test] Fix store_cost.ll under NPM
The NPM processes loops in forward program order, whereas the legacy PM
processes them in reverse program order. No reason to test both PMs
here, so just stick to the NPM.
2020-12-07 21:19:05 -08:00
Sam Parker
0724153bbe [CostModel] Fix cast crash
Don't presume instruction operands while matching reductions.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46430

Differential Revision: https://reviews.llvm.org/D82453
2020-07-03 07:53:45 +01:00
Fangrui Song
ac14f7b10c [lit] Delete empty lines at the end of lit.local.cfg NFC
llvm-svn: 363538
2019-06-17 09:51:07 +00:00
Eric Christopher
cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher
a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Craig Topper
b7b353be60 [X86] Make Feature64Bit useful
We now only add +64bit to the CPU string for "generic" CPU. All other CPU names are assumed to have the feature flag already set if they support 64-bit. I've remove the implies from CMPXCHG8 so that Feature64Bit only comes in via CPUs or user passing -mattr=+64bit.

I've changed the assert to a report_fatal_error so it's not lost in Release builds.

The test updates are to fix things that tripped the new error.

Differential Revision: https://reviews.llvm.org/D51231

llvm-svn: 341022
2018-08-30 06:01:05 +00:00
Elena Demikhovsky
f58f838495 Changed basic cost of store operation on X86
Store operation takes 2 UOps on X86 processors. The exact cost calculation affects several optimization passes including loop unroling.
This change compensates performance degradation caused by https://reviews.llvm.org/D34458 and shows improvements on some benchmarks.

Differential Revision: https://reviews.llvm.org/D35888

llvm-svn: 311285
2017-08-20 12:34:29 +00:00
David L Kreitzer
188de5ae69 Adds the ability to use an epilog remainder loop during loop unrolling and makes
this the default behavior.

Patch by Evgeny Stupachenko (evstupac@gmail.com).

Differential Revision: http://reviews.llvm.org/D18158

llvm-svn: 265388
2016-04-05 12:19:35 +00:00
Jingyue Wu
bfefff555e Roll forward r243250
r243250 appeared to break clang/test/Analysis/dead-store.c on one of the build
slaves, but I couldn't reproduce this failure locally. Probably a false
positive as I saw this test was broken by r243246 or r243247 too but passed
later without people fixing anything.

llvm-svn: 243253
2015-07-26 19:10:03 +00:00
Jingyue Wu
84879b71a9 Revert r243250
breaks tests

llvm-svn: 243251
2015-07-26 18:30:13 +00:00
Jingyue Wu
bf485f059c [TTI/CostModel] improve TTI::getGEPCost and use it in CostModel::getInstructionCost
Summary:
This patch updates TargetTransformInfoImplCRTPBase::getGEPCost to consider
addressing modes. It now returns TCC_Free when the GEP can be completely folded
to an addresing mode.

I started this patch as I refactored SLSR. Function isGEPFoldable looks common
and is indeed used by some WIP of mine. So I extracted that logic to getGEPCost.

Furthermore, I noticed getGEPCost wasn't directly tested anywhere. The best
testing bed seems CostModel, but its getInstructionCost method invokes
getAddressComputationCost for GEPs which provides very coarse estimation. So
this patch also makes getInstructionCost call the updated getGEPCost for GEPs.
This change inevitably breaks some tests because the cost model changes, but
nothing looks seriously wrong -- if we believe the new cost model is the right
way to go, these tests should be updated.

This patch is not perfect yet -- the comments in some tests need to be updated.
I want to know whether this is a right approach before fixing those details.

Reviewers: chandlerc, hfinkel

Subscribers: aschwaighofer, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D9819

llvm-svn: 243250
2015-07-26 17:28:13 +00:00
David Majnemer
453f7a1480 [LoopUnroll] Use undef for phis with no value live
We would create a phi node with a zero initialized operand instead of
undef in the case where no value was originally available.  This was
problematic for x86_mmx which has no null value.

llvm-svn: 241143
2015-07-01 05:38:07 +00:00
David Blaikie
a79ac14fa6 [opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.

A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)

import fileinput
import sys
import re

pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")

for line in sys.stdin:
  sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7649

llvm-svn: 230794
2015-02-27 21:17:42 +00:00
David Blaikie
79e6c74981 [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.

This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.

* This doesn't modify gep operators, only instructions (operators will be
  handled separately)

* Textual IR changes only. Bitcode (including upgrade) and changing the
  in-memory representation will be in separate changes.

* geps of vectors are transformed as:
    getelementptr <4 x float*> %x, ...
  ->getelementptr float, <4 x float*> %x, ...
  Then, once the opaque pointer type is introduced, this will ultimately look
  like:
    getelementptr float, <4 x ptr> %x
  with the unambiguous interpretation that it is a vector of pointers to float.

* address spaces remain on the pointer, not the type:
    getelementptr float addrspace(1)* %x
  ->getelementptr float, float addrspace(1)* %x
  Then, eventually:
    getelementptr float, ptr addrspace(1) %x

Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.

update.py:
import fileinput
import sys
import re

ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile(       r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")

def conv(match, line):
  if not match:
    return line
  line = match.groups()[0]
  if len(match.groups()[5]) == 0:
    line += match.groups()[2]
  line += match.groups()[3]
  line += ", "
  line += match.groups()[1]
  line += "\n"
  return line

for line in sys.stdin:
  if line.find("getelementptr ") == line.find("getelementptr inbounds"):
    if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
      line = conv(re.match(ibrep, line), line)
  elif line.find("getelementptr ") != line.find("getelementptr ("):
    line = conv(re.match(normrep, line), line)
  sys.stdout.write(line)

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).

The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7636

llvm-svn: 230786
2015-02-27 19:29:02 +00:00
Alp Toker
d3d017cf00 Reduce verbiage of lit.local.cfg files
We can just split targets_to_build in one place and make it immutable.

llvm-svn: 210496
2014-06-09 22:42:55 +00:00
Hal Finkel
6532c20faa Move late partial-unrolling thresholds into the processor definitions
The old method used by X86TTI to determine partial-unrolling thresholds was
messy (because it worked by testing target features), and also would not
correctly identify the target CPU if certain target features were disabled.
After some discussions on IRC with Chandler et al., it was decided that the
processor scheduling models were the right containers for this information
(because it is often tied to special uop dispatch-buffer sizes).

This does represent a small functionality change:
 - For generic x86-64 (which uses the SB model and, thus, will get some
   unrolling).
 - For AMD cores (because they still currently use the SB scheduling model)
 - For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump
   the default threshold to 50; we're working on a test case for this).
Otherwise, nothing has changed for any other targets. The logic, however, has
been moved into BasicTTI, so other targets may now also opt-in to this
functionality simply by setting LoopMicroOpBufferSize in their processor
model definitions.

llvm-svn: 208289
2014-05-08 09:14:44 +00:00
Benjamin Kramer
9130cb8547 LoopUnroll: If we're doing partial unrolling, use the PartialThreshold to limit unrolling.
Otherwise we use the same threshold as for complete unrolling, which is
way too high. This made us unroll any loop smaller than 150 instructions
by 8 times, but only if someone specified -march=core2 or better,
which happens to be the default on darwin.

llvm-svn: 207940
2014-05-04 19:12:38 +00:00
Hal Finkel
2eed29f3c8 Implement X86TTI::getUnrollingPreferences
This provides an initial implementation of getUnrollingPreferences for x86.
getUnrollingPreferences is used by the generic (concatenation) unroller, which
is distinct from the unrolling done by the loop vectorizer. Many modern x86
cores have some kind of uop cache and loop-stream detector (LSD) used to
efficiently dispatch small loops, and taking full advantage of this requires
unrolling small loops (small here means 10s of uops).

These caches also have limits on the number of taken branches in the loop, and
so we also cap the loop unrolling factor based on the maximum "depth" of the
loop. This is currently calculated with a partial DFS traversal (partial
because it will stop early if the path length grows too much). This is still an
approximation, and one that is both conservative (because it does not account
for branches eliminated via block placement) and optimistic (because it is only
recording the maximum depth over minimum paths). Nevertheless, because the
loops that fit in these uop caches are so small, it is not clear how much the
details matter.

The original set of patches posted for review produced the following test-suite
performance results (from the TSVC benchmark) at that time:
  ControlLoops-dbl - 13% speedup
  ControlLoops-flt - 15% speedup
  Reductions-dbl - 7.5% speedup

llvm-svn: 205348
2014-04-01 18:50:34 +00:00