211023 Commits

Author SHA1 Message Date
Arthur Eubanks
8334cdde2e [NFC] Don't pass redundant arguments
Some parameters were already part of the Config passed in.
2021-02-10 22:06:49 -08:00
Max Kazantsev
3d15b7e7df [Codegenprepare][X86] Use usub with overflow opt for IV increment
Function `replaceMathCmpWithIntrinsic` artificially limits the scope
of the optimization, setting a requirement of two instructions be in
the same block, due to two reasons:
- usage of DT for more general check is costly in terms of compile time;
- risk of creating a new value that lives through multiple blocks.

Because of this, two semantically equivalent tests may be or not be the
subject of this opt depending on where the binary operation is located.
See `test/CodeGen/X86/usub_inc_iv.ll` for motivation

There is one important particular case where this limitation is  too strict:
it is when the binary operation is the increment of the induction variable.
As result, the application of this opt becomes fragile and highly reliant on
where other passes decide to place IV increment. In most cases, they place
it in the end of the latch block, killing the opt opportunity (when in fact it
does not matter where to insert the actual instruction).

This patch handles this particular case separately.
- The detector does not use dom tree and has constant cost;
- The value of IV or IV.next lives through all loop in any case, so this should not
  create a new unexpected long-living value.

As result, the transform becomes more robust. It also seems to lead to
better code generation in some cases (see `test/CodeGen/X86/lsr-loop-exit-cond.ll`).

Differential Revision: https://reviews.llvm.org/D96119
Reviewed By: spatel, reames
2021-02-11 11:59:45 +07:00
Max Kazantsev
6efcc2fd3f [Test] Add negative tests where usub optimization should not apply 2021-02-11 11:59:44 +07:00
Carl Ritson
e5b0b434f6 [AMDGPU] Refactor MIMG tables to better handle hardware variants
Add mimgopc object to represent the opcode allowing different
opcodes for different hardware variants.
This enables image_atomic_fcmpswap, image_atomic_fmin, and
image_atomic_fmax on GFX10

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D96309
2021-02-11 13:22:41 +09:00
Kazu Hirata
c5e90a8857 [AsmPrinter] Use range-based for loops (NFC) 2021-02-10 20:01:22 -08:00
Kazu Hirata
b16c6b2a83 [TableGen] Use ListSeparator (NFC) 2021-02-10 20:01:20 -08:00
Kazu Hirata
d12a0f4fc0 [GCOV] Drop unnecessary const from return types (NFC)
Identified with readability-const-return-type.
2021-02-10 20:01:18 -08:00
Craig Topper
5189c5b940 [X86] Simplify patterns for avx512 vpcmp. NFC
This removes the commuted PatFrags that only existed to carry
an SDNodeXForm in its OperandTransform field. We know all the places
that need to use the commuted SDNodeXForm and there is one transform
shared by signed and unsigned compares. So just hardcode the
the SDNodeXForm where it is needed and use the non commuted PatFrag
in the pattern.

I think when I wrote this I thought the SDNodeXForm name had to
match what is in the PatFrag that is being used. But that's not
true. The OperandTransform is only used when the PatFrag is used
in an instruction pattern and not a separate Pat pattern. All
the commuted cases are Pat patterns.
2021-02-10 19:24:27 -08:00
Jessica Clarke
ca606dc988 [RISCV] More whitespace and comment typo fixes in RISCVInstrInfoC.td 2021-02-11 02:32:36 +00:00
Jessica Clarke
0973ce8596 [RISCV] Fix whitespace in RISCVInstrInfoC.td 2021-02-11 02:23:09 +00:00
Craig Topper
350ab4e617 [RISCV] Use OperandTransform field of ImmLeaf to slightly simplify a couple bitmanip patterns. NFC
This binds the SDNodeXForm to the ImmLeaf so we only need to mention
the ImmLeaf in both the input and output pattern.
2021-02-10 17:52:07 -08:00
xgupta
4fc6ff07b4 [Draft] [examples] Move llvm/examples/OCaml-Kaleidoscope/ to llvm-archive 2021-02-11 06:52:24 +05:30
Duncan P. N. Exon Smith
fa35c1f80f ValueMapper: Rename RF_MoveDistinctMDs => RF_ReuseAndMutateDistinctMDs, NFC
Rename the `RF_MoveDistinctMDs` flag passed into `MapValue` and
`MapMetadata` to `RF_ReuseAndMutateDistinctMDs` in order to more
precisely describe its effect and clarify the header documentation.

Found this while helping to investigate PR48841, which pointed out an
unsound use of the flag in `CloneModule()`. For now I've just added a
FIXME there, but I'm hopeful that the new (more precise) name will
prevent other similar errors.
2021-02-10 16:53:21 -08:00
Jessica Paquette
1514f3b2c8 [AArch64][GlobalISel] Don't perform the mul const combine with G_PTR_ADD
A G_MUL + G_PTR_ADD can also be folded into a madd. So, conservatively, we
shouldn't combine when the G_MUL is used by a G_PTR_ADD either.

Differential Revision: https://reviews.llvm.org/D96457
2021-02-10 15:30:45 -08:00
Arthur Eubanks
1cd1573f11 [docs] Make clearer in WritingAnLLVMPass that the legacy PM isn't the default
Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D96452
2021-02-10 15:26:25 -08:00
Jessica Paquette
5f7a4d8d05 [AArch64][GlobalISel] Perform load/store extended reg folding with optsize
GlobalISel was only doing this with minsize. SDAG does this with optsize.

(See: `SelectionDAG::shouldOptForSize()`)

This is a 0.3% code size improvement for CTMark at -Os.

(Best: 1.1% improvements on lencod + pairlocalalign)

Differential Revision: https://reviews.llvm.org/D96451
2021-02-10 14:42:25 -08:00
Hongtao Yu
3a5f8a3ea3 [CSSPGO] Restrict pseudo probe tests to x86_64 only. 2021-02-10 14:41:10 -08:00
Benjamin Kramer
8fb4a4f7bb [SampleFDO] Silence -Wnon-virtual-dtor warning
There's no polymorphic deletion happening here.
2021-02-10 23:37:15 +01:00
Arthur Eubanks
cee9869c4e [opt] Add helpful alternatives for -analyze under new PM
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D96449
2021-02-10 14:09:17 -08:00
Jacques Pienaar
d650365935 Revert "Make gCrashRecoveryEnabled thread local"
This reverts commit 5e77ea04f214c7a18bd5c782c8b8a7b7c828ad7a.

Causes a breakage on Windows buildbot.
2021-02-10 13:36:56 -08:00
Rong Xu
db0d7d0ba9 [SampleFDO][NFC] Refactor SampleProfileLoader to reuse in CodeGen
Break SampleProfileLoader into to a base and a derived class.
Base class (SampleProfileLoaderBaseImpl) includes the common
code for IR and MachineIR (CodeGen) sample loader.
It will be templatelized in the later patch.

Inline and Probe related code will remain in the derived class of
SampleProfileLoader and stays in SampleProfile.cpp.

We need to refactor some functions:
(1) getInstWeight() to enable the code sharing -- put the core into
getInstWeightImpl().
(2) emitAnnotation() and propagateWeights() to carve out the code
specific to SampleProfileLoader.
(3) make getInstWeight() and findFunctionSamples() virtual and override
in SampleProfileLoader as they need to access the fields in the derived
class.

Differential Revision: https://reviews.llvm.org/D95832
2021-02-10 13:29:15 -08:00
Jessica Paquette
9283058abb [AArch64][GlobalISel] Fold G_ADD into the cset for G_ICMP
When we have a G_ADD which is fed by a G_ICMP on one side, we can fold it into
the cset for the G_ICMP.

e.g. Given

```
%cmp = G_ICMP ... %x, %y
%add = G_ADD %cmp, %z
```

We would normally emit a cmp, cset, and add.

However, `%add` is either `%z` or `%z + 1`. So, we can just use `%z` as the
source of the cset rather than wzr, saving an instruction.

This would probably be cleaner in AArch64PostLegalizerLowering, but we'd need
to change the way we represent G_ICMP to do that, I think. For now, it's
easiest to implement in selection.

This is a 0.1% code size improvement on CTMark/pairlocalalign at -Os.

Example: https://godbolt.org/z/7KdrP8

Differential Revision: https://reviews.llvm.org/D96388
2021-02-10 13:28:01 -08:00
Jacques Pienaar
5e77ea04f2 Make gCrashRecoveryEnabled thread local
If context is enabled/disabled and queried concurrently then this
results in a data race/TSAN failure with RunSafely (where boolean
variable was not locked).

There doesn't seem to be a reasonable way to enable threads that enable
and disable recovery in parallel (without also keeping
gCrashRecoveryEnabled's lock held during Fn execution which seems
undesirable). This makes enable checking if enabled thread local and
consistent with other thread local usage of crash context here.

Differential Revision: https://reviews.llvm.org/D93907
2021-02-10 12:44:18 -08:00
Hongtao Yu
1cb47a063e [CSSPGO] Unblock optimizations with pseudo probe instrumentation.
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:

1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95982
2021-02-10 12:43:17 -08:00
Adrian Prantl
19fc8eede4 Add missing nullptr check.
salvageDebugInfoImpl() may fail and return a nullptr.
2021-02-10 12:15:24 -08:00
Philip Reames
9bf3cfa77b [SCEV] Add a missing AssumptionCache parameter
The AssumptionCache mechanism is used to feed assumes into known bits computations.  Most places in SCEV passed it in, but one place appears to have been missed.

Spotted via inspection, don't have a test case which actually exercises this, but it seemed like an obvious fixit.
2021-02-10 12:08:55 -08:00
Sanjay Patel
6e2053983e [InstCombine] fold lshr(mul X, SplatC), C2
This is a special-case multiply that replicates bits of
the source operand. We need this fold to avoid regression
if we make canonicalization to `mul` more aggressive for
shl+or patterns.

I did not see a way to make Alive generalize the bit width
condition for even-number-of-bits only, but an example of
the proof is:
  Name: i32
  Pre: isPowerOf2(C1 - 1) && log2(C1) == C2 && (C2 * 2 == width(C2))
  %m = mul nuw i32 %x, C1
  %t = lshr i32 %m, C2
  =>
  %t = and i32 %x, C1 - 2

  Name: i14
  %m = mul nuw i14 %x, 129
  %t = lshr i14 %m, 7
  =>
  %t = and i14 %x, 127

https://rise4fun.com/Alive/e52
2021-02-10 15:02:31 -05:00
Sanjay Patel
6bcc1fd461 [InstCombine] add tests for lshr with mul; NFC 2021-02-10 15:02:31 -05:00
Jameson Nash
a7db680183 Renovate CMake files in the llvm-exegesis tool.
This attempts to move all tools over to using `add_llvm_library` for
better consistency. After doing this, I noticed it ended up as nearly a
reimplementation of https://reviews.llvm.org/rL342148, which later got
reverted in r342336 (b09a8c9bd9b819741b38071a7ccd95042ef2643a).

With ccache and ninja on a large core machine (40), I haven't run into
build errors, so I'm hopeful it's better now, though it doesn't seem to
be any different / new.

Reviewed By: stephenneuendorffer

Differential Revision: https://reviews.llvm.org/D90970
2021-02-10 14:22:55 -05:00
Arthur Eubanks
5d960cba34 [opt][NewPM] Add a --print-passes flag to print all available passes
It seems nicer to list passes given a flag rather than displaying all
passes in opt --help.

This is awkwardly structured because a PassBuilder is required, but
reusing the PassBuilder in runPassPipeline() doesn't work because we
read the input IR before getting to runPassPipeline(). So printing the
list of passes needs to happen before reading the input IR. If we remove
the legacy PM code in main() and move everything from NewPMDriver.cpp
into opt.cpp, we can create the PassBuilder before reading IR and check
if we should print the list of passes and exit. But until then this hack
seems fine.

Compared to the legacy PM, the new PM passes are lacking descriptions.
We'll need to figure out a way to add descriptions if we think this is
important.

Also, this only works for passes specified in PassRegistry.def. If we
want to print other custom registered passes, we'll need a different
mechanism.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D96101
2021-02-10 11:22:12 -08:00
Craig Topper
fc4d780eaf [RISCV] Remove superfluous semicolon. NFC 2021-02-10 11:20:29 -08:00
Nick Desaulniers
68945a8686 [Thumb2] support movs pc, lr alias for subs pc, lr, #0/eret
This is used by the Linux kernel built with CONFIG_THUMB2_KERNEL.

Because different operands are not permitted to `movs`, the diagnostics now provide multiple suggestions along the lines of using a non-pc destination operand or lr source operand.

Forked from D95586.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D96304
2021-02-10 11:00:42 -08:00
Arthur Eubanks
c2c977ce50 Specify that some flags are legacy PM-specific
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D96100
2021-02-10 10:53:04 -08:00
Craig Topper
cb161b3a88 [RISCV] Add support for matching .vf forms of fadd/fsub/fmul/fdiv/fma for fixed vectors.
fma+neg will come in a different patch since I haven't done it for .vv
yet either.

Differential Revision: https://reviews.llvm.org/D96375
2021-02-10 10:16:27 -08:00
Tom Stellard
997f6b6f8e [CMake] Remove some dead code in llvm_install_library_symlink()
Reviewed By: smeenai

Differential Revision: https://reviews.llvm.org/D95666
2021-02-10 10:13:04 -08:00
Craig Topper
0c254b4a69 [RISCV] Add support for selecting vrgather.vx/vi for fixed vector splat shuffles.
The test cases extract a fixed element from a vector and splat it
into a vector. This gets DAG combined into a splat shuffle.

I've used some very wide vectors in the test to make sure we have
at least a couple tests where the element doesn't fit into the
uimm5 immediate of vrgather.vi so we fall back to vrgather.vx.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96186
2021-02-10 10:01:56 -08:00
Fangrui Song
04a2e12612 DebugInfo/Symbolize: Retrieve filename from the preceding STT_FILE for .symtab symbolization
The ELF spec says:

> STT_FILE: Conventionally, the symbol's name gives the name of the source file associated with the object file. A file symbol has STB_LOCAL binding, its section index is SHN_ABS, and it precedes the other STB_LOCAL symbols for the file, if it is present.

For a local symbol, the preceding STT_FILE symbol is almost always in the same
file[1]. GNU addr2line uses this heuristic to retrieve the filename associated
with a local symbol (e.g. internal linkage functions in C/C++).

GNU addr2line can assign STT_FILE filename to a non-local symbol, too, but the trick
only works if no regular symbol precede STT_FILE. This patch does not implement this corner case
(not useful for most executables which have more than one files).

In case of filename mismatch between .debug_line & .symtab, arbitrarily make .debug_line win.

[1]: LLD does not synthesize STT_FILE symbols
(https://bugs.llvm.org/show_bug.cgi?id=48023 see also
https://sourceware.org/bugzilla/show_bug.cgi?id=26822).  An assembly file
without `.file` directives can cause mis-attribution. This is an edge case.

Differential Revision: https://reviews.llvm.org/D95927
2021-02-10 09:47:10 -08:00
Fangrui Song
4f30a3d3d2 [llvm-cfi-verify] Set UseSymbolTable to false
parseSectionContents expects to skip regions not described by DWARF.  With my
pending DebugInfo/Symbolize change, the filename can be recovered and there
will be more IndirectInstructions entries.
2021-02-10 09:44:13 -08:00
Jeremy Morse
1d68e0a075 Reland [DWARF] Location-less inlined variables should not have DW_TAG_variable
Originally landed in ddc2f1e3fb4 and reverted in d32deaab4d because of
a Generic test objecting. That was fixed up in 013613964fd9. Original
landing commit message follows:

[DWARF] Location-less inlined variables should not have DW_TAG_variable

Discussed in this thread:

  https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html

DwarfDebug::collectEntityInfo accidentally distinguishes between variable
locations that never have a location specified, and variable locations that
have an empty location specified. The latter leads to the creation of an
empty variable referring to the abstract origin.

Fix this by seeking a non-empty location before producing a concrete
entity, to guarantee a DW_AT_location will be produced. Other loops in
collectEntityInfo and endFunctionImpl take care of examining the
retainedNodes collection and ensuring optimised-out variables are created.

Differential Revision: https://reviews.llvm.org/D95617
2021-02-10 15:40:47 +00:00
Jay Foad
b5f3383152 [AMDGPU] Add another test case for combining DS reads 2021-02-10 14:59:49 +00:00
Jay Foad
2114b458b0 [AMDGPU] Fix comments in SILoadStoreOptimizer::offsetsCanBeCombined 2021-02-10 14:49:33 +00:00
Luís Marques
acac29ca42 [DAGCombiner] Don't fold FCOPYSIGN vector sign operand casts
Avoid doing the following combine for vector types:

```
copysign(x, fp_extend(y)) -> copysign(x, y)
copysign(x, fp_round(y)) -> copysign(x, y)
```

That combine seemed to impede the selection of vector instruction and cause
a mess in some circumstances.

Differential Revision: https://reviews.llvm.org/D96037
2021-02-10 14:25:24 +00:00
Nico Weber
ec4fb5bcd3 [gn build] (manually) port e89fcbfad6a3 2021-02-10 08:59:07 -05:00
Daniel Cederman
ad3b023c88 [Sparc] Support relocatable expressions in the assembler
Allow assembler expressions to start with an identifier. This allows for expressions such as
```
b symbol + 4
```
and
```
mov symEnd - symStart, %g1
```

The patch builds upon https://reviews.llvm.org/D47136.

Reviewed By: joerg

Differential Revision: https://reviews.llvm.org/D47458
2021-02-10 14:52:44 +01:00
Fraser Cormack
a3c74d6d53 [RISCV] Add support for selecting vid.v from build_vector
This patch optimizes a build_vector "index sequence" and lowers it to
the existing custom RISCVISD::VID node. This pattern is common in
autovectorized code.

The custom node was updated to allow it to be used by both scalable and
fixed-length vectors, thus avoiding pattern duplication.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96332
2021-02-10 10:58:40 +00:00
Jeremy Morse
013613964f Reapply [DebugInfo] Re-engineer a test to be stricter, add XFails
Was e05c10380ce, reverted in d7d0b17de77, see D95617 for details. I've
added "arm64" to the XFail list (as well as aarch64), will follow up on
the mailing list about whether there's anything else to be done.
2021-02-10 10:46:58 +00:00
Simon Pilgrim
eb31c3c5cb Revert rGe1172959226689a "[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - merge VPERMILPD ops with different low/high masks."
Revert this while I investigate a downstream breakage report.
2021-02-10 10:26:44 +00:00
Sander de Smalen
9db6e97a86 [LoopVectorize] NFC: Change computeFeasibleMaxVF to operate on ElementCount.
This patch is NFC and changes occurrences of `unsigned MaxVectorSize`
to work on type ElementCount.

This patch is a preparatory patch with the ultimate goal of making
`computeMaxVF()` return both a max fixed VF and a max scalable VF,
so that `selectVectorizationFactor()` can pick the most cost-effective
vectorization factor.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D96018
2021-02-10 08:52:10 +00:00
Sander de Smalen
750a78cd5d [ValueTypes] Add MVT for nxv1bf16.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96249
2021-02-10 08:50:41 +00:00
Sam Parker
9d81ccc02f [WebAssembly] Enable loop unrolling
Enable partial and runtime unrolling with a threshold of 30, which
was derived from a large number of kernels running on node and
wasmtime for amd64 and aarch64.

Unrolling is enabled by default at -O2 and -O3 and is disabled at
-Oz and -Os. Compiling with -Os is recommended if the wasm binary
size is the most important factor.

Differential Revision: https://reviews.llvm.org/D95125
2021-02-10 08:25:46 +00:00