25241 Commits

Author SHA1 Message Date
Craig Topper
d9fa8147c4 [X86] Autogenerate complete checks and fix a failure introduced in r337875.
llvm-svn: 337889
2018-07-25 05:22:13 +00:00
Chandler Carruth
7024921c0a [x86/SLH] Teach the x86 speculative load hardening pass to harden
against v1.2 BCBS attacks directly.

Attacks using spectre v1.2 (a subset of BCBS) are described in the paper
here:
https://people.csail.mit.edu/vlk/spectre11.pdf

The core idea is to speculatively store over the address in a vtable,
jumptable, or other target of indirect control flow that will be
subsequently loaded. Speculative execution after such a store can
forward the stored value to subsequent loads, and if called or jumped
to, the speculative execution will be steered to this potentially
attacker controlled address.

Up until now, this could be mitigated by enableing retpolines. However,
that is a relatively expensive technique to mitigate this particular
flavor. Especially because in most cases SLH will have already mitigated
this. To fully mitigate this with SLH, we need to do two core things:
1) Unfold loads from calls and jumps, allowing the loads to be post-load
   hardened.
2) Force hardening of incoming registers even if we didn't end up
   needing to harden the load itself.

The reason we need to do these two things is because hardening calls and
jumps from this particular variant is importantly different from
hardening against leak of secret data. Because the "bad" data here isn't
a secret, but in fact speculatively stored by the attacker, it may be
loaded from any address, regardless of whether it is read-only memory,
mapped memory, or a "hardened" address. The only 100% effective way to
harden these instructions is to harden the their operand itself. But to
the extent possible, we'd like to take advantage of all the other
hardening going on, we just need a fallback in case none of that
happened to cover the particular input to the control transfer
instruction.

For users of SLH, currently they are paing 2% to 6% performance overhead
for retpolines, but this mechanism is expected to be substantially
cheaper. However, it is worth reminding folks that this does not
mitigate all of the things retpolines do -- most notably, variant #2 is
not in *any way* mitigated by this technique. So users of SLH may still
want to enable retpolines, and the implementation is carefuly designed to
gracefully leverage retpolines to avoid the need for further hardening
here when they are enabled.

Differential Revision: https://reviews.llvm.org/D49663

llvm-svn: 337878
2018-07-25 01:51:29 +00:00
Craig Topper
fc501a9223 [X86] Use a shift plus an lea for multiplying by a constant that is a power of 2 plus 2/4/8.
The LEA allows us to combine an add and the multiply by 2/4/8 together so we just need a shift for the larger power of 2.

llvm-svn: 337875
2018-07-25 01:15:38 +00:00
Craig Topper
5be253d988 [X86] Expand mul by pow2 + 2 using a shift and two adds similar to what we do for pow2 - 2.
llvm-svn: 337874
2018-07-25 01:15:35 +00:00
Craig Topper
56c104f104 [X86] Use a two lea sequence for multiply by 37, 41, and 73.
These fit a pattern used by 11, 21, and 19.

llvm-svn: 337871
2018-07-24 23:44:17 +00:00
Craig Topper
b5342b592e [X86] Add test cases for multiply by 37, 41, and 73.
These can all be handled with 2 LEAs similar to what we do for 11, 19, 21.

llvm-svn: 337870
2018-07-24 23:44:15 +00:00
Craig Topper
f8fcee70a3 [X86] Change multiply by 26 to use two multiplies by 5 and an add instead of multiply by 3 and 9 and a subtract.
Same number of operations, but ending in an add is friendlier due to it being commutable.

llvm-svn: 337869
2018-07-24 23:44:12 +00:00
Craig Topper
5ddc0a2b14 [X86] When expanding a multiply by a negative of one less than a power of 2, like 31, don't generate a negate of a subtract that we'll never optimize.
We generated a subtract for the power of 2 minus one then negated the result. The negate can be optimized away by swapping the subtract operands, but DAG combine doesn't know how to do that and we don't add any of the new nodes to the worklist anyway.

This patch makes use explicitly emit the swapped subtract.

llvm-svn: 337858
2018-07-24 21:31:21 +00:00
Craig Topper
6d29891bef [X86] Generalize the multiply by 30 lowering to generic multipy by power 2 minus 2.
Use a left shift and 2 subtracts like we do for 30. Move this out from behind the slow lea check since it doesn't even use an LEA.

Use this for multiply by 14 as well.

llvm-svn: 337856
2018-07-24 21:15:41 +00:00
Heejin Ahn
8daef0751d [WebAssembly] Add tests for weaker memory consistency orderings
Summary:
Currently all wasm atomic memory access instructions are sequentially
consistent, so even if LLVM IR specifies weaker orderings than that, we
should upgrade them to sequential ordering and treat them in the same
way.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49194

llvm-svn: 337854
2018-07-24 21:06:44 +00:00
Craig Topper
86d6320b94 [X86] Change multiply by 19 to use (9 * X) * 2 + X instead of (5 * X) * 4 - 1.
The new lowering can be done in 2 LEAs. The old code took 1 LEA, 1 shift, and 1 sub.

llvm-svn: 337851
2018-07-24 20:31:48 +00:00
Craig Topper
1296c622df [X86] Add test case to show failure to combine away negates that may be created by mul by constant expansion.
Mul by constant can expand to a sequence that ends with a negate. If the next instruction is an add or sub we might be able to fold the negate away.

We currently fail to do this because we explicitly don't add anything to the DAG combine worklist when we expand multiplies. This is primarily to keep the multipy from being reformed, but we should consider adding the users to worklist.

llvm-svn: 337843
2018-07-24 18:36:46 +00:00
Simon Atanasyan
28ded4ee19 [mips] Fix local dynamic TLS with Sym64
For the final DTPREL addition, rather than a lui/daddiu/daddu triple,
LLVM was erronously emitting a daddiu/daddiu pair, treating the %dtprel_hi
as if it were a %dtprel_lo, since Mips::Hi expands unshifted for Sym64.
Instead, use a new TlsHi node and, although unnecessary due to the exact
structure of the nodes emitted, use TlsHi for local exec too to prevent
future bugs. Also garbage-collect the unused TprelLo and TlsGd nodes,
and TprelHi since its functionality is provided by the new common TlsHi node.

Patch by James Clarke.

Differential revision: https://reviews.llvm.org/D49259

llvm-svn: 337827
2018-07-24 13:47:52 +00:00
Sam Parker
8b93e82c3d [ARM] Disable ARMCodeGenPrepare by default
ARM Stage 2 builders have been suspiciously broken since the pass was
committed. Disabling to hopefully fix the bots and give me time to
debug.

llvm-svn: 337821
2018-07-24 12:04:23 +00:00
Chandler Carruth
a25aca21af [x86] Clean up and convert test to use generated CHECK lines.
This test was already checking microscopic behavior of tail call under
specific conditions. This just makes the CHECK lines much more
consistent, clear, and easily updated when intentional changes are made.

I've also switched the test to consistently name the entry block and to
order the helper declarations and comments for specific tests in the
more usual locations.

llvm-svn: 337806
2018-07-24 03:18:08 +00:00
Chandler Carruth
d41dca2ddc [x86] Update the CHECK lines of this test to use the latest patterns
from the script. This minimizes the diff in subsequent changes.

llvm-svn: 337805
2018-07-24 03:07:07 +00:00
Tom Stellard
b7f19e6d1e AMDGPU/GlobalISel: Legalize G_INSERT
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49601

llvm-svn: 337798
2018-07-24 02:19:20 +00:00
Thomas Anderson
8e8a652c2f Fix typo in test/CodeGen/Mips/dins.ll
Differential Revision: https://reviews.llvm.org/D49704

llvm-svn: 337771
2018-07-23 23:19:53 +00:00
Martin Storsjo
100fc97051 [COFF] Fix assembly output of comdat sections without an attached symbol
Since SVN r335286, the .xdata sections are produced without an attached
symbol, which requires using a different syntax when printing assembly
output.

Instead of the usual syntax of '.section <name>,"dr",discard,<symbol>',
use '.section <name>,"dr"' + '.linkonce discard' (which is what GCC
uses for all assembly output).

This fixes PR38254.

Differential Revision: https://reviews.llvm.org/D49651

llvm-svn: 337756
2018-07-23 22:15:19 +00:00
Martin Storsjo
c2b701408e [AArch64] Use MCAsmInfoMicrosoft and MCAsmInfoGNUCOFF as base classes
This matches the structure used on X86 and ARM. This requires
a little bit of duplication of the parts that are equal in both
AArch64 COFF variants though.

Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF
didn't, but now they do.

This makes AArch64 match X86 in how comdat is used for float constants
for MinGW.

Differential Revision: https://reviews.llvm.org/D49637

llvm-svn: 337755
2018-07-23 22:15:14 +00:00
Reid Kleckner
980c4df037 Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
Don't try to generate large PIC code for non-ELF targets. Neither COFF
nor MachO have relocations for large position independent code, and
users have been using "large PIC" code models to JIT 64-bit code for a
while now. With this change, if they are generating ELF code, their
JITed code will truly be PIC, but if they target MachO or COFF, it will
contain 64-bit immediates that directly reference external symbols. For
a JIT, that's perfectly fine.

llvm-svn: 337740
2018-07-23 21:14:35 +00:00
Nirav Dave
5af81d5bfa Add inline asm aliasing test.
llvm-svn: 337734
2018-07-23 20:19:10 +00:00
Krzysztof Parzyszek
9500a24fce [Hexagon] Handle unnamed globals in HexagonConstExpr
Instead of comparing names, compare positions in the parent module.

llvm-svn: 337723
2018-07-23 18:30:17 +00:00
Simon Atanasyan
307e5b31ce [mips] Add more checks to the tls.ll test case. NFC
llvm-svn: 337705
2018-07-23 16:05:44 +00:00
Cameron McInally
2c9bcffc92 [FPEnv] Legalize double width StrictFP vector operations
Differential Revision: https://reviews.llvm.org/D48809

llvm-svn: 337698
2018-07-23 14:40:17 +00:00
Sam Parker
3828c6ff94 [ARM] ARMCodeGenPrepare backend pass
Arm specific codegen prepare is implemented to perform type promotion
on icmp operands, which can enable the removal of uxtb and uxth
(unsigned extend) instructions. This is possible because performing
type promotion before ISel alleviates this duty from the DAG builder
which has to perform legalisation, but has a limited view on data
ranges.
    
The pass visits any instruction operand of an icmp and creates a
worklist to traverse the use-def tree to determine whether the values
can simply be promoted. Our concern is values in the registers
overflowing the narrow (i8, i16) data range, so instructions marked
with nuw can be promoted easily. For add and sub instructions, we are
able to use the parallel dsp instructions to operate on scalar data
types and avoid overflowing bits. Underflowing adds and subs are also
permitted when the result is only used by an unsigned icmp.

Differential Revision: https://reviews.llvm.org/D48832

llvm-svn: 337687
2018-07-23 12:27:47 +00:00
Chandler Carruth
1d926fb9f4 [x86/SLH] Fix a bug where we would harden tail calls twice -- once as
a call, and then again as a return.

Also added a comment to try and explain better why we would be doing
what we're doing when hardening the (non-call) returns.

llvm-svn: 337673
2018-07-23 07:56:15 +00:00
Chandler Carruth
b66f2d8df8 [x86/SLH] Add a test covering indirect forms of control flow. NFC.
This specifically covers different ways of making indirect calls and
jumps. There are some bugs in SLH that I will be fixing in subsequent
patches where the diff in the generated instructions makes the bug fix
much more clear, so just checking in a baseline of this test to start.

I'm also going to be adding direct mitigation for variant 1.2 which this
file very specifically tests in the various forms it can arise on x86.
Again, the diff to the generated instructions should make the change for
that much more clear, so having the test as a baseline seems useful.

llvm-svn: 337672
2018-07-23 07:51:51 +00:00
Craig Topper
b2a626b52e [X86] Remove the max vector width restriction from combineLoopMAddPattern and rely splitOpsAndApply to handle splitting.
This seems to be a net improvement. There's still an issue under avx512f where we have a 512-bit vpaddd, but not vpmaddwd so we end up doing two 256-bit vpmaddwds and inserting the results before a 512-bit vpaddd. It might be better to do two 512-bits paddds with zeros in the upper half. Same number of instructions, but breaks a dependency.

llvm-svn: 337656
2018-07-22 19:44:35 +00:00
Craig Topper
d8f80e90ce [X86] Add more MADD recurrence test cases with larger and narrower vector widths.
llvm-svn: 337650
2018-07-22 05:16:47 +00:00
Simon Atanasyan
ecd1e0afdd [mips] Move out the WrapperPat declaration from the NotInMicroMips predicate
This is a follow-up to the rL335185. Those commit adds some WrapperPat
patterns for microMIPS target. But declaration of the WrapperPat class
is under the NotInMicroMips predicate and microMIPS patterns cannot be
selected because predicate (Subtarget->inMicroMipsMode()) &&
(!Subtarget->inMicroMipsMode()) is always false.

This change move out the WrapperPat class declaration from the
NotInMicroMips predicate and enables microMIPS WrapperPat patterns.

Differential revision: https://reviews.llvm.org/D49533

llvm-svn: 337646
2018-07-21 16:16:03 +00:00
Krzysztof Parzyszek
05337bdb50 [Hexagon] Disable packets in test to avoid ordering issues in checks
llvm-svn: 337624
2018-07-20 21:55:55 +00:00
Roman Tereshin
31d52847ef Reapply "[LSV] Refactoring + supporting bitcasts to a type of different size"
This reapplies commit r337489 reverted by r337541
Additionally, this commit contains a speculative fix to the issue reported in r337541
(the report does not contain an actionable reproducer, just a stack trace)

llvm-svn: 337606
2018-07-20 20:10:04 +00:00
Craig Topper
28ac623f6f [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types.
Ideally our ISD node types going into the isel table would have types consistent with their instruction domain. This prevents us having to duplicate patterns with different types for the same instruction.

Unfortunately, it seems our shuffle combining is currently relying on this a little remove some bitcasts. This seems to enable some switching between shufps and shufd. Hopefully there's some way we can address this in the combining.

Differential Revision: https://reviews.llvm.org/D49280

llvm-svn: 337590
2018-07-20 17:57:53 +00:00
Simon Pilgrim
70fcd0f481 [X86][XOP] Fix SUB constant folding for VPSHA/VPSHL shift lowering
We can safely use getConstant here as we're still lowering, which allows constant folding to kick in and simplify the vector shift codegen.

Noticed while working on D49562.

llvm-svn: 337578
2018-07-20 16:55:18 +00:00
Evandro Menezes
fffa9b5897 [ARM] Add new feature to enable optimizing the VFP registers
Enable the optimization of operations on DPR and SPR via a feature instead
of checking the target.

Differential revision: https://reviews.llvm.org/D49463

llvm-svn: 337575
2018-07-20 16:49:28 +00:00
Simon Pilgrim
c7132031a2 [X86][SSE] Use SplitOpsAndApply to improve HADD/HSUB lowering
Improve AVX1 256-bit vector HADD/HSUB matching by using SplitOpsAndApply to split into 128-bit instructions.

llvm-svn: 337568
2018-07-20 16:20:45 +00:00
Simon Pilgrim
a85b86a982 [X86][AVX] Add support for i16 256-bit vector horizontal op redundant shuffle removal
llvm-svn: 337566
2018-07-20 15:51:01 +00:00
Simon Pilgrim
a2bc2d488c [X86][AVX] Add v16i16 horizontal op redundant shuffle tests
llvm-svn: 337565
2018-07-20 15:41:15 +00:00
Nirav Dave
25802ac9fd [DAG] Avoid Node Update assertion due to AND simplification
Check for construction-time folding for incomplete AND nodes in
BackwardsPropagateMask.

Fixes PR38185.

Reviewers: RKSimon, samparker

Reviewed By: samparker

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D49444

llvm-svn: 337563
2018-07-20 15:27:24 +00:00
Simon Pilgrim
7c56bce996 [X86][AVX] Add support for 32/64 bits 256-bit vector horizontal op redundant shuffle removal
llvm-svn: 337561
2018-07-20 15:24:12 +00:00
Nirav Dave
5a4e11ad9c [DAG] Fix Memory ordering check in ReduceLoadOpStore.
When merging through a TokenFactor we need to check that the
load may be ordered such that no other aliasing memory operations may
happen. It is not sufficient to just check that the load is a member
of the chain token factor as it there may be a indirect chain. Require
the load's chain has only one use.

This fixes PR37826.

Reviewers: spatel, davide, efriedma, craig.topper, RKSimon

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D49388

llvm-svn: 337560
2018-07-20 15:20:50 +00:00
Simon Pilgrim
8342126c4e [X86][AVX] Add 256-bit vector horizontal op redundant shuffle tests
llvm-svn: 337558
2018-07-20 15:07:53 +00:00
Simon Pilgrim
f907e19b5e Regenerate partial vector fold test. NFCI.
llvm-svn: 337551
2018-07-20 13:58:57 +00:00
Simon Pilgrim
cbf5af12b0 Regenerate remainder test.
llvm-svn: 337546
2018-07-20 13:14:29 +00:00
Ulrich Weigand
9dd23b8433 [SystemZ] Test case formatting fixes
Fix systematically wrong whitespace from a prior automated change.

NFC.

llvm-svn: 337542
2018-07-20 12:12:10 +00:00
Sam McCall
57743883f1 Revert "[LSV] Refactoring + supporting bitcasts to a type of different size"
This reverts commit r337489.
It causes asserts to fire in some TensorFlow tests, e.g.
tensorflow/compiler/tests/gather_test.py on GPU.

Example stack trace:
Start test case: GatherTest.testHigherRank
assertion failed at third_party/llvm/llvm/lib/Support/APInt.cpp:819 in llvm::APInt llvm::APInt::trunc(unsigned int) const: width && "Can't truncate to 0 bits"
    @     0x5559446ebe10  __assert_fail
    @     0x55593ef32f5e  llvm::APInt::trunc()
    @     0x55593d78f86e  (anonymous namespace)::Vectorizer::lookThroughComplexAddresses()
    @     0x55593d78f2bc  (anonymous namespace)::Vectorizer::areConsecutivePointers()
    @     0x55593d78d128  (anonymous namespace)::Vectorizer::isConsecutiveAccess()
    @     0x55593d78c926  (anonymous namespace)::Vectorizer::vectorizeInstructions()
    @     0x55593d78c221  (anonymous namespace)::Vectorizer::vectorizeChains()
    @     0x55593d78b948  (anonymous namespace)::Vectorizer::run()
    @     0x55593d78b725  (anonymous namespace)::LoadStoreVectorizer::runOnFunction()
    @     0x55593edf4b17  llvm::FPPassManager::runOnFunction()
    @     0x55593edf4e55  llvm::FPPassManager::runOnModule()
    @     0x55593edf563c  (anonymous namespace)::MPPassManager::runOnModule()
    @     0x55593edf5137  llvm::legacy::PassManagerImpl::run()
    @     0x55593edf5b71  llvm::legacy::PassManager::run()
    @     0x55593ced250d  xla::gpu::IrDumpingPassManager::run()
    @     0x55593ced5033  xla::gpu::(anonymous namespace)::EmitModuleToPTX()
    @     0x55593ced40ba  xla::gpu::(anonymous namespace)::CompileModuleToPtx()
    @     0x55593ced33d0  xla::gpu::CompileToPtx()
    @     0x55593b26b2a2  xla::gpu::NVPTXCompiler::RunBackend()
    @     0x55593b21f973  xla::Service::BuildExecutable()
    @     0x555938f44e64  xla::LocalService::CompileExecutable()
    @     0x555938f30a85  xla::LocalClient::Compile()
    @     0x555938de3c29  tensorflow::XlaCompilationCache::BuildExecutable()
    @     0x555938de4e9e  tensorflow::XlaCompilationCache::CompileImpl()
    @     0x555938de3da5  tensorflow::XlaCompilationCache::Compile()
    @     0x555938c5d962  tensorflow::XlaLocalLaunchBase::Compute()
    @     0x555938c68151  tensorflow::XlaDevice::Compute()
    @     0x55593f389e1f  tensorflow::(anonymous namespace)::ExecutorState::Process()
    @     0x55593f38a625  tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady()::$_1::operator()()
*** SIGABRT received by PID 7798 (TID 7837) from PID 7798; ***

llvm-svn: 337541
2018-07-20 12:03:00 +00:00
Jonas Paulsson
c88d3f6a99 [SystemZ] Reimplent SchedModel IssueWidth and WriteRes/ReadAdvance mappings.
As a consequence of recent discussions
(http://lists.llvm.org/pipermail/llvm-dev/2018-May/123164.html), this patch
changes the SystemZ SchedModels so that the IssueWidth is 6, which is the
decoder capacity, and NumMicroOps become the number of decoder slots needed
per instruction.

In addition, the SchedWrite latencies now match the MachineInstructions
def-operand indexes, and ReadAdvances have been added on instructions with
one register operand and one memory operand.

Review: Ulrich Weigand
https://reviews.llvm.org/D47008

llvm-svn: 337538
2018-07-20 09:40:43 +00:00
Matt Arsenault
4bec7d4261 Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"
Reverts r337079 with fix for msan error.

llvm-svn: 337535
2018-07-20 09:05:08 +00:00
Stephen Canon
34f5867310 Add x86_64-unkown triple to llc for x86 test.
llvm-svn: 337523
2018-07-20 03:50:55 +00:00