36225 Commits

Author SHA1 Message Date
Carl Ritson
057934a6d7 [AMDGPU] Fix insert of SIPreAllocateWWMRegs in FastRegAlloc
SIPreAllocateWWMRegs was being inserted after RegisterCoalescer
but this pass does not exist during FastAlloc so pre-allocation
pass was never being run.
Insert pre-allocation after TwoAddressInstructionPass instead.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90236
2020-10-28 12:15:15 +09:00
Nemanja Ivanovic
5459d08795 [PowerPC] Fix single-use check and update chain users for ld-splat
When converting a BUILD_VECTOR or VECTOR_SHUFFLE to a splatting load
as of 1461fb6e783cb946b061f66689b419f74f7fad63, we inaccurately check
for a single user of the load and neglect to update the users of the
output chain of the original load. As a result, we can emit a new
load when the original load is kept and the new load can be reordered
after a dependent store. This patch fixes those two issues.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47891
2020-10-27 16:49:38 -05:00
Victor Huang
2e1a737f46 [PowerPC][PCRelative] Turn on TLS support for PCRel by default
Turn on TLS support for PCRel by default and update the test cases.

Differential Revision: https://reviews.llvm.org/D88738
Reviewed by: stefanp, kamaub
2020-10-27 13:58:44 -05:00
Michael Liao
46c3d5cb05 [amdgpu] Add the late codegen preparation pass.
Summary:
- Teach that pass to widen naturally aligned but not DWORD aligned
  sub-DWORD loads.

Reviewers: rampitec, arsenm

Subscribers:

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80364
2020-10-27 14:07:59 -04:00
Simon Pilgrim
64d3ed304f [X86] Regenerate scalar fptosi/fptoui tests. NFCI.
Merge prefixes where possible, use 'X86' instead of 'X32' (which we try to only use for gnux32 triple tests).
2020-10-27 17:44:30 +00:00
Simon Pilgrim
86bec79b15 [X86] Regenerate xor tests. NFCI.
Merge prefixes where possible, use 'X86' instead of 'X32' (which we try to only use for gnux32 triple tests).
2020-10-27 16:45:47 +00:00
Simon Pilgrim
4036551ae4 [X86] Regenerate tbm intrinsics tests. NFCI.
Merge prefixes where possible, use 'X86' instead of 'X32' (which we try to only use for gnux32 triple tests).
2020-10-27 16:45:47 +00:00
Simon Pilgrim
e0c06e310c [X86] Regenerate popcnt tests. NFCI.
Merge prefixes where possible, use 'X86' instead of 'X32' (which we try to only use for gnux32 triple tests).
2020-10-27 16:45:46 +00:00
Simon Pilgrim
fae1ffceae [X86] Regenerate xop tests with common prefixes. 2020-10-27 16:45:46 +00:00
Jay Foad
d028d2b376 [AMDGPU] Add llvm.amdgcn.div.scale with fneg tests 2020-10-27 16:05:51 +00:00
Florian Hahn
548772fe69 [AArch64] Add additional tests for vector inserts with common element. 2020-10-27 14:58:56 +00:00
Michael Liao
0d092303b4 [amdgpu] Enable use of AA during codegen.
- Add an internal option `-amdgpu-use-aa-in-codegen` to enable or
  disable this feature. By Default, it's enabled.

Differential Revision: https://reviews.llvm.org/D89320
2020-10-27 09:46:23 -04:00
Benjamin Kramer
35f7cbf9df [X86] Don't crash on CVTPS2PH with wide vector inputs. 2020-10-27 14:42:02 +01:00
Simon Pilgrim
3bc5824181 [X86] Regenerate all-ones vector tests with common prefixes. 2020-10-27 13:41:27 +00:00
Simon Pilgrim
5a855551cb [X86] Regenerate vector shift tests. NFCI.
Merge prefixes where possible, use 'X86' instead of 'X32' (which we try to only use for gnux32 triple tests).
2020-10-27 13:14:54 +00:00
Sven van Haastregt
5d03080092 [TargetLowering] Add i1 condition for bit comparison fold
For i1 types, boolean false is represented identically regardless of
the boolean content, so we can allow optimizations that otherwise
would not be correct for booleans with false represented as a negative
one.

Patch by Erik Hogeman.

Differential Revision: https://reviews.llvm.org/D90145
2020-10-27 12:22:20 +00:00
David Green
ad299364a7 [ARM][AArch64] Add VLDN shuffled interleaving tests. NFC 2020-10-27 09:27:32 +00:00
Craig Topper
f385823e04 [X86] Alternate implementation of D88194.
This uses PreprocessISelDAG to replace the constant before
instruction selection instead of matching opcodes after.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D89178
2020-10-27 00:20:03 -07:00
Wei Wang
d602e79a81 [X86] Encode global address in small code model
In small code model, program and its symbols are linked in the lower 2 GB of
the address space. Try encoding global address even when the range is unknown
in such case.

Differential Revision: https://reviews.llvm.org/D89341
2020-10-26 23:14:06 -07:00
Carl Ritson
7a880ab388 [AMDGPU] Move WQM Pass after MI Scheduler
Exec mask manipulation inserted by SIWholeQuadMode barriers to
instruction scheduling.  Move the entire pass after the machine
instruction scheduler and make changes so pass is correct for
non-SSA operation.  These changes should leave the pass still
usable pre-scheduler, although tests have be updated to reflect
post-scheduler results.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D88081
2020-10-27 10:25:53 +09:00
Amy Kwan
803cc3aff2 [PowerPC] Implement Set Boolean Condition Instructions
This patch implements the set boolean condition instructions introduced in
POWER10.

The set boolean condition instructions (set[n]bc[r]) are used during
the following situations:
- sign/zero/any extending i1 to an i32 or i64,
- reg+reg, reg+imm or floating point comparisons being sign/zero extended to i32 or i64,
- spilling CR bits (using the setnbc instruction)

Differential Revision: https://reviews.llvm.org/D87705
2020-10-26 18:42:51 -05:00
Rahman Lavaee
0b2f4cdf2b Explicitly check for entry basic block, rather than relying on MachineBasicBlock::pred_empty.
Sometimes in unoptimized code, we have dangling unreachable basic blocks with no predecessors. Basic block sections should be emitted for those as well. Without this patch, the included test fails with a fatal error in `AsmPrinter::emitBasicBlockEnd`.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D89423
2020-10-26 16:15:56 -07:00
Stanislav Mekhanoshin
038d884a50 [AMDGPU] Use flat scratch instructions where available
The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.

At the very least missing:

- GlobalISel;
- Some optimizations in frame elimination in between vector
  and scalar ALU;
- It shall finally allow to always materialize frame index
  as an SGPR, but that is not implemented and frame elimination
  cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
  is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
  expression to recover the unswizzled scratch address;

Differential Revision: https://reviews.llvm.org/D89170
2020-10-26 14:40:42 -07:00
Florian Hahn
89485efc26 [AArch64] Extend tests for insertelement improvements.
Extends the tests added in a562dc82a8d9488d35ff535302716141bc6feaa3 to
cover more vector variants.
2020-10-26 17:57:12 +00:00
Peter Waller
5b742a0c10 [SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination
The modified code in visitSTORE was missing a scalable vector check, and still
using the now deprecated implicit cast of TypeSize to uint64_t through the
overloaded operator. This patch fixes these issues.

This brings the logic in line with the comment on the context line immediately
above the added precondition.

Add a test in sve-redundant-store.ll that the warning is not triggered.

Differential Revision: https://reviews.llvm.org/D89701
2020-10-26 16:37:48 +00:00
Peter Waller
6536d6040f Revert "[SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination"
This reverts commit 4604441386dc5fcd3165f4b39f5fa2e2c600f1bc.

Reverting because it was not the intended version of the patch, which
follows this patch.
2020-10-26 16:37:00 +00:00
Peter Waller
4604441386 [SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination
The modified code in visitSTORE was missing a scalable vector check, and still
using the now deprecated implicit cast of TypeSize to uint64_t through the
overloaded operator. This patch fixes these issues.

This brings the logic in line with the comment on the context line immediately
above the added precondition.

Add a test in Redundantstores.ll that the warning is not triggered.
2020-10-26 16:23:42 +00:00
Florian Hahn
a562dc82a8 [AArch64] Add 2 cases where insertelement lowering could be improved. 2020-10-26 15:37:17 +00:00
Simon Pilgrim
2030db328a [X86] Use mtriple instead of march in MIR tests 2020-10-26 15:31:33 +00:00
Kazushi (Jam) Marukawa
cfefef50c1 [VE] Support atomic store
Support atomic store instructions and add a regression test.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90137
2020-10-27 00:28:11 +09:00
Fraser Cormack
ffa6d2afa4 [DAGCombine] Add test case showing incorrect DAGCombine optimization
This optmization produces incorrect results when the vector element type
is not byte-sized. Related to D78568.
2020-10-26 12:37:31 +00:00
Tyker
4afa077899 Try to fix buildbots after d3205bbca3e0002d76282878986993e7e7994779 2020-10-26 11:49:21 +01:00
Tyker
d3205bbca3 [Annotation] Allows annotation to carry some additional constant arguments.
This allows using annotation in a much more contexts than it currently has.
especially when annotation with template or constexpr.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D88645
2020-10-26 10:50:05 +01:00
Florian Hahn
b2bec7cece [AsmPrinter] Add per BB instruction mix remark.
This patch adds a remarks that provides counts for each opcode per basic block.

An snippet of the generated information can be seen below.

The current implementation uses the target specific opcode for the counts. For example, on AArch64 this means we currently get 2 entries for `add` instructions if the block contains 32 and 64 bit adds. Similarly, immediate version are treated differently.

Unfortunately there seems to be no convenient way to get only the mnemonic part of the instruction as a string AFAIK. This could be improved in the future.

```
--- !Analysis
Pass:            asm-printer
Name:            InstructionMix
DebugLoc:        { File: arm64-instruction-mix-remarks.ll, Line: 30, Column: 30 }
Function:        foo
Args:
  - String:          'BasicBlock: '
  - BasicBlock:      else
  - String:          "\n"
  - String:          INST_MADDWrrr
  - String:          ': '
  - INST_MADDWrrr:   '2'
  - String:          "\n"
  - String:          INST_MOVZWi
  - String:          ': '
  - INST_MOVZWi:     '1'
```

Reviewed By: anemet, thegameg, paquette

Differential Revision: https://reviews.llvm.org/D89892
2020-10-26 09:25:45 +00:00
Sebastian Neubauer
a094b4fa4b [AMDGPU] Emit new pal metadata by default
If no pal metadata is given, default to the msgpack format instead of
the legacy metadata. This makes tests better readable.

Differential Revision: https://reviews.llvm.org/D90035
2020-10-26 10:16:17 +01:00
Kai Luo
82150dae86 [PowerPC] Add test case for pr47830. NFC. 2020-10-26 09:11:33 +00:00
Kazushi (Jam) Marukawa
f32992ad24 [VE] Support atomic load
Support atomic load instruction and add a regression test.
VE uses release consitency, so need to insert fence around
atomic instructions.  This patch enable AtomicExpandPass
and use emitLeadingFence and emitTrailingFence mechanism
for such purpose.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90135
2020-10-26 18:02:45 +09:00
Kazushi (Jam) Marukawa
52f03fe115 [VE] Support atomic fence
Support atomic fence instruction and add a regression test.
Add MEMBARRIER pseudo insturction also to use it as a barrier
against to the compiler optimizations.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90112
2020-10-26 17:03:09 +09:00
Christudasan Devadasan
5a061041ec [AMDGPU] Avoid offset register in MUBUF for direct stack object accesses
We use an absolute address for stack objects and
it would be necessary to have a constant 0 for soffset field.

Fixes: SWDEV-228562

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89234
2020-10-26 11:08:37 +05:30
Benjamin Kramer
39a0d6889d [X86] Add a stub for Intel's alderlake.
No scheduling, no autodetection.
2020-10-24 19:01:22 +02:00
Benjamin Kramer
bd2cf96c09 [X86] Add a stub for znver3 based on the little public information there is in AMD's manuals
No scheduling, no autodetection. Just enough so -march=znver3 works.
2020-10-24 19:01:22 +02:00
Simon Pilgrim
62b17a7697 [LegalizeTypes] Legalize vector rotate operations
Lower vector rotate operations as long as the legalization occurs outside of LegalizeVectorOps.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47320

Patch By: @rsanthir.quic (Ryan Santhirarajan)

Differential Revision: https://reviews.llvm.org/D89497
2020-10-24 11:30:32 +01:00
Jonas Paulsson
7c026a83ee [SystemZ] Define MaxInstLength to have the value of 6.
This value had the default value of 4 which caused branch relaxation to fail.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D90065
2020-10-24 09:19:34 +02:00
Krzysztof Parzyszek
1b5baa42bc [Hexagon] Handle selection between HVX vector predicates
Make sure that (select i1 q0 q1) is handled properly.
2020-10-23 18:22:03 -05:00
Cameron McInally
a1cc274cb3 [SVE] Lower fixed length VECREDUCE_SEQ_FADD operation
Differential Revision: https://reviews.llvm.org/D89162
2020-10-23 16:24:02 -05:00
Nick Desaulniers
b7926ce6d7 [IR] add fn attr for no_stack_protector; prevent inlining on mismatch
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.

It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.

While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u

Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining.  Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.

Fixes pr/47479.

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D87956
2020-10-23 11:55:39 -07:00
Baptiste Saleil
edb27912a3 [PowerPC] Add intrinsics for MMA
This patch adds support for MMA intrinsics.

Authored by: Baptiste Saleil

Reviewed By: #powerpc, bsaleil, amyk

Differential Revision: https://reviews.llvm.org/D89345
2020-10-23 13:16:02 -05:00
Amara Emerson
0f0fd383b4 [AArch64][GlobalISel] Introduce a new post-isel optimization pass.
There are two optimizations here:

1. Consider the following code:
 FCMPSrr %0, %1, implicit-def $nzcv
 %sel1:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
 %sub:gpr32 = SUBSWrr %_, %_, implicit-def $nzcv
 FCMPSrr %0, %1, implicit-def $nzcv
 %sel2:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
This kind of code where we have 2 FCMPs each feeding a CSEL can happen
when we have a single IR fcmp being used by two selects. During selection,
to ensure that there can be no clobbering of nzcv between the fcmp and the
csel, we have to generate an fcmp immediately before each csel is
selected.

However, often we can essentially CSE these together later in MachineCSE.
This doesn't work though if there are unrelated flag-setting instructions
in between the two FCMPs. In this case, the SUBS defines NZCV
but it doesn't have any users, being overwritten by the second FCMP.

Our solution here is to try to convert flag setting operations between
a interval of identical FCMPs, so that CSE will be able to eliminate one.

2. SelectionDAG imported patterns for arithmetic ops currently select the
flag-setting ops for CSE reasons, and add the implicit-def $nzcv operand
to those instructions. However if those impdef operands are not marked as
dead, the peephole optimizations are not able to optimize them into non-flag
setting variants. The optimization here is to find these dead imp-defs and
mark them as such.

This pass is only enabled when optimizations are enabled.

Differential Revision: https://reviews.llvm.org/D89415
2020-10-23 10:18:36 -07:00
Huihui Zhang
1e113c078a [AArch64][SVE] Fix umin/umax lowering to handle out of range imm.
Immediate must be in an integer range [0,255] for umin/umax instruction.
Extend pattern matching helper SelectSVEArithImm() to take in value type
bitwidth when checking immediate value is in range or not.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D89831
2020-10-23 09:42:56 -07:00
Victor Huang
7a74bb899a [PowerPC] Fix the Predicates for enabling pcrelative-memops and PLXVP/PSTXVP definitions
In this patch, Predicates fix added for the following:
* disable prefix-instrs will disable pcrelative-memops
* set two predicates PairedVectorMemops and PrefixInstrs for PLXVP/PSTXVP definitions

Differential Revision: https://reviews.llvm.org/D89727
Reviewed by: amyk, steven.zhang
2020-10-23 11:33:20 -05:00