32115 Commits

Author SHA1 Message Date
Sjoerd Meijer
8af859d514 [MachineLoop] New helper isLoopInvariant()
This factors out code from MachineLICM that determines whether an instruction
is loop-invariant, which is a generally useful function. Thus this allows to
use that helper elsewhere too.

Differential Revision: https://reviews.llvm.org/D94082
2021-01-08 09:04:56 +00:00
Christudasan Devadasan
ae25a397e9 AMDGPU/GlobalISel: Enable sret demotion 2021-01-08 10:56:35 +05:30
Kazu Hirata
8febb2e0f5 [CodeGen] Remove unused function isCallerPreservedOrConstPhysReg (NFC)
The last use of the function was removed on Oct 20, 2018 in commit
8d6ff4c0af843e1a61b76d89812aed91e358de34.
2021-01-07 20:29:32 -08:00
Matt Arsenault
2cbbc6e87c GlobalISel: Fail legalization on narrowing extload below memory size 2021-01-07 17:40:34 -05:00
Matt Arsenault
1f9b6ef91f GlobalISel: Add combine for G_UREM by power of 2
Really I want this in the legalizer, but this is a start.
2021-01-07 16:36:35 -05:00
Wouter van Oortmerssen
5c38ae36c5 [WebAssembly] Fixed byval args missing DWARF DW_AT_LOCATION
A struct in C passed by value did not get debug information. Such values are currently
lowered to a Wasm local even in -O0 (not to an alloca like on other archs), which becomes
a Target Index operand (TI_LOCAL). The DWARF writing code was not emitting locations
in for TI's specifically if the location is a single range (not a list).

In addition, the ExplicitLocals pass which removes the ARGUMENT pseudo instructions did
not update the associated DBG_VALUEs, and couldn't even find these values since the code
assumed such instructions are adjacent, which is not the case here.

Also fixed asm printing of TIs needed by a test.

Differential Revision: https://reviews.llvm.org/D94140
2021-01-07 10:31:38 -08:00
Matt Arsenault
c9122ddef5 CodeGen: Refactor regallocator command line and target selection
Make the sequence of passes to select and rewrite instructions to
physical registers be a target callback. This is to prepare to allow
targets to split register allocation into multiple phases.
2021-01-07 13:13:25 -05:00
Simon Pilgrim
350ab7aa1c [DAG] Simplify OR(X,SHL(Y,BW/2)) eq/ne 0/-1 'all/any-of' style patterns
Attempt to simplify all/any-of style patterns that concatenate 2 smaller integers together into an and(x,y)/or(x,y) + icmp 0/-1 instead.

This is mainly to help some bool predicate reduction patterns where we end up concatenating bool vectors that have been bitcasted to integers.

Differential Revision: https://reviews.llvm.org/D93599
2021-01-07 12:03:19 +00:00
Sanjoy Das
a855c9403f [NFC] Don't copy MachineFrameInfo on each invocation of HasAlias
Also fix a typo in a comment.  This fixes a compile time issue in XLA
(https://www.tensorflow.org/xla).

Differential Revision: https://reviews.llvm.org/D94182
2021-01-06 18:59:20 -08:00
Kazu Hirata
cfeecdf7b6 [llvm] Use llvm::all_of (NFC) 2021-01-06 18:27:36 -08:00
Simon Pilgrim
1307e3f6c4 [TargetLowering] Add icmp ne/eq (srl (ctlz x), log2(bw)) vector support. 2021-01-06 16:13:51 +00:00
Sander de Smalen
aa280c99f7 [AArch64][SVE] Emit DWARF location expr for SVE (dbg.declare)
When using dbg.declare, the debug-info is generated from a list of
locals rather than through DBG_VALUE instructions in the MIR.
This patch is different from D90020 because it emits the DWARF
location expressions from that list of locals directly.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D90044
2021-01-06 11:45:05 +00:00
Sander de Smalen
84a1120943 [LiveDebugValues] Handle spill locations with a fixed and scalable component.
This patch fixes the two LiveDebugValues implementations
(InstrRef/VarLoc)Based to handle cases where the StackOffset contains
both a fixed and scalable component.

This depends on the `TargetRegisterInfo::prependOffsetExpression` being
added in D90020. Feel free to leave comments on that patch if you have them.

Reviewed By: djtodoro, jmorse

Differential Revision: https://reviews.llvm.org/D90046
2021-01-06 11:30:13 +00:00
Sander de Smalen
a7e3339f3b [AArch64][SVE] Emit DWARF location expression for SVE stack objects.
Extend PEI to emit a DWARF expression for StackOffsets that have
a fixed and scalable component. This means the expression that needs
to be added is either:
  <base> + offset
or:
  <base> + offset + scalable_offset * scalereg

where for SVE, the scale reg is the Vector Granule Dwarf register, which
encodes the number of 64bit 'granules' in an SVE vector and which
the debugger can evaluate at runtime.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D90020
2021-01-06 09:40:53 +00:00
Kazu Hirata
cea1c63756 [MachineSink] Construct SmallVector with iterator ranges (NFC) 2021-01-05 21:15:57 -08:00
Christudasan Devadasan
d68458bd56 [GlobalISel] Base implementation for sret demotion.
If the return values can't be lowered to registers
SelectionDAG performs the sret demotion. This patch
contains the basic implementation for the same in
the GlobalISel pipeline.

Furthermore, targets should bring relevant changes
during lowerFormalArguments, lowerReturn and
lowerCall to make use of this feature.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D92953
2021-01-06 10:30:50 +05:30
David Blaikie
ad18b075fd DebugInfo: Add support for always using ranges (rather than low/high pc) in DWARFv5
Given the ability provided by DWARFv5 rnglists to reuse addresses in the
address pool, it can be advantageous to object file size to use range
encodings even when the range could be described by a direct low/high
pc.

Add a flag to allow enabling this in DWARFv5 for the purpose of
experimentation/data gathering.

It might be that it makes sense to enable this functionality by default
for DWARFv5 + Split DWARF at least, where the tradeoff/desire to
optimize for .o file size is more explicit and .o bytes are higher
priority than .dwo bytes.
2021-01-05 16:36:22 -08:00
Craig Topper
4ef91f5871 [DAGCombiner] Don't speculatively create an all ones constant in visitREM that might not be used.
This looks to have been done to save some duplicated code under
two different if statements, but it ends up being harmful to D94073.
This speculative constant can be called on a scalable vector type
with i64 element size when i64 scalars aren't legal. The code tries
and fails to find a vector type with i32 elements that it can use.

So only create the node when we know it will be used.
2021-01-05 12:45:57 -08:00
Matt Arsenault
a427f15d60 GlobalISel: Add isKnownToBeAPowerOfTwo helper function 2021-01-05 12:59:08 -05:00
Jinsong Ji
f26bc0ddd5 [RegisterClassInfo] Return non-zero for RC without allocatable reg
In some case, the RC may have 0 allocatable reg.
eg: VRSAVERC in PowerPC, which has only 1 reg, but it is also reserved.

The curreent implementation will keep calling the computePSetLimit because
getRegPressureSetLimit assume computePSetLimit will return a non-zero value.

The fix simply early return the value from TableGen for such special case.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92907
2021-01-05 16:18:34 +00:00
Fraser Cormack
9a1ac97d3a [CodeGen] Format SelectionDAG::getConstant methods (NFC) 2021-01-05 12:59:46 +00:00
QingShan Zhang
2962f1149c [NFC] Add the getSizeInBytes() interface for MachineConstantPoolValue
Current implementation assumes that, each MachineConstantPoolValue takes
up sizeof(MachineConstantPoolValue::Ty) bytes. For PowerPC, we want to
lump all the constants with the same type as one MachineConstantPoolValue
to save the cost that calculate the TOC entry for each const. So, we need
to extend the MachineConstantPoolValue that break this assumption.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D89108
2021-01-05 03:22:45 +00:00
Roman Lebedev
b4f519bddd
[NFCI] DwarfEHPrepare: update DomTree in non-permissive mode, when present
Being stricter will catch issues that would be just papered over
in permissive mode, and is likely faster.
2021-01-05 01:26:36 +03:00
Cameron McInally
92be640bd7 [FPEnv][AMDGPU] Disable FSUB(-0,X)->FNEG(X) DAGCombine when subnormals are flushed
This patch disables the FSUB(-0,X)->FNEG(X) DAG combine when we're flushing subnormals. It requires updating the existing AMDGPU tests to use the fneg IR instruction, in place of the old fsub(-0,X) canonical form, since AMDGPU is the only backend currently checking the DenormalMode flags.

Note that this will require follow-up optimizations to make sure the FSUB(-0,X) form is handled appropriately

Differential Revision: https://reviews.llvm.org/D93243
2021-01-04 14:44:10 -06:00
Kazu Hirata
eb198f4c3c [llvm] Use llvm::any_of (NFC) 2021-01-04 11:42:47 -08:00
Florian Hahn
ed936aad78
[InterleavedAccess] Return correct 'modified' status.
Both tryReplaceExtracts and replaceBinOpShuffles may modify the IR, even
if no interleaved loads are generated, but currently the pass pretends
no changes were made.

This patch updates the pass to return true if either of the functions
made any changes. In case of tryReplaceExtracts, changes are made if
there are any Extracts and true is returned.

`replaceBinOpShuffles` always makes changes if BinOpShuffles is not empty.
It also always returned true, so I went ahead and change it to just
`replaceBinOpShuffles`.

Fixes PR48208.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D93997
2021-01-04 15:49:47 +00:00
Kazu Hirata
ba82c0b315 [llvm] Call *(Set|Map)::erase directly (NFC)
We can erase an item in a set or map without checking its membership
first.
2021-01-03 09:57:47 -08:00
Brandon Bergren
8f004471c2 [PowerPC] Add the LLVM triple for powerpcle [1/5]
Add a triple for powerpcle-*-*.

This is a little-endian encoding of the 32-bit PowerPC ABI, useful in certain niche situations:

1) A loader such as the FreeBSD loader which will be loading a little endian kernel. This is required for PowerPC64LE to load properly in pseries VMs.
Such a loader is implemented as a freestanding ELF32 LSB binary.

2) Userspace emulation of a 32-bit LE architecture such as x86 on 64-bit hosts such as PowerPC64LE with tools like box86 requires having a 32-bit LE toolchain and library set, as they operate by translating only the main binary and switching to native code when making library calls.

3) The Void Linux for PowerPC project is experimenting with running an entire powerpcle userland.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D93918
2021-01-02 12:17:22 -06:00
Kazu Hirata
171c5fd43e [llvm] Use llvm::erase_value and llvm::erase_if (NFC) 2021-01-02 09:24:15 -08:00
Roman Lebedev
f4ea21947d
[NFCI][CodeGen] DwarfEHPrepare: don't actually pass DTU into simplifyCFG by default
also, don't verify DomTree unless we intend to maintain it.
This is a very dumb think-o, i guess i was even warned about it
by subconsciousness in 4b80647367950ba3da6a08260487fd0dbc50a9c5's
commit message..

Fixes a compile-time regression reported by Martin Storsjö
in post-commit review of 2461cdb41724298591133c811df82b0064adfa6b.
2021-01-02 14:38:52 +03:00
Yang Fan
471dec3801
[CodeGen][NFC] Fix a build warning due to an extra semicolon 2021-01-02 10:42:58 +08:00
Roman Lebedev
2461cdb417
[CodeGen][SimplifyCFG] Teach DwarfEHPrepare to preserve DomTree
Once the default for SimplifyCFG flips, we can no longer pass nullptr
instead of DomTree to SimplifyCFG, so we need to propagate it here.

We don't strictly need to actually preserve DomTree in DwarfEHPrepare,
but we might as well do it, since it's trivial.
2021-01-02 01:01:19 +03:00
Roman Lebedev
e6b1a27fb9
[NFC][CodeGen] Split DwarfEHPrepare pass into an actual transform and an legacy-PM wrapper
This is consistent with the layout of other passes,
and simplifies further refinements regarding DomTree handling.

This is indended to be a NFC commit.
2021-01-02 01:01:19 +03:00
Roman Lebedev
c38739ad8f
[NFC] clang-format the entire DwarfEHPrepare.cpp 2021-01-02 01:01:19 +03:00
Sanjay Patel
c74e8539ff [Analysis] flatten enums for recurrence types
This is almost all mechanical search-and-replace and
no-functional-change-intended (NFC). Having a single
enum makes it easier to match/reason about the
reduction cases.

The goal is to remove `Opcode` from reduction matching
code in the vectorizers because that makes it harder to
adapt the code to handle intrinsics.

The code in RecurrenceDescriptor::AddReductionVar() is
the only place that required closer inspection. It uses
a RecurrenceDescriptor and a second InstDesc to sometimes
overwrite part of the struct. It seem like we should be
able to simplify that logic, but it's not clear exactly
which cmp+sel patterns that we are trying to handle/avoid.
2021-01-01 12:20:16 -05:00
Fangrui Song
d1fd72343c Refactor how -fno-semantic-interposition sets dso_local on default visibility external linkage definitions
The idea is that the CC1 default for ELF should set dso_local on default
visibility external linkage definitions in the default -mrelocation-model pic
mode (-fpic/-fPIC) to match COFF/Mach-O and make output IR similar.

The refactoring is made available by 2820a2ca3a0e69c3f301845420e0067ffff2251b.

Currently only x86 supports local aliases. We move the decision to the driver.
There are three CC1 states:

* -fsemantic-interposition: make some linkages interposable and make default visibility external linkage definitions dso_preemptable.
* (default): selected if the target supports .Lfoo$local: make default visibility external linkage definitions dso_local
* -fhalf-no-semantic-interposition: if neither option is set or the target does not support .Lfoo$local: like -fno-semantic-interposition but local aliases are not used. So references can be interposed if not optimized out.

Add -fhalf-no-semantic-interposition to a few tests using the half-based semantic interposition behavior.
2020-12-31 13:59:45 -08:00
Juneyoung Lee
5cdf6ed744 [CodeGen] recognize select form of and/ors when splitting branch conditions
Recently a few patches are made to move towards using select i1 instead of and/or i1 to represent "a && b"/"a || b" in C/C++.
"a && b" in C/C++ does not evaluate b if a is false whereas 'and a, b' in IR evaluates b and uses its result regardless of the result of a.
This is problematic because it can cause miscompilation if b was an erroneous operation (https://llvm.org/pr48353).
In C/C++, the result is simply false because b is not evaluated, but in IR the result is poison.
The discussion at D93065 has more context about this.

This patch makes two branch-splitting optimizations (one in SelectionDAGBuilder, one in CodeGenPrepare) recognize
select form of and/or as well using m_LogicalAnd/Or.
Since it is CodeGen, I think this is semantically ok (at least as safe as what codegen already did).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93853
2021-01-01 04:46:10 +09:00
Kazu Hirata
7bc76fd0ec [CodeGen] Construct SmallVector with iterator ranges (NFC) 2020-12-31 09:39:11 -08:00
Fangrui Song
f731839584 [LowerEmuTls] Copy dso_local from <var> to __emutls_v.<var>
This effect is not testable until we drop the implied dso_local for ELF
static/PIE defined symbols from TargetMachine::shouldAssumeDSOLocal.
2020-12-30 16:11:32 -08:00
Juneyoung Lee
9b29610228 Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923
2020-12-30 22:36:08 +09:00
Luo, Yuanke
981a0bd858 [X86] Add x86_amx type for intel AMX.
The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when
it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it
is used by load/store instruction. So amx intrinsics only operate on type x86_amx.
It can help to separate amx intrinsics from llvm IR instructions (+-*/).
Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981.

Differential Revision: https://reviews.llvm.org/D91927
2020-12-30 13:52:13 +08:00
Yuanfang Chen
480936e741 Reland "[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline" (again)
This reverts commit 16c8f6e91344ec9840d6aa9ec6b8d0c87a104ca3 with fix.

-Wswitch catched an unhandled enum value due to recent commits in
TargetPassConfig.cpp.
2020-12-29 16:39:55 -08:00
Yuanfang Chen
16c8f6e913 Revert "Reland "[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline""
This reverts commit 21314940c4856e0cb81b664fd2d2117d1b7dc3e3.

Build failure in some bots.
2020-12-29 16:29:07 -08:00
Yuanfang Chen
21314940c4 Reland "[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline"
This reverts commit 94427af60c66ffea655a3084825c6c3a9deec1ad (relands
4646de5d75cfce3da4ddeffb6eb8e66e38238800 with fix).

Use "return std::move(AsmStreamer);" instead of "return AsmStreamer;" in
LVMTargetMachine::createMCStreamer. Unlike Clang, GCC seems having trouble
inserting a implicit lvalue->rvalue conversion.
2020-12-29 15:17:23 -08:00
Kazu Hirata
1e3ed09165 [CodeGen] Use llvm::append_range (NFC) 2020-12-28 19:55:16 -08:00
Yuanfang Chen
94427af60c Revert "[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline"
This reverts commit 4646de5d75cfce3da4ddeffb6eb8e66e38238800.

Some bots have build failure.
2020-12-28 17:44:22 -08:00
Yuanfang Chen
4646de5d75 [NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline
Following up on D67687.
Please refer to the RFC here http://lists.llvm.org/pipermail/llvm-dev/2020-July/143309.html

`CodeGenPassBuilder` is the NPM counterpart of `TargetPassConfig` with below differences.
- Debugging features (MIR print/verify, disable pass, start/stop-before/after, etc.) living in `TargetPassConfig` are moved to use PassInstrument as much as possible. (Implementation also lives in `TargetPassConfig.cpp`)
- `TargetPassConfig` is a polymorphic base (virtual inheritance) to build the target-dependent pipeline whereas `CodeGenPassBuilder` is the CRTP base/helper to implement the target-dependent pipeline. The motivation is flexibility for targets to customize the pipeline, inlining opportunity, and fits the overall NPM value semantics design.
- `TargetPassConfig` is a legacy immutable pass to declare hooks for targets to customize some target-independent codegen layer behavior. This is partially ported to TargetMachine::options. The rest, such as `createMachineScheduler/createPostMachineScheduler`, are left out for now. They should be implemented in LLVMTargetMachine in the future.

Reviewed By: arsenm, aeubanks

Differential Revision: https://reviews.llvm.org/D83608
2020-12-28 17:36:36 -08:00
Gabriel Hjort Åkerlund
b9a7c89d43 [MIRPrinter] Fix incorrect output of unnamed stack names
The MIRParser expects unnamed stack entries to have empty names ('').
In case of unnamed alloca instructions, the MIRPrinter would output
'<unnamed alloca>', which caused the MIRParser to reject the generated
code.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93685
2020-12-28 18:01:40 +01:00
Chen Zheng
31c2b93d83 [MachineSink] add threshold in machinesink pass to reduce compiling time. 2020-12-27 23:23:07 -05:00
Kazu Hirata
789d250613 [CodeGen, Transforms] Use *Map::lookup (NFC) 2020-12-27 09:57:27 -08:00