32807 Commits

Author SHA1 Message Date
Paul Walker
96c8d615d6 [SVE] Extend findMoreOptimalIndexType so BUILD_VECTORs do not force 64bit indices.
Extends findMoreOptimalIndexType to allow ISD::BUILD_VECTOR based
indices to be truncated when such truncation is lossless. This can
enable the use of 32bit gather/scatter indices thus making it less
likely to have to split a gather/scatter in two.

Depends on D125194

Differential Revision: https://reviews.llvm.org/D130533
2022-08-18 18:00:53 +01:00
Simon Pilgrim
fdec50182d [CostModel] Replace getUserCost with getInstructionCost
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Differential Revision: https://reviews.llvm.org/D79483
2022-08-18 11:55:23 +01:00
wanglian
989ebc1783 [DAGCombiner][NFC] Tidy up unnecessary brackets in visitADD.
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D132107
2022-08-18 15:48:22 +08:00
wanglian
230e277dfe [DAGCombiner][NFC] Merge two if statement into one.
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131941
2022-08-18 10:12:35 +08:00
Daniil Fukalov
7ed3d81333 [NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.

E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.

Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D117723
2022-08-18 00:38:55 +03:00
Sanjay Patel
7f72a0f5bb [SDAG] avoid generating libcall to function with same name
This is a potentially better alternative to D131452 that also
should avoid the infinite loop bug from:
issue #56403

This is again a minimal fix to reduce merging pain for the
release. But if this makes sense, then we might want to guard
all of the RTLIB generation (and other libcalls?) with a
similar name check.

Differential Revision: https://reviews.llvm.org/D131521
2022-08-17 16:19:34 -04:00
Matthias Braun
19ce5e515f RAGreedyStats: Ignore identity COPYs; count COPYs from/to physregs
Improve copy statistics:

- Count copies from or to physical registers: They are used to model function parameters and calling conventions and the register allocator optimizes for them.
- Check physical registers assigned to virtual registers and stop counting "identity" `COPY`s where source and destination is the same physical registers; they will be removed in the `virtregmap` pass anyway.

Differential Revision: https://reviews.llvm.org/D131932
2022-08-17 12:53:29 -07:00
Archit Saxena
e170d955fe Split EH code by default
The current machine function splitter is reliant on profile data to do profile summary analysis to split blocks into cold section. This may sometimes limit the usage of machine function splitter especially in cases where we could do some form of static analysis to split out cold blocks if profile data is absent or profile data which may be faulty (Consider Sample PGO).

Of all code that could statically be marked cold Exception handling blocks are one of them (In fact BFI framework also tends to mark them as cold), and the most in size contribution. In my experiments I found out Exception handling pads and all code reachable from there account for up to 6-8% of the .text section on modern production binaries. This patch introduces a flag to split out all Exception handling blocks and blocks only reachable from Exceptional Handling pad to cold section. This flag has shown to give a performance win of up to 0.1% in terms of average cycles and instructions executed on internal facebook search service.

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D131824
2022-08-17 12:40:31 -07:00
Nick Desaulniers
6b0e2fa6f0 [SelectionDAG] make INLINEASM_BR use MachineBasicBlocks instead of BlockAddresses
As part of re-architecting callbr to no longer use blockaddresses
(https://reviews.llvm.org/D129288), we don't really need them in MIR.
They make comparing MachineBasicBlocks of indirect targets during
MachineVerifier a PITA.

Suggested by @efriedma from the discussion:
https://reviews.llvm.org/D130290#3669531

Reviewed By: efriedma, void

Differential Revision: https://reviews.llvm.org/D130316
2022-08-17 09:34:31 -07:00
David Penry
1c9f0408bc Revert "[ModuloSchedule] Add interface call to accept/reject SMS schedules"
This reverts commit 8c4aea438c310816bb4e4f9a32d783381ef3182e.

Needed because buildbot failures (warnings) gave a clue that there was
a functional bug in the ARM rejection logic.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D132037
2022-08-17 09:32:43 -07:00
David Penry
8c4aea438c [ModuloSchedule] Add interface call to accept/reject SMS schedules
This interface allows a target to reject a proposed
SMS schedule.  For Hexagon/PowerPC, all schedules
are accepted, leaving behavior unchanged.  For ARM,
schedules which exceed register pressure limits are
rejected.

Also, two RegisterPressureTracker methods now need to be public so
that register pressure can be computed by more callers.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D128941
2022-08-17 08:13:26 -07:00
Andre Vieira
49223e0a2d [TypePromotion] Don't promote PHI + ZExt if wider than RegisterBitWidth
Differential Revision: https://reviews.llvm.org/D131966
2022-08-17 09:54:15 +01:00
Eli Friedman
cfd2c5ce58 Untangle the mess which is MachineBasicBlock::hasAddressTaken().
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code.  Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.

Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.

So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.

Discovered while trying to sort out related stuff on D102817.

Differential Revision: https://reviews.llvm.org/D124697
2022-08-16 16:15:44 -07:00
Nicolas Miller
ccfabfbb1f Fix subrange liveness checking at rematerialization
This patch fixes an issue where an instruction reading a whole register would be moved during register allocation into a spot where one of the subregisters was dead.

The code to check whether an instruction can be rematerialized at a given point or not was already checking for subranges to ensure that subregisters are live, but only when the instruction being moved was using a subregister, this patch changes that so the subranges are checked even when the moved instruction uses the full register.

This patch also adds a case to the original test for the subrange checking that trigger the issue described above.

The original subrange checking code was introduced in this revision: https://reviews.llvm.org/D115278

And I've encountered this issue on AMDGPUs while working with DPC++: https://github.com/intel/llvm/issues/6209

Essentially the greedy register allocator attempts to move the following instruction:

```
%3961:vreg_64 = V_LSHLREV_B64_e64 3, %3078:vreg_64, implicit $exec
```

From `@3440` into the body of a loop `@16312`, but `%3078` has the following live ranges:

```
%3078 [2224r,2240r:0)[2240r,3488B:1)[16192B,38336B:1) 0@2224r 1@2240r  L0000000000000003 [2224r,3440r:0) 0@2224r  L000000000000000C [2240r,3488B:0)[16192B,38336B:0) 0@2240r
```

So `@16312e` `%3078.sub1` is alive but `%3078.sub0` is dead, so this instruction being moved there leads to invalid memory accesses as `3078.sub0` ends up being trashed and the result of this instruction is used as part of an address calculation for a load.

On the original ticket this issue showed up on gfx906 and gfx90a but not on gfx908, this turned out to be because on gfx908 instead of moving the shift instruction into the loop, its value is spilled into an ACC register, gfx906 doesn't have ACC registers and for gfx90a ACC registers are used like regular vector registers and so aren't used for spilling.

With this patch the original application from the DPC++ ticket works properly on gfx906, and the result of the shift instruction is correctly spilled instead of moving the instruction in the loop.

Original Author: npmiller

Reviewed by: rampitec

Submitted by: rampitec

Differential Revision: https://reviews.llvm.org/D131884
2022-08-16 10:50:09 -07:00
Arthur Eubanks
9181ce623f [Windows] Put init_seg(compiler/lib) in llvm.global_ctors
Currently we treat initializers with init_seg(compiler/lib) as similar
to any other init_seg, they simply have a global variable in the proper
section (".CRT$XCC" for compiler/".CRT$XCL" for lib) and are added to
llvm.used. However, this doesn't match with how LLVM sees normal (or
init_seg(user)) initializers via llvm.global_ctors. This
causes issues like incorrect init_seg(compiler) vs init_seg(user)
ordering due to GlobalOpt evaluating constructors, and the
ability to remove init_seg(compiler/lib) initializers at all.

Currently we use 'A' for priorities less than 200. Use 200 for
init_seg(compiler) (".CRT$XCC") and 400 for init_seg(lib) (".CRT$XCL"),
which do not append the priority to the section name. Priorities
between 200 and 400 use ".CRT$XCC${Priority}". This allows for
some wiggle room for people/future extensions that want to add
initializers between compiler and lib.

Fixes #56922

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D131910
2022-08-16 08:16:18 -07:00
Steve Merritt
ec60fca752 [CodeView] Use non-qualified names for static local variables
Static variables declared within a routine or lexical block should
be emitted with a non-qualified name.  This allows the variables to
be visible to the Visual Studio watch window.

Differential Revision: https://reviews.llvm.org/D131400
2022-08-16 10:33:43 -04:00
Andre Vieira
c6b5a13b7a [TypePromotion] Only search for PHI + ZExt promotion of Integers
Differential Revision: https://reviews.llvm.org/D131948
2022-08-16 10:15:32 +01:00
wanglian
fbc4c26e9a [SelectionDAG][NFC] Fix return type when used isConstantIntBuildVectorOrConstantInt
and isConstantFPBuildVectorOrConstantFP

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131870
2022-08-16 10:07:24 +08:00
Rahman Lavaee
df2213f345 [EHStreamer] Omit @LPStart when function has no landing pads
When no landing pads exist for a function, `@LPStart` is undefined and must be omitted.

EH table is generally not emitted for functions without landing pads, except when the personality function is uknown (`!isNoOpWithoutInvoke(classifyEHPersonality(Per))`). In that case, we must omit `@LPStart` even when machine function splitting is enabled.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131626
2022-08-15 17:09:46 -07:00
Aiden Grossman
24cdf97d63 [mlgo] Add ability to create feature-gated development features in regalloc advisor
Currently there is no way to add in development features to the ML
regalloc evict advisor which is useful to have when working on feature
engineering/improving the current model. This patch adds in the ability
to add in development features to the ML regalloc evict advisor which
are gated by a runtime flag and not added in at all if not compiled in
LLVM development mode. This sets the stage for future work where we are
planning on upstreaming some of the newer features that we are currently
experimenting with.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D131209
2022-08-15 16:01:37 -07:00
David Green
dfc95bab07 [DAG] Ensure more Legal BUILD_VECTOR elements types in shuffle->And combine
This is a followup to D131350, which caused another problem for i64
types being split into i32 on i32 targets. This patch tries to make sure
that either Illegal types are OK, or that the element types of a
buildvector are legal and bigger than or equal to the size of the
original elements.

Differential Revision: https://reviews.llvm.org/D131883
2022-08-15 14:41:45 +01:00
Luo, Yuanke
853bb192c4 Revert "(Reland) [fastalloc] Support allocating specific register class in fastalloc"
This reverts commit 30f9e6ebd30b79d13f99eaca4d829e0da07186b3.
2022-08-15 20:33:15 +08:00
Ayke van Laethem
de48717fcf
[AVR] Support unaligned store
This patch really just extends D39946 towards stores as well as loads.
While the patch is in SelectionDAGBuilder, it only applies to AVR (the
only target that supports unaligned atomic operations).

Differential Revision: https://reviews.llvm.org/D128483
2022-08-15 14:29:37 +02:00
Simon Pilgrim
3a73133217 [DAG] canCreateUndefOrPoison - add freeze(sign_extend_inreg(x,vt)) -> sign_extend_inreg(freeze(x),vt) support
Guaranteed not to create undef/poison
2022-08-15 12:18:59 +01:00
Peter Waller
6e85db7293 [DAGCombine] Combine signext_inreg of extract-extend
The outer signext_inreg is redundant in the following:

  Fold (signext_inreg (extract_subvector (zext|anyext|sext iN_value to _) _) from iN)
       -> (extract_subvector (signext iN_value to iM))

Tests are precommitted and clone those by analogy from the AND case in
the same file. Add a negative test to check extension width is handled
correctly.

This patch supersedes D130700.

Differential Revision: https://reviews.llvm.org/D131503
2022-08-15 10:58:07 +00:00
Simon Pilgrim
7e294e676e [DAG] canCreateUndefOrPoison - add freeze(assertsext/zext(x,bt)) -> assertsext/zext(freeze(x),vt) support
These are guaranteed not to create undef/poison (although they may pass through) - the associated ISD::VALUETYPE node is also guaranteed never to generate poison
2022-08-15 11:13:43 +01:00
Kazu Hirata
f5a68feab3 Use llvm::none_of (NFC) 2022-08-14 16:25:39 -07:00
Simon Pilgrim
e2d13fd096 [DAG] canCreateUndefOrPoison - add freeze(shl(x,y)) -> shl(freeze(x),y) support
These are guaranteed not to create undef/poison if the shift amount is known to be in range
2022-08-14 14:38:10 +01:00
Simon Pilgrim
a621d38bcb [DAG] canCreateUndefOrPoison - add freeze(and/or/xor(x,y)) -> and/or/xor(freeze(x),y) support
These are guaranteed not to create undef/poison
2022-08-14 13:14:53 +01:00
Simon Pilgrim
60534b8879 [DAG] canCreateUndefOrPoison - add freeze(add/sub/mul(x,y)) -> add/sub/mul(freeze(x),y,z) support
These are guaranteed not to create undef/poison as long as there are no poison generating flags
2022-08-13 20:58:00 +01:00
Luo, Yuanke
30f9e6ebd3 (Reland) [fastalloc] Support allocating specific register class in fastalloc
Reland commit 719658d078c4

The base RA support infrastructure that only allow a specific register
class be allocated in RA pss. Since greedy RA, basic RA derived from
base RA, they all allow allocating specific register class. Fast RA
doesn't support allocating register for specific register class. This
patch is to enable ShouldAllocateClass in fast RA, so that it can
support allocating register for specific register class.

Differential Revision: https://reviews.llvm.org/D131825
2022-08-13 13:57:34 +08:00
Joe Loser
b12aa497cd
[DAGCombine] Replace std::monostate equivalent in DAGCombiner.cpp
Remove the `UnitT` type and operators in favor of using `std::monostate`
directly.

Differential Revision: https://reviews.llvm.org/D131778
2022-08-12 21:42:09 -06:00
Simon Pilgrim
4de35f4bbf [DAG] Add TODO to remove creation of INSERT_SUBVECTOR nodes from SimplifyMultipleUseDemandedBits
SimplifyMultipleUseDemandedBits shouldn't be creating general nodes like this - although we allow bitcasts, even general constant folding is avoided.

Removing it causes a number of regressions that need addressing first, but I've added a TODO for now.
2022-08-12 10:45:30 +01:00
Filipp Zhinkin
1626ee6a95 [DAGCombine] Hoist shifts out of a logic operations tree.
Hoist and combine shift operations from logic operations tree:
logic (logic (SH x0, s), y), (logic (SH x1, s), z)  --> logic (SH (logic x0, x1), s), (logic y, z)

The transformation improves code generated for some cases related to the issue https://github.com/llvm/llvm-project/issues/49541.

Correctness:
https://alive2.llvm.org/ce/z/pVqVgY
https://alive2.llvm.org/ce/z/YVvT-q
https://alive2.llvm.org/ce/z/W5zTBq
https://alive2.llvm.org/ce/z/YfJsvJ
https://alive2.llvm.org/ce/z/3YSyDM
https://alive2.llvm.org/ce/z/Bs2kzk
https://alive2.llvm.org/ce/z/EoQpzU
https://alive2.llvm.org/ce/z/Jnc_5H
https://alive2.llvm.org/ce/z/_LP6k_
https://alive2.llvm.org/ce/z/KvZNC9

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D131189
2022-08-12 12:42:16 +03:00
wanglian
061f7ec9fa [LegalizeTypes][NFC] Use getConstantOperandVal instead of cast constant getvalue
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D131642
2022-08-12 14:35:10 +08:00
wanglian
1303057888 [LegalizeTypes][NFC] Use dyn_cast instead of isa and cast
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D131544
2022-08-12 14:18:49 +08:00
Chen Zheng
8d19cfb72e [PowerPC] omit location attribute for TLS variable on AIX
TLS debug on AIX is not ready for now.
The location generated in no-integrated-as mode is wrong and
in integrated-as mode causes AIX linker error.

Reviewed By: Esme

Differential Revision: https://reviews.llvm.org/D130245
2022-08-12 00:54:48 -04:00
wanglian
3b71f1d5ab [LegalizeTypes][NFC] Use getConstantOperandAPInt instead of cast constant getAPInt
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131653
2022-08-12 10:21:54 +08:00
Peter Waller
898699831b [DAGCombine] Check zext legality in zext-extract-extend combine
Discussed in D131503.

Fix to D130782.
2022-08-11 14:30:42 +00:00
Andre Vieira
1640679187 [TypePromotion] Search from ZExt + PHI
Expand TypePromotion pass to try to promote PHI-nodes in loops that are the
operand of a ZExt, using the ZExt's result type to determine the Promote Width.

Differential Revision: https://reviews.llvm.org/D111237
2022-08-11 09:50:10 +01:00
Andre Vieira
05fc5037cd [TypePromotion] Hoist out Promote Width calculation
Hoist out promote width calculation to simplify runOnFunction.

Differential Revision: https://reviews.llvm.org/D131489
2022-08-11 09:50:10 +01:00
Andre Vieira
e524d61f35 [TypePromotion] Don't delete Insns when iterating
Differential Revision: https://reviews.llvm.org/D131488
2022-08-11 09:50:10 +01:00
Andre Vieira
57de4e059d [TypePromotion] Don't insert Truncate for a no-op ZExt
Differential Revision: https://reviews.llvm.org/D131487
2022-08-11 09:50:10 +01:00
aqjune
02e56e2533 [CodeGen] Generate efficient assembly for freeze(poison) version of mm*_cast* intel intrinsics
This patch makes the variants of `mm*_cast*` intel intrinsics that use `shufflevector(freeze(poison), ..)` emit efficient assembly.
(These intrinsics are planned to use `shufflevector(freeze(poison), ..)` after shufflevector's semantics update; relevant thread: D103874)

To do so, this patch

1. Updates `LowerAVXCONCAT_VECTORS` in X86ISelLowering.cpp to recognize `FREEZE(UNDEF)` operand of `CONCAT_VECTOR` in addition to `UNDEF`
2. Updates X86InstrVecCompiler.td to recognize `insert_subvector` of `FREEZE(UNDEF)` vector as its first operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130339
2022-08-11 13:36:21 +09:00
Simon Pilgrim
8623da5f74 [DAG] visitFREEZE - generalize freeze(op()) -> op(freeze()) to any number of operands
canCreateUndefOrPoison currently only handles unary ops, but we intend to change that soon - this more closely matches the pushFreezeToPreventPoisonFromPropagating behaviour where the freeze is pushed up to a single operand value, as long as all others are guaranteed not to be poison/undef.

However, pushFreezeToPreventPoisonFromPropagating would freeze all uses of the value - whilst this variant requires the frozen value to be only used in the op - we can look at generalize multiple uses later if the need arises.
2022-08-10 13:12:46 +01:00
Simon Pilgrim
bbc27d0148 [DAG] canCreateUndefOrPoison - add freeze(truncate(x)) -> truncate(freeze(x)) support 2022-08-10 11:27:22 +01:00
David Truby
b1b9c39629 [AArch64][SVE] Use SVE for VLS fcopysign for wide vectors
Currently fcopysign for VLS vectors lowers through NEON even when the
vector width is wider than a NEON vector, causing bad codegen as the
vectors are split. This patch causes SVE to be used for these vectors
instead, giving much better codegen on wide VLS vectors.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D128642
2022-08-10 10:17:19 +00:00
Simon Pilgrim
df3ea7365e [DAG] Use DAG.getFreeze() to create freeze node. NFC. 2022-08-10 10:26:26 +01:00
Adrian Prantl
68f97d2f78 LiveDebugValues: Fix another crash related to unreachable blocks
This is a follow-up patch to D130999. In the test, the MIR contains an
unreachable MBB but the code attempts to look it up in MLocs. This
patch fixes this issue by checking for the default-constructed value.

rdar://97226240

Differential Revision: https://reviews.llvm.org/D131453
2022-08-09 10:34:57 -07:00
Simon Pilgrim
ed162d455a [DAG] Avoid hasOneUse() calls if the cheaper !AssumeSingleUse test has already failed. NFC.
Very minor optimization, but every little helps..
2022-08-09 16:42:19 +01:00