44598 Commits

Author SHA1 Message Date
Paul Walker
96c8d615d6 [SVE] Extend findMoreOptimalIndexType so BUILD_VECTORs do not force 64bit indices.
Extends findMoreOptimalIndexType to allow ISD::BUILD_VECTOR based
indices to be truncated when such truncation is lossless. This can
enable the use of 32bit gather/scatter indices thus making it less
likely to have to split a gather/scatter in two.

Depends on D125194

Differential Revision: https://reviews.llvm.org/D130533
2022-08-18 18:00:53 +01:00
wanglian
eeac894418 Precommit tests for D132115
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D132116
2022-08-18 17:58:15 +08:00
gonglingqin
36038b5cb6 [LoongArch] Supports brcond with 21 bit offsets
Differential Revision: https://reviews.llvm.org/D132006
2022-08-18 15:55:50 +08:00
WANG Xuerui
929d201b7a [LoongArch] Add support for llvm.eh.dwarf.cfa
It's the same as D126181 for RISCV.

Differential Revision: https://reviews.llvm.org/D132012
2022-08-18 13:17:49 +08:00
Sam Clegg
fa306f1396 [WebAssembly] WebAssemblyLowerEmscriptenEHSjLj: Fix signature of malloc in wasm64 mode
Differential Revision: https://reviews.llvm.org/D132091
2022-08-17 18:16:34 -07:00
Luo, Yuanke
28733d86cf [amdgpu] Change the RA to basic
Specifying `-regalloc=fast` is not reliable. With fast register allocation,
`LIS = getAnalysisIfAvailable<LiveIntervals>();` get nullptr
in "si-lower-sgpr-spills" pass, so the slot index is not created in the
pass for new inserted instructions. When verifying the machine
instructions, it fails on checking slot index. While greedy-ra is time
consuming basic-ra can be used to reduce compiling time for this test case.

Differential Revision: https://reviews.llvm.org/D131931
2022-08-18 08:16:19 +08:00
Jeffrey Byrnes
1c8d7ea973 [AMDGPU] Implement pipeline solver for non-trivial pipelines
Requested SchedGroup pipelines may be non-trivial to satisify. A minimimal example is if the requested pipeline is {2 VMEM, 2 VALU, 2 VMEM} and the original order of SUnits is {VMEM, VALU, VMEM, VALU, VMEM}. Because of existing dependencies, the choice of which SchedGroup the middle VMEM goes into impacts how closely we are able to match the requested pipeline. It seems minimizing the degree of misfit (as measured by the number of edges we can't add) w.r.t the choice we make when mapping an instruction -> SchedGroup is an NP problem. This patch implements the PipelineSolver class which produces a solution for the defined problem for the sched_group_barrier mutation. The solver has both an exponential time exact algorithm and a greedy algorithm. The patch includes some controls which allows the user to select the greedy/exact algorithm.

Differential Revision: https://reviews.llvm.org/D130797
2022-08-17 16:21:59 -07:00
Sanjay Patel
7f72a0f5bb [SDAG] avoid generating libcall to function with same name
This is a potentially better alternative to D131452 that also
should avoid the infinite loop bug from:
issue #56403

This is again a minimal fix to reduce merging pain for the
release. But if this makes sense, then we might want to guard
all of the RTLIB generation (and other libcalls?) with a
similar name check.

Differential Revision: https://reviews.llvm.org/D131521
2022-08-17 16:19:34 -04:00
Matthias Braun
19ce5e515f RAGreedyStats: Ignore identity COPYs; count COPYs from/to physregs
Improve copy statistics:

- Count copies from or to physical registers: They are used to model function parameters and calling conventions and the register allocator optimizes for them.
- Check physical registers assigned to virtual registers and stop counting "identity" `COPY`s where source and destination is the same physical registers; they will be removed in the `virtregmap` pass anyway.

Differential Revision: https://reviews.llvm.org/D131932
2022-08-17 12:53:29 -07:00
Archit Saxena
e170d955fe Split EH code by default
The current machine function splitter is reliant on profile data to do profile summary analysis to split blocks into cold section. This may sometimes limit the usage of machine function splitter especially in cases where we could do some form of static analysis to split out cold blocks if profile data is absent or profile data which may be faulty (Consider Sample PGO).

Of all code that could statically be marked cold Exception handling blocks are one of them (In fact BFI framework also tends to mark them as cold), and the most in size contribution. In my experiments I found out Exception handling pads and all code reachable from there account for up to 6-8% of the .text section on modern production binaries. This patch introduces a flag to split out all Exception handling blocks and blocks only reachable from Exceptional Handling pad to cold section. This flag has shown to give a performance win of up to 0.1% in terms of average cycles and instructions executed on internal facebook search service.

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D131824
2022-08-17 12:40:31 -07:00
Sanjay Patel
8eddd1ec60 [AArch64] add test for recursive libcall lowering; NFC
Issue #56403
2022-08-17 14:54:50 -04:00
Vladislav Dzhidzhoev
4b57939583 [AArch64][GlobalISel] Fallback to generic lowering of G_CTPOP
Use generic lowering of G_CTPOP for s32 and s64 scalars when
noimplicitfloat is specified.

Differential Revision: https://reviews.llvm.org/D131454
2022-08-17 21:10:27 +03:00
Craig Topper
550fab53e1 [RISCV] Fold (sub C, (xor (setcc), 1)) -> (add (setcc), C-1).
Extracted from D131729 where we handled C==0. It's now generalized
to more constants.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132000
2022-08-17 09:50:08 -07:00
Nick Desaulniers
6b0e2fa6f0 [SelectionDAG] make INLINEASM_BR use MachineBasicBlocks instead of BlockAddresses
As part of re-architecting callbr to no longer use blockaddresses
(https://reviews.llvm.org/D129288), we don't really need them in MIR.
They make comparing MachineBasicBlocks of indirect targets during
MachineVerifier a PITA.

Suggested by @efriedma from the discussion:
https://reviews.llvm.org/D130290#3669531

Reviewed By: efriedma, void

Differential Revision: https://reviews.llvm.org/D130316
2022-08-17 09:34:31 -07:00
David Penry
1c9f0408bc Revert "[ModuloSchedule] Add interface call to accept/reject SMS schedules"
This reverts commit 8c4aea438c310816bb4e4f9a32d783381ef3182e.

Needed because buildbot failures (warnings) gave a clue that there was
a functional bug in the ARM rejection logic.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D132037
2022-08-17 09:32:43 -07:00
Sotiris Apostolakis
848e9e454f [SelectOpti] Remove test on loop-level analysis
Remove a test that relied on the underlying instruction latency modeling.
Such dependency blocks efforts such as D79483 to improve this cost modeling.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D132029
2022-08-17 16:13:33 +00:00
David Penry
8c4aea438c [ModuloSchedule] Add interface call to accept/reject SMS schedules
This interface allows a target to reject a proposed
SMS schedule.  For Hexagon/PowerPC, all schedules
are accepted, leaving behavior unchanged.  For ARM,
schedules which exceed register pressure limits are
rejected.

Also, two RegisterPressureTracker methods now need to be public so
that register pressure can be computed by more callers.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D128941
2022-08-17 08:13:26 -07:00
Alex Bradbury
ce38128194 [RISCV] Avoid redundant branch-to-branch when expanding cmpxchg
If the success value of a cmpxchg is used in a branch, the expanded
cmpxchg sequence ends up with a redundant branch-to-branch (as the
backend atomics expansion happens as late as possible, passes to
optimise such cases have already run). This patch identifies this case
and avoid it when expanding the cmpxchg.

Note that a similar optimisation is possible for a BEQ on the cmpxchg
success value. As it's hard to imagine a case where real-world code may
do that, this patch doens't handle that case.

Differential Revision: https://reviews.llvm.org/D130192
2022-08-17 13:49:15 +01:00
OverMighty
232953f996 [AArch64] Add pattern for SQDML*Lv1i32_indexed
There was no pattern to fold into these instructions. This patch adds
the pattern obtained from the following ACLE intrinsics so that they
generate sqdmlal/sqdmlsl instructions instead of separate sqdmull and
sqadd/sqsub instructions:
 - vqdmlalh_s16, vqdmlslh_s16
 - vqdmlalh_lane_s16, vqdmlalh_laneq_s16, vqdmlslh_lane_s16,
   vqdmlslh_laneq_s16 (when the lane index is 0)

It also modifies the result of the existing pattern for the latter, when
the lane index is not 0, to use the v1i32_indexed instructions instead
of the v4i16_indexed ones.

Fixes #49997.

Differential Revision: https://reviews.llvm.org/D131700
2022-08-17 12:00:47 +01:00
Rainer Orth
d9993484ee [Sparc] Don't use SunStyleELFSectionSwitchSyntax
As discussed in D85414 <https://reviews.llvm.org/D85414>, two tests
currently `FAIL` on Sparc since that backend uses the Sun assembler syntax
for the `.section` directive, controlled by
`SunStyleELFSectionSwitchSyntax`.

Instead of adapting the affected tests, this patch changes that default.
The internal assembler still accepts both forms as input, only the output
syntax is affected.

Current support for the Sun syntax is cursory at best: the built-in
assembler cannot even assemble some of the directives emitted by GCC, and
the set supported by the Solaris assembler is even larger: SPARC Assembly
Language Reference Manual, 3.4 Pseudo-Op Attributes
<https://docs.oracle.com/cd/E37838_01/html/E61063/gmabi.html#scrolltoc>.

A few Sparc test cases need to be adjusted. At the same time, the patch
fixes the failures from D85414 <https://reviews.llvm.org/D85414>.

Tested on `sparcv9-sun-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D85415
2022-08-17 12:59:29 +02:00
Craig Topper
39707c1a9a [RISCV] Add test coverage for (select (icmp X, Y), float, float). NFC
We fold integer setcc into SELECT_CC during DAG combine even if
the SELECT_CC has FP result type, but we had no test coverage.
2022-08-16 21:28:26 -07:00
Vitaly Buka
16fecdfa70 Revert "[AArch64] Add foldCSELOfCSEl DAG combine"
Breaks ubsan on buildbot, details in D125504

This reverts commit 6f9423ef06926a70af84b77cb290c91214cf791a.
2022-08-16 20:29:37 -07:00
Craig Topper
d8cdd78b6c [RISCV] Add test cases to show missed opportunity to fold (sub C, (xor (setcc), 1)). NFC
(sub C, (xori X, 1)) can be folded to (add X, C-1) if X is 0 or 1.

This would avoid the xori and in some cases remove an instruction
neede to materialize the constant.
2022-08-16 16:40:53 -07:00
Eli Friedman
cfd2c5ce58 Untangle the mess which is MachineBasicBlock::hasAddressTaken().
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code.  Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.

Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.

So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.

Discovered while trying to sort out related stuff on D102817.

Differential Revision: https://reviews.llvm.org/D124697
2022-08-16 16:15:44 -07:00
Craig Topper
53ce22e429 Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine."
This time using N1 instead of N0 since N1 points to the original
setcc. This now affects scheduling as I expected.

Original commit message:
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
2022-08-16 15:51:07 -07:00
Craig Topper
b5a18de651 [RISCV] Remove C!=0 restriction from (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)).
While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is
still preferrable. (addi X, -1) can be compressed, sub with x0 on
the LHS is never compressible.
2022-08-16 14:49:52 -07:00
Alexander Shaposhnikov
d68ba43ad2 [Intrinsics] Add initial support for NonNull attribute
Add initial support for NonNull attribute.
(https://github.com/llvm/llvm-project/issues/57113)

Test plan:

verify that for
__thread int x;
int main() {

int* y = &x;
return *y;
}
(with this patch) clang -O -fsanitize=null -S -emit-llvm -o -
doesn't emit a null-pointer check

Differential revision: https://reviews.llvm.org/D131872
2022-08-16 21:28:23 +00:00
Craig Topper
de6fd16971 [RISCV] Don't fold (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) if C-1 isn't simm12.
We still need to materialize the constant in a register and we
may not be removing all uses of the original constant so it may
increase code size.
2022-08-16 14:11:31 -07:00
Craig Topper
1180ed41ee [RISCV] Add more test cases for (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)). NFC
In these test cases we do the transform, but the immediate is too
large to form an ADDI so it didn't save any instructions.

If the constant is opaque or has additional users we shouldn't do
the transform if it doesn't form an ADDI.
2022-08-16 14:08:42 -07:00
Craig Topper
4854fa217f [RISCV] Move test from setcc-logic.ll to select-const.ll. NFC
Also add setne version of the test.

Add some common prefixes to reduce number of identical CHECK lines.
2022-08-16 14:08:42 -07:00
Craig Topper
4184edc691 [RISCV] (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) fold for FP setcc.
This introduce an xori in some cases. I don't believe it was the
intention of the original patch. This was an accident because
nonan FP equality compares also use SETEQ/SETNE.

Also pass the correct type to getSetCCInverse.
2022-08-16 13:00:36 -07:00
Craig Topper
87e7837293 [RISCV] Add test cases to show where we inverted a fp setcc and introduced an extra xori.
In these tests we had (sub C, (seteq X, Y)) which we converted to
the (add (setne X, Y), C-1). We don't have a FNE compare instruction
so this created an XORI to invert an FEQ instruction.

This might be a good idea since it can save a constant materialization,
but does not appear to be the intention of the original patch.
2022-08-16 12:59:16 -07:00
Nicolas Miller
ccfabfbb1f Fix subrange liveness checking at rematerialization
This patch fixes an issue where an instruction reading a whole register would be moved during register allocation into a spot where one of the subregisters was dead.

The code to check whether an instruction can be rematerialized at a given point or not was already checking for subranges to ensure that subregisters are live, but only when the instruction being moved was using a subregister, this patch changes that so the subranges are checked even when the moved instruction uses the full register.

This patch also adds a case to the original test for the subrange checking that trigger the issue described above.

The original subrange checking code was introduced in this revision: https://reviews.llvm.org/D115278

And I've encountered this issue on AMDGPUs while working with DPC++: https://github.com/intel/llvm/issues/6209

Essentially the greedy register allocator attempts to move the following instruction:

```
%3961:vreg_64 = V_LSHLREV_B64_e64 3, %3078:vreg_64, implicit $exec
```

From `@3440` into the body of a loop `@16312`, but `%3078` has the following live ranges:

```
%3078 [2224r,2240r:0)[2240r,3488B:1)[16192B,38336B:1) 0@2224r 1@2240r  L0000000000000003 [2224r,3440r:0) 0@2224r  L000000000000000C [2240r,3488B:0)[16192B,38336B:0) 0@2240r
```

So `@16312e` `%3078.sub1` is alive but `%3078.sub0` is dead, so this instruction being moved there leads to invalid memory accesses as `3078.sub0` ends up being trashed and the result of this instruction is used as part of an address calculation for a load.

On the original ticket this issue showed up on gfx906 and gfx90a but not on gfx908, this turned out to be because on gfx908 instead of moving the shift instruction into the loop, its value is spilled into an ACC register, gfx906 doesn't have ACC registers and for gfx90a ACC registers are used like regular vector registers and so aren't used for spilling.

With this patch the original application from the DPC++ ticket works properly on gfx906, and the result of the shift instruction is correctly spilled instead of moving the instruction in the loop.

Original Author: npmiller

Reviewed by: rampitec

Submitted by: rampitec

Differential Revision: https://reviews.llvm.org/D131884
2022-08-16 10:50:09 -07:00
Arthur Eubanks
9181ce623f [Windows] Put init_seg(compiler/lib) in llvm.global_ctors
Currently we treat initializers with init_seg(compiler/lib) as similar
to any other init_seg, they simply have a global variable in the proper
section (".CRT$XCC" for compiler/".CRT$XCL" for lib) and are added to
llvm.used. However, this doesn't match with how LLVM sees normal (or
init_seg(user)) initializers via llvm.global_ctors. This
causes issues like incorrect init_seg(compiler) vs init_seg(user)
ordering due to GlobalOpt evaluating constructors, and the
ability to remove init_seg(compiler/lib) initializers at all.

Currently we use 'A' for priorities less than 200. Use 200 for
init_seg(compiler) (".CRT$XCC") and 400 for init_seg(lib) (".CRT$XCL"),
which do not append the priority to the section name. Priorities
between 200 and 400 use ".CRT$XCC${Priority}". This allows for
some wiggle room for people/future extensions that want to add
initializers between compiler and lib.

Fixes #56922

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D131910
2022-08-16 08:16:18 -07:00
Karl Meakin
6f9423ef06 [AArch64] Add foldCSELOfCSEl DAG combine
Differential Revision: https://reviews.llvm.org/D125504
2022-08-16 12:49:11 +01:00
Zain Jaffal
7155ed4289
[AArch64] Add support for 256-bit non temporal loads
Currenlty all temporal loads are mapped to `LDP` or `LDR`. This patch will map all the non temporal 256-bit loads into `LDNP`. Future patches should address other non-temporal loads.

Reviewed By: fhahn, dmgreen

Differential Revision: https://reviews.llvm.org/D131773
2022-08-16 12:19:36 +01:00
Bing1 Yu
807b8cb06c [X86] Fix a lowering issue of mask.compress which has undef float passthrough
Previously, LegaizeDAG didn't check mask.compress's passthrough might be float, and this lead to getConstant crash since it doesn't support fp

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131947
2022-08-16 17:54:45 +08:00
gonglingqin
a9d46d9af3 [LoongArch] Add codegen support for fabs
Differential Revision: https://reviews.llvm.org/D131871
2022-08-16 14:41:27 +08:00
Weining Lu
d1f36da9e0 [LoongArch] Encode LoongArch specific ELF e_flags to binary by LoongArchTargetStreamer
Reference: https://github.com/loongson/LoongArch-Documentation
The last commit hash (main branch) is:
99016636af64d02dee05e39974d4c1e55875c45b

Note:
There are several PRs [1][2][3] that may affect the e_flags.
After they got closed or merged, we should update the implementation here accordingly.

[1] https://github.com/loongson/LoongArch-Documentation/pull/33
[2] https://github.com/loongson/LoongArch-Documentation/pull/47
[2] https://github.com/loongson/LoongArch-Documentation/pull/61

Differential Revision: https://reviews.llvm.org/D130239
2022-08-16 13:41:50 +08:00
Rahman Lavaee
df2213f345 [EHStreamer] Omit @LPStart when function has no landing pads
When no landing pads exist for a function, `@LPStart` is undefined and must be omitted.

EH table is generally not emitted for functions without landing pads, except when the personality function is uknown (`!isNoOpWithoutInvoke(classifyEHPersonality(Per))`). In that case, we must omit `@LPStart` even when machine function splitting is enabled.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131626
2022-08-15 17:09:46 -07:00
Vitaly Buka
e0e960923f [AArch64] Fix signed integer overflow in CSINC case
Followup to D131815, which overlflows on different
values.
2022-08-15 15:04:20 -07:00
Amy Kwan
a5bef98c75 [PowerPC][NFC] Add additional vector_shuffle tests involving scalar_to_vector.
This patch adds additional test cases involving vector_shuffles where either its
left, right or both inputs are scalar_to_vector nodes. These test cases involve
v16i8, v2i64, v4i32 and v8i16 vector shuffles, and were generated in preparation
for D130487.

Differential Revision: https://reviews.llvm.org/D130485
2022-08-15 12:30:58 -05:00
Craig Topper
7a73ab5818 [RISCV] Enable isTruncateFree in SDAG for i64->i32 on rv64.
We have a good selection of W instructions, so promoting a truncated
value back to i64 is often free.

This appears to be a net code size reduction on SPECINT2006.

This has been split from D130397 as one of the patches needed to
complete that.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131819
2022-08-15 08:32:51 -07:00
Simon Pilgrim
a7b85e4c0c [X86] Freeze shl(x,1) -> add(x,x) vector fold (PR50468)
Vector fold shl(x,1) -> add(freeze(x),freeze(x)) to avoid the undef issues identified in PR50468

Differential Revision: https://reviews.llvm.org/D106675
2022-08-15 16:17:21 +01:00
Simon Pilgrim
41bdb8cd36 [X86] Fold insert_vector_elt(undef, elt, 0) --> scalar_to_vector(elt)
I had hoped to make this a generic fold in DAGCombine, but there's quite a few regressions in Thumb2 MVE that need addressing first.

Fixes regressions from D106675.
2022-08-15 14:56:30 +01:00
David Green
dfc95bab07 [DAG] Ensure more Legal BUILD_VECTOR elements types in shuffle->And combine
This is a followup to D131350, which caused another problem for i64
types being split into i32 on i32 targets. This patch tries to make sure
that either Illegal types are OK, or that the element types of a
buildvector are legal and bigger than or equal to the size of the
original elements.

Differential Revision: https://reviews.llvm.org/D131883
2022-08-15 14:41:45 +01:00
Luo, Yuanke
853bb192c4 Revert "(Reland) [fastalloc] Support allocating specific register class in fastalloc"
This reverts commit 30f9e6ebd30b79d13f99eaca4d829e0da07186b3.
2022-08-15 20:33:15 +08:00
Ayke van Laethem
a560e57a7e
[AVR] Only push and clear R1 in interrupts when necessary
R1 is a reserved register, but LLVM gives the APIs to know when it is
used or not. So this patch uses these APIs to only save/clear/restore R1
in interrupts when necessary.

The main issue here was getting inline assembly to work. One could argue
that this is the job of Clang, but for consistency I've made sure that
R1 is always usable in inline assembly even if that means clearing it
when it might not be needed.

Information on inline assembly in AVR can be found here:

https://www.nongnu.org/avr-libc/user-manual/inline_asm.html#asm_code

Essentially, this seems to suggest that r1 can be freely used in avr-gcc
inline assembly, even without specifying it as an input operand.

Differential Revision: https://reviews.llvm.org/D117426
2022-08-15 14:29:38 +02:00
Ayke van Laethem
43a8dbc5be
[AVR] Use @earlyclobber instead of register scavenging
The code to support the case when the register allocator has assigned
the same register to the src and the dst register operand isn't actually
needed:

  * LDWRdPtr and LDDWRdPtrQ have an @earlyclobber on the output
    register, so the register allocator will make sure to allocate a
    different register for the output register.
  * LDDWRdYQ does not have an @earlyclobber, but the pointer register is
    the fixed Y register which is reserved. The register allocator won't
    use reserved registers for the output value.

This removes a special case in the code that makes the pseudo
instruction expansion pass more complicated than it needs to be.

Differential Revision: https://reviews.llvm.org/D131844
2022-08-15 14:29:38 +02:00
Ayke van Laethem
de48717fcf
[AVR] Support unaligned store
This patch really just extends D39946 towards stores as well as loads.
While the patch is in SelectionDAGBuilder, it only applies to AVR (the
only target that supports unaligned atomic operations).

Differential Revision: https://reviews.llvm.org/D128483
2022-08-15 14:29:37 +02:00