52796 Commits

Author SHA1 Message Date
Matt Arsenault
b59022b42e DAG: Handle lowering of unordered fcZero|fcSubnormal to fcmp 2023-07-11 18:30:15 -04:00
Brendan Dahl
220fe00a7c [WebAssembly] Support annotate clang attributes for marking functions.
Annotation attributes may be attached to a function to mark it with
custom data that will be contained in the final Wasm file. The
annotation causes a custom section named
"func_attr.annotate.<name>.<arg0>.<arg1>..." to be created that will
contain each function's index value that was marked with the annotation.

A new patchable relocation type for function indexes had to be created so
the custom section could be updated during linking.

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D150803
2023-07-11 15:17:26 -07:00
Jon Chesterfield
980cd18354 [amdgpu][nfc] Drop lds strategy noise from some tests 2023-07-11 21:18:49 +01:00
Eduard Zingerman
18e13739b8 [BPF] Undo transformation for LICM.cpp:hoistMinMax()
Extended BPFCheckAndAdjustIR pass with sinkMinMax() transformation
that undoes LICM hoistMinMax pass.

The undo transformation converts the following patterns:

    x < min(a, b) -> x < a && x < b
    x > min(a, b) -> x > a || x > b
    x < max(a, b) -> x < a || x < b
    x > max(a, b) -> x > a && x > b

Where 'a' or 'b' is a constant.
Also supports `sext min(...) ...` and `zext min(...) ...`.

~~~

This was previously commited as 09feee559a29 and reverted in
0bf9bfeacc8c because of the testbot memory leak report:
  https://lab.llvm.org/buildbot/#/builders/5/builds/34931

The memory leak issue was caused by incorrect instruction removal
sequence in skinMinMaxBB():

    I->dropAllReferences();  -------->  I->eraseFromParent();
    I->removeFromParent();   fixed to

Differential Revision: https://reviews.llvm.org/D147990
2023-07-11 22:30:34 +03:00
Matt Arsenault
fbe4ff8149 AMDGPU: Partially fix not respecting dynamic denormal mode
The most notable issue was producing v_mad_f32 in functions with the
dynamic mode, since it just ignores the mode. fdiv lowering is still
somewhat broken because it involves a mode switch and we need to query
the original mode.
2023-07-11 15:14:52 -04:00
Philip Reames
5cd41dc62d [RISCV] Remove legacy TA/TU pseudo distinction for binary instructions
This change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295.

This change handles most of the binary pseudos. I excluded pseudos which _TIED variants, and those that produce mask results. Both a bit different in functionality, and deserve their own change and review. As with previous changes in the series, we replace the existing TA and TU forms with a single unified pseudo with a passthru (which may be implicit_def) and a policy operand.

As before, we see codegen changes (some improvements and some regressions) due to scheduling differences caused by the extra implicit_def instructions.

Differential Revision: https://reviews.llvm.org/D154245
2023-07-11 10:21:42 -07:00
David Green
d96387f005 [AArch64] Extra tests for smull/umull, especially of smaller vector size. NFC
See D153632 and D154063
2023-07-11 18:16:23 +01:00
Simon Pilgrim
72fddda8af [X86] ReplaceNodeResults - widen vector truncate nodes on pre-SSSE3 targets
Building on the support for wider input vector types from D154592, try to more aggressively widen inputs instead of scalarizing them.
2023-07-11 17:21:20 +01:00
David Mo
ef7ca14fa5 [WebAssembly] Report error for inline assembly with unsupported opcodes
For inline WebAssembly, passing a numeric operand to global.get is
unsupported. This causes encodeInstruction to reach an llvm_unreachable
call, leading to undefined behaviors. This patch fixes the issue for
this invalid instruction encoding, making it report an error by adding
an MCContext field in class WebAssemblyMCCodeEmitter.

Reviewed By: sbc100, bryanpkc

Differential Revision: https://reviews.llvm.org/D154734
2023-07-11 10:36:25 -04:00
Phoebe Wang
db8b624de1 [X86][FP16] Fix mis-combination from FMULC to FCMULC
The combination was designed to combine a negative imaginary value
rather then a full negative complex value.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D154213
2023-07-11 22:26:20 +08:00
Georgi Mirazchiyski
103df542b5 Fix shr/and pair replace with bfe
Co-Authored-By: Aidan Belton <aidan.belton@codeplay.com>
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D117118
2023-07-11 14:43:35 +01:00
Simon Wallis
82458ce69e [ARM] mark tMOVi32imm as killing flags
Mark the tMOVi32imm pseudo instr as killing the flags register.

The pseudo instruction expands to a sequence of 7 movs/lsls/adds
instructions, which are all Thumb-1 flag setting instructions.

For a test case, take an existing arm test which checks for
"Don't CSE a cmp across a call that clobbers CPSR."
and retarget it at thumbv6m execute-only.

Reviewed By: stuij

Differential Revision: https://reviews.llvm.org/D154845

Change-Id: I8f8209fbc40a833f8875629937b9606c1e2c021d
2023-07-11 14:42:07 +01:00
Simon Pilgrim
6656c8d01f [X86] shuffle-vs-trunc-256.ll - move comment outside test. NFC. 2023-07-11 14:31:50 +01:00
Jim Lin
b515133088 [RISCV] Merge rv32/rv64 vector reduction intrinsic tests that have the same content. NFC. 2023-07-11 19:04:56 +08:00
Simon Pilgrim
77b3f890cc [X86] combineAndMaskToShift - match constant splat with X86::isConstantSplat
Using X86::isConstantSplat instead of ISD::isConstantSplatVector allows us to detect constant masks after they've been lowered to constant pool loads.

Addresses regression from D154592
2023-07-11 11:25:34 +01:00
Ties Stuij
f0ae3c23b5 [ARM] in LowerConstantFP, make sure we cover armv6-m execute-only
Currently in LowerConstantFP, when we compile for execute-only (XO) we don't
check what architecture we're compiling for (v6m=< or >v6m). We shouldn't get
here for v6m, so put in an assert.

Reviewed By: simonwallis2, dmgreen

Differential Revision: https://reviews.llvm.org/D154506
2023-07-11 10:42:15 +01:00
Simon Pilgrim
842a6728d9 [X86] LowerTRUNCATE - improve handling during type legalization to PACKSS/PACKUS patterns
Extend coverage for lowering wide vector types during type legalization to allow us to use PACKSS/PACKUS patterns instead of dropping down to shuffle lowering.

First step towards avoiding premature folds of TRUNCATE to PACKSS/PACKUS nodes as described on Issue #63710 - which causes a large number of regressions on D152928 - we will next need to tweak the TRUNCATE widening in ReplaceNodeResults

Differential Revision: https://reviews.llvm.org/D154592
2023-07-11 10:39:44 +01:00
pvanhout
655714a300 [AArch64] Use GlobalISel MatchTable Combiner Backend
Only a few minor test changes needed because I removed the "helper" suffix from the combiner name, as it's not really a helper anymore but more like the implementation itself.

Depends on D153757

NOTE: This would land iff D153757 (RFC) lands too.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D153850
2023-07-11 11:27:14 +02:00
Jim Lin
0fd212c91d [RISCV] Merge rv32/rv64 vector widening intrinsic tests that have the same content. NFC. 2023-07-11 16:25:12 +08:00
Nabeel Omer
e148899ad9 [X86] Preserve volatile ATOMIC_LOAD_OR nodes
Fixes #63692.

In reference to volatile memory accesses, the langref says:
> the backend should never split or merge target-legal volatile load/store instructions.

Differential Revision: https://reviews.llvm.org/D154609
2023-07-11 08:05:38 +00:00
Zi Xuan Wu (Zeson)
2ccb2dbc8d [RISCV] Don't fold RISCVISD::VMV_V_X_VL series node and scalar load to vector load when scalar load is update load
We try to fold RISCVISD::VMV_V_X_VL series node + scalar load -> vector load.
But if scalar load is indexed load (load update form), it's not profitable to fold because load update node can't be removed after fold.

Differential Revision: https://reviews.llvm.org/D152222
2023-07-11 15:56:31 +08:00
wangpc
99809f4377 [RISCV] Simplify the definitions of interrupt CSRs
For `CSR_Interrupt`, we can generate the register list via a single
`sequence`.

For `CSR_XLEN_F32_Interrupt` and `CSR_XLEN_F64_Interrupt`, I don't
see the reason why we need to keep the order the same as how we used
to allocate registers (and we have changed the order in D146488), so
I fold them into one `sequence`.

There are some *.ll changes because of the order change.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154837
2023-07-11 11:20:24 +08:00
Piyou Chen
299b2c2d93 [RISCV] precommit for prefetch locality support
Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D154690
2023-07-10 20:07:18 -07:00
Matt Arsenault
1d92b68ead DAG: Correct chain management for frexp libcalls
We need to replace the other uses of the call chain with the new load
chain.

Fixes not preserving the return def with unused x86_fp80
results. Regression reported here:
https://reviews.llvm.org/rGb15bf305ca3e9ce63aaef7247d32fb3a75174531#1224999
2023-07-10 21:39:15 -04:00
Wang Rui
f9e0845ef2 [LoongArch] Explicitly specify instruction properties
This revision explicitly specifies the machine instruction properties instead of relying on guesswork. This is because guessing instruction properties has proven to be inaccurate, such as the machine LICM not working:

```
void func(char *a, char *b)
{
    int i;

    for (i = 0; i != 72526; i++)
        a[i] = b[i];
}
```

Guessing instruction properties:

```
func:                                   # @func
        move    $a2, $zero
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        ldx.b   $a3, $a1, $a2
        stx.b   $a3, $a0, $a2
        addi.d  $a2, $a2, 1
        lu12i.w $a3, 17
        ori     $a3, $a3, 2894
        bne     $a2, $a3, .LBB0_1
        ret
.Lfunc_end0:
```

Explicitly specify instruction properties:

```
func:                                   # @func
        lu12i.w $a2, 17
        ori     $a2, $a2, 2894
        move    $a3, $zero
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        ldx.b   $a4, $a1, $a3
        stx.b   $a4, $a0, $a3
        addi.d  $a3, $a3, 1
        bne     $a3, $a2, .LBB0_1
        ret
.Lfunc_end0:
```

Reviewed By: SixWeining, xen0n

Differential Revision: https://reviews.llvm.org/D154192
2023-07-11 08:45:24 +08:00
Han Shen
8df75969ae [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO.
The original MFS work D85368 shows good performance improvement with
Instrumented FDO. However, AutoFDO or Flow-Sensitive AutoFDO (FSAFDO)
does not show performance gain. This is mainly caused by a less
accurate profile compared to the iFDO profile.

For the past few months, we have been working to improve FSAFDO
quality, like in D145171. Taking advantage of this improvement, MFS
now shows performance improvements over FSAFDO profiles.

That being said, 2 minor changes need to be made, 1) An FS-AutoFDO
profile generation pass needs to be added right before MFS pass and an
FSAFDO profile load pass is needed when FS-AutoFDO is enabled and the
MFS flag is present. 2) MFS only applies to hot functions, because we
believe (and experiment also shows) FS-AutoFDO is more accurate about
functions that have plenty of samples than those with no or very few
samples.

With this improvement, we see a 1.2% performance improvement in clang
benchmark, 0.9% QPS improvement in our internal search benchmark, and
3%-5% improvement in internal storage benchmark.

This is #1 of the two patches that enables the improvement.

Reviewed By: wenlei, snehasish, xur

Differential Revision: https://reviews.llvm.org/D152399
2023-07-10 16:00:30 -07:00
David Green
44479b80a6 [AArch64] Ensure constrained register class in INS peephole.
Ensure we constrain the register class of the NewDef to that of OldDef, in case
they do not match.

Fixes #63777
2023-07-10 22:48:10 +01:00
Craig Topper
dbd47c4489 [RISCV] Don't allow X0 to be used for 'r' constraint in inline assembly
Some instructions treat x0 as a special encoding rather than as a
value of 0. Since we don't parse the inline assembly to know what
the instruction is, chooser the safest option of never using x0.

Fixes #63747.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D154744
2023-07-10 13:25:17 -07:00
Jake Egan
bbd0d123d3 Implement -frecord-command-line for XCOFF
This patch extends support of the option `-frecord-command-line` to XCOFF. XCOFF doesn’t have custom sections like ELF, so the command line data is emitted to a .info section instead. A C_INFO symbol is generated with the .info section to preserve the command line data past the link step. Multiple command lines are separated by newlines and null bytes. The command line data can be retrieved on AIX with command `what file_name`.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D153600
2023-07-10 12:47:07 -04:00
Simon Pilgrim
1ab442464b [X86] Regenerate or-address.ll test checks 2023-07-10 16:55:52 +01:00
Nabeel Omer
862e5dcb7e [X86] Pre-commit test for lowerAtomicArith
Test for https://reviews.llvm.org/D154609
2023-07-10 14:21:20 +00:00
Jingu Kang
041db60bc1 [tests] Update precommit test for MachineLICM subloops
precommit test for https://reviews.llvm.org/D154205
2023-07-10 14:32:45 +01:00
Igor Kirillov
0aecf7ff0d [CodeGen] Fix incorrectly detected reduction bug in ComplexDeinterleaving pass
Using ACLE intrinsics, it is possible to create a loop that the
deinterleaving pass incorrectly classified as a reduction loop.
For example, for fixed-width vectors the loop was like below:

vector.body:
  %a = phi <4 x float> [ %init.a, %entry ], [ %updated.a, %vector.body ]
  %b = phi <4 x float> [ %init.b, %entry ], [ %updated.b, %vector.body ]
  ...
; Does not depend on %a or %b:
  %updated.a = ...
  %updated.b = ...

Differential Revision: https://reviews.llvm.org/D154598
2023-07-10 12:54:38 +00:00
Amara Emerson
3a80bdb316 [GlobalISel] Remove an erroneous oneuse check in the G_ADD reassociation combine.
This check was unnecessary/incorrect, it was already being done by the target
hook default implementation, and the one in the matcher was checking for a
completely different thing. This change:
 1) Removes the check and updates affected tests which now do some more reassociations.
 2) Modifies the AMDGPU hooks which were stubbed with "return true" to also do the oneuse
    check. Not sure why I didn't do this the first time.
2023-07-10 01:03:12 -07:00
Alex Bradbury
29f630a1dd [RISCV][MC] MC layer support for the experimental zacas extension
This implements the v1.0-rc1 draft extension.

amocas.d on RV32 and amocas.q have the restriction that rd and rs2 must
be even registers. I've opted to implement this restriction in
RISCVAsmParser::validateInstruction even though for codegen we'll need a
new register class and can then remove this validation. This also
sidesteps, for now, the issue of amocas.d being different on rv32 vs
rv64.

See <https://github.com/riscv-non-isa/riscv-c-api-doc/issues/37> for the
issue of needing an agreed asm register constraint for register pairs.

Differential Revision: https://reviews.llvm.org/D149248
2023-07-10 08:26:31 +01:00
Wang, Xin10
284a059b33 Revert "[X86]Remove TEST in AND32ri+TEST16rr in peephole-opt"
This reverts commit 2c64226d84174dd1d9f93e1884c1b0bd432f89b5.

revert first due to buildbot fail https://lab.llvm.org/buildbot/#/builders/85/builds/17571
2023-07-10 03:20:11 -04:00
XinWang10
2c64226d84 [X86]Remove TEST in AND32ri+TEST16rr in peephole-opt
Previously we remove a pattern like:
  %reg = and32ri %in_reg, 5
  ...                         // EFLAGS not changed.
  %src_reg = subreg_to_reg 0, %reg, %subreg.sub_index
  test64rr %src_reg, %src_reg, implicit-def $eflags
We can remove test64rr since it has same functionality as and subreg_to_reg avoid the opt in previous code, so we handle this case specially.
And this case is also can be opted for the same reason, like:
  %reg = and32ri %in_reg, 5
  ...                         // EFLAGS not changed.
  %src_reg = copy %reg.sub_16bit:gr32
  test16rr %src_reg, %src_reg, implicit-def $eflags
The COPY from gr32 to gr16 prevent the opt in previous code too, just handle it specially as what we did for test64rr.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D154193
2023-07-09 23:21:32 -04:00
Simon Pilgrim
7428739ea8 [X86] matchAddressRecursively - peek through ZEXT nodes to match foldMaskAndShiftToExtract
Handle (zero_extend (and (srl X, C1), C2)) patterns to allow foldMaskAndShiftToExtract to match h-register extractions from smaller types

Ideally matchAddressRecursively needs to be able to recurse through ZEXT/SEXT nodes generally but for now we should just handle specific cases when they occur

Addresses regressions in D146121
2023-07-09 15:41:38 +01:00
Simon Pilgrim
848f6abfdb [X86] Add tests showing failure by matchAddressRecursively to peek through ZEXT nodes to match foldMaskAndShiftToExtract
Test coverage for a similar regression in D146121
2023-07-09 15:41:38 +01:00
David Green
758c4640c9 [CGP] Enable CodeGenPrepares phi type convertion.
This is a recommit of 67121d7, enabling the CodeGenPrepare OptimizePhiTypes
option that can help with the type of phi instructions into ISel.
2023-07-09 10:32:11 +01:00
LiaoChunyu
1575063db2 [RISCV] Match shl_vl (ext_vl v, splat 1) to vwadd_vl
Similer to: D153112, match shl (v, splat 1) to vwadd

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154726
2023-07-08 08:03:15 +08:00
Matt Arsenault
310f839612 DAG: Lower is.fpclass fcInf to fcmp of fabs
InstCombine should have taken care of this, but I think
this is more useful in the future when the expansion
tries to handle multiple cases at a time with fcmp.

x87 looks worse to me but the only thing I know about it is that
I aggressively do not care about it.

https://reviews.llvm.org/D143198
2023-07-07 17:00:10 -04:00
Matt Arsenault
64d325454b AMDGPU: Delete custom combine on class intrinsic
This is no longer necessary as class-with-constant will always be
transformed to the generic class intrinsic.

https://reviews.llvm.org/D153901
2023-07-07 15:28:21 -04:00
Nemanja Ivanovic
b0e249d5e2 Reland "[PowerPC] Remove extend between shift and and"
The commit originally caused a bootstrap failure on the big endian
PPC bot as the combine was interfering with the legalizer when
applied on illegal types. This update restricts the combine to
the only types for which it is actually needed. Tested on PPC BE
bootstrap locally.
2023-07-07 14:45:05 -04:00
Christudasan Devadasan
7a98f084c4 [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which is an unproblematic case.

Differential Revision: https://reviews.llvm.org/D124196
2023-07-07 23:14:32 +05:30
Christudasan Devadasan
b4a62b1fa5 [AMDGPU] Enable whole wave register copy
So far, we haven't exposed the allocation of whole-wave
registers to regalloc. We hand-picked them for various
whole wave mode operations. With a future patch, we
want the allocator to efficiently allocate them rather
than using the custom pre-allocation pass.

Any liverange split of virtual registers involved in
whole-wave operations require the resulting COPY
introduced with the split to be performed for all
lanes. It isn't implemented in the compiler yet.

This patch would identify all such copies and
manipulate the exec mask around them to enable all
lanes without affecting the value of exec mask
elsewhere.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D143762
2023-07-07 22:58:55 +05:30
Christudasan Devadasan
b78b36e1a2 [AMDGPU] Implement whole wave register spill
To reduce the register pressure during allocation,
when the allocator spills a virtual register that
corresponds to a whole wave mode operation, the
spill loads and restores should be activated for
all lanes by temporarily flipping all bits in exec
register to one just before the spills. It is not
implemented in the compiler as of today and this
patch enables the necessary support.

This is a pre-patch before the SGPR spill to virtual
VGPR lanes that would eventually causes the whole
wave register spills during allocation.

Reviewed By: arsenm, cdevadas

Differential Revision: https://reviews.llvm.org/D143759
2023-07-07 22:51:45 +05:30
Yashwant Singh
b7836d8562 [CodeGen]Allow targets to use target specific COPY instructions for live range splitting
Replacing D143754. Right now the LiveRangeSplitting during register allocation uses
TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a
problem as we have both vector and scalar copies. Vector copies perform a copy over
a vector register but only on the lanes(threads) that are active. This is mostly sufficient
however we do run into cases when we have to copy the entire vector register and
not just active lane data. One major place where we need that is live range splitting.

Allowing targets to use their own copy instructions(if defined) will provide a lot of
flexibility and ease to lower these pseudo instructions to correct MIR.

- Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range
 splitting.
- Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline.

Reviewed By: arsenm, cdevadas, kparzysz

Differential Revision: https://reviews.llvm.org/D150388
2023-07-07 22:29:50 +05:30
Qiu Chaofan
a2b5117df7 [PowerPC] Update InputOps of Power10 SchedModel
Count of input operands affect pipeline forwarding in scheduling model.
Previous Power10 model definition arranges some instructions into
incorrect groups, by counting the wrong number of input operands.

This patch updates the model, setting the input operands count correctly
by excluding irrelevant immediate operands and count memory operands of
load instructions correctly.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D153842
2023-07-07 22:46:22 +08:00
Luke Lau
02bb33c3ce [RISCV] Check for alignment when lowering interleaved/deinterleaved loads/stores
As noted by @reames, we should be checking that the memory access is aligned to
the element size (or the unaligned vector memory access feature is enabled)
before lowering vlseg/vsseg intrinsics via the interleaved access pass.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D154536
2023-07-07 15:34:24 +01:00