50716 Commits

Author SHA1 Message Date
Craig Topper
247eb13fab [RISCV][GISel] Legalize G_BITREVERSE. 2023-11-09 16:27:21 -08:00
Maurice Heumann
8cbfc0b29d
[X86] Respect blockaddress offsets when performing X86 LEA fixups (#71641)
The X86FixupLEAs pass drops blockaddress offsets, when splitting up slow
3-ops LEAs, as can be seen in this example:

https://godbolt.org/z/bEsc3Poje

Before running the pass, the first instruction in bb.0 is a LEA with
ebp, ebx and a blockaddress.
After the transformation, the blockaddress is missing.

The reason this happens is because the 3-ops LEA is being splitup into a
2-ops LEA + an add instruction.
However, as hasLEAOffset does not take blockaddresses into
consideration, the add is not emitted and thus leading to the offset
being dropped.

Taking blockaddresses into consideration fixes this issue and results in
the add instruction being emitted.

This fixes #71667
2023-11-10 08:12:18 +08:00
stephenpeckham
1d1fede493
[XCOFF] Ensure .file is emitted before any .info pseudo-ops (#71577)
When generating the assembly code for AIX/XCOFF, the .file pseudo-op
needs to be emitted first, before any csects are generated. Otherwise,
information such as the embedded command line will be associated with
part of the object file rather than the entire object file.
2023-11-09 16:03:45 -06:00
Craig Topper
8b98d5b813 [RISCV][GISel] Enable libcall expansion for G_FCEIL and G_FFLOOR. 2023-11-09 13:14:42 -08:00
Craig Topper
679cc16c99 [RISCV] Disable early promotion for Zbs in performANDCombine with riscv-experimental-rv64-legal-i32
We can match this directly in isel with the i32 type being legal.

The generic DAG combine will unpromote part of the pattern and
prevent it from being matched in isel.
2023-11-09 09:51:31 -08:00
Craig Topper
24577bd089 [RISCV] Add BSET/BCLR/BINV/BEXT patterns for riscv-experimental-rv64-legal-i32. 2023-11-09 09:17:22 -08:00
Juergen Ributzka
6d1d7be133
Obsolete WebKit Calling Convention (#71567)
The WebKit Calling Convention was created specifically for the WebKit
FTL. FTL
doesn't use LLVM anymore and therefore this calling convention is
obsolete.

This commit removes the WebKit CC, its associated tests, and
documentation.
2023-11-09 09:08:41 -08:00
chuongg3
451bc3ec1d
[AArch64][GlobalISel] Legalize G_VECREDUCE_{MIN/MAX} (#69461)
Legalizes G_VECREDUCE_{MIN/MAX} and selects instructions for
vecreduce_{min/max}
2023-11-09 16:29:14 +00:00
Philip Reames
7ac8486e54
[RISCVInsertVSETVLI] Allow PRE with non-immediate AVLs (#71728)
Extend our PRE logic to cover non-immediate AVL values. This covers
large constant AVLs (which must be materialized in registers), and may
help some code written explicitly with intrinsics.

Looking at the existing code, I can't entirely figure out why I thought
we needed VL == AVL to perform the PRE. My best guess is that I was
worried about the VLMAX < VL < 2 * VLMAX case, but the spec explicitly
says that vsetvli must be determinist on any particular AVL value.

That case was, possibly by accident, covering another legality
precondition. Specifically, by only returning true for immediate and
VLMAX AVL values, we didn't encounter the case where the AVL was a
register and that register wasn't available in the predecessor (e.g. if
AVL is a load in the MBB block itself).

---------

Co-authored-by: Luke Lau <luke_lau@icloud.com>
2023-11-09 08:03:13 -08:00
Shengchen Kan
c9017bc793
[X86] Support EGPR (R16-R31) for APX (#70958)
1. Map R16-R31 to DWARF registers 130-145.
2. Make R16-R31 caller-saved registers.
3. Make R16-31 allocatable only when feature EGPR is supported
4. Make R16-31 availabe for instructions in legacy maps 0/1 and EVEX
space, except XSAVE*/XRSTOR

RFC:

https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4

Explanations for some seemingly unrelated changes:

inline-asm-registers.mir, statepoint-invoke-ra-enter-at-end.mir:
The immediate (TargetInstrInfo.cpp:1612) used for the regdef/reguse is
the encoding for the register
  class in the enum generated by tablegen. This encoding will change
  any time a new register class is added. Since the number is part
  of the input, this means it can become stale.

seh-directive-errors.s:
   R16-R31 makes ".seh_pushreg 17" legal

musttail-varargs.ll:
It seems some LLVM passes use the number of registers rather the number
of allocatable registers as heuristic.

This PR is to reland #67702 after #70222 in order to reduce some
compile-time regression when EGPR is not used.
2023-11-09 23:39:40 +08:00
Igor Kirillov
59a063d5c6
[ExpandMemCmp] Improve memcmp optimisation for boolean results (#71221)
This patch enhances the optimization of memcmp calls when only two
outcomes
are needed and comparison fits into one block, for example:

	bool result = memcmp(a, b, 6) > 0;

Previously, LLVM would generate unnecessary operations even when the
user of
memcmp was only interested in a binary outcome.
2023-11-09 11:52:04 +00:00
Craig Topper
e3c120a585 [RISCV] Add a Zbb+Zbs command line to rv*zbs.ll to get coverage on an existing isel pattern. NFC
This pattern wasn't tested

def : Pat<(XLenVT (and (rotl -2, (XLenVT GPR:$rs2)), GPR:$rs1)),
          (BCLR GPR:$rs1, GPR:$rs2)>;1
2023-11-08 22:31:49 -08:00
Jianjian Guan
d36eb79ccc
[RISCV] Support Strict FP arithmetic Op when only have Zvfhmin (#68867)
Include: STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV,
STRICT_FSQRT and STRICT_FMA.
2023-11-09 09:55:48 +08:00
Jun Wang
54470176af
[AMDGPU] Add inreg support for SGPR arguments (#67182)
Function parameters marked with inreg are supposed to be allocated to
SGPRs. However, for compute functions, this is ignored and function
parameters are allocated to VGPRs. This fix modifies CC_AMDGPU_Func in
AMDGPUCallingConv.td to use SGPRs if input arg is marked inreg.
---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2023-11-08 11:35:52 -08:00
Simon Pilgrim
671d10ad39 [X86] Add fabs test coverage for Issue #70947 2023-11-08 16:20:34 +00:00
Simon Pilgrim
45f1db4855 [X86] vec_fabs.ll - add AVX2 test coverage 2023-11-08 16:20:34 +00:00
Dinar Temirbulatov
3f9d385e58
[AArch64][SME] Shuffle lowering, assume that the minimal SVE register is 128-bit, when NOEN is not available. (#71647)
We can assume that the minimal SVE register is 128-bit, when NEON is not
available. And we can lower the shuffle shuffle operation with one
operand to TBL1 SVE instruction.
2023-11-08 14:37:49 +00:00
alexfh
067632e141
Revert "[DAGCombiner] Transform (icmp eq/ne (and X,C0),(shift X,C1)) to use rotate or to getter constants." due to a miscompile (#71598)
- Revert "[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))`
to use rotate or to getter constants." - causes a miscompile, see
112e49b381 (commitcomment-131943923)
- Revert "[X86] Fix gcc warning about mix of enumeral and non-enumeral
types. NFC", which fixes a compiler warning in the commit above
2023-11-08 15:07:12 +01:00
Simon Pilgrim
33ecd93596 [X86] Add test coverage for ABDS/ABDU patterns with mismatching extension types 2023-11-08 10:33:18 +00:00
Jay Foad
d5f3b3b3b1
[RegScavenger] Simplify state tracking for backwards scavenging (#71202)
Track the live register state immediately before, instead of after,
MBBI. This makes it simple to track the state at the start or end of a
basic block without a separate (and poorly named) Tracking flag.

This changes the API of the backward(MachineBasicBlock::iterator I)
method, which now recedes to the state just before, instead of just
after, *I. Some clients are simplified by this change.

There is one small functional change shown in the lit tests where
multiple spilled registers all need to be reloaded before the same
instruction. The reloads will now be inserted in the opposite order.
This should not affect correctness.
2023-11-08 09:49:07 +00:00
Zhaoxuan Jiang
76b53a0216
[AArch64] (NFC) Fix test after loosening requirements for register renaming (#71634)
The landing of https://reviews.llvm.org/D88663 renders the existing
stp-opt-with-renaming-undef-assert test useless because the picked
register for renaming becomes q0 instead of q16.
2023-11-08 09:12:57 +00:00
Diana Picus
3b905a0be5 [AMDGPU] ISel for llvm.amdgcn.set.inactive.chain.arg
Add patterns to select int_amdgcn_set_inactive_chain_arg to
V_SET_INACTIVE.

This could probably use some more testing, but at least for simple cases
V_SET_INACTIVE seems to mostly work out of the box.

Differential Revision: https://reviews.llvm.org/D158605
2023-11-08 09:53:47 +01:00
Diana Picus
39830fea28 [AMDGPU][PEI] Set up SP for chain functions
Initialize the SP to 0 in the prologue of functions with the
`amdgpu_cs_chain` or `amdgpu_cs_chain_preserve` calling conventions, but
only if they need one (i.e. if they contain calls to `amdgpu_gfx`
functions or if they have stack objects).

Also make sure we don't try to realign the stack (since 0 is aligned
enough).

Differential Revision: https://reviews.llvm.org/D156413
2023-11-08 09:27:34 +01:00
Craig Topper
24b11ba24d [RISCV][GISel] Use default lowering for G_DYN_STACKALLOC. 2023-11-07 23:59:27 -08:00
Diana
1fa58c7790
[AMDGPU] Callee saves for amdgpu_cs_chain[_preserve] (#71526)
Teach prolog epilog insertion how to handle functions with the
amdgpu_cs_chain or amdgpu_cs_chain_preserve calling conventions.

For amdgpu_cs_chain functions, we only need to preserve the inactive
lanes of VGPRs above v8, and only in the presence of calls via
@llvm.amdgcn.cs.chain.

For amdgpu_cs_chain_preserve functions, we will also need to preserve
the active lanes for registers above the last argument VGPR. AFAICT
there's no direct way to find out what the last argument VGPR is, so
instead the patch uses the fact that chain calls from
amdgpu_cs_chain_preserve functions can't use more VGPRs than the
caller's VGPR arguments. In other words, it removes the operands of
SI_CS_CHAIN_TC instructions from the list of callee saved registers.

For both calling conventions, registers v0-v7 never need to be saved and
restored, so we should never add them as WWM spills.

Differential Revision: https://reviews.llvm.org/D156412
2023-11-08 08:28:15 +01:00
Qiu Chaofan
5f295552f1
[PowerPC] Fix incorrect symbol name of frexp libcall (#71626)
frexpl is for ppc_fp128. The correct symbol name for f128 is frexpf128.
2023-11-08 14:41:19 +08:00
Luke Lau
11c182740a
[RISCV] Use masked pseudo peephole for reduction pseudos (#71508)
After #71483 we now have a way of marking masked pseudos as having an
unmasked
equivalent, but their mask shouldn't be folded unless it's all ones
since it
would affect the result.

This patch uses it to mark the pseudos for vredsum and friends, which in
turn
allows us to remove the unmasked patterns, and catch some other forms of
vmerge.
2023-11-08 12:46:06 +08:00
Qiu Chaofan
d199fd76f7 [NFC] Add f128 frexp intrinsics for PowerPC 2023-11-08 11:27:40 +08:00
Carl Ritson
af6ff98c53
[AMDGPU] Move WWM register pre-allocation to during regalloc (#70618)
Move SIPreAllocateWWMRegs pass to just before VGPR allocation. This
saves recomputation of the virtual matrix and live reg map, with the
slight regression in O0 that live intervals and slot indexes must be
computed.
2023-11-08 11:54:28 +09:00
Craig Topper
a6c80c4f70 [RISCV][GISel] Add support for G_SITOFP/G_UITOFP with F and D extensions. 2023-11-07 16:40:58 -08:00
Michael Maitland
ac4ff6168a
[CodeGen][MachineVerifier] Use TypeSize instead of unsigned for getRe… (#70881)
…gSizeInBits

This patch changes getRegSizeInBits to return a TypeSize instead of an
unsigned in the case that a virtual register has a scalable LLT. In the
case that register is physical, a Fixed TypeSize is returned.

The MachineVerifier pass is updated to allow copies between fixed and
scalable operands as long as the Src size will fit into the Dest size.

This is a precommit which will be stacked on by a change to GISel to
generate COPYs with a scalable destination but a fixed size source.

This patch is stacked on https://github.com/llvm/llvm-project/pull/70893
for the ability to use scalable vector types in MIR tests.
2023-11-07 14:38:46 -05:00
Craig Topper
374fb4126f [RISCV][GISel] Add support for G_FPTOSI/G_FPTOUI with F and D extensions. 2023-11-07 10:14:37 -08:00
Philip Reames
a7f35d54ee
[SCEV] Extend isImpliedCondOperandsViaRanges to independent predicates (#71110)
As far as I can tell, there's nothing in this code which actually
assumes the two predicates in (FoundLHS FoundPred FoundRHS) => (LHS Pred
RHS) are the same.

Noticed while investigating something else, this is purely an
oppurtunistic optimization while I'm looking at the code. Unfortunately,
this doesn't solve my original problem. :)
2023-11-07 07:25:47 -08:00
Pierre van Houtryve
5db63d29fd
[AMDGPU] PromoteAlloca: Handle load/store subvectors using non-constant indexes (#71505)
I assumed indexes were always ConstantInts, but that's not always the
case. They can be other things as well. We can easily handle that by
just emitting an add and let InstSimplify do the constant folding for
cases where it's really a ConstantInt.

Solves SWDEV-429935
2023-11-07 15:29:41 +01:00
Mitch Phillips
9b2439167d Revert "RegisterCoalescer: Generate test checks"
This reverts commit 9832eb4bdd92e876a59fea5a3502572dc9bcf870.

Reason: Dependency on change that was reverted in
ba385ae210
2023-11-07 15:09:08 +01:00
Mitch Phillips
9e50c6e6b5 Revert "Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG""
This reverts commit ba385ae210b3659bc9dfb78ef1d280d03c2c3b5a.

Reason: Broke the MSan buildbot. See comments on
ba385ae210
for more information.
2023-11-07 15:08:45 +01:00
Pierre van Houtryve
4428b01faa Reland: [AMDGPU] Remove Code Object V3 (#67118)
V3 has been deprecated for a while as well, so it can safely be removed
like V2 was removed.

- [Clang] Set minimum code object version to 4
- [lld] Fix tests using code object v3
- Remove code object V3 from the AMDGPU backend, and delete or port v3
tests to v4.
- Update docs to make it clear V3 can no longer be emitted.
2023-11-07 12:23:03 +01:00
Nikita Popov
6e56c35d19 [SpeculativeExecution] Add only-if-divergent-target pass option
The optimization pipeline enables this option, but it was not
preserved in -print-pipeline-passes output.
2023-11-07 11:49:37 +01:00
Graham Hunter
a850dbcc5c
[AArch64] Sink vscale calls into loops for better isel (#70304)
For more recent sve capable CPUs it is beneficial to use the inc*
instruction
to increment a value by vscale (potentially shifted or multiplied) even
in
short loops.

This patch tells codegenprepare to sink appropriate vscale calls into
blocks where they are used so that isel can match them.
2023-11-07 10:29:42 +00:00
Luke Lau
fd4804423b [RISCV] Add tests for pseudos that shouldn't have vmerge folded into them. NFC 2023-11-07 18:25:37 +08:00
Jim Lin
4306cfd40e
[RISCV] Fix using undefined variable %pt2 in mask-reg-alloc.mir testcase (#70764)
First PseudoVMERGE_VIM_M1 should use %pt1 as its operand instead of
%pt2.

I found this error when I add LiveIntervals analysis pass in my
downstream. And it crashes with the message:

```
Use of %7 does not have a corresponding definition on every path:
112r %6:vrnov0 = PseudoVMERGE_VIM_M1 %pt2:vrnov0(tied-def 0), %2:vr, 1, %4:vmv0, 1, 3
LLVM ERROR: Use not jointly dominated by defs.
```
2023-11-07 17:05:03 +08:00
Amara Emerson
e09184ffe0 [AArch64][GlobalISel] Remove -O0 from a legalizer test, which causes legalization failures to be silent.
This was masking legalization failures in some functions in the test. Remove
those for now since they don't actually work.
2023-11-07 00:36:15 -08:00
Nikita Popov
17764d2c87
[IR] Remove FP cast constant expressions (#71408)
Remove support for the fptrunc, fpext, fptoui, fptosi, uitofp and sitofp
constant expressions. All places creating them have been removed
beforehand, so this just removes the APIs and uses of these constant
expressions in tests.

With this, the only remaining FP operation that still has constant
expression support is fcmp.

This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
2023-11-07 09:34:16 +01:00
Yeting Kuo
a5c1ecada2
[RISCV] Disable performCombineVMergeAndVOps for PseduoVIOTA_M. (#71483)
This transformation might be illegal for `PseduoVIOTA_M`. The value of
`viota.m vd, vs2` is the prefix sum of vd2 and adding mask for it may
cause wrong prefix sum.
Take an example, the result of following expression is `{5, 5, 5, 3}`,
```
; v4 = {1, 1, 1, 1}
viota.m v1, v4
; v0 = {0, 0, 0, 1}, v1 = {0, 1, 2, 3}, v8 = {5, 5, 5, 5}
vmerge.vvm v8, v8, v1, v0.t
; v8 = {5, 5, 5, 3}
```
but if we merge them to `viota.m v8, v4, v0.t`, then the result of is
`{5, 5, 5, 0}`.
Also, we still does `performCombineVMergeAndVOps` for `voita.m` when
mask of `vmerge.vvm` is a true mask.
2023-11-07 16:21:35 +08:00
Matt Arsenault
9832eb4bdd RegisterCoalescer: Generate test checks
Forgot to add the FileCheck part and generate checks before pushing this.
2023-11-07 16:58:47 +09:00
Matt Arsenault
ba385ae210 Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"
This reverts commit e0f86ca2004b2d87ffe3c1e8242650a29fa98a82.

This was hitting some assertions which have since been relaxed.
2023-11-07 16:57:02 +09:00
Matt Arsenault
d34a10a47d
AMDGPU: Port AMDGPUAttributor to new pass manager (#71349) 2023-11-07 15:40:40 +09:00
Amara Emerson
6b69584660
[GlobalISel] Fall back for bf16 conversions. (#71470)
We don't support these correctly since we don't yet have FP types.
AMDGPU tests were silently miscompiling bf16 as if they were fp16.
2023-11-06 21:18:57 -08:00
Jay Foad
521ac12a25
[AMDGPU] Remove AMDGPUAsmPrinter::isBlockOnlyReachableByFallthrough (#71407)
The special handling for blocks ending with a long branch has been
unnecessary since D106445:
"[amdgpu] Add 64-bit PC support when expanding unconditional branches."
2023-11-06 16:29:52 +00:00
Jay Foad
1c6102d19b [AMDGPU] Regenerate checks for long-branch-reserve-register.ll 2023-11-06 15:33:23 +00:00