52796 Commits

Author SHA1 Message Date
Amara Emerson
8fb12f8ade [AArch64][GlobalISel] Re-generate stale test checks. 2023-09-01 08:29:34 -07:00
Sander de Smalen
b09a52d589 [AArch64] NFC: Move llvm.aarch64.sve.fadda tests back 2023-09-01 13:37:51 +00:00
David Green
03f338e7e0 [AArch64] Ensure we do not access illegal operands in tryCombineMULLWithUZP1
https://github.com/llvm/llvm-project/issues/65015 shows a case where
tryCombineMULLWithUZP1 could attempt to look at the wrong operand of another
user instruction. This adds an extra else as if we don't find the right opcode,
we don't need to check the operands.

Differential Revision: https://reviews.llvm.org/D159282
2023-09-01 14:12:31 +01:00
Matt Arsenault
ee795fd1cf AMDGPU: Handle rounding intrinsic exponents in isKnownIntegral
https://reviews.llvm.org/D158999
2023-09-01 08:22:16 -04:00
Matt Arsenault
def228553c AMDGPU: Use pown instead of pow if known integral
https://reviews.llvm.org/D158998
2023-09-01 08:22:16 -04:00
Matt Arsenault
deefda7074 AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32
These codegen correctly but f64 doesn't. This prevents losing fast
math flags on the way to the underlying intrinsic.

https://reviews.llvm.org/D158997
2023-09-01 08:22:16 -04:00
Matt Arsenault
dac8f974b5 AMDGPU: Handle sitofp and uitofp exponents in fast pow expansion
https://reviews.llvm.org/D158996
2023-09-01 08:22:16 -04:00
Matt Arsenault
699685b718 AMDGPU: Enable assumptions in AMDGPULibCalls
https://reviews.llvm.org/D159006
2023-09-01 08:22:16 -04:00
Matt Arsenault
a45b787c91 AMDGPU: Turn pow libcalls into powr
powr is just pow with the assumption that x >= 0, otherwise nan. This
fires at least 6 times in luxmark

https://reviews.llvm.org/D158908
2023-09-01 08:22:16 -04:00
Matt Arsenault
f5d8a9b1bb AMDGPU: Simplify handling of constant vectors in libcalls
Also fixes not handling the partially undef case.

https://reviews.llvm.org/D158905
2023-09-01 08:22:16 -04:00
Matt Arsenault
afb24cbb69 AMDGPU: Don't require all flags to expand fast powr
This was requiring all fast math flags, which is practically
useless. This wouldn't fire using all the standard OpenCL fast math
flags. This only needs afn nnan and ninf.

https://reviews.llvm.org/D158904
2023-09-01 08:22:16 -04:00
Sander de Smalen
9e9be99c97 [AArch64][SME] Disable remat of VL-dependent ops when function changes streaming mode.
This is a way to prevent the register allocator from inserting instructions
which behave differently for different runtime vector-lengths, inside a
call-sequence which changes the streaming-SVE mode before/after the call.

I've considered using BUNDLEs in Machine IR, but found that using this is
not possible for a few reasons:
* Most passes don't look inside BUNDLEs, but some passes would need to
  look inside these call-sequence bundles, for example the PrologEpilog
  pass (to remove the CALLSEQSTART/END), a PostRA pass to remove COPY
  instructions, or the AArch64PseudoExpand pass.
* Within the streaming-mode-changing call sequence, one of the instructions
  is a CALLSEQEND. The corresponding CALLSEQBEGIN (AArch64::ADJCALLSTACKUP)
  is outside this sequence. This means we'd end up with a BUNDLE that has
  [SMSTART, COPY, BL, ADJCALLSTACKUP, COPY, SMSTOP]. The MachineVerifier
  doesn't accept this, and we also can't move the CALLSEQSTART into the
  call sequence.

Maybe in the future we could model this differently by modelling
the runtime vector-length as a value that's used by certain operations
(similar to e.g. NCZV flags) and clobbered by SMSTART/MMSTOP, such that the
register allocator can consider these as actual dependences and avoid
rematerialization. For now we just want to address the immediate problem.

Reviewed By: paulwalker-arm, aemerson

Differential Revision: https://reviews.llvm.org/D159193
2023-09-01 12:13:27 +00:00
Sander de Smalen
7e815dd76d [AArch64][SME] Create new interface for isSVEAvailable.
When a function is compiled to be in Streaming(-compatible) mode, the full
set of SVE instructions may not be available. This patch adds an interface
to query that and changes the codegen for FADDA (not legal in Streaming-SVE
mode) to instead be expanded for fixed-length vectors, or otherwise not to
code-generate for scalable vectors.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D156109
2023-09-01 12:00:36 +00:00
Simon Pilgrim
2a81396b1b [DAG] SimplifyDemandedBits - add SMIN/SMAX KnownBits comparison analysis
Followup to D158364

Also, final fix for Issue #59902 which noted that the snippet should just return 1
2023-09-01 12:42:30 +01:00
Simon Pilgrim
b3d454950c [X86] Add tests showing failure to use KnownBits known comparison results to remove SMIN/SMAX
Followup to D158364
2023-09-01 12:04:57 +01:00
Simon Pilgrim
aca8b9d0d5 [DAG] SimplifyDemandedBits - if we're only demanding the signbits, a MIN/MAX node can be simplified to a OR or AND node
Extension to the signbit case, if the signbits extend down through all the demanded bits then SMIN/SMAX/UMIN/UMAX nodes can be simplified to a OR/AND/AND/OR.

Alive2: https://alive2.llvm.org/ce/z/mFVFAn (general case)

Differential Revision: https://reviews.llvm.org/D158364
2023-09-01 10:56:32 +01:00
Rainer Orth
6ef767c075 [MC][ELF] Don't emit .note.GNU-stack sections on Solaris
LLVM currently emits `.note.GNU-stack` sections on all ELF targets.

However, Solaris ld doesn't know/care about them.  Even worse, with the
revised Solaris GNU ld patch (D85309 <https://reviews.llvm.org/D85309>),
there are hundreds of warnings:

  /usr/gnu/bin/ld: warning: /usr/lib/amd64/crtn.o: missing .note.GNU-stack
section implies executable stack
  /usr/gnu/bin/ld: NOTE: This behaviour is deprecated and will be removed
in a future version of the linker

The Solaris crts are not going to change here, and even if they were, GNU
ld would emit `PT_GNU_STACK` segments that Solaris `ld.so.1` ignores.

So the note sections are completely useless on Solaris and this patch
disables their creation.

Instead, Solaris has its own mechanisms to control stack executability:
`PT_SUNW_STACK`, `DT_SUNW_SX_NXSTACK` and the system-wide control via
`sxadm` where `nxstack` defaults to on.

Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11` with Solaris
ld and GNU ld, and `x86_64-pc-linux-gnu`.

Differential Revision: https://reviews.llvm.org/D159179
2023-09-01 11:20:42 +02:00
David Green
55dc73af97 [AArch64][GISel] Expand coverage of FRem.
This adds some more extensive test coverage for frem through global isel,
making sure that vector types are all scalarized and all fp16 become f32
libcalls.
2023-09-01 09:21:53 +01:00
Chen Zheng
a69cb20768 [NFC] Fix the PowerPC broken cases in D152215.
Reviewed By: qiucf

Differential Revision: https://reviews.llvm.org/D159052
2023-09-01 02:07:48 -04:00
Amara Emerson
8ba1c38a0d [AArch64][GlobalISel] Add heuristics for G_FCONSTANT localization.
Now that in an earlier commit we adopt the heuristics for SDAG's expansion
of 32/64b fpimms to either GPR materializations or CP load, we can also improve
the localizer to also understand the same heuristics. This avoids localizing
expensive immediates as that increases code size.

The combination of these two changes results in minor improvements in CTMark -Os,
and bigger improvements in some other cases.
2023-08-31 22:23:36 -07:00
Amara Emerson
49d5bb4b34 [AArch64][GlobalISel] Materialize 64b FP immediates instead of loading if profitable.
This just mimics what the SDAG backend does.
2023-08-31 22:23:36 -07:00
Craig Topper
319aba645f [RISCV] Teach MatInt to use (ADD_UW X, (SLLI X, 32)) to materialize some constants.
If the high and low 32 bits are the same, we try to use
(ADD X, (SLLI X, 32)) but that only works if bit 31 is clear since
the low 32 bits will be sign extended.

If we have Zba we can use add.uw to zero the sign extended bits.

Reviewed By: reames, wangpc

Differential Revision: https://reviews.llvm.org/D159253
2023-08-31 20:24:34 -07:00
Jim Lin
10c8619701 [RISCV] Remove unused check prefixes for tests. NFC 2023-09-01 10:42:53 +08:00
Philip Reames
fe19822198 [RISCV] Add test coverage for high lmul non-constant build_vectors 2023-08-31 14:37:49 -07:00
Arthur Eubanks
2a2f02e19f [X86] Use 64-bit jump table entries for large code model PIC
With the large code model, the label difference may not fit into 32 bits.
Even if we assume that any individual function is no larger than 2^32
and use a difference from the function entry to the target destination,
things like BOLT can rearrange blocks (even if BOLT doesn't necessarily
work with the large code model right now).

set directives avoid static relocations in some 32-bit entry cases, but
don't worry about set directives for 64-bit jump table entries (we can
do that later if somebody really cares about it).

check-llvm in a bootstrapped clang with the large code model passes.

Fixes #62894

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D159297
2023-08-31 14:13:38 -07:00
Nick Desaulniers
fc5306d128 Revert "[RISCV] Teach RISCVMergeBaseOffset to handle inline asm"
This reverts commit f281543a48905e58359c6b0f1b9c3b42bd67e315.

Sami Tolvanen reports that this breaks the Linux kernel's arch=RISCV
defconfig.

Link: https://github.com/ClangBuiltLinux/linux/issues/1928
2023-08-31 14:02:53 -07:00
Hiroshi Yamauchi
8942d3047c [AArch64][WinCFI] Handle cases where no SEH opcodes in the prologue
but there are some in the epilogue.

Make a decision whether or not to have a startepilogue/endepilogue
based on whether we actually insert SEH opcodes in the epilogue,
rather than whether we had SEH opcodes in the prologue or not.

This fixes an assert failure when there are no SEH opcodes in the
prologue but there are SEH opcodes in the epilogue (for example, when
there is no stack frame but there are stack arguments) which was not
covered in https://reviews.llvm.org/D88641.

Assertion failed: HasWinCFI == MF.hasWinCFI(), file C:\Users\hiroshi\llvm-project\llvm\lib\Target\AArch64\AArch64FrameLowering.cpp, line 1988

Differential Revision: https://reviews.llvm.org/D159238
2023-08-31 12:43:26 -07:00
Daniel Paoliello
0c5c7b52f0 Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:

* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).

Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.

Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)

This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D149367
2023-08-31 12:06:50 -07:00
Luke Lau
1664eb05d0 [RISCV] Fix crash during during i1 vector bitreverse lowering
A shuffle of v256i1 with a large enough minimum vlen might make it through type
legalization and into lowering. In this case, zvl1024b was enough. The
bitreverse shuffle lowering would then try to convert this to a v1i256 type
which is invalid (v1i128 exists though, which is why the existing v128i1 tests
were fine).

This patch checks to make sure that the new type is not only legal but also
valid.

Reviewed By: craig.topper, reames

Differential Revision: https://reviews.llvm.org/D159215
2023-08-31 19:39:08 +01:00
Konstantina Mitropoulou
17fc78e7a4 [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.
This reverts commit 48fa79a503a7cf380f98b6335fbd349afae1bd86.

Reviewed By: brooksmoses

Differential Revision: https://reviews.llvm.org/D159240
2023-08-31 11:36:50 -07:00
Craig Topper
d1c3784adf [RISCV] Prefer ShortForwardBranch over the fully generic Zicond expansion.
Short forward branch is shorter than (or (czero.eqz), (czero.nez)).

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D159295
2023-08-31 11:07:35 -07:00
hstk30
db8f6c009e [AArch64] Fix arm neon vstx lane memVT size
StN lane memory size set too big lead to alias analysis goes wrong.

Fixes https://github.com/llvm/llvm-project/issues/64696

Differential Revision: https://reviews.llvm.org/D158611
2023-08-31 17:54:57 +01:00
Simon Pilgrim
9734b2256d [X86] combineCMP - use widenMaskVector to allow us to handle sub-i8 mask cases when just comparing a bool element against zero 2023-08-31 16:57:22 +01:00
Stephen Peckham
282da83756 [XCOFF][AIX] Issue an error when specifying an alias for a common symbol
Summary:

There is no support in XCOFF for labels on common symbols. Therefore, an alias for a common symbol is not supported. Issue an error in the front end when an aliasee is a common symbol. Issue a similar error in the back end in case an IR specifies an alias for a common symbol.

Reviewed by: hubert.reinterpretcast, DiggerLin

Differential Revision:  https://reviews.llvm.org/D158739
2023-08-31 11:43:47 -04:00
Sander de Smalen
a6293228fd Reland "[AArch64][SME] Add support for Copy/Spill/Fill of strided ZPR2/ZPR4 registers."
This patch contains a few changes:

* It changes the alignment of the strided/contiguous ZPR2/ZPR4 registers to
  128-bits. This is important, because when we spill these registers to the
  stack, the address doesn't need to be 256/512 bits aligned because we
  split the single-store/reload pseudo instruction up into multiple
  STR_ZXI/LDR_ZXI (single vector store/load) instructions, which only
  require a 128-bit alignment. Additionally, an alignment larger than the
  stack-alignment is not supported for scalable vectors.

* It adds support for these register classes in storeRegToStackSlot,
  loadRegFromStackSlot and copyPhysReg.

* It adds tests only for the strided forms. There is no need to also
  test the contiguous forms, because a register such as z2_z3 or
  z4_z5_z6_z7 are also part of the regular ZPR2 and ZPR4 register classes,
  respectively, which are already covered and tested.

Reviewed By: dtemirbulatov

Differential Revision: https://reviews.llvm.org/D159189
2023-08-31 15:03:19 +00:00
Sander de Smalen
d6bd6f244e Revert "[AArch64][SME] Add support for Copy/Spill/Fill of strided ZPR2/ZPR4 registers."
This reverts commit 64da981b8b259c18313560bf629e1a8b3b7c1d52.
2023-08-31 14:14:56 +00:00
Sander de Smalen
64da981b8b [AArch64][SME] Add support for Copy/Spill/Fill of strided ZPR2/ZPR4 registers.
This patch contains a few changes:

* It changes the alignment of the strided/contiguous ZPR2/ZPR4 registers to
  128-bits. This is important, because when we spill these registers to the
  stack, the address doesn't need to be 256/512 bits aligned because we
  split the single-store/reload pseudo instruction up into multiple
  STR_ZXI/LDR_ZXI (single vector store/load) instructions, which only
  require a 128-bit alignment. Additionally, an alignment larger than the
  stack-alignment is not supported for scalable vectors.

* It adds support for these register classes in storeRegToStackSlot,
  loadRegFromStackSlot and copyPhysReg.

* It adds tests only for the strided forms. There is no need to also
  test the contiguous forms, because a register such as z2_z3 or
  z4_z5_z6_z7 are also part of the regular ZPR2 and ZPR4 register classes,
  respectively, which are already covered and tested.

Reviewed By: dtemirbulatov

Differential Revision: https://reviews.llvm.org/D159189
2023-08-31 13:47:46 +00:00
Simon Pilgrim
239ab16ec1 [X86] combineCMP - attempt to simplify KSHIFTR mask element extractions when just comparing against zero
We can just bitcast the pre-shifted mask as an integer and use TEST/BT directly.

This can be extended further to better handle sub-i8 mask cases, but just getting rid of KSHIFTR nodes makes a notable difference.
2023-08-31 13:14:01 +01:00
Simon Pilgrim
f33c64dd56 [X86] addr-mode-matcher-2.ll - add more sext/zext nsw/nuw permutations
As suggested by D159198
2023-08-31 12:18:44 +01:00
Igor Kirillov
e2cb07c322 [CodeGen] Fix incorrect insertion point selection for reduction nodes in ComplexDeinterleavingPass
When replacing ComplexDeinterleavingPass::ReductionOperation, we can do it
either from the Real or Imaginary part. The correct way is to take whichever
is later in the BasicBlock, but before the patch, we just always took the
Real part.

Fixes https://github.com/llvm/llvm-project/issues/65044

Differential Revision: https://reviews.llvm.org/D159209
2023-08-31 10:38:01 +00:00
David Spickett
69f1cd58aa [llvm][AArch64] Disable BigByval with expensive checks
AArch64 incorrectly nests ADJCALLSTACKDOWN/ADJCALLSTACKUP which fails
to verify with expensive checks enabled.

See https://github.com/llvm/llvm-project/issues/62137 and
https://github.com/llvm/llvm-project/issues/62138.
2023-08-31 10:15:45 +00:00
wangpc
f281543a48 [RISCV] Teach RISCVMergeBaseOffset to handle inline asm
For inline asm with memory operands, we can merge the offset into
the second operand of memory constraint operands.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158062
2023-08-31 15:39:12 +08:00
wangpc
0d73259cf2 [RISCV] Precommit test for D158062
Tests for callbr, multi-operands and multi-asm are added.

Reviewed By: wangpc, craig.topper

Differential Revision: https://reviews.llvm.org/D158149
2023-08-31 15:39:12 +08:00
David Green
58a2f839fd [AArch64][GISel] Expand coverage of FDiv and move into place.
This adds some more extensive test coverage for fdiv through global isel,
switching the opcodes to use the more complete ActionDefinitions to handle more
cases and moving it into the position of the existing code which is no longer
needed.
2023-08-30 22:09:53 +01:00
Philip Reames
079c968eb9 [RISCV] Form vmv.s.f/x from single element splats via DAG combine
This re-implements the special casing we had in lowerScalarSplat as a DAG combine. As can be seen in the tests, this ends up triggering in a bunch more cases.

The semantically interesting bit of this change is the use of the implicit truncate semantics for when XLEN > SEW. We'd already been doing this for vmv.v.x, but this change extends e.g. the constant matching to make the same assumption about vmv.s.x. Per my reading of the specification, this should be fine, and if anything, is more obviously true of vmv.s.x than vmv.v.x.

Differential Revision: https://reviews.llvm.org/D158874
2023-08-30 12:44:36 -07:00
Philip Reames
fd465f377c [RISCV] Move vmv_s_x and vfmv_s_f special casing to DAG combine
We'd discussed this in the original set of patches months ago, but decided against it. I think we should reverse ourselves here as the code is significantly more readable, and we do pick up cases we'd missed by not calling the appropriate helper routine.

Differential Revision: https://reviews.llvm.org/D158854
2023-08-30 12:04:48 -07:00
Matt Arsenault
5f8ee45d5a AMDGPU: Implement llvm.get.rounding
There are really two rounding modes, so only return the standard
values if both modes are the same. Otherwise, return a bitmask
representing the two modes.

Annoyingly the register doesn't use the same values as FLT_ROUNDS. Use
a simple integer table we can shift into to convert.

https://reviews.llvm.org/D153158
2023-08-30 14:06:13 -04:00
Simon Pilgrim
967d95382d [X86] lowerShuffleAsVALIGN - extend to recognize basic shifted element masks
Try to use VALIGN as a cross-lane version of VSHLDQ/VSRLDQ
2023-08-30 18:32:55 +01:00
Simon Pilgrim
d3d71b8d5b [X86] Add shuffle tests cases showing missed opportunity to use VALIGN 2023-08-30 18:32:55 +01:00
Philip Reames
eb5fe55b81 [RISCV] Expand codegen test coverage extract/insert element
In particular, at mixed LMULS, high LMULS, and types which require splitting.
2023-08-30 10:10:40 -07:00