llvm-project

Author	SHA1	Message	Date
Amara Emerson	8fb12f8ade	[AArch64][GlobalISel] Re-generate stale test checks.	2023-09-01 08:29:34 -07:00
Sander de Smalen	b09a52d589	[AArch64] NFC: Move llvm.aarch64.sve.fadda tests back	2023-09-01 13:37:51 +00:00
David Green	03f338e7e0	[AArch64] Ensure we do not access illegal operands in tryCombineMULLWithUZP1 https://github.com/llvm/llvm-project/issues/65015 shows a case where tryCombineMULLWithUZP1 could attempt to look at the wrong operand of another user instruction. This adds an extra else as if we don't find the right opcode, we don't need to check the operands. Differential Revision: https://reviews.llvm.org/D159282	2023-09-01 14:12:31 +01:00
Matt Arsenault	ee795fd1cf	AMDGPU: Handle rounding intrinsic exponents in isKnownIntegral https://reviews.llvm.org/D158999	2023-09-01 08:22:16 -04:00
Matt Arsenault	def228553c	AMDGPU: Use pown instead of pow if known integral https://reviews.llvm.org/D158998	2023-09-01 08:22:16 -04:00
Matt Arsenault	deefda7074	AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32 These codegen correctly but f64 doesn't. This prevents losing fast math flags on the way to the underlying intrinsic. https://reviews.llvm.org/D158997	2023-09-01 08:22:16 -04:00
Matt Arsenault	dac8f974b5	AMDGPU: Handle sitofp and uitofp exponents in fast pow expansion https://reviews.llvm.org/D158996	2023-09-01 08:22:16 -04:00
Matt Arsenault	699685b718	AMDGPU: Enable assumptions in AMDGPULibCalls https://reviews.llvm.org/D159006	2023-09-01 08:22:16 -04:00
Matt Arsenault	a45b787c91	AMDGPU: Turn pow libcalls into powr powr is just pow with the assumption that x >= 0, otherwise nan. This fires at least 6 times in luxmark https://reviews.llvm.org/D158908	2023-09-01 08:22:16 -04:00
Matt Arsenault	f5d8a9b1bb	AMDGPU: Simplify handling of constant vectors in libcalls Also fixes not handling the partially undef case. https://reviews.llvm.org/D158905	2023-09-01 08:22:16 -04:00
Matt Arsenault	afb24cbb69	AMDGPU: Don't require all flags to expand fast powr This was requiring all fast math flags, which is practically useless. This wouldn't fire using all the standard OpenCL fast math flags. This only needs afn nnan and ninf. https://reviews.llvm.org/D158904	2023-09-01 08:22:16 -04:00
Sander de Smalen	9e9be99c97	[AArch64][SME] Disable remat of VL-dependent ops when function changes streaming mode. This is a way to prevent the register allocator from inserting instructions which behave differently for different runtime vector-lengths, inside a call-sequence which changes the streaming-SVE mode before/after the call. I've considered using BUNDLEs in Machine IR, but found that using this is not possible for a few reasons: * Most passes don't look inside BUNDLEs, but some passes would need to look inside these call-sequence bundles, for example the PrologEpilog pass (to remove the CALLSEQSTART/END), a PostRA pass to remove COPY instructions, or the AArch64PseudoExpand pass. * Within the streaming-mode-changing call sequence, one of the instructions is a CALLSEQEND. The corresponding CALLSEQBEGIN (AArch64::ADJCALLSTACKUP) is outside this sequence. This means we'd end up with a BUNDLE that has [SMSTART, COPY, BL, ADJCALLSTACKUP, COPY, SMSTOP]. The MachineVerifier doesn't accept this, and we also can't move the CALLSEQSTART into the call sequence. Maybe in the future we could model this differently by modelling the runtime vector-length as a value that's used by certain operations (similar to e.g. NCZV flags) and clobbered by SMSTART/MMSTOP, such that the register allocator can consider these as actual dependences and avoid rematerialization. For now we just want to address the immediate problem. Reviewed By: paulwalker-arm, aemerson Differential Revision: https://reviews.llvm.org/D159193	2023-09-01 12:13:27 +00:00
Sander de Smalen	7e815dd76d	[AArch64][SME] Create new interface for isSVEAvailable. When a function is compiled to be in Streaming(-compatible) mode, the full set of SVE instructions may not be available. This patch adds an interface to query that and changes the codegen for FADDA (not legal in Streaming-SVE mode) to instead be expanded for fixed-length vectors, or otherwise not to code-generate for scalable vectors. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D156109	2023-09-01 12:00:36 +00:00
Simon Pilgrim	2a81396b1b	[DAG] SimplifyDemandedBits - add SMIN/SMAX KnownBits comparison analysis Followup to D158364 Also, final fix for Issue #59902 which noted that the snippet should just return 1	2023-09-01 12:42:30 +01:00
Simon Pilgrim	b3d454950c	[X86] Add tests showing failure to use KnownBits known comparison results to remove SMIN/SMAX Followup to D158364	2023-09-01 12:04:57 +01:00
Simon Pilgrim	aca8b9d0d5	[DAG] SimplifyDemandedBits - if we're only demanding the signbits, a MIN/MAX node can be simplified to a OR or AND node Extension to the signbit case, if the signbits extend down through all the demanded bits then SMIN/SMAX/UMIN/UMAX nodes can be simplified to a OR/AND/AND/OR. Alive2: https://alive2.llvm.org/ce/z/mFVFAn (general case) Differential Revision: https://reviews.llvm.org/D158364	2023-09-01 10:56:32 +01:00
Rainer Orth	6ef767c075	[MC][ELF] Don't emit .note.GNU-stack sections on Solaris LLVM currently emits `.note.GNU-stack` sections on all ELF targets. However, Solaris ld doesn't know/care about them. Even worse, with the revised Solaris GNU ld patch (D85309 <https://reviews.llvm.org/D85309>), there are hundreds of warnings: /usr/gnu/bin/ld: warning: /usr/lib/amd64/crtn.o: missing .note.GNU-stack section implies executable stack /usr/gnu/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker The Solaris crts are not going to change here, and even if they were, GNU ld would emit `PT_GNU_STACK` segments that Solaris `ld.so.1` ignores. So the note sections are completely useless on Solaris and this patch disables their creation. Instead, Solaris has its own mechanisms to control stack executability: `PT_SUNW_STACK`, `DT_SUNW_SX_NXSTACK` and the system-wide control via `sxadm` where `nxstack` defaults to on. Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11` with Solaris ld and GNU ld, and `x86_64-pc-linux-gnu`. Differential Revision: https://reviews.llvm.org/D159179	2023-09-01 11:20:42 +02:00
David Green	55dc73af97	[AArch64][GISel] Expand coverage of FRem. This adds some more extensive test coverage for frem through global isel, making sure that vector types are all scalarized and all fp16 become f32 libcalls.	2023-09-01 09:21:53 +01:00
Chen Zheng	a69cb20768	[NFC] Fix the PowerPC broken cases in D152215. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D159052	2023-09-01 02:07:48 -04:00
Amara Emerson	8ba1c38a0d	[AArch64][GlobalISel] Add heuristics for G_FCONSTANT localization. Now that in an earlier commit we adopt the heuristics for SDAG's expansion of 32/64b fpimms to either GPR materializations or CP load, we can also improve the localizer to also understand the same heuristics. This avoids localizing expensive immediates as that increases code size. The combination of these two changes results in minor improvements in CTMark -Os, and bigger improvements in some other cases.	2023-08-31 22:23:36 -07:00
Amara Emerson	49d5bb4b34	[AArch64][GlobalISel] Materialize 64b FP immediates instead of loading if profitable. This just mimics what the SDAG backend does.	2023-08-31 22:23:36 -07:00
Craig Topper	319aba645f	[RISCV] Teach MatInt to use (ADD_UW X, (SLLI X, 32)) to materialize some constants. If the high and low 32 bits are the same, we try to use (ADD X, (SLLI X, 32)) but that only works if bit 31 is clear since the low 32 bits will be sign extended. If we have Zba we can use add.uw to zero the sign extended bits. Reviewed By: reames, wangpc Differential Revision: https://reviews.llvm.org/D159253	2023-08-31 20:24:34 -07:00
Jim Lin	10c8619701	[RISCV] Remove unused check prefixes for tests. NFC	2023-09-01 10:42:53 +08:00
Philip Reames	fe19822198	[RISCV] Add test coverage for high lmul non-constant build_vectors	2023-08-31 14:37:49 -07:00
Arthur Eubanks	2a2f02e19f	[X86] Use 64-bit jump table entries for large code model PIC With the large code model, the label difference may not fit into 32 bits. Even if we assume that any individual function is no larger than 2^32 and use a difference from the function entry to the target destination, things like BOLT can rearrange blocks (even if BOLT doesn't necessarily work with the large code model right now). set directives avoid static relocations in some 32-bit entry cases, but don't worry about set directives for 64-bit jump table entries (we can do that later if somebody really cares about it). check-llvm in a bootstrapped clang with the large code model passes. Fixes #62894 Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D159297	2023-08-31 14:13:38 -07:00
Nick Desaulniers	fc5306d128	Revert "[RISCV] Teach RISCVMergeBaseOffset to handle inline asm" This reverts commit f281543a48905e58359c6b0f1b9c3b42bd67e315. Sami Tolvanen reports that this breaks the Linux kernel's arch=RISCV defconfig. Link: https://github.com/ClangBuiltLinux/linux/issues/1928	2023-08-31 14:02:53 -07:00
Hiroshi Yamauchi	8942d3047c	[AArch64][WinCFI] Handle cases where no SEH opcodes in the prologue but there are some in the epilogue. Make a decision whether or not to have a startepilogue/endepilogue based on whether we actually insert SEH opcodes in the epilogue, rather than whether we had SEH opcodes in the prologue or not. This fixes an assert failure when there are no SEH opcodes in the prologue but there are SEH opcodes in the epilogue (for example, when there is no stack frame but there are stack arguments) which was not covered in https://reviews.llvm.org/D88641. Assertion failed: HasWinCFI == MF.hasWinCFI(), file C:\Users\hiroshi\llvm-project\llvm\lib\Target\AArch64\AArch64FrameLowering.cpp, line 1988 Differential Revision: https://reviews.llvm.org/D159238	2023-08-31 12:43:26 -07:00
Daniel Paoliello	0c5c7b52f0	Emit the CodeView `S_ARMSWITCHTABLE` debug symbol for jump tables The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information: * The address of the branch instruction that uses the jump table. * The address of the jump table. * The "base" address that the values in the jump table are relative to. * The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted). Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to. Documentation for the symbol can be found in the Microsoft PDB library dumper: `0fe89a942f/cvdump/dumpsym7.cpp (L5518)` This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D149367	2023-08-31 12:06:50 -07:00
Luke Lau	1664eb05d0	[RISCV] Fix crash during during i1 vector bitreverse lowering A shuffle of v256i1 with a large enough minimum vlen might make it through type legalization and into lowering. In this case, zvl1024b was enough. The bitreverse shuffle lowering would then try to convert this to a v1i256 type which is invalid (v1i128 exists though, which is why the existing v128i1 tests were fine). This patch checks to make sure that the new type is not only legal but also valid. Reviewed By: craig.topper, reames Differential Revision: https://reviews.llvm.org/D159215	2023-08-31 19:39:08 +01:00
Konstantina Mitropoulou	17fc78e7a4	[DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points. This reverts commit 48fa79a503a7cf380f98b6335fbd349afae1bd86. Reviewed By: brooksmoses Differential Revision: https://reviews.llvm.org/D159240	2023-08-31 11:36:50 -07:00
Craig Topper	d1c3784adf	[RISCV] Prefer ShortForwardBranch over the fully generic Zicond expansion. Short forward branch is shorter than (or (czero.eqz), (czero.nez)). Reviewed By: reames Differential Revision: https://reviews.llvm.org/D159295	2023-08-31 11:07:35 -07:00
hstk30	db8f6c009e	[AArch64] Fix arm neon vstx lane memVT size StN lane memory size set too big lead to alias analysis goes wrong. Fixes https://github.com/llvm/llvm-project/issues/64696 Differential Revision: https://reviews.llvm.org/D158611	2023-08-31 17:54:57 +01:00
Simon Pilgrim	9734b2256d	[X86] combineCMP - use widenMaskVector to allow us to handle sub-i8 mask cases when just comparing a bool element against zero	2023-08-31 16:57:22 +01:00
Stephen Peckham	282da83756	[XCOFF][AIX] Issue an error when specifying an alias for a common symbol Summary: There is no support in XCOFF for labels on common symbols. Therefore, an alias for a common symbol is not supported. Issue an error in the front end when an aliasee is a common symbol. Issue a similar error in the back end in case an IR specifies an alias for a common symbol. Reviewed by: hubert.reinterpretcast, DiggerLin Differential Revision: https://reviews.llvm.org/D158739	2023-08-31 11:43:47 -04:00
Sander de Smalen	a6293228fd	Reland "[AArch64][SME] Add support for Copy/Spill/Fill of strided ZPR2/ZPR4 registers." This patch contains a few changes: * It changes the alignment of the strided/contiguous ZPR2/ZPR4 registers to 128-bits. This is important, because when we spill these registers to the stack, the address doesn't need to be 256/512 bits aligned because we split the single-store/reload pseudo instruction up into multiple STR_ZXI/LDR_ZXI (single vector store/load) instructions, which only require a 128-bit alignment. Additionally, an alignment larger than the stack-alignment is not supported for scalable vectors. * It adds support for these register classes in storeRegToStackSlot, loadRegFromStackSlot and copyPhysReg. * It adds tests only for the strided forms. There is no need to also test the contiguous forms, because a register such as z2_z3 or z4_z5_z6_z7 are also part of the regular ZPR2 and ZPR4 register classes, respectively, which are already covered and tested. Reviewed By: dtemirbulatov Differential Revision: https://reviews.llvm.org/D159189	2023-08-31 15:03:19 +00:00
Sander de Smalen	d6bd6f244e	Revert "[AArch64][SME] Add support for Copy/Spill/Fill of strided ZPR2/ZPR4 registers." This reverts commit 64da981b8b259c18313560bf629e1a8b3b7c1d52.	2023-08-31 14:14:56 +00:00
Sander de Smalen	64da981b8b	[AArch64][SME] Add support for Copy/Spill/Fill of strided ZPR2/ZPR4 registers. This patch contains a few changes: * It changes the alignment of the strided/contiguous ZPR2/ZPR4 registers to 128-bits. This is important, because when we spill these registers to the stack, the address doesn't need to be 256/512 bits aligned because we split the single-store/reload pseudo instruction up into multiple STR_ZXI/LDR_ZXI (single vector store/load) instructions, which only require a 128-bit alignment. Additionally, an alignment larger than the stack-alignment is not supported for scalable vectors. * It adds support for these register classes in storeRegToStackSlot, loadRegFromStackSlot and copyPhysReg. * It adds tests only for the strided forms. There is no need to also test the contiguous forms, because a register such as z2_z3 or z4_z5_z6_z7 are also part of the regular ZPR2 and ZPR4 register classes, respectively, which are already covered and tested. Reviewed By: dtemirbulatov Differential Revision: https://reviews.llvm.org/D159189	2023-08-31 13:47:46 +00:00
Simon Pilgrim	239ab16ec1	[X86] combineCMP - attempt to simplify KSHIFTR mask element extractions when just comparing against zero We can just bitcast the pre-shifted mask as an integer and use TEST/BT directly. This can be extended further to better handle sub-i8 mask cases, but just getting rid of KSHIFTR nodes makes a notable difference.	2023-08-31 13:14:01 +01:00
Simon Pilgrim	f33c64dd56	[X86] addr-mode-matcher-2.ll - add more sext/zext nsw/nuw permutations As suggested by D159198	2023-08-31 12:18:44 +01:00
Igor Kirillov	e2cb07c322	[CodeGen] Fix incorrect insertion point selection for reduction nodes in ComplexDeinterleavingPass When replacing ComplexDeinterleavingPass::ReductionOperation, we can do it either from the Real or Imaginary part. The correct way is to take whichever is later in the BasicBlock, but before the patch, we just always took the Real part. Fixes https://github.com/llvm/llvm-project/issues/65044 Differential Revision: https://reviews.llvm.org/D159209	2023-08-31 10:38:01 +00:00
David Spickett	69f1cd58aa	[llvm][AArch64] Disable BigByval with expensive checks AArch64 incorrectly nests ADJCALLSTACKDOWN/ADJCALLSTACKUP which fails to verify with expensive checks enabled. See https://github.com/llvm/llvm-project/issues/62137 and https://github.com/llvm/llvm-project/issues/62138.	2023-08-31 10:15:45 +00:00
wangpc	f281543a48	[RISCV] Teach RISCVMergeBaseOffset to handle inline asm For inline asm with memory operands, we can merge the offset into the second operand of memory constraint operands. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158062	2023-08-31 15:39:12 +08:00
wangpc	0d73259cf2	[RISCV] Precommit test for D158062 Tests for callbr, multi-operands and multi-asm are added. Reviewed By: wangpc, craig.topper Differential Revision: https://reviews.llvm.org/D158149	2023-08-31 15:39:12 +08:00
David Green	58a2f839fd	[AArch64][GISel] Expand coverage of FDiv and move into place. This adds some more extensive test coverage for fdiv through global isel, switching the opcodes to use the more complete ActionDefinitions to handle more cases and moving it into the position of the existing code which is no longer needed.	2023-08-30 22:09:53 +01:00
Philip Reames	079c968eb9	[RISCV] Form vmv.s.f/x from single element splats via DAG combine This re-implements the special casing we had in lowerScalarSplat as a DAG combine. As can be seen in the tests, this ends up triggering in a bunch more cases. The semantically interesting bit of this change is the use of the implicit truncate semantics for when XLEN > SEW. We'd already been doing this for vmv.v.x, but this change extends e.g. the constant matching to make the same assumption about vmv.s.x. Per my reading of the specification, this should be fine, and if anything, is more obviously true of vmv.s.x than vmv.v.x. Differential Revision: https://reviews.llvm.org/D158874	2023-08-30 12:44:36 -07:00
Philip Reames	fd465f377c	[RISCV] Move vmv_s_x and vfmv_s_f special casing to DAG combine We'd discussed this in the original set of patches months ago, but decided against it. I think we should reverse ourselves here as the code is significantly more readable, and we do pick up cases we'd missed by not calling the appropriate helper routine. Differential Revision: https://reviews.llvm.org/D158854	2023-08-30 12:04:48 -07:00
Matt Arsenault	5f8ee45d5a	AMDGPU: Implement llvm.get.rounding There are really two rounding modes, so only return the standard values if both modes are the same. Otherwise, return a bitmask representing the two modes. Annoyingly the register doesn't use the same values as FLT_ROUNDS. Use a simple integer table we can shift into to convert. https://reviews.llvm.org/D153158	2023-08-30 14:06:13 -04:00
Simon Pilgrim	967d95382d	[X86] lowerShuffleAsVALIGN - extend to recognize basic shifted element masks Try to use VALIGN as a cross-lane version of VSHLDQ/VSRLDQ	2023-08-30 18:32:55 +01:00
Simon Pilgrim	d3d71b8d5b	[X86] Add shuffle tests cases showing missed opportunity to use VALIGN	2023-08-30 18:32:55 +01:00
Philip Reames	eb5fe55b81	[RISCV] Expand codegen test coverage extract/insert element In particular, at mixed LMULS, high LMULS, and types which require splitting.	2023-08-30 10:10:40 -07:00

... 59 60 61 62 63 ...

52796 Commits