llvm-project

Author	SHA1	Message	Date
David Sherwood	37f8ffc64c	[AArch64][SME2] Add LLVM IR intrinsics for the vertical dot products Adds intrinsics for the following SME2 instructions: * BFVDOT (32-bit) * FVDOT (32-bit) * SVDOT (2-way) (32-bit) * SVDOT (4-way) (32-bit and 64-bit) * UVDOT (2-way) (32-bit) * UVDOT (4-way) (32-bit and 64-bit) * SUVDOT (32-bit) * USVDOT (32-bit) NOTE: These intrinsics are still in development and are subject to future changes. Differential Revision: https://reviews.llvm.org/D142000	2023-01-20 13:01:03 +00:00
Kerry McLaughlin	2e35d684d7	[AArch64][SME2] Add multi-vector multiply-add long intrinsics. Adds (single, multi & indexed) intrinsics for the following: - bfmlal/bfmlsl - fmlal/fmlsl - smlal/smlsl - umlal/umlsl This patch also extends SelectSMETileSlice to handle scaled vector select offsets. NOTE: These intrinsics are still in development and are subject to future changes. Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D142004	2023-01-20 11:33:29 +00:00
Kerry McLaughlin	cfd3a0e04a	[AArch64][SME2] Add multi-vector fused multiply-add/subtract intrinsics Adds intrinsics for the following: - fmla (single, multi & indexed) - fmls (single, multi & indexed) NOTE: These intrinsics are still in development and are subject to future changes. Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D141946	2023-01-20 11:14:41 +00:00
Craig Topper	f4fa34c359	Revert "[X86][WIP] Change precision control to FP80 during u64->fp32 conversion on Windows." This reverts commit 928a1764d6bdf84073c9d85875f45c1716d6ff12. Committed accidentally	2023-01-20 00:41:14 -08:00
Craig Topper	928a1764d6	[X86][WIP] Change precision control to FP80 during u64->fp32 conversion on Windows. This is an alternative to D141074 to fix the problem by adjusting the precision control dynamically. This isn't quite complete yet. I want to support fadd with an load folded into it too. That's the code we will usually generate. Posting for early review so we can do some testing of this solution. Differential Revision: https://reviews.llvm.org/D142178	2023-01-20 00:34:05 -08:00
Craig Topper	1692dff0b3	Revert "[X86] Avoid converting u64 to f32 using x87 on Windows" This reverts commit a6e3027db7ebe6863e44bafcfeaacc16bdc88a3f. Chrome and Halide are both reporting issues with importing builtins. Maybe the better direction is to manually adjust FPCW for the inline sequence on Windows.	2023-01-19 21:36:07 -08:00
Ben Shi	c919ea5b48	[AVR] Fix incorrectly printed global symbol operands in inline-asm Fixes https://github.com/llvm/llvm-project/issues/58879 Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D142096	2023-01-20 09:45:00 +08:00
Jeffrey Byrnes	1f08d3bc3a	[AMDGPU] Further reduce attaching of implicit operands to spills Extension of https://reviews.llvm.org/D141101 to even further reduce the amount of implicit operands we attach. The main benefit is to improve cability of post-ra scheduler, and reduce unneeded dependency resolution (e.g. inserting snops). Unfortunately, we run into regressions if we completely minimize the amount implicit operands (naively), we run into some regressions (e.g. dual_movs are replaced with multiple calls to v_mov). This is even more reason to switch to LiveRegUnits. Nonetheless, this patch removes the operands which we can for free (more or less). Change-Id: Ib4f409202b36bdbc59eed615bc2d19fa8bd8c057 Differential Revision: https://reviews.llvm.org/D141557 Change-Id: I8b039e3c0d39436b384083f8beb947ee1b1730b2	2023-01-19 14:31:07 -08:00
Evgenii Stepanov	bd3ee371e9	Revert "[AArch64][v8.3A] Avoid inserting implicit landing pads (PACI*SP)" Linux kernel sets SCTRL_EL1.BT0 and BT1 to 1 unconditionally, which makes PACIASP equivalent to BTI C + PACIA LR,SP. Use the shorter instruction sequence by default. I'm not aware of anyone who needs the opposite. They are welcome to revert to the current behavior under a subtarget feature or an environment check. This reverts commit 571c8c5263a79293aaadae07b11feb36726eaf53. Differential Revision: https://reviews.llvm.org/D141978	2023-01-19 14:09:22 -08:00
Paul Kirth	af9a452e57	[llvm][codegen] Fix non-determinism in StackFrameLayoutAnalysisPass output We were iterating over a SmallPtrSet when outputting slot variables. This is still correct but made the test fail under reverse iteration. This patch replaces the SmallPtrSet with a SmallVector. Also remove the "Stack Frame Layout" lines from arm64-opt-remarks-lazy-bfi test, since those also break under reverse iteration. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D142127	2023-01-19 20:04:14 +00:00
Stanislav Mekhanoshin	63e7e9c875	[AMDGPU] Treat WMMA the same as MFMA for sched_barrier MFMA and WMMA essentially the same thing, but apear on different ASICs. Differential Revision: https://reviews.llvm.org/D142062	2023-01-19 10:52:31 -08:00
Stanislav Mekhanoshin	e7f080b359	[AMDGPU] Introduce separate register limit bias in scheduler Current implementation abuses ErrorMargin to apply an additional bias to VGPR and SGPR limits under a high register pressure. The ErrorMargin exists to account for inaccuracies of the RP tracker and not to tackle an excess pressure. Introduce separate bias for this purpose and also make it different for SGPRs and VGPRs as we may want to use different values in the future. This is supposed to be NFC, however there is a subtle difference when subtracting a margin overflows the limit. Doing two subtractions makes it less probable, although manifests only in mir tests with an artificially small register budget. Differential Revision: https://reviews.llvm.org/D142051	2023-01-19 10:51:40 -08:00
Zino Benaissa	68f45796ed	[AARCH64][SVE] Do not optimize vector conversions shuffle_vector instructions are serialized targeting SVE fixed vectors, see https://reviews.llvm.org/D139111. This patch disables optimizeExtendOrTruncateConversion peepholes that generates shuffle_vector. Differential Revision: https://reviews.llvm.org/D141439	2023-01-19 16:50:31 +00:00
Jonas Paulsson	a9c5a98f81	[SystemZ] Improvement in tryRxSBG(). Only allow replacements of nodes that have a single user. This is better as simple instructions (e.g. XGRK) are one cycle faster, and it helps in cases where both inputs share a common node. Review: Ulrich Weigand	2023-01-19 10:43:52 -06:00
Michal Paszkowski	2bcedd4643	[SPIR-V] Emit OpExecutionMode ContractionOff for no FP_CONTRACT metadata This change makes the AsmPrinter emit OpExecutionMode ContractionOff when both opencl.enable.FP_CONTRACT and spirv.ExecutionMode metadata are not present. Differential Revision: https://reviews.llvm.org/D141734	2023-01-19 15:26:34 +01:00
Alex Brachet	67bd3c58c0	[X86] Add register definitions for cfi directives Add {e,r}flags, {g,f}s.base registers so they can be referenced in cfi directives,. They are not otherwise useable in any instructions, but can be implicitly pushed to the stack like with pushf for {e,r}flags. Differential Revision: https://reviews.llvm.org/D141879	2023-01-19 14:10:31 +00:00
Amaury Séchet	7e5681cf29	[DAG] Peek through ZEXT/TRUNC in foldAddSubMasked1 Fix a regression in D141883 Depends on D141883 Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D141884	2023-01-19 13:23:42 +00:00
Michal Paszkowski	786cb151d9	[SPIR-V] Add -opaque-pointers=0 to some LIT tests Differential Revision: https://reviews.llvm.org/D142061	2023-01-19 14:02:14 +01:00
Amaury Séchet	2826869d7b	[DAG] Do not combine any_ext when we combine and into zext. This transofrm loses information that can be useful for other transforms. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D141883	2023-01-19 12:37:05 +00:00
David Sherwood	871815e062	[AArch64][SVE2p1] Add SVE2.1 while (predicate-pair) intrinsics Adds intrinsics for the following instructions: * WHILEGE (predicate pair) * WHILEGT (predicate pair) * WHILEHI (predicate pair) * WHILEHS (predicate pair) * WHILELE (predicate pair) * WHILELO (predicate pair) * WHILELS (predicate pair) * WHILELT (predicate pair) I've added an opcode selector called SelectOpcodeFromVT to AArch64ISelDAGToDAG.cpp that we will extend in future to select opcodes from different MVTs. For now, the only use is for selecting predicate types. NOTE: These intrinsics are still in development and are subject to future changes. Differential Revision: https://reviews.llvm.org/D141936	2023-01-19 09:32:20 +00:00
wanglei	7fa0a3c923	[LoongArch] Add an option for MCInstPrinter to print numeric reg names `-loongarch-numeric-reg` for llvm-mc and llc. `-M numeric` (which matches GNU objdump) for llvm-objdump and llvm-mc. Reviewed By: SixWeining Differential Revision: https://reviews.llvm.org/D141743	2023-01-19 16:29:22 +08:00
icedrocket	a6e3027db7	[X86] Avoid converting u64 to f32 using x87 on Windows The code below currently prints less accurate values only on Windows 32-bit. On Windows, the default precision control on x87 is only 53-bit, and FADD triggers rounding with that precision, so the final result may be less accurate. This revision avoids less accurate conversions by using library calls instead. ``` int main() { int64_t n = 0b0000000000111111111111111111111111011111111111111111111111111111; printf("%lld, %.0f, %.0f", n, (float)n, (float)(uint64_t)n); return 0; } ``` Reviewed By: craig.topper, lebedev.ri Differential Revision: https://reviews.llvm.org/D141074	2023-01-18 22:41:34 -08:00
Paul Kirth	557a5bc336	[codegen] Add StackFrameLayoutAnalysisPass Issue #58168 describes the difficulty diagnosing stack size issues identified by -Wframe-larger-than. For simple code, its easy to understand the stack layout and where space is being allocated, but in more complex programs, where code may be heavily inlined, unrolled, and have duplicated code paths, it is no longer easy to manually inspect the source program and understand where stack space can be attributed. This patch implements a machine function pass that emits remarks with a textual representation of stack slots, and also outputs any available debug information to map source variables to those slots. The new behavior can be used by adding `-Rpass-analysis=stack-frame-layout` to the compiler invocation. Like other remarks the diagnostic information can be saved to a file in a machine readable format by adding -fsave-optimzation-record. Fixes: #58168 Reviewed By: nickdesaulniers, thegameg Differential Revision: https://reviews.llvm.org/D135488	2023-01-19 01:51:14 +00:00
Jeffrey Byrnes	f0e7ae085f	[AMDGPU] Run autogen checks on test Change-Id: I46f2ced9ceac592c2a93a00631014a806d4b0693	2023-01-18 16:12:18 -08:00
Tulio Magno Quites Machado Filho	1136cf1721	[SystemZ] Implement lowering of GET_ROUNDING Add support for _FLT_ROUNDS_ in SystemZ. Patch by Tulio Magno Quites Machado Filho. Reviewed By: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D140988	2023-01-18 14:41:19 -06:00
Roman Lebedev	7460842fb2	[DAGCombiner] `combineShuffleOfSplatVal()`: don't assert that shuffle is non-undef As per the test case from Steven Johnson in https://reviews.llvm.org/rGf8d9097168b7#1165311 we can indeed encounter such shuffles, that produce all-undef after folding, before something else manages to optimize them away.	2023-01-18 18:45:08 +03:00
Samuel Parker	32af267447	[NFC][WebAssembly] Add tests Add more variations to fpclamptosat.	2023-01-18 13:30:53 +00:00
Dmitry Bushev	3cba33c56f	[RISCV][ISelLowering] Fix select lowering issue Fix bug that leads to some pseudo instructions not being lowered. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D141395	2023-01-18 15:09:40 +03:00
David Green	21df504399	[DAG][ARM][AArch64] Transform max(a,b) - min(a,b) -> abd(a,b) This adds both signed and unsigned transforms for max(a, b) - min(a, b) -> abd(a, b). unsigned: https://alive2.llvm.org/ce/z/RF4jGQ signed: https://alive2.llvm.org/ce/z/Cjr2zE Fixes: #59894 Differential Revision: https://reviews.llvm.org/D141706	2023-01-18 11:44:26 +00:00
Tim Northover	3ed58d4df6	AArch64: allocate small fixed args properly in varargs functions. On Darwin, function arguments occupy their real size when passed on the stack (e.g. an i16 only consumes 2 bytes). This means that, even for fixed args in varargs calls we need to keep track of the original type being passed before any DAG/GISel promotions. Existing logic only applied this fix to the non-varargs case leading to mismatch between caller & callee in those situations. On Linux & Windows these arguments always occupy a 64-bit slot anyway so there's no special handling needed.	2023-01-18 11:35:24 +00:00
chenglin.bi	45299fb0f9	Reapply [AArch64] fold subs ugt/ult to ands when the second operand is mask/pow2 Origianl patch made a mistake that ugt is reverse cc should be ule. And ule < C will be generalize to ult < C + 1. So the new patch add support for ult < Pow2 case. https://alive2.llvm.org/ce/z/naBw5A Reviewed By: samtebbs, chapuni Differential Revision: https://reviews.llvm.org/D141829	2023-01-18 19:24:20 +08:00
Luke Lau	a0d80c2398	[RISCV] Generalize performFP_TO_INTCombine to vectors Like in the scalar domain, combine calls to (fp_to_int (ftrunc X)) on scalable and fixed-length vectors into a single vfcvt instruction. For truncating rounds, the static vfcvt.rtz rounding mode is used. Otherwise use the VFCVT_RM_ variants to set the rounding mode dynamically. Closes https://github.com/llvm/llvm-project/issues/56737 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D141599	2023-01-18 10:53:24 +00:00
Luke Lau	98b9340c07	[RISCV][NFC] Add test cases for rounding vfcvt Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D141600	2023-01-18 10:53:21 +00:00
David Green	e26ec330c4	[DAG][AArch64][ARM] Combine abd(sub(x, y)) to abd if the sub is nsw This implements the fold (abs (sub nsw x, y)) -> abds(x, y). Providing the sub is nsw this appears to be valid without the extensions that are usually used for abds. https://alive2.llvm.org/ce/z/XHVaB3. The equivalent abdu combine seems to not be valid. Differential Revision: https://reviews.llvm.org/D141665	2023-01-18 10:10:52 +00:00
Nikita Popov	9ed2f14c87	[AsmParser] Remove typed pointer auto-detection IR is now always parsed in opaque pointer mode, unless -opaque-pointers=0 is explicitly given. There is no automatic detection of typed pointers anymore. The -opaque-pointers=0 option is added to any remaining IR tests that haven't been migrated yet. Differential Revision: https://reviews.llvm.org/D141912	2023-01-18 09:58:32 +01:00
Pierre van Houtryve	fd3300123d	[CodeGen] Prevent overlapping subregs in getCoveringSubRegIndexes If `getCoveringSubRegIndexes` returns a set of subregister indexes where some subregisters overlap others, it can create unsatisfiable copy bundles that eventually cause VirtRegRewriter to error out due to "cycles in copy bundle". We can simply prevent this by making the algorithm skip over subregisters indexes that would cause an overlap with already-covered lanes. Note that in the case of AMDGPU, this problem is caused by the lack of subregisters indexes for 13/14/15-register tuples. We have everything up until 12, then we have 16 and 32 but nothing between 12 and 16. This means that the best candidate to do the least amount of copies when splitting a 29-register tuple was to copy (e.g.) 0-15 and 14-29, causing an overlap. With this change, getCoveringSubRegIndexes will now prefer using something like 0-15, 16-28 and 1 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D141576	2023-01-18 03:50:17 -05:00
Pierre van Houtryve	6a60a68e72	[AMDGPU] Precommit test for D141576 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D141903	2023-01-18 03:49:37 -05:00
chendewen	d0942df43e	[AArch64][SVE] Add more intrinsics in 'isZeroingInactiveLanes'. The REINTERPRET_CAST operation generates redundant and and ptrue instructions. For some instructions, this is redundant, because its inactive lanes are zeroed by construction. For example. Codegen before: ``` facgt p2.d, p0/z, z4.d, z1.d ptrue p1.d and p1.b, p2/z, p2.b, p1.b ``` After: ``` facgt p1.d, p0/z, z4.d, z1.d ``` ref: https://reviews.llvm.org/D129851 Reviewed By:sdesmalen,paulwalker-arm Differential Revision:https://reviews.llvm.org/D141469	2023-01-18 11:06:13 +08:00
Anshil Gandhi	5073a622a7	[MachineBasicBlock] Explicit FT branching param Introduce a parameter in getFallThrough() to optionally allow returning the fall through basic block in spite of an explicit branch instruction to it. This parameter is set to false by default. Introduce getLogicalFallThrough() which calls getFallThrough(false) to obtain the block while avoiding insertion of a jump instruction to its immediate successor. This patch also reverts the changes made by D134557 and solves the case where a jump is inserted after another jump (branch-relax-no-terminators.mir). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D140790	2023-01-17 17:12:08 -07:00
Rahman Lavaee	3d6841b2b1	[Propeller] Use Fixed MBB ID instead of volatile MachineBasicBlock::Number. Let Propeller use specialized IDs for basic blocks, instead of MBB number. This allows optimizations not just prior to asm-printer, but throughout the entire codegen. This patch only implements the functionality under the new `LLVM_BB_ADDR_MAP` version, but the old version is still being used. A later patch will change the used version. ####Background Today Propeller uses machine basic block (MBB) numbers, which already exist, to map native assembly to machine IR. This is done as follows. - Basic block addresses are captured and dumped into the `LLVM_BB_ADDR_MAP` section just before the AsmPrinter pass which writes out object files. This ensures that we have a mapping that is close to assembly. - Profiling mapping works by taking a virtual address of an instruction and looking up the `LLVM_BB_ADDR_MAP` section to find the MBB number it corresponds to. - While this works well today, we need to do better when we scale Propeller to target other Machine IR optimizations like spill code optimization. Register allocation happens earlier in the Machine IR pipeline and we need an annotation mechanism that is valid at that point. - The current scheme will not work in this scenario because the MBB number of a particular basic block is not fixed and changes over the course of codegen (via renumbering, adding, and removing the basic blocks). - In other words, the volatile MBB numbers do not provide a one-to-one correspondence throughout the lifetime of Machine IR. Profile annotation using MBB numbers is restricted to a fixed point; only valid at the exact point where it was dumped. - Further, the object file can only be dumped before AsmPrinter and cannot be dumped at an arbitrary point in the Machine IR pass pipeline. Hence, MBB numbers are not suitable and we need something else. ####Solution We propose using fixed unique incremental MBB IDs for basic blocks instead of volatile MBB numbers. These IDs are assigned upon the creation of machine basic blocks. We modify `MachineFunction::CreateMachineBasicBlock` to assign the fixed ID to every newly created basic block. It assigns `MachineFunction::NextMBBID` to the MBB ID and then increments it, which ensures having unique IDs. To ensure correct profile attribution, multiple equivalent compilations must generate the same Propeller IDs. This is guaranteed as long as the MachineFunction passes run in the same order. Since the `NextBBID` variable is scoped to `MachineFunction`, interleaving of codegen for different functions won't cause any inconsistencies. The new encoding is generated under the new version number 2 and we keep backward-compatibility with older versions. ####Impact on Size of the `LLVM_BB_ADDR_MAP` Section Emitting the Propeller ID results in a 23% increase in the size of the `LLVM_BB_ADDR_MAP` section for the clang binary. Reviewed By: tmsriram Differential Revision: https://reviews.llvm.org/D100808	2023-01-17 15:25:29 -08:00
Craig Topper	29f5e9e6f0	[RISCV] Use zeroext instead of signext in mask reduction tests. NFC This is more consistent with ABI and how bools on RISC-V are represented. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D141963	2023-01-17 15:15:45 -08:00
Mircea Trofin	5898be19e6	[mlgo] Remove the protobuf dependency The dependency was due to the log format. This change switches to the previously-introduced (D139370) "dependency-free" logger instead of the protobuf-based one. A subsequent change will clean out the unnecessary abstraction left behind. This change drops the logger unittest, we have sufficient test coverage via lit tests, and a unit test would require adding, unnecesarily, a log reader (the reader is expected to be python, for the ML side, and there is a reader for that under Analysis/models, used for tests). Differential Revision: https://reviews.llvm.org/D141720	2023-01-17 13:12:27 -08:00
Craig Topper	b8b756c6f1	[RISCV] Add missing check prefixes to vreductions-mask.ll. NFC There's a conflict between the riscv32 and riscv64 output for some tests which caused the script to drop the check lines. Add specific check prefixes for these cases.	2023-01-17 12:57:51 -08:00
Noah Goldstein	ca5d11751e	Add additional tests for ctlz{_zero_undef} to test folding with xor; NFC Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D141549	2023-01-17 11:04:26 -08:00
David Green	1f2c37afbe	[AArch64][SVE] Implement isVScaleKnownToBeAPowerOfTwo According to https://developer.arm.com/documentation/102105/ia-00/?lang=en > Arm is making a retrospective change to the SVE architecture to remove > the capability of selecting a non-power-of-two vector length in > non-Streaming SVE as well as in Streaming SVE mode. Specific updates as > a result of this change will be communicated in due course. This patch implements the isVScaleKnownToBeAPowerOfTwo method to teach DAG Combines that VScale will be known to be a power of 2, which helps reduce or simplify some expressions (notably the udiv in vector trip count expressions). Differential Revision: https://reviews.llvm.org/D141486	2023-01-17 15:49:29 +00:00
Francesco Petrogalli	229162d4d7	[MIScheduler] Print top/down cycle in the SUnit dump. Add an extra command line option to `llc` that allows checking at what cycle an instruction has been scheduled by the machine scheduler. Differential Revision: https://reviews.llvm.org/D141289	2023-01-17 15:55:43 +01:00
zhongyunde	2deb10c108	[AArch64][SVE] Fix crash for DestructiveBinaryComm zero merging This fix is similar to D124325, and I find the DestructiveBinaryComm operation type also may be allocated same register, so insert the LSL. movprfx z0.s, p0/z, z0.s lsl z0.b, p0/m, z0.b, #0 fmul z0.s, p0/m, z0.s, z0.s Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D141471	2023-01-17 20:45:59 +08:00
David Green	e49367e7f3	[ARM] Fix i1 shuffle lowering with multiple operands. The existing lowering of i1 vector shuffle was only considering single-source shuffles, always assuming the second was undef. This extends that to properly handle both operands.	2023-01-17 11:29:51 +00:00
chenglin.bi	6c37fbdfcf	Revert "[AArch64] fold subs ugt/ult to ands when the second operand is a mask" This reverts commit 4a64024c1410692197e4b54e27e7b269a67c78f4. The original commit made a misstake that ugt reverse should be ule	2023-01-17 18:41:44 +08:00
Samuel Parker	bba9221d9f	[NFC][WebAssembly] Update test Run update_llc_test_checks.py on address-offsets.ll	2023-01-17 10:34:43 +00:00

1 2 3 4 5 ...

46631 Commits