llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	cda2b01df7	[X86] combineUIntToFP - fold vXiY -> vXf16 using SINT_TO_FP(ZEXT()) AVX512 targets can just as easily use UINT_TO_FP/SINT_TO_FP, but pre-AVX512 only have SINT_TO_FP instructions	2023-11-15 11:51:38 +00:00
chuongg3	692fbd6c00	[AArch64][GlobalISel] Support udot lowering for vecreduce add (#70784 ) vecreduce_add(mul(ext, ext)) -> vecreduce_add(udot) vecreduce_add(ext) -> vecreduce_add(ext) Vectors of scalar size of 8-bits with element count of multiples of 8	2023-11-15 11:41:46 +00:00
Jay Foad	0b2c3c66e2	[AMDGPU] Add test case for issue #71685 The bug was fixed by #71710.	2023-11-15 11:23:03 +00:00
Jay Foad	1e8c17e9c7	[AMDGPU] Allow folding to FMAMK with SGPR and immediate operand on GFX10+ (#72258 ) Allow foldImmediate to create instructions like: v_fmamk_f32 v0, s0, 0x42000000, v0 This instruction has two "scalar values": s0 and 0x42000000. On GFX10+ this is allowed. This fold was originally implemented before the compiler supported GFX10, when all ASICs were limited to one scalar value.	2023-11-15 10:58:00 +00:00
Momchil Velikov	dedf2c6bb5	[AArch64] Refactor allocation of locals and stack realignment (#72028 ) Factor out some stack allocation in a separate function. This patch splits out the generic portion of a larger refactoring done as a part of stack clash protection support. The patch is almost, but not quite NFC. The only difference should be that where we have adjacent allocation of stack space for local SVE objects and non-local SVE objects the order of `sub sp, ...` and `addvl sp, ...` instructions is reversed, because now it's done with a single call to `emitFrameOffset` and it happens add/subtract the fixed part before the scalable part, e.g. addvl sp, sp, #-2 sub sp, sp, #16, lsl #12 sub sp, sp, #16 becomes sub sp, sp, #16, lsl #12 sub sp, sp, #16 addvl sp, sp, #-2	2023-11-15 09:27:01 +00:00
Qiu Chaofan	426ad99bb2	[PowerPC] Forbid f128 SELECT_CC optimized into fsel (#71497 )	2023-11-15 12:20:06 +08:00
Craig Topper	c44ac52e7d	[RISC][GISel] Consider ABI copies when picking register bank for G_LOAD/STORE. This is partially based on AArch64, but reduced to handle just the case we currently have a test for.	2023-11-14 18:57:08 -08:00
Michael Maitland	a4f77f1ca3	[RISCV][GISEL] Use MO_PLT when Callee is a Global or Symbol (#71982 ) SelectionDAG does the same thing in 74c83649547c2	2023-11-14 18:55:39 -05:00
Karthika Devi C	6726c99f88	[AArch64] Fix tryMergeAdjacentSTG function in PrologEpilog pass (#68873 ) The tryMergeAdjacentSTG function tries to merge multiple stg/st2g/stg_loop instructions. It doesn't verify the liveness of NZCV flag before moving around STGloop which also alters NZCV flags. This was not issue before the patch 5e612bc as these stack tag stores does not alter the NZCV flags. But after the change, this merge function leads to miscompilation because of control flow change in instructions. Added the check to to see if the first instruction after insert point reads or writes to NZCV flag and it's liveout state. This check happens after the filling of merge list just before merge and bails out if necessary.	2023-11-14 14:43:33 -08:00
Christudasan Devadasan	8f7e9f3793	[AMDGPU] Precommit lit test for #72140 .	2023-11-15 02:18:16 +05:30
Changpeng Fang	011c9eeb9a	GlobalISel: Guard return in llvm::getIConstantSplatVal (#71989 ) getIConstantVRegValWithLookThrough could return NULL.	2023-11-14 12:23:54 -08:00
AdityaK	b7669ed95f	Fix error message when regalloc eviction advisor analysis could not be created (#72165 )	2023-11-14 11:17:17 -08:00
Michael Maitland	a7bbcc4690	[RISCV][GISEL] Add support for lowerFormalArguments that contain scalable vector types (#70882 ) Scalable vector types from LLVM IR can be lowered to scalable vector types in MIR according to the RISCVAssignFn.	2023-11-14 13:15:41 -05:00
Acim Maravic	01c1c7a19e	[AMDGPU][CodeGen] Update support (soffset + offset) s_buffer_load's (#68302 ) getBaseWithConstantOffset() is used for scalar and non-scalar buffer loads. Diffrence between s_load and load instruction is that s_load instruction extends 32-bit offset to 64-bits, so a 32-bit (address + offset) should not cause unsigned 32-bit integer wraparound, because it performs addition in 64-bits.	2023-11-14 19:06:45 +01:00
Acim-Maravic	f3138524db	[AMDGPU] Generic lowering for rint and nearbyint (#69596 ) The are three different rounding intrinsics, that are brought down to same instruction. Co-authored-by: Acim Maravic <acim.maravic@amd.com>	2023-11-14 18:49:21 +01:00
Qiongsi Wu	c8b11091e8	[SelectionDAG] Handling Oversized Alloca Types under 32 bit Mode to Avoid Code Generator Crash (#71472 ) Situations may arise leading to negative `NumElements` argument of an `alloca` instruction. In this case the `NumElements` is treated as a large unsigned value. Such large arrays may cause the size constant to overflow during code generation under 32 bit mode, leading to a crash. This PR limits the constant's bit width to the width of the pointer on the target. With this fix, ``` alloca i32, i32 -1 ``` and ``` alloca [4294967295 x i32], i32 1 ``` generates the exact same PowerPC assembly code under 32 bit mode.	2023-11-14 10:52:51 -05:00
Momchil Velikov	33374c445d	[CFIFixup] Allow function prologues to span more than one basic block (#68984 ) The CFIFixup pass assumes a function prologue is contained in a single basic block. This assumption is broken with upcoming support for stack probing (`-fstack-clash-protection`) in AArch64 - the emitted probing sequence in a prologue may contain loops, i.e. more than one basic block. The generated CFG is not arbitrary though: * CFI instructions are outside of any loops * for any two CFI instructions of the function prologue one dominates and is post-dominated by the other Thus, for the prologue CFI instructions, if one is executed then all are executed, there is a total order of executions, and the last instruction in that order can be considered the end of the prologoue for the purpose of inserting the initial `.cfi_remember_state` directive. That last instruction is found by finding the first block in the post-order traversal which contains prologue CFI instructions.	2023-11-14 15:02:03 +00:00
David Sherwood	bdc0afc871	[CodeGen][AArch64] Set min jump table entries to 13 for AArch64 targets (#71166 ) There are some workloads that are negatively impacted by using jump tables when the number of entries is small. The SPEC2017 perlbench benchmark is one example of this, where increasing the threshold to around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum threshold based on empirical evidence rather than science, and just manually increased the threshold until I got the best performance without impacting other workloads. For neoverse-v1 I saw around ~0.2% improvement in the SPEC2017 integer geomean, and no overall change for neoverse-n1. If we find issues with this threshold later on we can always revisit this. The most significant SPEC2017 score changes on neoverse-v1 were: 500.perlbench_r: +1.6% 520.omnetpp_r: +0.6% and the rest saw changes < 0.5%. I updated CodeGen/AArch64/min-jump-table.ll to reflect the new threshold. For most of the affected tests I manually set the min number of entries back to 4 on the RUN line because the tests seem to rely upon this behaviour.	2023-11-14 13:00:28 +00:00
Simon Pilgrim	074e4ae0e7	[DAG] foldABSToABD - support abs(ext(x) - ext(y)) -> zext(abd*(x, y)) from different extension source types (#71670 ) We currently limit the fold to cases where we're extending from the same source type, but we can safely perform this using the wider of mismatching source types (we're really just interested in having extension bits on both sources), ensuring we don't create additional extensions/truncations.	2023-11-14 12:56:42 +00:00
Simon Pilgrim	668454183a	[X86] Regenerate expand-vp-int-intrinsics.ll Add missing X86 checks	2023-11-14 12:48:52 +00:00
Diana	eb3c02fdc2	[AMDGPU] Use immediates for stack accesses in chain funcs (#71913 ) Switch to using immediate offsets instead of the SP register to access objects on the current stack frame in chain functions. This means we no longer need to reserve a SP register just for accesing stack objects and it also allows us to set the SP (when one is actually needed) to the stack size from the very beginning. This only works if we use a FixedObject for the ScavengeFI, which is what we do for entry functions anyway (and we generally want to keep chain functions close to amdgpu_cs behaviour where we don't have a good reason to diverge).	2023-11-14 13:17:46 +01:00
Matthew Devereau	cc1244980b	[AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (#71795 ) Adds the builtins: void svldr_zt(uint64_t zt, const void rn) void svstr_zt(uint64_t zt, void rn) And the intrinsics: call void @llvm.aarch64.sme.ldr.zt(i32, ptr) tail call void @llvm.aarch64.sme.str.zt(i32, ptr) Patch by: Kerry McLaughlin <kerry.mclaughlin@arm.com>	2023-11-14 11:27:41 +00:00
Momchil Velikov	65eaec82c0	[CFIFixup] Precommit test ahead of multi-block prologues support (#72033 ) Precommit test for https://github.com/llvm/llvm-project/pull/68984	2023-11-14 10:45:28 +00:00
Nikita Popov	56c1d30183	[IR] Remove support for lshr/ashr constant expressions (#71955 ) Remove support for the lshr and ashr constant expressions. All places creating them have been removed beforehand, so this just removes the APIs and uses of these constant expressions in tests. This is part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.	2023-11-14 09:25:14 +01:00
Kai Luo	acdf7c8f27	[PowerPC] Precommit test to show impact of early-ifcvt on target without `isel`. NFC.	2023-11-14 06:10:05 +00:00
Craig Topper	028ed6125f	[RISCV][GISel] Support G_UMIN/UMAX/SMIN/SMAX legal with Zbb. (#72182 )	2023-11-13 20:57:38 -08:00
Craig Topper	0a459dd4e9	[RISCV] Add tests for selecting G_BRCOND+G_ICMP. NFC These should have been part of e0e0891d741588684b0803d7724e5080f9c75537	2023-11-13 15:29:34 -08:00
Craig Topper	29d75cb9dc	[RISCV][GISel] Update legalize-smin.mir and legalize-smax.mir to test G_SMIN/G_SMAX. Looks like an incomplete fixup was done after copying the umin/umax tests.	2023-11-13 13:49:14 -08:00
Craig Topper	915e092400	[RISCV] Select zext as sext when sign bit is 0 for -riscv-experimental-rv64-legal-i32 In our default SelectionDAG where i32 isn't legal, the zext will become and i64 AND and often get optimized out on its own. With i32 legal, we need to turn it in into sext.w and rely on RISCVOptWInstrs to remove it.	2023-11-13 12:21:36 -08:00
Craig Topper	05300222ba	[RISCV][GISel] Add really basic support for FP regbank selection for G_LOAD/G_STORE. (#70896 ) Coerce the register bank based on the users of the G_LOAD or the defining instruction for the G_STORE. s64 on rv32 is handled by forcing the FPRB register bank.	2023-11-13 12:12:16 -08:00
Craig Topper	70ce047f7e	[RISCV] Legalize G_CTLZ/G_CTLZ_ZERO_UNDEF/G_CTTZ/G_CTTZ_ZERO_UNDEF. (#72014 ) The base ISA does not support these operations. A future patch will enable them for Zbb.	2023-11-13 11:22:48 -08:00
Tom Stellard	877226f01f	[X86] Simplify regex in pr42616.ll test (#71980 )	2023-11-13 11:05:42 -08:00
Craig Topper	d8576e4542	[RISCV][GISel] Update RV64 legalize-ctpop.mir to account for constant shift amounts being i64 now. This changed while the ctpop patch was in review and I forgot to update it.	2023-11-13 10:38:36 -08:00
Craig Topper	90dd4c470f	[RISCV][GISel] Legalize G_CTPOP. (#72005 ) The base ISA does not have an instruction for this so we need to lower. Zbb support will come in a future patch.	2023-11-13 10:26:32 -08:00
Felipe de Azevedo Piovezan	83729e6716	[SelectionDAG] Disable FastISel for swiftasync functions (#70741 ) Most (x86) swiftasync functions tend to use both SelectionDAGISel and FastISel lowering: * FastISel argument lowering can only handle C calling convention. * FastISel fails mid-BB in a number of ways, including in simple `ret void` instructions under certain circumstances. This dance of SelectionDAG (argument) -> FastISel (some instructions) -> SelectionDAG(remaining instructions) is lossy; in particular, Argument information lowering is cleared after that first SelectionDAG run. Since swiftasync functions rely heavily on proper Argument lowering for debug information, this patch disables the use of FastISel in such functions.	2023-11-13 08:27:29 -08:00
Yingwei Zheng	d64d5ea102	[RISCV][CodeGenPrepare] Remove duplicated transform for zext. NFC. (#72053 ) After #71534 and #72052, the transform `zext -> zext nneg` in `RISCVCodeGenPrepare` is redundant.	2023-11-13 22:45:33 +08:00
David Green	2238363a5f	[AArch64] Prevent v1f16 vselect/setcc type expansion. (#72048 ) PR #71614 identified an issue in the lowering of v1f16 vector compares, where the `v1i1 setcc` is expanded to `v1i16 setcc`, and the `v1i16 setcc` tries to be expanded to a `v2i16 setcc` which fails. For floating point types we can let them scalarize instead though, generating a `setcc f16` that can be lowered using normal fp16 lowering. 07a8ff4892b2a54f0bd5843f863bcffa7a258f1f added a special case combine for v1 vselect to expand the predicate type to the same size as the fcmp operands. This turns that off for float types, allowing them to scalarize naturally, which hopefully fixes the issue by preventing the v1i16 setcc, meaning it wont try to widen to larger vectors. The codegen might not be optimal, but as far as I can tell everything generated successfully, providing that no `v1i16 setcc v1f16` instructions get generated.	2023-11-13 14:42:52 +00:00
Jay Foad	a4196666ac	[AMDGPU] Revert "Preliminary patch for divergence driven instruction selection. Operands Folding 1." (#71710 ) This reverts commit 201f892b3b597f24287ab6a712a286e25a45a7d9.	2023-11-13 13:53:10 +00:00
Simon Pilgrim	1a9fbf6166	[X86] combineLoad - reuse an existing VBROADCAST_LOAD constant for a smaller vector load of the same constant Extends the existing code that performed something similar for SUBV_BROADCAST_LOAD, but this is just for cases where AVX2 targets loads full width 128-bit constant vectors but broadcasts the equivalent 256-bit constant vector Fixes AVX2 case for Issue #70947	2023-11-13 11:59:04 +00:00
Jay Foad	47f29043f0	[AMDGPU] Fix a GlobalISel RUN line This was added in D149795 without actually enabling GlobalISel.	2023-11-13 11:30:15 +00:00
Nemanja Ivanovic	563720c3be	[RISCV] Fix lowering of negative zero with Zdinx 32-bit (#71869 ) The compiler currently abends with an impossible reg-to-reg copy when producing a negative zero FP immediate on RV32 with -Zdinx. This is because we emit a negation that uses FP registers. Emit the right node to produce correct code.	2023-11-13 07:38:14 +01:00
Craig Topper	44e8bea400	[GISel][AArch64] Notify the Observer when CTTZ lowering changes the opcode to CTPOP. (#72008 )	2023-11-12 19:36:24 -08:00
Carl Ritson	edc38a6cbd	[AMDGPU] Add option to pre-allocate SGPR spill VGPRs (#70626 ) SGPR spill VGPRs are WWM registers so allow them to be allocated by SIPreAllocateWWMRegs pass. This intentionally prevents spilling of these VGPRs when enabled.	2023-11-13 12:21:18 +09:00
Carl Ritson	52b247b1d3	[PHIElimination] Handle subranges in LiveInterval updates (#69429 ) Add subrange tracking and handling for LiveIntervals during PHI elimination. This requires extending MachineBasicBlock::SplitCriticalEdge to also update subrange intervals.	2023-11-13 12:16:26 +09:00
Han Shen	ca10e3b2e5	[LLVM][NVPTX] Add BF16 vector instruction and fix lowering rules (#69415 ) Add support for bf16x2 instructions such as setp, fneg, fabs, etc; Fix the instructions that were not differentiated between sm_80 and sm_90 support, such as fpround etc. Add more bf16 test cases to ensure the correct behavior. --------- Co-authored-by: shenhan03 <shenhan03@kuaishou.com>	2023-11-12 21:48:31 +08:00
David Green	a31538d29c	[AArch64] Add a test showing inefficient register allocation around loop IVs. NFC	2023-11-12 13:11:03 +00:00
Craig Topper	ee95819503	[RISCV][GISel] Legalize G_FSHL/G_FSHR.	2023-11-11 20:23:29 -08:00
Craig Topper	b0e97c7757	[RISCV][GISel] Legalize G_ROTL/G_ROTR.	2023-11-11 20:12:07 -08:00
Craig Topper	7965a21f7a	[RISCV] Add more packh patterns.	2023-11-11 19:31:23 -08:00
Craig Topper	bfb7843580	[RISCV] Add packw/packh patterns for -riscv-experimental-rv64-legal-i32	2023-11-11 17:52:22 -08:00

... 39 40 41 42 43 ...

52796 Commits