llvm-project

Author	SHA1	Message	Date
Matt Arsenault	35c2a7542c	AMDGPU: Fix asserting on fast f16 pown https://reviews.llvm.org/D158903	2023-08-25 19:56:20 -04:00
Snehasish Kumar	3dbabeadd6	[CodeGen] Remove unused option in MachineFunctionSplitter. The option was added in github.com/llvm/llvm-project/commit/90ab85a but it doesn't seem to be used. The triple check has been removed so this shouldn't be required going forward. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D158885	2023-08-25 21:24:28 +00:00
Craig Topper	398c855457	[RISCV] Improve splatPartsI64WithVL for vlmax scalable vector constants where Hi and Lo are the same. We can use a 32-bit splat and bitcast to i64 vector. This only handles the case where we are using vlmax so that the new vl is cheap to compute. This could be generalized to double the VL. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D158879	2023-08-25 14:15:41 -07:00
Craig Topper	4184bafa9b	[RISCV] Refactor lowerSPLAT_VECTOR_PARTS to use splatPartsI64WithVL for scalable vectors. There was quite a bit of duplication between splatPartsI64WithVL and the scalable vector handling in lowerSPLAT_VECTOR_PARTS, but scalable vector had one additional case. Move that case to splatPartsI64WithVL which improves some fixed vector tests. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D158876	2023-08-25 14:15:40 -07:00
Jeffrey Byrnes	3ba8dabbf3	[AMDGPU] Add sdot4 / sdot8 intrinsics for gfx11 This provides a uniform way to lower into the relevant instructions across all generations. Differential Revision: https://reviews.llvm.org/D158468 Change-Id: I1f7ba4b15ee470738535cf1c7d177a11fc471e43	2023-08-25 11:45:55 -07:00
Daniel Paoliello	8d0c3db388	Emit the CodeView `S_ARMSWITCHTABLE` debug symbol for jump tables The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information: * The address of the branch instruction that uses the jump table. * The address of the jump table. * The "base" address that the values in the jump table are relative to. * The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted). Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to. Documentation for the symbol can be found in the Microsoft PDB library dumper: `0fe89a942f/cvdump/dumpsym7.cpp (L5518)` This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D149367	2023-08-25 10:19:17 -07:00
Yolanda Chen	291101aa8e	[WebAssembly] Optimize vector shift using a splat value from outside block The vector shift operation in WebAssembly uses an i32 shift amount type, while the LLVM IR requires binary operator uses the same type of operands. When the shift amount operand is splated from a different block, the splat source will not be exported and the vector shift will be unrolled to scalar shifts. This patch enables the vector shift to identify the splat source value from the other block, and generate expected WebAssembly bytecode when lowering. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D158399	2023-08-25 08:13:27 -07:00
LiaoChunyu	1b12427c01	[VP][RISCV] Add vp.is.fpclass and RISC-V support There is no vp.fpclass after FCLASS_VL(D151176), try to support vp.fpclass. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152993	2023-08-25 15:40:55 +08:00
Konstantina Mitropoulou	48fa79a503	Revert "[DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points." This reverts commit 5ec13535235d07eafd64058551bc495f87c283b1.	2023-08-24 20:39:04 -07:00
Daniel Hoekwater	8c249c44d4	[CodeGen][AArch64] Don't split functions with a red zone on AArch64 Because unconditional branch relaxation on AArch64 grows the stack to spill a register, splitting a function would cause the red zone to be overwritten. Explicitly disable MFS for such functions. Differential Revision: https://reviews.llvm.org/D157127	2023-08-24 21:57:35 +00:00
Daniel Hoekwater	c9f328844d	Reland "[CodeGen] Fix unconditional branch duplication issue in bbsections" Reverted in 4c8d056f50342d5401f5930ed60e5e48b211c3fb because it broke buildbot `llvm-clang-x86_64-expensive-checks-debian` due to the AArch64 test generating invalid code. The issue still exists, but it's fixed in D156767, so the AArch64 test should be added there. Differential Revision: https://reviews.llvm.org/D158674	2023-08-24 21:27:55 +00:00
Felipe de Azevedo Piovezan	6be47fb8be	[CodeGen] Separate X86 and Aarch entry_value test Addresses the bot issues raised in D158636.	2023-08-24 17:07:21 -04:00
Felipe de Azevedo Piovezan	e070a5d230	[CodeGen] Separate X86 and Aarch tests The directory this test used to live in is exclusive to Aarch. Addresses the failure reported in D158636.	2023-08-24 16:33:13 -04:00
Konstantina Mitropoulou	5ec1353523	[DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points. CMP(A,C)\|\|CMP(B,C) => CMP(MIN/MAX(A,B), C) CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C) If the operands are proven to be non NaN, then the optimization can be applied for all predicates. We can apply the optimization for the following predicates for FMINNUM/FMAXNUM (for quiet and signaling NaNs) and for FMINNUM_IEEE/FMAXNUM_IEEE if we can prove that the operands are not signaling NaNs. - ordered lt/le and \|\| - ordered gt/ge and \|\| - unordered lt/le and && - unordered gt/ge and && Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155267	2023-08-24 10:48:56 -07:00
Daniel Hoekwater	4c8d056f50	Revert "[CodeGen] Fix unconditional branch duplication issue in bbsections" This reverts commit 994eb5adc40cd001d82d0f95d18d1827b57e496c. Breaks buildbot `llvm-clang-x86_64-expensive-checks-debian` https://lab.llvm.org/buildbot/#/builders/16/builds/53620	2023-08-24 16:59:17 +00:00
Daniel Hoekwater	994eb5adc4	[CodeGen] Fix unconditional branch duplication issue in bbsections If an end section basic block ends in an unconditional branch to its fallthrough, BasicBlockSections will duplicate the unconditional branch. This doesn't break x86, but it is a (slight) size optimization and more importantly prevents AArch64 builds from breaking. Ex: ``` bb1 (bbsections Hot): jmp bb2 bb2 (bbsections Cold): /* do work... / ``` After running sortBasicBlocksAndUpdateBranches(): ``` bb1 (bbsections Hot): jmp bb2 jmp bb2 bb2 (bbsections Cold): / do work... */ ``` Differential Revision: https://reviews.llvm.org/D158674	2023-08-24 16:22:55 +00:00
Luke Lau	515bd40b4e	[RISCV] Fix test using wrong variable. NFC Looks like this test was trying to check if two shifts were combined, but it was accidentally using the insertelement instead of the splat.	2023-08-24 15:45:43 +01:00
Yeting Kuo	243d8cdb03	[RISCV] Add missed HasRoundModeOp for VPseudoUnaryMask_FRM/VPseudoUnaryMask_FRM. Missed HasRoundModeOp makes performCombineVMergeAndVOps use wrong operands for VFCVT_RM instructions. Reviewed By: luke Differential Revision: https://reviews.llvm.org/D158711	2023-08-24 21:45:22 +08:00
Felipe de Azevedo Piovezan	35f4ef1fee	[SelectionDAG][DebugInfo] Handle entry_value dbg.value DIExprs earlier When SelectiondDAG converts dbg.value intrinsics, it first ensures we have already generated code for the value operator of the intrinsic. The rationale being that if we haven't had the need to generate code for this value, it won't be a debug value that causes the generation. For example, if the first use the physical register of an argument is a dbg.value, we are going to hit this code path. However, this is irrelevant for entry value expressions: by definition we are not interested in the _current_ value of the physical register, but rather on its value at the start of the function. To deal with this, this patch changes lowering to handle this case as early as possible. Differential Revision: https://reviews.llvm.org/D158649	2023-08-24 09:33:53 -04:00
Oliver Stannard	40614e1c14	[ARM] Save and restore CPSR around tMOVimm32 When resolving a frame index with a large offset for v6M execute-only, we emit a tMOVimm32 pseudo-instruction, which later gets lowered to a sequence of instructions, all of which are flag-setting. However, a frame index may be generated for a register spill or reload instruction, which can be inserted at a point where CPSR is live. This patch inserts MRS and MSR instructions around the tMOVimm32 to save and restore the value of CPSR, if CPSR is live at that point. This may need up to two virtual registers (one to build the immediate value, one to save CPSR) during frame index lowering, which happens after register allocation, so we need to ensure two spill slots are avilable to the register scavenger to ensure it can free up enough registers for this. There is no test for the emission (or not) of the MRS/MSR pair, because it requires a spill or reload to be inserted at a point where CPSR is live, which requires a large, complex function and is fragile enough that any optimisation changes will break the test. This bug was easily found by csmith with -verify-machineinstrs, which I now run regularly on v6M execute-only (and many other combinations). Patch by John Brawn and myself. Reviewed By: stuij Differential Revision: https://reviews.llvm.org/D158404	2023-08-24 14:15:02 +01:00
Felipe de Azevedo Piovezan	27425aec86	[CodeGen][DebugInfo] Add x86 entry value tests We should also test the x86 target, since it has different backend defaults from ARM. Differential Revision: https://reviews.llvm.org/D158636	2023-08-24 08:48:48 -04:00
Simon Pilgrim	19777deba4	[X86] matchAddressRecursively - add foldMaskedShiftToBEXTR handling to ZERO_EXTEND nodes.	2023-08-24 13:14:41 +01:00
Simon Pilgrim	69a0f23598	[X86] extract-bits.ll - add test showing failure to match BEXTR through ZERO_EXTEND node	2023-08-24 13:14:41 +01:00
Matt Arsenault	d86a7d631c	GlobalISel: Add constant fold combine for zext/sext/anyext Could use more work for vectors. https://reviews.llvm.org/D156534	2023-08-24 08:10:01 -04:00
Matt Arsenault	e52acb817d	GlobalISel: Add shifts to constant_fold combine Currently we're getting away with post-selection constant folding on these (a hack which exists for the DAG). https://reviews.llvm.org/D156534	2023-08-24 08:09:57 -04:00
Luke Lau	e772c0ecd8	[RISCV] Use vmv.v.x if Hi bits are undef when lowering splat_vector_parts When lowering a splat_vector_parts, if the hi bits are undefined then we can splat the lo bits without having to check if it's going to be sign extended or not, because those bits will be undefined anyway. I've handled it for both fixed and scalable vectors, but there's no diff on the scalable vror tests, since the hi bits aren't combined away to undef in SimplifyDemanded for scalable vectors. I'm not sure why that is. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158625	2023-08-24 12:19:09 +01:00
David Green	8ec2392622	[AArch64][GISel] Expand coverage of FNeg. This adds some more extensive test coverage for fneg through global isel, switching the opcodes to use the more complete ActionDefinitions to handle more cases.	2023-08-24 12:00:23 +01:00
Simon Pilgrim	e283ef7e93	[X86] matchAddressRecursively - add foldMaskAndShiftToScale handling to ZERO_EXTEND nodes.	2023-08-24 11:47:07 +01:00
Simon Pilgrim	f84cd7e579	[X86] fold-and-shift-x86_64.ll - add zext test case where upper bits are known zero (and won't get simplified to any_extend) Add test coverage showing failure to use foldMaskAndShiftToScale with zero_extend nodes	2023-08-24 11:47:07 +01:00
David Green	e2fa9610c6	[AArch64][GISel] Expand coverage of FMul. This adds some more extensive test coverage for fmul through global isel, switching the opcodes to use the more complete ActionDefinitions to handle more cases.	2023-08-24 11:41:15 +01:00
Jingu Kang	3b485a6622	[AArch64] Mark known zero for high 16-bits of uaddlv intrinsic output with v8i8 The uaddlv with v8i8 returns 16-bits value but clang generates 32-bits intrinsic and trunc for it. In this case, we can mark known zero for the high 16-bits of the intrinsic output. Differential Revision:	2023-08-24 10:55:19 +01:00
Simon Pilgrim	19cdd45b08	[X86] X86DAGToDAGISel::matchIndexRecursively - add SIGN_EXTEND(ADD_NSW(X,C)) handling Split an index register from IndexReg = SIGN_EXTEND(ADD_NSW(X,C)) to IndexReg = SIGN_EXTEND(X), Offset = SIGN_EXTEND(C)	2023-08-24 10:19:37 +01:00
Serge Pavlov	6862f0fab1	[FPEnv] Intrinsics for access to FP control modes The change introduces intrinsics 'get_fpmode', 'set_fpmode' and 'reset_fpmode'. They manage all target dynamic floating-point control modes, which include, for instance, rounding direction, precision, treatment of denormals and so on. The intrinsics do the same operations as the C library functions 'fegetmode' and 'fesetmode'. By default they are lowered to calls to these functions. Two main use cases are supported by this implementation. 1. Local modification of the control modes. In this case the code usually has a pattern (in pseudocode): saved_modes = get_fpmode() set_fpmode(<new_modes>) ... <do operations under the new modes> ... set_fpmode(saved_modes) In the case when it is known that the current FP environment is default, the code may be shorter: set_fpmode(<new_modes>) ... <do operations under the new modes> ... reset_fpmode() Such patterns appear not only in user code but also in implementations of various FP controlling pragmas. In particular, the implementation of `#pragma STDC FENV_ROUND` requires similar code if the target does not support static rounding mode. 2. Portable control of FP modes. Usually FP control modes are set by writing to some control register. Different targets have different layout of this register, the way the register is accessed also may be different. Using set of target-specific definitions for the control register bits together with these intrinsic functions provides enough portable way to handle control modes across wide range of hardware. This change defines only llvm intrinsic function, which implement the access required for the aforementioned use cases. Differential Revision: https://reviews.llvm.org/D82525	2023-08-24 15:52:19 +07:00
Craig Topper	2ad50f354a	[DAGCombiner][RISCV][AArch64][PowerPC] Restrict foldAndOrOfSETCC from using SMIN/SMAX where and OR/AND would do. This removes some diffs created by D153502. I'm assuming an AND/OR won't be worse than an SMIN/SMAX. For RISC-V at least, AND/OR can be a shorter encoding than SMIN/SMAX. It's weird that we have two different functions responsible for folding logic of setccs, but I'm not ready to try to untangle that. I'm unclear if the PowerPC chang is a regression or not. It looks like it might use more registers, but I don't understand PowerPC register so I'm not sure. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D158292	2023-08-23 20:26:23 -07:00
Kai Luo	1ceaec3e81	[PowerPC][altivec] Optimize codegen of vec_promote According to https://www.ibm.com/docs/en/xl-c-and-cpp-linux/16.1.1?topic=functions-vec-promote, elements not specified by the input index argument are undefined. So that we don't need to set these elements to be zeros. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D158487	2023-08-24 02:10:13 +00:00
Matt Arsenault	8ce75acd1a	AMDGPU: Expand and modernize llvm.sqrt.f32 tests	2023-08-23 20:39:18 -04:00
Matt Arsenault	16bc07ac91	AMDGPU: Select f64 fmul by negative power of 2 to ldexp Select fmul x, -K -> ldexp(-x, log2(fabsK)) Select fmul fabs(x), -K -> ldexp(-\|x\|, log2(fabsK)) https://reviews.llvm.org/D158173	2023-08-23 20:36:01 -04:00
Matt Arsenault	4c4ff50361	AMDGPU: Add more baseline test for fmul to ldexp patterns	2023-08-23 20:31:54 -04:00
Matt Arsenault	a738bdf35e	AMDGPU: Permit more rsq formation in AMDGPUCodeGenPrepare We were basing the defer the fast case to codegen based on the fdiv itself, and not looking for a foldable sqrt input. https://reviews.llvm.org/D158127	2023-08-23 20:06:50 -04:00
Craig Topper	1f395115da	[RISCV] Add Zicond instructions to RISCVOptWInstrs like XVentanaCondOps.	2023-08-23 16:57:16 -07:00
Matt Arsenault	e954085f80	AMDGPU: Fix more unsafe rsq formation Introducing rsq contract flags is wrong, and also requires some level of approximate functions. AMDGPUCodeGenPrepare already should handle the f32 cases with appropriate flags, and I don't see how new situations to handle would arise during legalization (other than cases involving the rcp intrinsic, which instcombine tries to handle). AMDGPUCodeGenPrepare does need to learn better handling of rcp/rsq for f64 though, which we never bothered to handle well. Removes another obstacle to correctly lowering sqrt. https://reviews.llvm.org/D158099	2023-08-23 19:28:49 -04:00
Nitin John Raj	c07062a2e9	[RISCV][GlobalISel] Select G_CONSTANT, G_ANYEXT, COPY We select G_CONSTANT generic opcodes by materializing the constant in a register. G_ANYEXT is replaced with COPY. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158504	2023-08-23 15:33:44 -07:00
David Tellenbach	979e8ae4fc	[AArch64] Check opcode before trying to extract register from operand When matching FNEG patterns for the MachineCombiner we need to check for opcodes first, before trying to extract a register from an operand. Otherwise handling of instructions with non-register operands causes the compiler to crash. Differential Revision: https://reviews.llvm.org/D158473	2023-08-23 14:46:31 -07:00
Simon Pilgrim	5d79a8d148	[X86] fold-and-shift.ll - add x86-64 test coverage Although we already have fold-and-shift-x86_64.ll - this adds additional test coverage for various and-shift patterns split by sign/zero extensions from i32 index patterns to i64 pointers	2023-08-23 22:16:54 +01:00
Yingwei Zheng	d6639f83a9	[SDAG][RISCV] Avoid folding `setcc (xor C1, -1), C2, cond` into `setcc (xor C2, -1), C1, cond` This patch fixes https://github.com/llvm/llvm-project/issues/64935. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158654	2023-08-24 04:18:17 +08:00
Changpeng Fang	ffa7c7897c	[AMDGPU] Emit .actual_access metadata Summary: Emit .actual_access metadata for the deduced argument access qualifier, and .access for kernel_arg_access_qual. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D157451	2023-08-23 12:57:29 -07:00
Peter Rong	f58fbfc746	[X86][CodeGen] Add a dag pattern to fix #64323 After recent patch D30189, #64323's error message become a new one. When DAGCombiner was optimizing `(vextract (scalar_to_vector val, 0) -> val`, it didn't consider the possibility that the inserted value type has less bit than the dest type. This patch fixes that. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D158355	2023-08-23 10:50:32 -07:00
David Green	adaf545a50	[GlobalISel] Limit shift_of_shifted_logic_chain to non-zero folds After D157690 we are seeing some crashes from Global ISel, which seem to be related to the shift_of_shifted_logic_chain combine that can remove too many instructions if the shift amount is zero. This limits the fold to non-zero shifts, under the assumption that it is better in that case to fold away the shift to a COPY. Differential Revision: https://reviews.llvm.org/D158596	2023-08-23 18:17:37 +01:00
Felipe de Azevedo Piovezan	88417098bb	[CodeGen][DebugInfo] Append OP_deref when converting an EntryValue dbg.declare When we convert an EntryValue dbg.declare into an entry of the MF side table, we currently copy its DIExpression as is, and rely on subsequent layers to "know" that this expression is implicitly indirect. This is bad because it adds an implicit assumption to the IR representation, and requires subsequent layers to know about this assumption. This also limits the reusability of this table: what if, in the future, we want to use this table for dbg.values? This patch changes existing behavior so that the entities converting dbg_declares explicitly add an OP_deref when converting EntryValue dbg.declares. Differential Revision: https://reviews.llvm.org/D158437	2023-08-23 12:25:12 -04:00
Neumann Hon	d00f59893e	[SystemZ][z/OS] Fix the entry point marker for leaf functions The function emitFunctionEntryLabel does not look at whether or not a function is a leaf when setting the entry flags, and instead blindly marks all functions as non-leaf routines. Differential Revision: https://reviews.llvm.org/D157701 Reviewed By: uweigand	2023-08-23 09:50:01 -04:00

... 61 62 63 64 65 ...

52796 Commits