llvm-project

Author	SHA1	Message	Date
Joseph Huber	ffabcbcf8f	[NVVMReflect][Reland] Force dead branch elimination in NVVMReflect (#81189 ) Summary: The `__nvvm_reflect` function is used to guard invalid code that varies between architectures. One problem with this feature is that if it is used without optimizations, it will leave invalid code in the module that will then make it to the backend. The `__nvvm_reflect` pass is already mandatory, so it should do some trivial branch removal to ensure that constants are handled correctly. This dead branch elimination only works in the trivial case of a compare on a branch and does not touch any conditionals that were not realted to the `__nvvm_reflect` call in order to preserve `O0` semantics as much as possible. This should allow the following to work on NVPTX targets ```c int foo() { if (__nvvm_reflect("__CUDA_ARCH") >= 700) asm("valid;\n"); } ``` Relanding after fixing a bug.	2024-02-08 20:09:44 -06:00
Joseph Huber	0800a36053	Revert "[NVVMReflect] Force dead branch elimination in NVVMReflect (#81189 )" This reverts commit 9211e67da36782db44a46ccb9ac06734ccf2570f. Summary: This seemed to crash one one of the CUDA math tests. Revert until it can be fixed.	2024-02-08 17:32:04 -06:00
Joseph Huber	9211e67da3	[NVVMReflect] Force dead branch elimination in NVVMReflect (#81189 ) Summary: The `__nvvm_reflect` function is used to guard invalid code that varies between architectures. One problem with this feature is that if it is used without optimizations, it will leave invalid code in the module that will then make it to the backend. The `__nvvm_reflect` pass is already mandatory, so it should do some trivial branch removal to ensure that constants are handled correctly. This dead branch elimination only works in the trivial case of a compare on a branch and does not touch any conditionals that were not realted to the `__nvvm_reflect` call in order to preserve `O0` semantics as much as possible. This should allow the following to work on NVPTX targets ```c int foo() { if (__nvvm_reflect("__CUDA_ARCH") >= 700) asm("valid;\n"); } ```	2024-02-08 17:16:31 -06:00
Alex MacLean	9affa177b5	[NVPTX] Add support for calling aliases (#81170 ) The current implementation of aliases tries to remove all the aliases in the module to prevent the generic version of `AsmPrinter` from emitting them incorrectly. Unfortunately, if the aliases are used this will fail. Instead let's override the function to print aliases directly. In addition, the declarations of the alias functions must occur before the uses. To fix this we emit alias declarations as part of `emitDeclarations` and only emit the `.alias` directives at the end (where we can assume the aliasee has also already been declared).	2024-02-08 17:14:13 -06:00
Luke Lau	06c89bd59c	[RISCV] Check type is legal before combining mgather to vlse intrinsic (#81107 ) Otherwise we will crash since target intrinsics don't have their types legalized. Let the mgather get legalized first, then do the combine on the legal type. Fixes #81088 Co-authored-by: Craig Topper <craig.topper@sifive.com>	2024-02-09 06:51:11 +08:00
Philip Reames	b8545e1ece	[RISCV] Consider all subvector extracts within a single VREG cheap (#81032 ) This adjusts the isSubVectorExtractCheap callback to consider any extract which fits entirely within the first VLEN bits of the src vector (and uses a 5 bit immediate for the slide) as cheap. These can be done via a single m1 vslide1down.vi instruction. This allows our generic DAG combine logic to kick in and recognize a few more cases where shuffle source is longer than the dest, but that using a wider shuffle is still profitable. (Or as shown in the test diff, we can split the wider source and do two narrower shuffles.)	2024-02-08 12:15:33 -08:00
Philip Reames	d0f72f8860	[RISCV] Consider truncate semantics in performBUILD_VECTORCombine (#81168 ) Fixes https://github.com/llvm/llvm-project/issues/80910. Per the documentation in ISDOpcodes.h, for BUILD_VECTOR "The types of the operands must match the vector element type, except that integer types are allowed to be larger than the element type, in which case the operands are implicitly truncated." This transform was assuming that the scalar operand type matched the result type. This resulted in essentially performing a truncate before a binop, instead of after. As demonstrated by the test case changes, this is often not legal.	2024-02-08 11:28:06 -08:00
alex-t	88e52511ca	[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init (#79586 ) This change implements synthesizing the private buffer resource descriptor in the kernel prolog instead of using the preloaded kernel argument.	2024-02-08 20:27:36 +01:00
Philip Reames	c8d431e0ed	[riscv] Add test coverage in advance of a upcoming fix This is a reduced test case for a fix for the issue identified in https://github.com/llvm/llvm-project/issues/80910.	2024-02-08 09:45:57 -08:00
Simon Pilgrim	bef25ae297	[X86] X86FixupVectorConstants - use explicit register bitwidth for the loaded vector instead of using constant pool bitwidth Fixes #81136 - we might be loading from a constant pool entry wider than the destination register bitwidth, affecting the vextload scale calculation. ConvertToBroadcastAVX512 doesn't yet set an explicit bitwidth (it will default to the constant pool bitwidth) due to difficulties in looking up the original register width through the fold tables, but as we only use rebuildSplatCst this shouldn't cause any miscompilations, although it might prevent folding to broadcast if only the lower bits match a splatable pattern.	2024-02-08 17:39:19 +00:00
Simon Pilgrim	eb85c8edf5	[X86] Add test case for #81136	2024-02-08 16:35:13 +00:00
Ivan Kosarev	7d19dc50de	[AMDGPU][True16] Support VOP3 source DPP operands. (#80892 )	2024-02-08 16:23:00 +00:00
ostannard	5452cbc4a6	[AArch64] Indirect tail-calls cannot use x16 with pac-ret+pc (#81020 ) When using -mbranch-protection=pac-ret+pc, x16 is used in the function epilogue to hold the address of the signing instruction. This is used by a HINT instruction which can only use x16, so we can't change this. This means that we can't use it to hold the function pointer for an indirect tail-call. There is existing code to force indirect tail-calls to use x16 or x17 when BTI is enabled, so there are now 4 combinations: bti pac-ret+pc Valid function pointer registers off off Any non callee-saved register on off x16 or x17 off on Any non callee-saved register except x16 on on x17	2024-02-08 15:31:54 +00:00
Evgeniy	49ee2ffc65	[X86][GlobalISel] Reorganize br/brcond tests (NFC) (#80204 ) Removing duplicating tests under GlobalISel, consolidating to perform checks with all three selectors.	2024-02-08 15:36:22 +05:30
Pierre van Houtryve	9ff3b82948	[AMDGPU] Revert Metadata Version Upgrade (#80995 ) Metadata is still 1.2, not 1.3 after V6. I thought that amdhsa.version mapped to the COV version but it's separate, and there are no MD changes in V6, hence it doesn't need to be updated.	2024-02-08 08:30:59 +01:00
David Green	7da1dda01e	[AArch64][GlobalISel] Update GISel check line and regenerate tests. NFC	2024-02-08 02:58:09 +00:00
Luke Lau	ece66dbc60	[SelectionDAG] Add computeKnownBits support for ISD::STEP_VECTOR (#80452 ) This handles two cases where we can work out some known-zero bits for ISD::STEP_VECTOR. The first case handles when we know the low bits are zero because the step amount is a power of two. This is taken from https://reviews.llvm.org/D128159, and even though the original patch didn't end up landing this case due to it not having any test difference, I've included it here for completeness's sake. The second case handles the case when we have an upper bound on vscale_range. We can use this to work out the upper bound on the number of elements, and thus what the maximum step will be. From the maximum step we then know which hi bits are zero. On its own, computing the known hi bits results in some small improvements for RVV with -mrvv-vector-bits=zvl across the llvm-test-suite. However I'm hoping to be able to use this later to reduce the LMUL in index calculations for vrgather/indexed accesses. --------- Co-authored-by: Philip Reames <preames@rivosinc.com>	2024-02-08 10:04:55 +08:00
Visoiu Mistrih Francis	514686acfd	[RISCV] Add correct Uses, Defs, isReturn to Zcmp (#81039 ) * they all do stack adjustments, so they all use and def x2. * popret and popretz also return * popretz also defines x10 This adds that to the TD file and updates the PushPopOptimizer to preserve the extra implicit operands added during frame lowering when converting to popret(z).	2024-02-07 14:30:45 -08:00
Ilya Leoshkevich	9c75a98155	[SystemZ] Implement A, O and R inline assembly format flags (#80685 ) Implement the following assembly format flags, which are already supported by GCC: 'A': On z14 or higher: If operand is a mem print the alignment hint usable with vl/vst prefixed by a comma. 'O': print only the displacement of a memory reference or address. 'R': print only the base register of a memory reference or address. Implement 'A' conservatively, since the memory operand alignment information is not available for INLINEASM at the moment.	2024-02-07 20:41:40 +01:00
Jeffrey Byrnes	3115ad8980	[AMDGPU] Accept arbitrary sized sources in CalculateByteProvider (#70240 ) Reland the original patch with additional commit containing fix for two issues: 1. Attempting to bitcast using MVTs with no corresponding LLVM type. getDWordFromOffset now works directly with the original vector to get the corresponding elements given the DWordOffset. 2. Improper bit tracking in CalculateByteProvider for vector types using certain ops. Previously, bit tracking for certain ops (e.g. ISD::TRUNCATE) assumed operands were scalar types, which is not correct since these ops have different semantics depending on vector / scalar. CalculateByteProvider / CalculateSrcByte now exit on vector types, handling which is a TODO.	2024-02-07 11:34:50 -08:00
Arthur Eubanks	5a83bccb35	[X86] Fix lowering TLS under darwin large code model (#80907 ) OpFlag and WrapperKind should be chosen consistently with each other in regards to PIC, otherwise we hit asserts later on. Broken by c04a05d8. Fixes #80831.	2024-02-07 09:16:36 -08:00
Jeremy Morse	d109f94f29	[DebugInfo][RemoveDIs] Re-enable some test coverage We disabled these extra-special RUNlines due to unexpected interactions between the various things we've been fixing. Re-enable them (they'll run on the llvm-new-debug-iterators buildbot) as they all now pass.	2024-02-07 12:41:32 +00:00
Simon Pilgrim	c2a91d4a33	[X86] combine-movmsk-avx.ll - add full AVX1/AVX2 VTEST/MOVMSK test coverage Test all combos of avx1/avx2 and prefer-movmsk-over-vtest	2024-02-07 11:12:29 +00:00
Carl Ritson	7d508eb5d3	Revert "[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104 )" This reverts commit d6c7253d32e4bdff619c39708170f1c1fa01ff95. Change causing CTS failures due to incomplete metadata.	2024-02-07 17:09:56 +09:00
Carl Ritson	9bda1de0b6	[TwoAddressInstruction] Propagate undef flags for partial defs (#79286 ) If part of a register (lowered from REG_SEQUENCE) is undefined then we should propagate undef flags to uses of those lanes. This is only performed when live intervals are present as it requires live intervals to correctly match uses to defs, and the primary goal is to allow precise computation of subrange intervals.	2024-02-07 16:46:00 +09:00
Serge Pavlov	b0785cd1cb	[GlobalISel][ARM] Support missing case for G_CONSTANT (#80555 ) Global Instruction Selector could not select the code: %0:gprb(s32) = G_CONSTANT i32 -1 In DAG selector the similar code is selected to the instruction MVNi using custom operand `mod_imm_not`. Changing its definition from `PatLeaf` to `ImmLeaf` and providing counterpart for `imm_not_XFORM` make the relevant rule available for GlobalISel too.	2024-02-07 12:53:20 +07:00
Visoiu Mistrih Francis	69a661cbae	[RISCV] Remove CalleeSavedInfo for Zcmp/save-restore-libcalls registers (#79535 ) Registers that are pushed/popped by Zcmp or libcalls have pre-defined frame indices that are never allocated in MachineFrameInfo. They're being used throughout PEI, but the rest of codegen doesn't work that way and expects each frame index to be a valid index in MFI. This patch keeps it local to PEI and removes them from the CalleeSavedInfo list at the end of the pass. Before this pass, any MIR testing post-PEI is broken and asserts (see issue #79491).	2024-02-06 18:18:49 -08:00
Fangrui Song	1c22d3f55d	[ARC] Convert tests to opaque pointers (NFC)	2024-02-06 12:55:16 -08:00
Fangrui Song	423ac3d9ee	[CSKY] Convert tests to opaque pointers (NFC)	2024-02-06 12:54:21 -08:00
Fangrui Song	cd0d11be7a	[M68k] Convert tests to opaque pointers (NFC)	2024-02-06 12:53:16 -08:00
stephenpeckham	90e8dc0f7c	Fix failing testcases (#80902 )	2024-02-06 15:35:21 -05:00
Jeremy Morse	5ce2f73b2e	[DebugInfo][RemoveDIs] Add some missing test coverage In github PR #78731 it looks like I added test coverage for RemoveDIs to either the wrong test, or not enough. Adding --try-experimental-debuginfo-iterators to this particular test is enough to restore some coverage it seems.	2024-02-06 19:34:53 +00:00
Craig Topper	2faeea313f	[RISCV] Add Ssqosid support to -march. (#80747 )	2024-02-06 10:06:01 -08:00
Craig Topper	cca49663a5	[FastISel][X86] Use getTypeForExtReturn in GetReturnInfo. (#80803 ) The comment and code here seems to match getTypeForExtReturn. The history shows that at the time this code was added, similar code existed in SelectionDAGBuilder. SelectionDAGBuiler code has since been refactored into getTypeForExtReturn. This patch makes FastISel match SelectionDAGBuilder. The test changes are because X86 has customization of getTypeForExtReturn. So now we only extend returns to i8. Stumbled onto this difference by accident.	2024-02-06 09:38:25 -08:00
Fangrui Song	6b2fd7aed6	[MIPS] Use generic isBlockOnlyReachableByFallthrough (#80799 ) FastISel may create a redundant BGTZ terminal which fallthroughes. ``` BGTZ %2:gpr32, %bb.1, implicit-def $at bb.1.bb1: ; predecessors: %bb.0 ``` The `!I->isBarrier()` check in MipsAsmPrinter::isBlockOnlyReachableByFallthrough will incorrectly not print a label, leading to a `Undefined temporary symbol ` error when we try assembling the output assembly file. See the updated `Fast-ISel/pr40325.ll` and https://github.com/rust-lang/rust/issues/108835 In addition, the `SwitchInst` condition is too conservative and prints many unneeded labels (see the updated tests). Just use the generic isBlockOnlyReachableByFallthrough, updated by commit 1995b9fead62f2f6c0ad217bd00ce3184f741fdb for SPARC, which also handles MIPS.	2024-02-06 09:23:33 -08:00
choikwa	e5638c5a00	[AMDGPU] Use correct number of bits needed for div/rem shrinking (#80622 ) There was an error where dividend of type i64 and actual used number of bits of 32 fell into path that assumes only 24 bits being used. Check that AtLeast field is used correctly when using computeNumSignBits and add necessary extend/trunc for 32 bits path. Regolden and update testcases. @jrbyrnes @bcahoon @arsenm @rampitec	2024-02-06 21:32:28 +05:30
David Stuttard	d6c7253d32	[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104 ) PAL Metadata 3.0 introduces an explicit structure in metadata for the programmable registers written out by the compiler backend. The previous approach used opaque registers which can change between different architectures and required encoding the bitfield information in the backend, which may change between versions. This change is an extension the previously added support - which only handled entry functions. This adds support for all functions. The change also includes some re-factoring to separate common code.	2024-02-06 15:34:36 +00:00
stephenpeckham	b1acb7a315	[XCOFF] Add compiler version to an auxiliary symbol table entry (#80162 ) C_FILE symbols. To match the behavior of the assembler and the legacy compiler, this includes using the generic ".file" name for the C_FILE symbol and generating the actual file name in an auxiliary entry.	2024-02-06 09:08:18 -06:00
Thorsten Schütt	364f781344	[GlobalIsel] Combine logic of icmps (#77855 ) Inspired by InstCombinerImpl::foldAndOrOfICmpsUsingRanges with some adaptations to MIR.	2024-02-06 15:58:02 +01:00
David Green	2e3de997ab	[DAG] Generalize setcc(setcc) fold to use known bits. If we have a `SETCC (SETCC), 0, NE` and ZeroOrOneBooleanContent, we can remove the outer setcc as it will produce the same value as the inner. This can be generalized to anything where the top bits are known to be 0, as the value will remain as 1 or 0.	2024-02-06 12:39:48 +00:00
Simon Pilgrim	b8cdc2638e	[DAG] visitCTPOP - if only the upper half of the ctpop operand is zero then see if its profitable to only count the lower half. (#80473 )	2024-02-06 12:19:31 +00:00
Rin Dobrescu	7f292b8fb1	[AArch64] Convert concat(uhadd(a,b), uhadd(c,d)) to uhadd(concat(a,c), concat(b,d)) (#80674 ) We can convert concat(v4i16 uhadd(a,b), v4i16 uhadd(c,d)) to v8i16 uhadd(concat(a,c), concat(b,d)), which can lead to further simplifications.	2024-02-06 11:02:06 +00:00
Qiu Chaofan	292d9e869f	[PowerPC] Mask constant operands in ValueBit tracking (#67653 ) In IR or C code, shift amount larger than value size is undefined behavior. But in practice, backend lowering for shift_parts produces add/sub of shift amounts, thus constant shift amounts might be negative or larger than value size, which depends on ISA definition. PowerPC ISA says, the lowest 7 bits (6 bits for 32-bit instruction) will be taken, and if the highest among them is 1, result will be zero, otherwise the low 6 bits (or 5 on 32-bit) are used as shift amount. This commit emulates the behavior and avoids array overflow in bit permutation's value bits calculator.	2024-02-06 18:37:31 +08:00
Luke Lau	bc569f6eb3	[RISCV] Add test case for shufflevector that gets scalarized. NFC This shufflevector gets scalarized into a build_vector of extract_vector_elts because the output type doesn't match the input vector type. Normally this is combined back into a vector_shuffle in DAGCombine, but this one fails because we don't consider a extract_subvector to be cheap, specifically because it's at an index > 31. This should be canonicalized back into a vector_shuffle at some point so we can lower it as a vrgather.vv.	2024-02-06 18:35:18 +08:00
Sjoerd Meijer	35904ec4e1	[AArch64] MI Scheduler STP combine (#80188 ) Add opcodes for different store instructions to the target hook that can enable more STP pairs. This is split off from the patch that does the same for some load instructions (#79003). Patch co-authored by Cameron McInally.	2024-02-06 10:29:42 +00:00
paperchalice	c9fd738388	[CodeGen] Port DeadMachineInstructionElim to new pass manager (#80582 ) A simple enough op pass so we can test standard instrumentations in future.	2024-02-06 17:56:56 +08:00
Matt Arsenault	42b5b720ca	AMDGPU/GlobalISel: Fix not running -global-isel in global isel test	2024-02-06 14:55:48 +05:30
Derek Schuff	c0cb0be85c	Mark llvm/test/CodeGen/WebAssembly/immediates.ll as passing on MIPS (#80771 ) Fixes #80533	2024-02-05 17:38:54 -08:00
Congcong Cai	a71147dd28	[WebAssembly] improve getRegForPromotedValue to avoid meanless value copy (#80469 ) When promoted value, it is meaningless to copy value from reg to another reg with the same type. This PR add additional check for this cases to reduce the code size. Fixes: #80053.	2024-02-06 09:07:58 +08:00
Philip Reames	e722d9662d	[DAG] Avoid a crash when checking size of scalable type in visitANDLike Fixes https://github.com/llvm/llvm-project/issues/80744. This transform doesn't handled vectors at all, The fixed length ones pass the first check, but would fail the constant operand checks which immediate follow. This patch takes the simplest approach, and just guards the transform for scalar integers.	2024-02-05 14:30:10 -08:00

... 15 16 17 18 19 ...

52796 Commits