llvm-project

Author	SHA1	Message	Date
WANG Rui	42dccf9c16	[LoongArch] Implement isLegalAddImmediate This brings a trivial improvement in the and-add-lsr.ll test case. Signed-off-by: WANG Rui <wangrui@loongson.cn> Reviewed By: SixWeining, xen0n Differential Revision: https://reviews.llvm.org/D154762	2023-07-24 17:17:24 +08:00
WANG Rui	c100f35f02	[LoongArch] Add tests for (and (add x, c1), (lshr y, c2)) Add tests for (and (add x, c1), (lshr y, c2)). Signed-off-by: WANG Rui <wangrui@loongson.cn> Reviewed By: SixWeining, xen0n Differential Revision: https://reviews.llvm.org/D154809	2023-07-24 17:12:10 +08:00
WANG Rui	595d5f36f4	[DAGCombine] Canonicalize operands for visitANDLike During the construction of SelectionDAG, there are no explicit canonicalization rules to adjust the order of operands for AND nodes. This may prevent the optimization in DAGCombiner::visitANDLike from being triggered. This patch canonicalizes the operands before matches, which can be observed to improve optimization on the RISC-V target architecture. Canonicalize: ``` and(x, add) -> and(add, x) ``` Signed-off-by: WANG Rui <wangrui@loongson.cn> Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D154760	2023-07-24 16:52:04 +08:00
WANG Rui	cea980f380	[RISCV] Add tests for (and (add x, c1), (lshr y, c2)) Add tests for (and (add x, c1), (lshr y, c2)). Signed-off-by: WANG Rui <wangrui@loongson.cn> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D154808	2023-07-24 16:52:02 +08:00
Sander de Smalen	6865fbd3da	[AArch64][SME] Use `fmov` instead of NEON `movi` for FP value. NEON `movi` is not valid in Streaming SVE mode, so use an `fmov` instruction instead for zero-initializing a FP value. Reviewed By: hassnaa-arm Differential Revision: https://reviews.llvm.org/D155432	2023-07-24 08:48:19 +00:00
Antonio Frighetto	2dea969d83	[clang][CodeGen] Introduce `-frecord-command-line` for MachO Allow clang driver command-line recording when targeting MachO object files as well. Reviewed-by: sgraenitz Differential Revision: https://reviews.llvm.org/D155716	2023-07-24 09:24:59 +02:00
Craig Topper	74d16b212b	[RISCV] Add Zicond RUN lines to xaluo.ll. NFC A couple of these tests show a need for computeKnownBits support for Zicond.	2023-07-23 23:03:18 -07:00
Jim Lin	37b474a20e	[RISCV] Remove unused check prefixes for tests. NFC Also remove the warning line for that these prefixes are unused. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D156048	2023-07-24 13:42:49 +08:00
eopXD	78d91df452	[RISCV] Support register allocation for GHC when f/d is not specified in the architecture This patch supports register allocation for floating-point types when `zfinx` and `zdinx` is specified in the architecture for the GHC calling convention. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155910	2023-07-23 22:40:10 -07:00
esmeyi	776195865d	[XCOFF] Write source language ID and CPU version ID into C_FILE symbol. Summary: The source language ID and CPU version ID are required by debuggers on AIX. AIX's system assembler determines the source language ID based on the source file's name suffix, and the behavior in this patch is consistent with it. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D155684	2023-07-24 00:35:24 -04:00
Pravin Jagtap	c48ed93cf8	[AMDGPU] Add llvm.amdgcn.wave.reduce.umin/umax Intrinsic. When input to intrinsic is uniform value, reduced value is same as input whereas if input value is divergent we need to iterate over all active lanes of WaveFront to perform the reduction. The control flow for a `loop` has been set up, which iterates over `only` active lanes to perform reduction. Introduced WAVE_REDUCE_UMIN_PSEUDO_U32 and WAVE_REDUCE_UMAX_PSEUDO_U32 Pseudos which are lowered Post-ISel (in `EmitInstrWithCustomInserter `). Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D154858	2023-07-24 00:06:00 -04:00
Zhongyunde	0aaeb88532	[AArch64][GlobalISel] Legalize <2 x s8> and <4 x s8> for G_BUILD_VECTOR Refer to commit ccffc27, the remaining types <2 x s8> and <4 x s8> should also be promoted to <2 x s32> and <4 x s16>. Fixes https://github.com/llvm/llvm-project/issues/58274 Reviewed By: aemerson, tschuett, paquette, dmgreen Differential Revision: https://reviews.llvm.org/D153394	2023-07-24 11:25:26 +08:00
Jun Sha (Joshua)	f375ee36c4	[RISCV] Add codegen for Zfbfmin instructions The implementation in https://reviews.llvm.org/D151313 is done for the circumstance without Zfbfmin. This patch adds codegen support for the 6 instructions provided in Zfbfmin extension. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D153234	2023-07-24 10:37:58 +08:00
David Green	495bdfc7bb	[AArch64] Lower fcvtl2 (fpext) via tablegen patterns. This patch does two things. First it removes the tryHighFPExt DAG2DAG method used to select fcvtl2 instructions, using tablegen patterns through SelectExtractHigh instead. This essentially undoes D71515, in a way that should hopefully avoid any regressions. The second is that a GI equivalent of SelectExtractHigh is added in selectExtractHigh, from G_UNMERGE_VALUES. The end result is that GlobalISel (and some constrained fpext) can now make use of the fcvtl2 instructions, saving an extra dup/ext. Differential Revision: https://reviews.llvm.org/D155871	2023-07-23 19:17:11 +01:00
David Green	6edc9a7662	[AArch64][GISel] Additional FPExt vector lowering Similar to D155311, this adds lowering for more vector cases for FPExt Differential Revision: https://reviews.llvm.org/D155601	2023-07-23 16:58:13 +01:00
Phoebe Wang	88b6d291bb	[X86][FP16] Split v32f16 shuffle when feature BWI is off Found this problem when investigating #63017 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D156050	2023-07-23 20:56:09 +08:00
Simon Pilgrim	da0f24873d	[X86] LowerFunnelShift - manually expand funnel shifts by splat constant patterns. Followup to af32e51a43fb4343f - where the undef funnel shift amounts (during widening from v2i32 -> v4i32) were being constant folded to 0 when the shift amounts are created during expansion, losing the splat'd shift amounts.	2023-07-23 10:57:11 +01:00
Simon Pilgrim	92bf83cf60	[X86] Add basic test coverage for funnels shifts of sub-128-bit vector types	2023-07-23 10:57:11 +01:00
Kishan Parmar	41af6ece6c	[PowerPC/SPE] powerpcspe load and store instruction has 8-bit offset instead of 16-bit unlike other load/store instructions. so if stack grows any further than 8-bit, create one emergency slot for spilling.	2023-07-23 13:24:35 +05:30
Amaury Séchet	88452508f3	[DAG] Improve carry reconstruction in combineCarryDiamond. The gain is usually suffiscient to go the extra mile and reconstruct a carry in some cases. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D154533	2023-07-22 22:49:48 +00:00
Simon Pilgrim	af32e51a43	[X86] LowerRotate - manually expand rotate by splat constant patterns. Fixes issue identified on #63980 where the undef rotate amounts (during widening from v2i32 -> v4i32) were being constant folded to 0 when the shift amounts are created during expansion, losing the splat'd shift amounts.	2023-07-22 17:54:57 +01:00
Phoebe Wang	04527f1d32	[X86][BF16] Customize INSERT_VECTOR_ELT for bf16 when feature BF16 is on Fixes root cause of #63017. The reason is similar to BUILD_VECTOR. We have legal vector type but still soft promote for scalar type. So we need to customize these scalar to vector nodes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D155961	2023-07-22 20:26:34 +08:00
Phoebe Wang	f11526b091	[X86][BF16] Do not scalarize masked load for BF16 when we have AVX512BF16 Fixes #63017 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D155952	2023-07-22 18:16:49 +08:00
Jon Roelofs	62a1fbe9f7	Enable compact unwind in all darwin simulators ... since they've always supported it. rdar://104359594 Differential revision: https://reviews.llvm.org/D155988	2023-07-21 16:13:47 -07:00
Philip Reames	785939c15e	Revert "[RISCV] Add test which shows alignment of constant pools and the functions which followed" This reverts commit cbf2a6ce197e8176c01316fe25400aae0b7390c4. This was a precommited test for a change which is being abandoned.	2023-07-21 16:03:15 -07:00
Nathan Chancellor	17f4f262fc	Revert "Reapply [IR] Mark and constant expressions as undesirable" This reverts commit 086ee99564afbb11449c08ea2e094f7f49fadde5. This patch causes an infinite loop when building arch/mips/mm/c-r4k.c in the Linux kernel. See the comment in Phabricator for a reduced reproducer: https://reviews.llvm.org/rG086ee99564afbb11449c08ea2e094f7f49fadde5	2023-07-21 15:57:03 -07:00
Matt Arsenault	8406c3568a	AMDGPU: Implement new 2ulp fdiv lowering Extends the new frexp scaled reciprocal to the general case. The reciprocal case is just the same thing when frexp of 1 is constant folded. Could probably clean up the code to rely on that constant folding. Improves results for the IEEE path for the default OpenCL division. We used to only emit the fdiv.fast intrinsic with a 2.5 ulp accuracy threshold with DAZ, which uses explicit range checks. This gives us a better fast option with the default IEEE behavior.	2023-07-21 18:55:42 -04:00
Matt Arsenault	6699c37028	AMDGPU: Refactor AMDGPUCodeGenPrepare fdiv handling NFC-ish. Does trigger some reordering of the fdiv scalarization. Also skips scalarizing in more cases where nothing was going to happen. We can still scalarize in some no-op edge cases. https://reviews.llvm.org/D155740	2023-07-21 18:55:42 -04:00
Philip Reames	cbf2a6ce19	[RISCV] Add test which shows alignment of constant pools and the functions which followed	2023-07-21 15:02:43 -07:00
Joseph Huber	f4381d4644	[NVPTX] Add initial support for '.alias' in PTX This patch adds initial support for using aliases when targeting PTX. We perform a pretty strict conversion from the globals referenced to the expected output. as described in the PTX documentation at https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-and-function-directives-alias These cannot currently be used due to a bug in the `nvlink` implementation that causes aliases to pruned functions to crash the linker. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D155211	2023-07-21 16:43:46 -05:00
Matt Arsenault	8287f3af9d	AMDGPU: Overhaul and improve rcp and rsq f32 formation The highlight change is a new denormal safe 1ulp lowering which uses rcp after using frexp to perform input scaling. This saves 2 instructions compared to other implementations which performed an explicit denormal range change. This improves the OpenCL default, and requires a flag for HIP. I don't believe there's any flag wired up for OpenMP to emit the necessary fpmath metadata. This provides several improvements and changes that were hard to separate without regressing one case or another. Disturbingly the OpenCL conformance test seems to have the reciprocal test commented out. I locally hacked it back in to test this. Starts introducing f32 rsq intrinsics in AMDGPUCodeGenPrepare. Like the rcp case, we could do this in codegen if !fpmath were preserved (although we would lose some computeKnownFPClass tricks). Start requiring contract flags to form rsq. The rsq fusion actually improves the result from ~2ulp to ~1ulp. We have some older fusion in codegen which only keys off unsafe math which should be refined. Expand rsq patterns by checking for denormal inputs and pre/post multiplying like the current library code does. We also take advantage of computeKnownFPClass to avoid the scaling when we can statically prove the input cannot be a denormal. We could do the same for the rcp case, but unlike rsq a large input can underflow to denormal. We need additional upper bound exponent checks on the input in order to do the same for rcp. This rsq handling also now starts handling the negated case. We introduce rsq with an fneg. In the case the fneg doesn't fold into its user, it's a neutral change but provides improvement if it is foldable as a source modifier. Also starts respecting the arcp attribute properly, and more strictly interprets afn. We were previously interpreting afn as implying you could do the reciprocal expansion of an fdiv. The codegen handling of these also needs to be revisited. This also effectively introduces the optimization combineRepeatedFPDivisors enables, just done in the IR instead (and only for f32). This is almost across the board better. The one minor regression is for gfx6/buggy frexp case where for multiple reciprocals, we could previously reuse rematerialized constants per instance (it's neutral for a single rcp). The fdiv.fast and sqrt handling need to be revisited next. https://reviews.llvm.org/D155593	2023-07-21 16:35:53 -04:00
Daniel Hoekwater	0315fca912	[AArch64] Move branch relaxation after bbsection assignment Because branch relaxation needs to factor in if branches target a block in the same section or a different one, it needs to run after the Basic Block Sections / Machine Function Splitting passes. Because Jump table compression relies on block offsets remaining fixed after the table is compressed, we must also move the JT compression pass. The only tests affected are ones enforcing just the ordering and the a few that have basic block ids changed because RenumberBlocks hasn't run yet. Differential Revision: https://reviews.llvm.org/D153829	2023-07-21 20:24:52 +00:00
Matt Arsenault	37512d7629	AMDGPU: Add baseline test for fdiv combine	2023-07-21 16:04:12 -04:00
Simon Pilgrim	65c9153cf0	[X86] combineBitcastvxi1 - don't prematurely create PACKSS nodes. Similar to Issue #63710 - by truncating the v8i16 result with a PACKSS node before type legalization, we fail to make use of various folds that rely on TRUNCATE nodes. This required tweaks to LowerTruncateVecPackWithSignBits to recognise when the truncation source has been widened and to more closely match combineVectorSignBitsTruncation wrt truncating with PACKSS/PACKUS on AVX512 targets. One of the last stages before we can finally get rid of combineVectorSignBitsTruncation.	2023-07-21 19:10:18 +01:00
Fangrui Song	9996e71f2d	[Support] Implement LLVM_ENABLE_REVERSE_ITERATION for StringMap ProgrammersManual.html says > StringMap iteration order, however, is not guaranteed to be deterministic, so any uses which require that should instead use a std::map. This patch makes -DLLVM_REVERSE_ITERATION=on (currently -DLLVM_ENABLE_REVERSE_ITERATION=on works as well) shuffle StringMap iteration order (actually flipping the hash so that elements not in the same bucket are reversed) to catch violations, similar to D35043 for DenseMap. This should help change the hash function (e.g., D142862, D155781). With a lot of fixes, there are still some violations. This patch implements the "reverse_iteration" lit feature to skip such tests. Eventually we should remove this feature. `ninja check-{llvm,clang,clang-tools}` are clean with `#define LLVM_ENABLE_REVERSE_ITERATION 1`. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D155789	2023-07-21 08:46:51 -07:00
Fangrui Song	ffa829c4c5	[RISCV] Allow delayed decision for ADD/SUB relocations For a label difference `A-B` in assembly, if A and B are separated by a linker-relaxable instruction, we should emit a pair of ADD/SUB relocations (e.g. R_RISCV_ADD32/R_RISCV_SUB32, R_RISCV_ADD64/R_RISCV_SUB64). However, the decision is made upfront at parsing time with inadequate heuristics (`requiresFixup`). As a result, LLVM integrated assembler incorrectly suppresses R_RISCV_ADD32/R_RISCV_SUB32 for the following code: ``` // Simplified from a workaround https://android-review.googlesource.com/c/platform/art/+/2619609 // Both end and begin are not defined yet. We decide ADD/SUB relocations upfront and don't know they will be needed. .4byte end-begin begin: call foo end: ``` To fix the bug, make two primary changes: * Delete `requiresFixups` and the overridden emitValueImpl (from D103539). This deletion requires accurate evaluateAsAbolute (D153097). * In MCAssembler::evaluateFixup, call handleAddSubRelocations to emit ADD/SUB relocations. However, there is a remaining issue in MCExpr.cpp:AttemptToFoldSymbolOffsetDifference. With MCAsmLayout, we may incorrectly fold A-B even when A and B are separated by a linker-relaxable instruction. This deficiency is acknowledged (see D153097), but was previously bypassed by eagerly emitting ADD/SUB using `requiresFixups`. To address this, we partially reintroduce `canFold` (from D61584, removed by D103539). Some expressions (e.g. .size and .fill) need to take the `MCAsmLayout` code path in AttemptToFoldSymbolOffsetDifference, avoiding relocations (weird, but matching GNU assembler and needed to match user expectation). Switch to evaluateKnownAbsolute to leverage the `InSet` condition. As a bonus, this change allows for the removal of some relocations for the FDE `address_range` field in the .eh_frame section. riscv64-64b-pcrel.s contains the main test. Add a linker relaxable instruction to dwarf-riscv-relocs.ll to test what it intends to test. Merge fixups-relax-diff.ll into fixups-diff.ll. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D155357	2023-07-21 08:37:58 -07:00
Phoebe Wang	fbae3d1d3c	Revert "[X86][BF16] Do not scalarize masked load for BF16 when we have BWI" This reverts commit ca1c05208ed35ba72869c65ad773b2cca4bbd360. It caused Buildbot fail: https://lab.llvm.org/buildbot#builders/220/builds/24870	2023-07-21 23:29:11 +08:00
Phoebe Wang	ca1c05208e	[X86][BF16] Do not scalarize masked load for BF16 when we have BWI Fixes #63017 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D155952	2023-07-21 23:18:54 +08:00
Jay Foad	6c8f4472b4	[ARM] Extend regression test for D154281 Add a test case with a larger call frame which does not satisfy ARMFrameLowering::hasReservedCallFrame.	2023-07-21 15:48:45 +01:00
Simon Pilgrim	ae60706da0	[DAG] SimplifyDemandedBits - call ComputeKnownBits for constant non-uniform ISD::SRL shift amounts We only attempted to determine KnownBits for uniform constant shift amounts, but ComputeKnownBits is able to handle some non-uniform cases as well that we can use as a fallback.	2023-07-21 14:52:57 +01:00
Simon Pilgrim	be62041e7e	[X86] matchBinaryShuffle - match PACKUS for v2i64 -> v4i32 shuffle truncation patterns. Handle PACKUSWD on +SSE41 targets, or fallback to PACKUSBW on any +SSE2 target	2023-07-21 13:32:04 +01:00
Simon Pilgrim	c0a1f4624b	[X86] Add packus.ll test coverage Similar to the existing packss.ll tests	2023-07-21 13:32:04 +01:00
Simon Pilgrim	7196eb2541	[X86] packss.ll - add SSE4.2 test coverage	2023-07-21 13:32:03 +01:00
Jay Foad	e45a0c2994	[AMDGPU][RFC] Update isLegalAddressingMode for GFX9 SMEM signed offsets Differential Revision: https://reviews.llvm.org/D155587	2023-07-21 10:56:43 +01:00
Jay Foad	787bef0bee	[AMDGPU] Add tests for SMEM addressing modes in CodeGenPrepare Differential Revision: https://reviews.llvm.org/D155854	2023-07-21 10:56:43 +01:00
Luke Lau	33a83c5486	[RISCV] Add SDNode patterns for vrol.[vv,vx] and vror.[vv,vx,vi] These correspond to ROTL/ROTR nodes Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155439	2023-07-21 10:22:46 +01:00
Luke Lau	24628a14c4	[RISCV] Add patterns for vnsr[a,l].wx where shift amount has different type than vector element We're currently only matching scalar shift amounts where the type is the same as the vector element type. But because only the bottom log2(2*SEW) bits are used, only 7 bits will be used at most so we can use any scalar type >= i8. This patch adds patterns for the case above, as well as for when the shift amount type is the same as the widened element type and doesn't need extended. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155698	2023-07-21 10:13:28 +01:00
Luke Lau	418e678ba3	[RISCV] Add tests for vnsr[l,a].wx patterns that could be matched These patterns of ([l,a]shr v, ([s,z]ext splat)) only pick up the cases where the scalar has the same type as the vector element. However since only the low log2(SEW) bits of the scalar are read, we could use any scalar type that has been extended. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155697	2023-07-21 10:13:26 +01:00
Nikita Popov	f060f095aa	[X86] Expand constant expressions in test (NFC)	2023-07-21 10:40:47 +02:00
Nikita Popov	086ee99564	Reapply [IR] Mark and constant expressions as undesirable Reapply after fixing an issue in canonicalizeLogicFirst() exposed by this change (218f97578b26f7a89f7f8ed0748c31ef0181f80a). ----- In preparation for removing support for and expressions, mark them as undesirable. As such, we will no longer implicitly create such expressions, but they still exist.	2023-07-21 10:10:50 +02:00

... 72 73 74 75 76 ...

52796 Commits