llvm-project

Author	SHA1	Message	Date
Matt Arsenault	460ffcddd9	AMDGPU: Make bf16/v2bf16 legal types (#76215 ) There are some intrinsics are using i16 vectors in place of bfloat vectors. Move towards making bf16 vectors legal so these can migrate. Leave the larger vectors for a later change. Depends #76213 #76214	2024-01-04 22:31:18 +07:00
Simon Pilgrim	f45b75949d	[DAG] SimplifyDemandedBits - call demanded elts variant directly for SELECT/SELECT_CC nodes. Don't rebuild the demanded elts mask every time.	2024-01-04 10:53:45 +00:00
Simon Pilgrim	5b38ecff6e	[DAG] BaseIndexOffset::equalBaseIndex - early out on failed matches. NFCI. If we successfully cast only the first base node as GlobalAddressSDNode / ConstantPoolSDNode / FrameIndexSDNode then we can early out as we know that base won't cast as a later type. Noticed while investigating profiles for potential compile time improvements.	2024-01-04 10:50:43 +00:00
Simon Pilgrim	43e0723899	[DAG] BaseIndexOffset::computeAliasing - early out on failed matches. NFCI. Don't wait to test that all base ptr matches have succeeded	2024-01-04 10:50:42 +00:00
Simon Pilgrim	72db578d71	[DAG] Fix typo in VSELECT SimplifyDemandedVectorElts handling. NFC. Rename UndefZero -> UndefSel (undefined elements from Sel operand).	2024-01-04 10:50:42 +00:00
Jie Fu	ff0c1f20a7	[CodeGen] Remove unused variables in TargetLoweringBase.cpp (NFC) llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp:570:12: error: unused variable 'ModeN' [-Werror,-Wunused-variable] 570 \| unsigned ModeN, ModelN; \| ^~~~~ llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp:570:19: error: unused variable 'ModelN' [-Werror,-Wunused-variable] 570 \| unsigned ModeN, ModelN; \| ^~~~~~ 2 errors generated.	2024-01-04 18:45:55 +08:00
Thomas Preud'homme	ce61b0e9a4	Add out-of-line-atomics support to GlobalISel (#74588 ) This patch implement the GlobalISel counterpart to 4d7df43ffdb460dddb2877a886f75f45c3fee188.	2024-01-04 10:15:16 +00:00
Jannik Silvanus	7954c57124	[IR] Fix GEP offset computations for vector GEPs (#75448 ) Vectors are always bit-packed and don't respect the elements' alignment requirements. This is different from arrays. This means offsets of vector GEPs need to be computed differently than offsets of array GEPs. This PR fixes many places that rely on an incorrect pattern that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`. We replace these by usages of `GTI.getSequentialElementStride(DL)`, which is a new helper function added in this PR. This changes behavior for GEPs into vectors with element types for which the (bit) size and alloc size is different. This includes two cases: * Types with a bit size that is not a multiple of a byte, e.g. i1. GEPs into such vectors are questionable to begin with, as some elements are not even addressable. * Overaligned types, e.g. i16 with 32-bit alignment. Existing tests are unaffected, but a miscompilation of a new test is fixed. --------- Co-authored-by: Nikita Popov <github@npopov.com>	2024-01-04 10:08:21 +01:00
David Green	5550e9c841	[GlobalISel][AArch64] Add libcall lowering for fpowi. (#67114 ) This adds legalization, notably libcall lowering for fpowi. It is a little different to other methods as the function takes both a float and integer register. Otherwise all vectors get scalarized and fp16 is promoted to fp32.	2024-01-04 07:26:23 +00:00
Micah Weston	7df28fd61a	[SHT_LLVM_BB_ADDR_MAP][AsmPrinter] Implements PGOAnalysisMap emitting in AsmPrinter with tests. (#75202 ) Uses machine analyses to emit PGOAnalysisMap into the bb-addr-map ELF section. Implements filecheck tests to verify emitting new fields. This patch emits optional PGO related analyses into the bb-addr-map ELF section during AsmPrinter. This currently supports Function Entry Count, Machine Block Frequencies. and Machine Branch Probabilities. Each is independently enabled via the `feature` byte of `bb-addr-map` for the given function. A part of [RFC - PGO Accuracy Metrics: Emitting and Evaluating Branch and Block Analysis](https://discourse.llvm.org/t/rfc-pgo-accuracy-metrics-emitting-and-evaluating-branch-and-block-analysis/73902).	2024-01-03 19:17:44 -05:00
Craig Topper	bdcd7c0ba0	[DAGCombiner][RISCV] Preserve disjoint flag in folding (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2) (#76860 ) Since we are shifting both inputs to the original Or by the same amount and inserting zeros in the LSBs, the result should still be disjoint.	2024-01-03 13:14:13 -08:00
Craig Topper	47a1704ac9	[SelectionDAG][X86] Use disjoint flag in SelectionDAG::isADDLike. (#76847 ) Keep the haveNoCommonBitsSet check because we haven't started inferring the flag yet. I've added tests for two transforms, but these are not the only transforms that use isADDLike.	2024-01-03 11:54:29 -08:00
Arthur Eubanks	c4146121e9	Revert "Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"" This reverts commit 0e46b49de43349f8cbb2a7d4c6badef6d16e31ae. Causes crashes, see repro on `0e46b49de4`.	2024-01-03 17:09:46 +00:00
David Green	771fd1ad2a	[DAG] Extend input types if needed in combineShiftToAVG. (#76791 ) This atempts to fix #76734 which is a crash in invalid TRUNC nodes types from unoptimized input code in combineShiftToAVG. The NVT can be VT if the larger type was legal and the adds will not overflow, in which case the inputs should be extended. From what I can tell this appears to be valid (if not optimal for this case): https://alive2.llvm.org/ce/z/fRieHR The result has also been changed to getExtOrTrunc in case that VT==NVT, which is not handled by SEXT/ZEXT.	2024-01-03 10:52:01 +00:00
David Green	d659bd1635	[GlobalISel][AArch64] Tail call libcalls. (#74929 ) This tries to allow libcalls to be tail called, using a similar method to DAG where the type is checked to make sure they match, and if so the backend, through lowerCall checks that the tailcall is valid for all arguments.	2024-01-03 07:59:36 +00:00
David Green	5b5614c92f	[AArch64][GlobalISel] Add legalization for vecreduce.fmul (#73309 ) There are no native operations that we can use for floating point mul, so lower by splitting the vector into chunks multiple times. There is still a missing fold for fmul_indexed, that could help the gisel test cases a bit.	2024-01-03 07:49:20 +00:00
Craig Topper	bbd57e1832	[SelectionDAG] Add initial plumbing for the disjoint flag. (#76751 ) This copies the flag from IR to the SDNode in SelectionDAGBuilder, clears the flag in SimplifyDemandedBits, and adds it to canCreateUndefOrPoison. Uses of the flag will come in later patches.	2024-01-02 21:58:00 -08:00
Thorsten Schütt	4b9194952d	[GlobalIsel] Combine selects with constants (#76089 ) A first small step at combining selects.	2024-01-02 17:26:39 +01:00
Alex Bradbury	a181b42565	[llvm][NFC] Use SDValue::getConstantOperandAPInt(i) where possible The helper function allows examples like `cast<ConstantSDNode>(Op.getOperand(0))->getAPIntValue();` to be changed to `Op.getConstantOperandAPInt(0);`. See #76708 for further context. Although there are far fewer opportunities for replacement, I used a similar git grep and sed combo as before, given I already had it to hand: `git grep -l "cast<ConstantSDNode>$.->getOperand\(.$\)->getAPIntValue" \| xargs sed -E -i 's/cast<ConstantSDNode>$(.)->getOperand\((.)$\)->getAPIntValue/\1->getConstantOperandAPInt(\2)/'` and `git grep -l "cast<ConstantSDNode>$.\.getOperand\(.$\)->getAPIntValue" \| xargs sed -E -i 's/cast<ConstantSDNode>$(.)\.getOperand\((.)$\)->getAPIntValue/\1.getConstantOperandAPInt(\2)/'`	2024-01-02 14:43:55 +00:00
Alex Bradbury	80aeb62211	[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708 ) This helper function shortens examples like `cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to `Node->getConstantOperandVal(1);`. Implemented with: `git grep -l "cast<ConstantSDNode>$.->getOperand\(.$\)->getZExtValue" \| xargs sed -E -i 's/cast<ConstantSDNode>$(.)->getOperand\((.)$\)->getZExtValue/\1->getConstantOperandVal(\2)/` and `git grep -l "cast<ConstantSDNode>$.\.getOperand\(.$\)->getZExtValue" \| xargs sed -E -i 's/cast<ConstantSDNode>$(.)\.getOperand\((.)$\)->getZExtValue/\1.getConstantOperandVal(\2)/'`. With a couple of simple manual fixes needed. Result then processed by `git clang-format`.	2024-01-02 13:14:28 +00:00
Shao-Ce SUN	9f6bf00b25	[DAGCombine] Add DAG optimisation for BF16_TO_FP (#69426 ) fold bf16_to_fp(op & 0xffff) -> bf16_to_fp(op)	2023-12-27 17:20:54 +08:00
Vettel	dc1fadef23	[MCP] Enhance MCP copy Instruction removal for special case(reapply) (#74239 ) Machine Copy Propagation Pass may lose some opportunities to further remove the redundant copy instructions during the ForwardCopyPropagateBlock procedure. When we Clobber a "Def" register, we also need to remove the record from the copy maps that indicates "Src" defined "Def" to ensure the correct semantics of the ClobberRegister function. This patch reapplies #70778 and addresses the corner case bug #73512 specific to the AMDGPU backend. Additionally, it refines the criteria for removing empty records from the copy maps, thereby enhancing overall safety. For more information, please see the C++ test case generated code in "vector.body" after the MCP Pass: https://gcc.godbolt.org/z/nK4oMaWv5.	2023-12-26 16:22:42 +08:00
Kazu Hirata	41cb686d0f	[CodeGen] Use range-based for loops (NFC)	2023-12-24 22:45:50 -08:00
HaohaiWen	536b043219	[RegAllocFast] Lazily initialize InstrPosIndexes for each MBB (#76275 ) Most basic block do not need to query dominates. Defer initialization of InstrPosIndexes to first query for each MBB.	2023-12-25 09:42:31 +08:00
Simon Pilgrim	1e710cfc80	[DAG] Add TLI::isTruncateFree(SDValue, EVT) wrapper. Similar to the existing isZExtFree(SDValue, EVT) wrapper, this will allow targets to override for specific cases (e.g. free truncation of an ext/extload node). But for now its just used to wrap the existing isTruncateFree(EVT, EVT) call.	2023-12-24 13:19:10 +00:00
Felipe de Azevedo Piovezan	acacec3bbf	[LiveDebugValues][nfc] Reduce memory usage of InstrRef (#76051 ) Commit 1b531d54f623 (#74203) removed the usage of unique_ptrs of arrays in favour of using vectors, but inadvertently increased peak memory usage by removing the ability to deallocate vector memory that was no longer needed mid-LDV. In that same review, it was pointed out that `FuncValueTable` typedef could be removed, since it was "just a vector". This commit addresses both issues by making `FuncValueTable` a real data structure, capable of mapping BBs to ValueTables and able to free ValueTables as needed. This reduces peak memory usage in the compiler by 10% in the benchmarks flagged by the original review. As a consequence, we had to remove a handful of instances of the "declare-then-initialize" antipattern in unittests, as the FuncValueTable class is no longer default-constructible.	2023-12-23 13:44:45 -03:00
Matt Arsenault	ed6dc62862	DAG: Handle equal size element build_vector promotion (#76213 )	2023-12-23 20:43:14 +07:00
Nikita Popov	d82eccc752	[RegAllocFast] Avoid duplicate hash lookup (NFC)	2023-12-22 16:52:20 +01:00
HaohaiWen	40ec791b15	[RegAllocFast] Refactor dominates algorithm for large basic block (#72250 ) The original brute force dominates algorithm is O(n) complexity so it is very slow for very large machine basic block which is very common with O0. This patch added InstrPosIndexes to assign index for each instruction and use it to determine dominance. The complexity is now O(1).	2023-12-22 23:06:16 +08:00
Matt Arsenault	f7c3627338	DAG: Implement promotion for strict_fpextend (#74310 ) Test is a placeholder, will be merged into the existing test after additional bug fixes for illegal f16 targets are fixed.	2023-12-22 17:15:52 +07:00
Matt Arsenault	0e46b49de4	Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" This reverts commit c398fa009a47eb24f88383d5e911e59e70f8db86. PPC backend was fixed in 2f82662ce901c6666fceb9c6c5e0de216a1c9667	2023-12-22 16:46:22 +07:00
Wang Pengcheng	17858ce6f3	[MacroFusion] Remove createBranchMacroFusionDAGMutation (#76209 ) Instead, we add a `BranchOnly` parameter to indicate that only branches with its predecessors will be fused. X86 is the only user of `createBranchMacroFusionDAGMutation`.	2023-12-22 16:31:38 +08:00
Matt Arsenault	4d1cd38c95	DAG: Handle promotion of fcanonicalize This avoids a regression in a future commit	2023-12-22 12:50:18 +07:00
Felipe de Azevedo Piovezan	058e527434	[AccelTable][NFC] Fix typos and duplicated code (#76155 ) Renaming a member variable from "Endoding" to "Encoding". Also replace inlined code for "isNormalized" with a call to the function, so that if the definition of normalization ever changes, we only need to change the one place.	2023-12-21 16:10:30 -03:00
Craig Topper	0dcff0db3a	[RISCV] Add codegen support for experimental.vp.splice (#74688 ) IR intrinsics were already defined, but no codegen support had been added. I extracted this code from our downstream. Some of it may have come from https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.	2023-12-21 08:38:32 -08:00
yan zhou	cd09f4b951	[CodeGen] This patch fix a bug that may caused error for a self-defined target in SelectionDAG::getNode (#75320 ) we need first judge N1.getNumOperands() > 0. If Lowering Generated SDNode like. ``` v2i32 t20: TargetOpNode. i32 t21: extract_vector_elt t20 0 i32 t22: extract_vector_elt t20 1 ``` will cause a error.	2023-12-21 19:39:05 +07:00
Paschalis Mpeis	2e3d77d6ed	[TLI] Pass replace-with-veclib works with Scalable Vectors. (#73642 ) [TLI] Pass replace-with-veclib works with Scalable Vectors. The pass is heavily refactored. It uses the Masked variant of a TLI method when the Intrinsic operates on Scalable Vectors. Improve tests for ArmPL and SLEEF Intrinsics: - Auto-generate test `armpl-intrinsics.ll`, and use active lane mask to have shorter `shufflevector` check lines. - Update scripts now add `@llvm.compiler.used` instead of using the regex: `@[[LLVM_COMPILER_USED:[a-zA-Z0-9_$"\\.-]+]]` - Add simplifycfg pass and noalias to ensure tail folding. `noalias` attribute was added only to the `%in.ptr` parameter of the ArmPL Intrinsics.	2023-12-21 12:37:57 +00:00
Matt Arsenault	9e574a3936	DAG: Fix expansion of bf16 sourced extloads Also fix assorted vector extload failures for AMDGPU.	2023-12-20 19:24:27 +07:00
Yusra Syeda	0768253c20	[SystemZ][z/OS] Add exception handling for XPLINK (#74638 ) Adds emitting the exception table and the EH registers for XPLINK. --------- Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>	2023-12-19 13:58:33 -05:00
Jonas Paulsson	e32e147d6c	[DAGCombiner] Don't drop alignment info of original load. (#75626 ) Pass the original MMO instead of different individual values. getAlign() was used before where actually getOriginalAlign() would have been better, and this patch has the same effect.	2023-12-19 16:30:47 +01:00
Rin	0894c2ee5f	[DAGCombiner] Avoid the pre-truncate of BUILD_VECTOR sources. (#75792 ) Avoid the pre-truncate of BUILD_VECTOR sources when there is more than one use. This can avoid using unnecessary movs later down the instruction selection pipeline.	2023-12-19 15:25:38 +00:00
Matt Arsenault	5781d79a20	ShadowStackGCLowering: Remove unnecessary std::string	2023-12-19 17:12:52 +07:00
Wang Pengcheng	9348d437f5	[SelectionDAG] Add space-optimized forms of OPC_EmitRegister (#73291 ) The followed byte of `OPC_EmitRegister` is a MVT type, which is usually i32 or i64. We add `OPC_EmitRegisterI32` and `OPC_EmitRegisterI64` so that we can reduce one byte. Overall this reduces the llc binary size with all in-tree targets by about 10K.	2023-12-19 17:31:49 +08:00
Matt Arsenault	e8d98fa16b	ShadowGCLowering: Drop typed pointer handling	2023-12-19 14:03:54 +07:00
paperchalice	72c75501ec	[CodeGen] Port `LowerEmuTLS` to new pass manager (#75171 ) In fact, this pass need `llc` to test. `TargetMachine` seems redundant, because before adding this pass `CodeGenPassBuilder` already checks it: `ed4194bb8d/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h (L590-L592)`	2023-12-19 14:44:35 +08:00
Felipe de Azevedo Piovezan	da2db4a9e8	[InstrRef][NFC] Delete unused variables (#75501 ) `V` was unused, and all the other deletions follow from that observation.	2023-12-18 11:53:18 -08:00
Simon Pilgrim	7b1e4239b3	[DAG] Fold (vt trunc (extload (vt x))) -> (vt load x) (#75229 ) We were only folding cases which remained extloads, but DAG.getExtLoad can also handle the cases which don't need to extend at all (we just can't do truncloads). reduceLoadWidth can handle this for scalar loads, but not for vectors. Noticed while triaging D152928	2023-12-18 16:21:11 +00:00
Ulrich Weigand	82a1bffd34	[SelectionDAG] Do not crash on large integers in CheckInteger (#75787 ) The CheckInteger routine called from TableGen-generated selection logic uses getSExtValue - which will abort if the underlying APInt does not fit into an int64_t. This case is now triggered by the SystemZ back-end since i128 is a legal type on certain machines. While we do not have any regular instructions that take 128-bit immediates (like most other platforms), there are patterns in the .td files that recognize an i128 "xor ..., -1" as a "not". These patterns cause code to be generated that calls the CheckInteger routine on some i128-valued integer, which may trigger the assert. Fix by using trySExtValue instead. Fixes https://github.com/llvm/llvm-project/issues/75710	2023-12-18 14:03:57 +01:00
Kazu Hirata	2570c7e284	[CodeGen] Remove unused forward declarations (NFC)	2023-12-17 09:09:39 -08:00
Kazu Hirata	4b3078ef2d	[CodeGen] Remove unnecessary includes (NFC)	2023-12-17 09:09:38 -08:00

1 2 3 4 5 ...

35122 Commits