llvm-project

Author	SHA1	Message	Date
Felipe de Azevedo Piovezan	acacec3bbf	[LiveDebugValues][nfc] Reduce memory usage of InstrRef (#76051 ) Commit 1b531d54f623 (#74203) removed the usage of unique_ptrs of arrays in favour of using vectors, but inadvertently increased peak memory usage by removing the ability to deallocate vector memory that was no longer needed mid-LDV. In that same review, it was pointed out that `FuncValueTable` typedef could be removed, since it was "just a vector". This commit addresses both issues by making `FuncValueTable` a real data structure, capable of mapping BBs to ValueTables and able to free ValueTables as needed. This reduces peak memory usage in the compiler by 10% in the benchmarks flagged by the original review. As a consequence, we had to remove a handful of instances of the "declare-then-initialize" antipattern in unittests, as the FuncValueTable class is no longer default-constructible.	2023-12-23 13:44:45 -03:00
Matt Arsenault	ed6dc62862	DAG: Handle equal size element build_vector promotion (#76213 )	2023-12-23 20:43:14 +07:00
Nikita Popov	d82eccc752	[RegAllocFast] Avoid duplicate hash lookup (NFC)	2023-12-22 16:52:20 +01:00
HaohaiWen	40ec791b15	[RegAllocFast] Refactor dominates algorithm for large basic block (#72250 ) The original brute force dominates algorithm is O(n) complexity so it is very slow for very large machine basic block which is very common with O0. This patch added InstrPosIndexes to assign index for each instruction and use it to determine dominance. The complexity is now O(1).	2023-12-22 23:06:16 +08:00
Matt Arsenault	f7c3627338	DAG: Implement promotion for strict_fpextend (#74310 ) Test is a placeholder, will be merged into the existing test after additional bug fixes for illegal f16 targets are fixed.	2023-12-22 17:15:52 +07:00
Matt Arsenault	0e46b49de4	Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" This reverts commit c398fa009a47eb24f88383d5e911e59e70f8db86. PPC backend was fixed in 2f82662ce901c6666fceb9c6c5e0de216a1c9667	2023-12-22 16:46:22 +07:00
Wang Pengcheng	17858ce6f3	[MacroFusion] Remove createBranchMacroFusionDAGMutation (#76209 ) Instead, we add a `BranchOnly` parameter to indicate that only branches with its predecessors will be fused. X86 is the only user of `createBranchMacroFusionDAGMutation`.	2023-12-22 16:31:38 +08:00
Matt Arsenault	4d1cd38c95	DAG: Handle promotion of fcanonicalize This avoids a regression in a future commit	2023-12-22 12:50:18 +07:00
Felipe de Azevedo Piovezan	058e527434	[AccelTable][NFC] Fix typos and duplicated code (#76155 ) Renaming a member variable from "Endoding" to "Encoding". Also replace inlined code for "isNormalized" with a call to the function, so that if the definition of normalization ever changes, we only need to change the one place.	2023-12-21 16:10:30 -03:00
Craig Topper	0dcff0db3a	[RISCV] Add codegen support for experimental.vp.splice (#74688 ) IR intrinsics were already defined, but no codegen support had been added. I extracted this code from our downstream. Some of it may have come from https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.	2023-12-21 08:38:32 -08:00
yan zhou	cd09f4b951	[CodeGen] This patch fix a bug that may caused error for a self-defined target in SelectionDAG::getNode (#75320 ) we need first judge N1.getNumOperands() > 0. If Lowering Generated SDNode like. ``` v2i32 t20: TargetOpNode. i32 t21: extract_vector_elt t20 0 i32 t22: extract_vector_elt t20 1 ``` will cause a error.	2023-12-21 19:39:05 +07:00
Paschalis Mpeis	2e3d77d6ed	[TLI] Pass replace-with-veclib works with Scalable Vectors. (#73642 ) [TLI] Pass replace-with-veclib works with Scalable Vectors. The pass is heavily refactored. It uses the Masked variant of a TLI method when the Intrinsic operates on Scalable Vectors. Improve tests for ArmPL and SLEEF Intrinsics: - Auto-generate test `armpl-intrinsics.ll`, and use active lane mask to have shorter `shufflevector` check lines. - Update scripts now add `@llvm.compiler.used` instead of using the regex: `@[[LLVM_COMPILER_USED:[a-zA-Z0-9_$"\\.-]+]]` - Add simplifycfg pass and noalias to ensure tail folding. `noalias` attribute was added only to the `%in.ptr` parameter of the ArmPL Intrinsics.	2023-12-21 12:37:57 +00:00
Matt Arsenault	9e574a3936	DAG: Fix expansion of bf16 sourced extloads Also fix assorted vector extload failures for AMDGPU.	2023-12-20 19:24:27 +07:00
Yusra Syeda	0768253c20	[SystemZ][z/OS] Add exception handling for XPLINK (#74638 ) Adds emitting the exception table and the EH registers for XPLINK. --------- Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>	2023-12-19 13:58:33 -05:00
Jonas Paulsson	e32e147d6c	[DAGCombiner] Don't drop alignment info of original load. (#75626 ) Pass the original MMO instead of different individual values. getAlign() was used before where actually getOriginalAlign() would have been better, and this patch has the same effect.	2023-12-19 16:30:47 +01:00
Rin	0894c2ee5f	[DAGCombiner] Avoid the pre-truncate of BUILD_VECTOR sources. (#75792 ) Avoid the pre-truncate of BUILD_VECTOR sources when there is more than one use. This can avoid using unnecessary movs later down the instruction selection pipeline.	2023-12-19 15:25:38 +00:00
Matt Arsenault	5781d79a20	ShadowStackGCLowering: Remove unnecessary std::string	2023-12-19 17:12:52 +07:00
Wang Pengcheng	9348d437f5	[SelectionDAG] Add space-optimized forms of OPC_EmitRegister (#73291 ) The followed byte of `OPC_EmitRegister` is a MVT type, which is usually i32 or i64. We add `OPC_EmitRegisterI32` and `OPC_EmitRegisterI64` so that we can reduce one byte. Overall this reduces the llc binary size with all in-tree targets by about 10K.	2023-12-19 17:31:49 +08:00
Matt Arsenault	e8d98fa16b	ShadowGCLowering: Drop typed pointer handling	2023-12-19 14:03:54 +07:00
paperchalice	72c75501ec	[CodeGen] Port `LowerEmuTLS` to new pass manager (#75171 ) In fact, this pass need `llc` to test. `TargetMachine` seems redundant, because before adding this pass `CodeGenPassBuilder` already checks it: `ed4194bb8d/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h (L590-L592)`	2023-12-19 14:44:35 +08:00
Felipe de Azevedo Piovezan	da2db4a9e8	[InstrRef][NFC] Delete unused variables (#75501 ) `V` was unused, and all the other deletions follow from that observation.	2023-12-18 11:53:18 -08:00
Simon Pilgrim	7b1e4239b3	[DAG] Fold (vt trunc (extload (vt x))) -> (vt load x) (#75229 ) We were only folding cases which remained extloads, but DAG.getExtLoad can also handle the cases which don't need to extend at all (we just can't do truncloads). reduceLoadWidth can handle this for scalar loads, but not for vectors. Noticed while triaging D152928	2023-12-18 16:21:11 +00:00
Ulrich Weigand	82a1bffd34	[SelectionDAG] Do not crash on large integers in CheckInteger (#75787 ) The CheckInteger routine called from TableGen-generated selection logic uses getSExtValue - which will abort if the underlying APInt does not fit into an int64_t. This case is now triggered by the SystemZ back-end since i128 is a legal type on certain machines. While we do not have any regular instructions that take 128-bit immediates (like most other platforms), there are patterns in the .td files that recognize an i128 "xor ..., -1" as a "not". These patterns cause code to be generated that calls the CheckInteger routine on some i128-valued integer, which may trigger the assert. Fix by using trySExtValue instead. Fixes https://github.com/llvm/llvm-project/issues/75710	2023-12-18 14:03:57 +01:00
Kazu Hirata	2570c7e284	[CodeGen] Remove unused forward declarations (NFC)	2023-12-17 09:09:39 -08:00
Kazu Hirata	4b3078ef2d	[CodeGen] Remove unnecessary includes (NFC)	2023-12-17 09:09:38 -08:00
Stefan Pintilie	c398fa009a	Revert "Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"" This reverts commit f4b5be1ecdc85ca4257b739afb8d57e23c7a8030. The above change was breaking the clang-ppc64le-linux-test-suite bot.	2023-12-16 07:30:53 -06:00
Paul Kirth	9a578a9f60	Revert "[StackColoring] Delete dead stack slots (#75351 )" (#75655 ) This reverts commit 08b306dc8e7c0b2498f4f194a3c51686d56dbd20. it causes the following assertion failure: llvm/include/llvm/CodeGen/MachineFrameInfo.h:530: int64_t llvm::MachineFrameInfo::getObjectOffset(int) const: Assertion `!isDeadObjectIndex(ObjectIdx) && "Getting frame offset for a dead object?"' failed.	2023-12-15 13:32:39 -08:00
Philip Reames	e8a15eca92	[RISCV] Prefer whole register loads and stores when VL=VLMAX (#75531 ) If we're lowering a fixed length vector load or store which happens to exactly VLEN in size (when VLEN is exactly known), we can use a whole register load or store instead of the unit strided variants. This doesn't require a vsetvli in some cases, allows additional flexibility of vsetvli cases in others, and doesn't have a runtime dependency on the value of VL.	2023-12-15 09:26:57 -08:00
Youngsuk Kim	67aec2f58b	[llvm] Remove no-op ptr-to-ptr casts (NFC) Remove calls to CreatePointerCast which are just doing no-op ptr-to-ptr bitcasts. Opaque ptr cleanup effort (NFC).	2023-12-15 11:04:48 -06:00
Simon Pilgrim	163aeca33d	[CodeGenPrepare] Remove unused TypePromotionTransaction::moveBefore to fix gcc Wunused-function warning. NFC.	2023-12-15 14:45:17 +00:00
mohammed-nurulhoque	08b306dc8e	[StackColoring] Delete dead stack slots (#75351 ) deletes slots that have lifetime markers and the lifetime ranges are empty.	2023-12-15 09:58:19 +00:00
paperchalice	f94c410ef4	[CodeGen] Use `MachinePassKey` for machine passes (#75567 ) Machine passes define `AnalysisKey`, it is counterintuitive. Add `MachinePassKey` and `MachinePassInfoMixin` to avoid this.	2023-12-15 17:49:33 +08:00
paperchalice	d63f54f91f	[CodeGen][NewPM] Add necessary codegen options (#70904 ) These options are used by `TargetPassConfig` to build CodeGen pass pipeline, add them to `CGPassBuilderOption` so `CodeGenPassBuilder` can use them. Currently not all options are added, but it is enough to build a prototype of `CodeGenPassBuilder`. Part of #69879.	2023-12-15 17:03:28 +08:00
paperchalice	fde91d1b37	Reland "[Pass][CodeGen] Add some necessary passes for codegen" (#71783 ) Unfortunately, there are two `KCFI` passes in source tree, `CodeGen/KCFI.cpp` and `Transforms/Instrumentation/KCFI.cpp`, use `MachineKCFIPass` for machine function pass. `MIRProfileLoaderPass` is resolved to the legacy one when `LLVM_ENABLE_MODULES=ON`, use `MIRProfileLoaderNewPass` as a workaround.	2023-12-15 12:30:53 +08:00
Craig Topper	2a21260ea8	[SelectionDAG] Use getVectorElementPointer in DAGCombiner::replaceStoreOfInsertLoad. (#74249 ) This ensures we clip the index to be in bounds of the vector we are inserting into. If the index is out of bounds the results of the insert element is poison. If we don't clip the index we can write memory that was not part of the original store. Fixes #74248 #75557.	2023-12-14 20:25:16 -08:00
Matt Arsenault	f4b5be1ecd	Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" This reverts commit 69c4930aad9659ec6ab846c8e7124d6afe044b1e. See if this sticks after a few more coalescer assertions are fixed.	2023-12-15 10:51:47 +07:00
Arthur Eubanks	239a41e8f2	Re-Reland [X86] Respect code models more when determining if a global reference can fit in 32 bits (#75386 ) For non-GlobalValue references, the small and medium code models can use 32 bit constants. For GlobalValue references, use TargetMachine::isLargeGlobalObject(). Look through aliases for determining if a GlobalValue is small or large. Even the large code model can reference small objects with 32 bit constants as long as we're in no-pic mode, or if the reference is offset from the GOT. Original commit broke the build... First reland broke large PIC builds referencing small data since it was using GOTOFF as a 32-bit constant.	2023-12-14 14:12:37 -08:00
Saleem Abdulrasool	23ccb02c59	CodeGen: add a missing check for bit-slice overlap in CV (#75504 ) Type dereferenced fragments are specified by offset and length in bits. The representation in CodeView is defined in terms of byte offsets. If the bit slice overlaps at a byte that is included, we would create invalid definition ranges. Consider the following scenario: ~~~ 01234567 01234567 ---------+--------- ==== ====== ~~~ Here bits 1-4 are marked as defined as well as bits 7-9. The byte range for the second portion overlaps and so we would say that bytes 1 and 2 are valid though there is potentially a hole. There is no way to represent this in the defined range for the local variable in CodeView. We simply can drop the fragment definition in such a scenario with the variables are "optimized out". Thanks to @rnk and @hjyamauchi for the discussion around this.	2023-12-14 13:59:15 -08:00
Jon Roelofs	b071b70317	[GlobalISel] Always direct-call IFuncs and Aliases (#74902 ) This is safe because for both cases, the use must be in the same TU as the definition, and they cannot be forward declared.	2023-12-14 14:58:20 -07:00
Jon Roelofs	640c1d3dd1	[llvm] Support IFuncs on Darwin platforms (#73686 ) ... by lowering them as lazy resolve-on-first-use symbol resolvers. Note that this is subtly different timing than on ELF platforms, where ifunc resolution happens at load time. Since ld64 and ld-prime don't support all the cases we need for these, we lower them manually in the AsmPrinter.	2023-12-14 14:40:52 -07:00
Arthur Eubanks	15617d14f7	Revert "Reland [X86] Respect code models more when determining if a global reference can fit in 32 bits (#75386 )" This reverts commit ec92d74a0ef89b9dd46aee6ec8aca6bfd3c66a54. Breaks some compiler-rt tests, e.g. https://lab.llvm.org/buildbot/#/builders/37/builds/28834	2023-12-14 12:28:50 -08:00
Zequan Wu	ab3430f891	[Profile] Add binary profile correlation for code coverage. (#69493 ) ## Motivation Since we don't need the metadata sections at runtime, we can somehow offload them from memory at runtime. Initially, I explored [debug info correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113), which is used for PGO with value profiling disabled. However, it currently only works with DWARF and it's be hard to add such artificial debug info for every function in to CodeView which is used on Windows. So, offloading profile metadata sections at runtime seems to be a platform independent option. ## Design The idea is to use new section names for profile name and data sections and mark them as metadata sections. Under this mode, the new sections are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime and can be stripped away as a post-linking step. After the process exits, the generated raw profiles will contains only headers + counters. llvm-profdata can be used correlate raw profiles with the unstripped binary to generate indexed profile. ## Data For chromium base_unittests with code coverage on linux, the binary size overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and the raw profile files size reduce from 128M to 68M (46.9%) ``` $ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data [NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table [ = ] 0 +65% +1.23Ki .relro_padding +62% +1.20Ki [ = ] 0 [Unmapped] +13% +448 +19% +448 .init_array +8.8% +192 [ = ] 0 [ELF Section Headers] +0.0% +136 +0.0% +80 [7 Others] +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.5% +80 +1.2% +64 .plt [ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]] +195% +64.0Mi +194% +64.0Mi TOTAL $ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table +13% +448 +19% +448 .init_array +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.2% +64 +1.2% +64 .plt +2.9% +64 [ = ] 0 [ELF Section Headers] +0.0% +40 +0.0% +40 .data +1.2% +32 +1.2% +32 .got.plt +0.0% +24 +0.0% +8 [5 Others] [ = ] 0 -22.9% -872 [LOAD #5 [RW]] -74.5% -1.44Ki [ = ] 0 [Unmapped] [ = ] 0 -76.5% -1.45Ki .relro_padding +118% +38.8Mi +117% +38.8Mi TOTAL ``` A few things to note: 1. llvm-profdata doesn't support filter raw profiles by binary id yet, so when a raw profile doesn't belongs to the binary being digested by llvm-profdata, merging will fail. Once this is implemented, llvm-profdata should be able to only merge raw profiles with the same binary id as the binary and discard the rest (with mismatched/missing binary id). The workflow I have in mind is to have scripts invoke llvm-profdata to get all binary ids for all raw profiles, and selectively choose the raw pnrofiles with matching binary id and the binary to llvm-profdata for merging. 2. Note: In COFF, currently they are still loaded into memory but not used. I didn't do it in this patch because I noticed that `.lcovmap` and `.lcovfunc` are loaded into memory. A separate patch will address it. 3. This should works with PGO when value profiling is disabled as debug info correlation currently doing, though I haven't tested this yet.	2023-12-14 14:16:38 -05:00
Arthur Eubanks	ec92d74a0e	Reland [X86] Respect code models more when determining if a global reference can fit in 32 bits (#75386 ) For non-GlobalValue references, the small and medium code models can use 32 bit constants. For GlobalValue references, use TargetMachine::isLargeGlobalObject(). Look through aliases for determining if a GlobalValue is small or large. Even the large code model can reference small objects with 32 bit constants as long as we're in no-pic mode, or if the reference is offset from the GOT. Original commit broke the build...	2023-12-14 09:49:35 -08:00
Arthur Eubanks	f0c03da63c	Revert "[X86] Respect code models more when determining if a global reference can fit in 32 bits" (#75500 ) Reverts llvm/llvm-project#75386 Breaks build.	2023-12-14 09:32:55 -08:00
Arthur Eubanks	5e38ba26d2	[X86] Respect code models more when determining if a global reference can fit in 32 bits (#75386 ) For non-GlobalValue references, the small and medium code models can use 32 bit constants. For GlobalValue references, use TargetMachine::isLargeGlobalObject(). Look through aliases for determining if a GlobalValue is small or large. Even the large code model can reference small objects with 32 bit constants as long as we're in no-pic mode, or if the reference is offset from the GOT.	2023-12-14 09:28:27 -08:00
Felipe de Azevedo Piovezan	1b531d54f6	[InstrRef][nfc] Remove usage of unique_ptrs of arrays (#74203 ) These are usually difficult to reason about, and they were being used to pass raw pointers around with array semantic (i.e., we were using operator [] on raw pointers). To put it in InstrRef terminology: we were passing a pointer to a ValueTable but using it as if it were a FuncValueTable. These could have easily been SmallVectors, which now allow us to have reference semantics in some places, as well as simpler initialization. In the future, we can use even more pass-by-reference with some extra changes in the code.	2023-12-14 13:22:32 -03:00
Simon Pilgrim	39093102ca	[DAG] visitTRUNCATE - format (truncate (load x)) fold code. Reduces diff in #75229	2023-12-14 15:13:38 +00:00
Thorsten Schütt	26616c62d1	[GlobalIsel][NFC] Harden MachineIRBuilder (#75465 ) Protective measures against https://github.com/llvm/llvm-project/pull/74502	2023-12-14 14:04:57 +01:00
DianQK	7649d22306	[AArch64] ORRWrs is copy instruction when there's no implicit def of the X register (#75184 ) Follows https://github.com/llvm/llvm-project/pull/74682#issuecomment-1850268782. Fixes #74680.	2023-12-14 19:19:55 +08:00
Simon Pilgrim	a0c7a29655	[GlobalISel] IRTranslator::translateGetElementPtr - don't assume a gep constant offset is representable as i64 Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=65052	2023-12-14 11:02:38 +00:00

1 2 3 4 5 ...

35097 Commits