llvm-project

Author	SHA1	Message	Date
Matt Arsenault	d6f9428e46	GlobalISel: Pass MachineIRBuilder to applyMappingImpl The target should not have to construct MachineIRBuilders during RegBankSelect (we should perhaps hide the constructors for it). The pass should own the builder setup with the desired CSE configuration (although currently the pass does not use the CSE builder, which is what I want to fix). https://reviews.llvm.org/D156479	2023-07-31 10:03:38 -04:00
Vedant Paranjape	259d56d41d	[LoopAccessAnalysis] Add a const qualifier to getMaxSafeDepDistBytes() Add a const qualifier to this API call, since this is a member of MemoryDepChecker and LoopAccessInfo returns an object of this class as a const, as follows: const MemoryDepChecker &getDepChecker() const { return *DepChecker; } If one tries to use function as follows: LAI->getDepChecker().getMaxSafeDepDistBytes() results in the following error: passing ‘const llvm::MemoryDepChecker’ as ‘this’ argument discards qualifiers Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D156304	2023-07-31 09:45:01 +00:00
Nikita Popov	41895843b5	[InstCombine] Only perform one iteration InstCombine is a worklist-driven algorithm, which works roughly as follows: * All instructions are initially pushed to the worklist. The initial order is in RPO program order. * All newly inserted instructions get added to the worklist. * When an instruction is folded, its users get added back to the worklist. * When the use-count of an instruction decreases, it gets added back to the worklist. * And a few of other heuristics on when we should revisit instructions. On top of the worklist algorithm, InstCombine layers an additional fix-point iteration: If any fold was performed in the previous iteration, then InstCombine will re-populate the worklist from scratch and fold the entire function again. This continues until a fix-point is reached. In the vast majority of cases, InstCombine will reach a fix-point within a single iteration: However, a second iteration is performed to verify that this is indeed the fixpoint. We can see this in the statistics for llvm-test-suite: "instcombine.NumOneIteration": 411380, "instcombine.NumTwoIterations": 117921, "instcombine.NumThreeIterations": 236, "instcombine.NumFourOrMoreIterations": 2, The way to read these numbers is that in 411380 cases, InstCombine performs no folds. In 117921 cases it performs a fold and reaches the fix-point within one iteration (the second iteration verifies the fixpoint). In the remaining 238 cases, more than one iteration is needed to reach the fixpoint. In other words, only in 0.04% of cases are additional iterations needed to reach a fixpoint. Conversely, in 22.3% of cases InstCombine performs a completely useless extra iteration to verify the fix point. This patch removes the fixpoint iteration from InstCombine, and always only perform a single iteration. This results in a major compile-time improvement of around 4% at negligible codegen impact. This explicitly does accept that we will not reach a fixpoint in all cases. However, this is mitigated by two factors: First, the data suggests that this happens very rarely in practice. Second, InstCombine runs many times during the optimization pipeline (8 times even without LTO), so there are many chances to recover such cases. In order to prevent accidental optimization regressions in the future, this implements a verify-fixpoint option, which is enabled by default when instcombine is specified in -passes and disabled when InstCombinePass() is constructed from C++. This means that test cases need to explicitly use the no-verify-fixpoint option if they fail to reach a fixed point (for a well understand reason we cannot / do not want to avoid). Differential Revision: https://reviews.llvm.org/D154579	2023-07-31 10:56:49 +02:00
Francesco Petrogalli	c4b21d57bc	[llc] Add the command line option `-sched-model-force-enable-intervals`. The option is used to force the use of resource intervals in the machine scheduler, effectively ignoring the value of `EnableIntervals` in the instance of the `SchedMachineModel`. Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D156540	2023-07-31 10:10:18 +02:00
Alexandros Lamprineas	893d3a61c0	Reland [FuncSpec] Add Phi nodes to the InstCostVisitor. This patch allows constant folding of PHIs when estimating the user bonus. Phi nodes are a special case since some of their inputs may remain unresolved until all the specialization arguments have been processed by the InstCostVisitor. Therefore, we keep a list of dead basic blocks and then lazily visit the Phi nodes once the user bonus has been computed for all the specialization arguments. Differential Revision: https://reviews.llvm.org/D154852	2023-07-31 08:25:48 +01:00
Mel Chen	5962942902	[LV][NFC] Refine comments related to reduction idioms.	2023-07-31 00:06:45 -07:00
Sameer Sahasrabuddhe	d9847cde48	[GlobalISel] convergent intrinsics Introduced the convergent equivalent of the existing G_INTRINSIC opcodes: - G_INTRINSIC_CONVERGENT - G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS Out of the targets that currently have some support for GlobalISel, the patch assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154766	2023-07-31 12:15:39 +05:30
Carl Ritson	2e6530c8e0	[LiveIntervals] Fix comment to match code for getNextValue (NFC) Comment mentions MIIdx, but the actual parameter is def. Fix comment, but also rename parameter to Def to match current coding standards while touching the code.	2023-07-31 14:06:25 +09:00
Lang Hames	7bd481d9af	[ORC] Add ExecutionSession::removeJITDylibs (plural), use it in endSession. The ExecutionSession::removeJITDylibs operation will remove all JITDylibs in the given list (i.e. first clear them, then remove them from the session). ExecutionSession::endSession is updated to remove JITDylibs rather than just clearing them. This prevents new code from being added to any JITDylib once endSession has been called.	2023-07-30 08:51:42 -07:00
Jay Foad	e2e3f06813	Revert "[MachineScheduler] Track physical register dependencies per-regunit" This reverts commit 1a54671d5405a39de362e9692ce963c0638023bc. It was causing lit test failures in a LLVM_ENABLE_EXPENSIVE_CHECKS build.	2023-07-29 18:05:25 +01:00
Jay Foad	1a54671d54	[MachineScheduler] Track physical register dependencies per-regunit Change the scheduler's physical register dependency tracking from registers-and-their-aliases to regunits. This has a couple of advantages when subregisters are used: - The dependency tracking is more accurate and creates fewer useless edges in the dependency graph. An AMDGPU example, edited for clarity: SU(0): $vgpr1 = V_MOV_B32 $sgpr0 SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1 SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0 There is a data dependency on $vgpr1 from SU(0) to SU(1) and from SU(1) to SU(2). But the old dependency tracking code also added a useless edge from SU(0) to SU(2) because it thought that SU(0)'s def of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1. - On targets like AMDGPU that make heavy use of subregisters, each register can have a huge number of aliases - it can be quadratic in the size of the largest defined register tuple. There is a much lower bound on the number of regunits per register, so iterating over regunits is faster than iterating over aliases. The LLVM compile-time tracker shows a tiny overall improvement of 0.03% on X86. I expect a larger compile-time improvement on targets like AMDGPU. Differential Revision: https://reviews.llvm.org/D156552	2023-07-29 15:34:53 +01:00
Nikita Popov	ed23609bc2	[PatternMatch] Do not match constant expressions for binops Currently, m_Mul() style matchers also match constant expressions. This is a regular source of assertion failures (usually by trying to do a match and then cast to Instruction or BinaryOperator) and infinite combine loops. At the same time, I don't think this provides useful optimization capabilities (all of the tests affected here are regression tests for crashes / infinite loops). Long term, all of these constant expressions (apart from possibly add/sub) are slated for removal per https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179 -- but doing those removals can itself expose new crashes and infinite loops due to the current PatternMatch behavior. Differential Revision: https://reviews.llvm.org/D156401	2023-07-29 11:21:22 +02:00
Matt Arsenault	3240ae7034	AMDGPU/GlobalISel: Set dead on scc on manually selected instructions In SelectionDAG InstrEmitter automatically puts dead flags on unused physreg defs everywhere. The generated selectors should also set dead on physreg defs that were not used in the pattern.	2023-07-28 14:14:06 -04:00
Aaron Ballman	1a53b5c367	Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map" This reverts commit 66ba71d913df7f7cd75e92c0c4265932b7c93292. Addressing issues found by: https://lab.llvm.org/buildbot/#/builders/245/builds/11732 https://lab.llvm.org/buildbot/#/builders/187/builds/12251 https://lab.llvm.org/buildbot/#/builders/186/builds/11099 https://lab.llvm.org/buildbot/#/builders/182/builds/6976	2023-07-28 09:41:38 -04:00
Job Noorman	76f023bddf	[RISCV] Make mapping symbols SF_FormatSpecific This ensures that llvm-symbolizer ignores them for symbolization. Note: unlike aarch64-mapping-symbol.s, the test included here does not test if the mapping symbols are actually in the symbol table. The reason is that llvm-mc support for RISC-V mapping symbols (D153260) has not landed yet, so the mapping symbols simply aren't there. However, D153260 would like to depend on this patch together with D156190 to avoid having to update a large amount of tests. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D156236	2023-07-28 10:21:55 +02:00
Freddy Ye	c9d92e6638	[X86] Support -march=arrowlake,arrowlake-s,lunarlake Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D156239	2023-07-28 15:05:54 +08:00
Fangrui Song	9ea44c6894	[llvm-objdump] -d: don't display mapping symbols as labels Similar to D96617 for llvm-symbolizer. This patch matches the GNU objdump -d behavior to suppress printing labels for mapping symbols. Mapping symbol names don't convey much information. When --show-all-symbols (not in GNU) is specified, we still print mapping symbols. Note: the `for (size_t SI = 0, SE = Symbols.size(); SI != SE;)` loops needs to iterate all mapping symbols, even if they are not displayed. We use the new field `IsMappingSymbol` to recognize mapping symbols. This field also enables simplification after D139131. ELF/ARM/disassemble-all-mapping-symbols.s is enhanced to add `.space 2`. If `End = std::min(End, Symbols[SI].Addr);` is not correctly set, we would print a `.word`. Reviewed By: jhenderson, jobnoorman, peter.smith Differential Revision: https://reviews.llvm.org/D156190	2023-07-27 20:51:42 -07:00
Fangrui Song	c06a314150	[CSKY] Make mapping symbols SF_FormatSpecific and omit them from llvm-nm output unless --special-syms is specified, similar to ARM and AArch64. This is a prerequisite of D156190 as llvm-objdump will only perform mapping symbol recognition for SF_FormatSpecific symbols.	2023-07-27 20:37:43 -07:00
Fangrui Song	e09a1b51ab	[MCDisassembler] Reorder XCOFF specific constructor parameters. NFC to prevent overload resolution confusion. In particular, if we add another parameter to the generic constructor, MCDisassemblerTest.cpp specified constructors will be resolve to the generic constructor, which is unintended.	2023-07-27 19:59:26 -07:00
Jun Sha (Joshua)	2b6df4a336	[RISCV] Add codegen support for bf16 vector This patch adds codegen support for vector with bfloat16 type in llvm backend. With this patch, Zvbfmin/Zvbfwma instructions as well as vle16/vse16 can generated from newly added bf16 IR intrinsics. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D156287	2023-07-28 09:54:23 +08:00
William Huang	66ba71d913	[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740	2023-07-27 23:08:27 +00:00
Derek Schuff	1b21067cf2	[WebAssembly][Objcopy] Write output section headers identically to inputs Previously when objcopy generated section headers, it padded the LEB that encodes the section size out to 5 bytes, matching the behavior of clang. This is correct, but results in a binary that differs from the input. This can sometimes have undesirable consequences (e.g. breaking source maps). This change makes the object reader remember the size of the LEB encoding in the section header, so that llvm-objcopy can reproduce it exactly. For sections not read from an object file (e.g. that llvm-objcopy is adding itself), pad to 5 bytes. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D155535	2023-07-27 15:43:51 -07:00
Douglas Yung	32683b231e	Revert "[FuncSpec] Add Phi nodes to the InstCostVisitor." This reverts commit 96ff464dd3aac255adc52787a1e28487a9cd4c35. The test in this change was failing on many buildbots: https://lab.llvm.org/buildbot/#/builders/164/builds/41292 https://lab.llvm.org/buildbot/#/builders/258/builds/4491 https://lab.llvm.org/buildbot/#/builders/192/builds/3566 https://lab.llvm.org/buildbot/#/builders/123/builds/20411 https://lab.llvm.org/buildbot/#/builders/58/builds/42553 https://lab.llvm.org/buildbot/#/builders/247/builds/7037 https://lab.llvm.org/buildbot/#/builders/139/builds/46259 https://lab.llvm.org/buildbot/#/builders/216/builds/24650 https://lab.llvm.org/buildbot/#/builders/234/builds/12571 https://lab.llvm.org/buildbot/#/builders/232/builds/12574 https://lab.llvm.org/buildbot/#/builders/235/builds/975	2023-07-27 13:47:52 -07:00
Alexandros Lamprineas	96ff464dd3	[FuncSpec] Add Phi nodes to the InstCostVisitor. This patch allows constant folding of PHIs when estimating the user bonus. Phi nodes are a special case since some of their inputs may remain unresolved until all the specialization arguments have been processed by the InstCostVisitor. Therefore, we keep a list of dead basic blocks and then lazily visit the Phi nodes once the user bonus has been computed for all the specialization arguments. In addition to the last revision this one fixes the bug reported on Phabricator. Differential Revision: https://reviews.llvm.org/D154852	2023-07-27 19:24:11 +01:00
spupyrev	bc59faa863	A new code layout algorithm for function reordering [2/3] We are bringing a new algorithm for function layout (reordering) based on the call graph (extracted from a profile data). The algorithm is an improvement of top of a known heuristic, C^3. It tries to co-locate hot and frequently executed together functions in the resulting ordering. Unlike C^3, it explores a larger search space and have an objective closely tied to the performance of instruction and i-TLB caches. Hence, the name CDS = Cache-Directed Sort. The algorithm can be used at the linking or post-linking (e.g., BOLT) stage. The algorithm shares some similarities with C^3 and an approach for basic block reordering (ext-tsp). It works with chains (ordered lists) of functions. Initially all chains are isolated functions. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the objective, which is a weighted combination of frequency-based and distance-based locality. That is, we try to co-locate hot functions together (so they can share the cache lines) and functions frequently executed together. The merging process stops when there is only one chain left, or when merging does not improve the objective. In the latter case, the remaining chains are sorted by density in the decreasing order. Complexity We regularly apply the algorithm for large data-center binaries containing 10K+ (hot) functions, and the algorithm takes only a few seconds. For some extreme cases with 100K-1M nodes, the runtime is within minutes. Perf-impact We extensively tested the implementation extensively on a benchmark of isolated binaries and prod services. The impact is measurable for "larger" binaries that are front-end bound: the cpu time improvement (on top of C^3) is in the range of [0% .. 1%], which is a result of a reduced i-TLB miss rate (by up to 20%) and i-cache miss rate (up to 5%). Reviewed By: rahmanl Differential Revision: https://reviews.llvm.org/D152834	2023-07-27 09:20:53 -07:00
Cyndy Ishida	27459a3a2b	[TextAPI] Update missing enum cases & utility functions * Expand understood `FileType`s that InterfaceFile class can represent. * Add `hasTarget` function. * Cleanup symbol `<` comparator to account for SymbolSet operations.	2023-07-27 08:24:42 -07:00
Nikita Popov	70aca7b122	[InstCombine] Explicitly track dead edges This allows us to handle dead blocks with multiple incoming edges, where we can determine that all of those edges are dead (or cycles). This allows InstCombine to handle certain dead code patterns that can be produced by LoopVectorize in a single iteration. This is in preparation for D154579.	2023-07-27 16:41:03 +02:00
Jay Foad	2dcf051259	[CodeGen] Store call frame size in MachineBasicBlock Record the call frame size on entry to each basic block. This is usually zero except when a basic block has been split in the middle of a call sequence. This simplifies PEI::replaceFrameIndices which previously had to visit basic blocks in a specific order and had special handling for unreachable blocks. More importantly it paves the way for an equally simple implementation of a backwards version of replaceFrameIndices, which is required to fully convert PrologEpilogInserter to backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156113	2023-07-27 10:32:00 +01:00
Sameer Sahasrabuddhe	7c760b224b	Restore "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR" Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic ID is the first non-def operand to the instruction. These are now represented as a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID() is now moved to this subclass GIntrinsic. Some target-defined instructions behave like GMIR intrinsics, and have an Intrinsic::ID operand. But they should not be recognized as generic intrinsics, and should not use GIntrinsic::getIntrinsicID(). Separated these out by introducing a new AMDGPU::getIntrinsicID(). Reviewed By: arsenm, Pierre-vh Differential Revision: https://reviews.llvm.org/D155556 This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa. Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.	2023-07-27 14:49:17 +05:30
Vitaly Buka	a496c8be6e	Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" And dependent commits. Details in D150388. This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b. No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll Reviewed By: alexfh Differential Revision: https://reviews.llvm.org/D156381	2023-07-26 22:13:32 -07:00
Sameer Sahasrabuddhe	d0f7850b01	Revert "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR" This reverts commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa. The changes did not cover all occurrences of the deteleted function MachineInstr::getIntrinsicID().	2023-07-27 10:14:24 +05:30
Sameer Sahasrabuddhe	baa3386edb	[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic ID is the first non-def operand to the instruction. These are now represented as a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID() is now moved to this subclass GIntrinsic. Some target-defined instructions behave like GMIR intrinsics, and have an Intrinsic::ID operand. But they should not be recognized as generic intrinsics, and should not use GIntrinsic::getIntrinsicID(). Separated these out by introducing a new AMDGPU::getIntrinsicID(). Reviewed By: arsenm, Pierre-vh Differential Revision: https://reviews.llvm.org/D155556	2023-07-27 10:00:45 +05:30
Sameer Sahasrabuddhe	b14e30f10d	[LLVM] refactor GenericSSAContext and its specializations Fix the GenericSSAContext template so that it actually declares all the necessary typenames and the methods that must be implemented by its specializations SSAContext and MachineSSAContext. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156288	2023-07-27 09:54:50 +05:30
Jeff Niu	e76ac8074f	[llvm][orc] Consider other ELF init sections as well ELF object files can contain `.ctors` and `.dtors` sections that also participate as initializers. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D154802	2023-07-26 13:44:41 -07:00
Jessica Del	93dc66a289	[AMDGPU] - Mark inverse.ballot as not convergent `inverse.ballot` checks if a cc bit is set for the current lane. Therefore, it is not convergent. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D156088	2023-07-26 19:52:52 +02:00
Shilei Tian	10068cd654	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-07-26 13:35:14 -04:00
Craig Topper	e5df0481f0	[FunctionSpecialization] Use SmallVector::operator== to simplify some code. NFC Reviewed By: labrinea Differential Revision: https://reviews.llvm.org/D156260	2023-07-26 09:46:37 -07:00
Ivan Kosarev	e9df4c9892	[ADT] Support iterating size-based integer ranges. It seems the ranges start with 0 in most cases. Reviewed By: dblaikie, gchatelet Differential Revision: https://reviews.llvm.org/D156135	2023-07-26 16:28:41 +01:00
Alexandros Lamprineas	c52ab9ea2f	Revert "[FuncSpec] Add Phi nodes to the InstCostVisitor." Reverting due to the crash reported in D154852. Also reverting the subsequent commit as collateral damage: "[FuncSpec] Split the specialization bonus into CodeSize and Latency."	2023-07-26 12:33:41 +01:00
Alexandros Lamprineas	20c8f58c11	[FuncSpec] Split the specialization bonus into CodeSize and Latency. Currently we use a combined metric TargetTransformInfo::TCK_SizeAndLatency when estimating the specialization bonus. This is suboptimal, and in some cases erroneous. For example we shouldn't be weighting the codesize decrease attributed to constant propagation by the block frequency of the dead code. Instead only the latency savings should be weighted by block frequency. The total codesize savings from all the specialization arguments should be deducted from the specialization cost. Differential Revision: https://reviews.llvm.org/D155103	2023-07-26 12:03:46 +01:00
Johannes Doerfert	b3fec1067a	[Attributor] Improve NonNull deduction We can improve our deduction if we stop at PHI and select instructions and also iterate the returned values explicitly. The latter helps with isImpliedByIR deductions.	2023-07-25 20:31:21 -07:00
Kai Luo	11a02de782	[JITLink][PowerPC] Change method to check if a symbol is external to current object After PrePrunePass `claimOrExternalizeWeakAndCommonSymbols`, a defined symbol might become external. So determine a function call is external or not when building the linkgraph is not accurate. This largely affects updating TOC pointer on PowerPC. TOC pointer is supposed to be the same in one object file(if no mulitple TOC appears) and is updated when control flow transferred to another object file. This patch defers checking a function call is external or not, in `buildTables_ELF_ppc64` which is a PostPrunePass. This patch fixes failures when `jitlink -orc-runtime=/path/to/libort_rt.a` is used. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D155925	2023-07-26 03:20:56 +00:00
Weining Lu	c56514f21b	Reland "[LoongArch] Support -march=native and -mtune=" As described in [1][2], `-mtune=` is used to select the type of target microarchitecture, defaults to the value of `-march`. The set of possible values should be a superset of `-march` values. Currently possible values of `-march=` and `-mtune=` are `native`, `loongarch64` and `la464`. D136146 has supported `-march={loongarch64,la464}` and this patch adds support for `-march=native` and `-mtune=`. A new ProcessorModel called `loongarch64` is defined in LoongArch.td to support `-mtune=loongarch64`. `llvm::sys::getHostCPUName()` returns `generic` on unknown or future LoongArch CPUs, e.g. the not yet added `la664`, leading to `llvm::LoongArch::isValidArchName()` failing to parse the arch name. In this case, use `loongarch64` as the default arch name for 64-bit CPUs. And these two preprocessor macros are defined: - __loongarch_arch - __loongarch_tune [1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc [2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc Reviewed By: xen0n, wangleiat Differential Revision: https://reviews.llvm.org/D155824	2023-07-26 10:26:38 +08:00
Fangrui Song	4553dc46a0	[Support] Rewrite GlobPattern The current implementation has two primary issues: * Matching `aaab` against `aaaaaa` has exponential complexity. BitVector harms data cache and is inefficient for literal matching. and a minor issue that `\` at the end may cause an out of bounds access in `StringRef::operator[]`. Switch to an O(\|S\|\|P\|) greedy algorithm instead: factor the pattern into segments split by ''. The segment is matched sequentianlly by finding the first occurrence past the end of the previous match. This algorithm is used by lots of fnmatch implementations, including musl and NetBSD's. In addition, `optional<StringRef> Exact, Suffix, Prefix` wastes space. Instead, match the non-metacharacter prefix against the haystack, then match the pattern with the rest. In practice `*suffix` style patterns are less common and our new algorithm is fast enough, so don't bother storing the non-metacharacter suffix. Note: brace expansions (D153587) can leverage the `matchOne` function. Differential Revision: https://reviews.llvm.org/D156046	2023-07-25 18:46:55 -07:00
Johannes Doerfert	4223c9b354	[Attributor] Always deduce nosync from readonly + non-convergent This adds the deduction also if the function is not IPO amendable.	2023-07-25 17:47:33 -07:00
Fangrui Song	6a684dbc44	[Support] Remove llvm::is_trivially_{copy/move}_constructible This restores D132311, which was reverted in 29c841ce93e087fa4e0c5f3abae94edd460bc24a (Sep 2022) due to certain files not buildable with GCC 7.3.0. The previous attempt was reverted by 6cd9608fb37ca2418fb44b57ec955bb5efe10689 (Dec 2020). This time, GCC 7.3.0 has existing build errors for a long time due to structured bindings for many files, e.g. ``` llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9098:13: error: cannot decompose class type ‘std::pair<llvm::Value, const llvm::SCEV>’: both it and it s base class ‘std::pair<llvm::Value, const llvm::SCEV>’ have non-static data members for (auto [_, Stride] : Legal->getLAI()->getSymbolicStrides()) { ^~~~~~~~~~~ ``` ... and also some `error: duplicate initialization of` instances due to llvm/Transforms/IPO/Attributor.h. --- GCC 7.5.0 has a bug that, without this change, certain `SmallVector` with a `std::pair` element type like `SmallVector<std::pair<Instruction * const, Info>, 0> X;` lead to spurious ``` /tmp/opt/gcc-7.5.0/include/c++/7.5.0/type_traits:878:48: error: constructor required before non-static data member for ‘...’ has been parsed ``` Switching to std::is_trivially_{copy/move}_constructible fixes the error.	2023-07-25 17:21:16 -07:00
Shubham Sandeep Rastogi	b66b176dc3	Emit DW_RLE_base_addressx + DW_RLE_offset_pairs instead of DW_ELE_start_length in debug_rnglists section This patch tries to reduce the size of the debug_rnglist section by replacing the DW_RLE_start_length opcodes currently emitted by dsymutil in favor of using DW_RLE_base_addressx + DW_RLE_offset_pair instead. The DW_RLE_start_length is one AddressSize followed by a ULEB per entry, whereas, the DW_RLE_base_addressx + DW_RLE_offset_pair will use one ULEB for the base address, and then the DW_RLE_offset_pair is a pair of ULEBs. This will be more efficient. Differential Revision: https://reviews.llvm.org/D156166	2023-07-25 16:17:53 -07:00
Fangrui Song	1b162fabe8	[Support] Change SetVector's default template parameter to SmallVector<*, 0> Similar to D156016 for MapVector. This brings back commit fae7b98c221b5b28797f7b56b656b6b819d99f27 with a fix to llvm/unittests/Support/ThreadPool.cpp's `_WIN32` code path.	2023-07-25 13:13:35 -07:00
Johannes Doerfert	08a220764b	Reapply "[OpenMP] Add the `ompx_attribute` clause for target directives" This reverts commit 0d12683046ca75fb08e285f4622f2af5c82609dc and reapplies ef9ec4bbcca2fa4f64df47bc426f1d1c59ea47e2 with an extension to fix the Flang build. Differential Revision: https://reviews.llvm.org/D156184	2023-07-25 10:40:35 -07:00
Thomas Köppe	6f1395a1fe	[llvm-objcopy] --set-section-flags: allow "large" to add SHF_X86_64_LARGE Currently, objcopy cannot set the new flag SHF_X86_64_LARGE. This change introduces the named flag "large" which translates to that section flag. An "invalid argument" error is produced if a user attempts to set the flag on an architecture other than X86_64. Reviewed By: jhenderson, MaskRay Differential Revision: https://reviews.llvm.org/D153262	2023-07-25 09:47:02 -07:00

1 2 3 4 5 ...

52324 Commits