llvm-project

Author	SHA1	Message	Date
Sjoerd Meijer	8af859d514	[MachineLoop] New helper isLoopInvariant() This factors out code from MachineLICM that determines whether an instruction is loop-invariant, which is a generally useful function. Thus this allows to use that helper elsewhere too. Differential Revision: https://reviews.llvm.org/D94082	2021-01-08 09:04:56 +00:00
Christudasan Devadasan	ae25a397e9	AMDGPU/GlobalISel: Enable sret demotion	2021-01-08 10:56:35 +05:30
Kazu Hirata	8febb2e0f5	[CodeGen] Remove unused function isCallerPreservedOrConstPhysReg (NFC) The last use of the function was removed on Oct 20, 2018 in commit 8d6ff4c0af843e1a61b76d89812aed91e358de34.	2021-01-07 20:29:32 -08:00
Matt Arsenault	2cbbc6e87c	GlobalISel: Fail legalization on narrowing extload below memory size	2021-01-07 17:40:34 -05:00
Matt Arsenault	1f9b6ef91f	GlobalISel: Add combine for G_UREM by power of 2 Really I want this in the legalizer, but this is a start.	2021-01-07 16:36:35 -05:00
Wouter van Oortmerssen	5c38ae36c5	[WebAssembly] Fixed byval args missing DWARF DW_AT_LOCATION A struct in C passed by value did not get debug information. Such values are currently lowered to a Wasm local even in -O0 (not to an alloca like on other archs), which becomes a Target Index operand (TI_LOCAL). The DWARF writing code was not emitting locations in for TI's specifically if the location is a single range (not a list). In addition, the ExplicitLocals pass which removes the ARGUMENT pseudo instructions did not update the associated DBG_VALUEs, and couldn't even find these values since the code assumed such instructions are adjacent, which is not the case here. Also fixed asm printing of TIs needed by a test. Differential Revision: https://reviews.llvm.org/D94140	2021-01-07 10:31:38 -08:00
Matt Arsenault	c9122ddef5	CodeGen: Refactor regallocator command line and target selection Make the sequence of passes to select and rewrite instructions to physical registers be a target callback. This is to prepare to allow targets to split register allocation into multiple phases.	2021-01-07 13:13:25 -05:00
Simon Pilgrim	350ab7aa1c	[DAG] Simplify OR(X,SHL(Y,BW/2)) eq/ne 0/-1 'all/any-of' style patterns Attempt to simplify all/any-of style patterns that concatenate 2 smaller integers together into an and(x,y)/or(x,y) + icmp 0/-1 instead. This is mainly to help some bool predicate reduction patterns where we end up concatenating bool vectors that have been bitcasted to integers. Differential Revision: https://reviews.llvm.org/D93599	2021-01-07 12:03:19 +00:00
Sanjoy Das	a855c9403f	[NFC] Don't copy MachineFrameInfo on each invocation of HasAlias Also fix a typo in a comment. This fixes a compile time issue in XLA (https://www.tensorflow.org/xla). Differential Revision: https://reviews.llvm.org/D94182	2021-01-06 18:59:20 -08:00
Kazu Hirata	cfeecdf7b6	[llvm] Use llvm::all_of (NFC)	2021-01-06 18:27:36 -08:00
Simon Pilgrim	1307e3f6c4	[TargetLowering] Add icmp ne/eq (srl (ctlz x), log2(bw)) vector support.	2021-01-06 16:13:51 +00:00
Sander de Smalen	aa280c99f7	[AArch64][SVE] Emit DWARF location expr for SVE (dbg.declare) When using dbg.declare, the debug-info is generated from a list of locals rather than through DBG_VALUE instructions in the MIR. This patch is different from D90020 because it emits the DWARF location expressions from that list of locals directly. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D90044	2021-01-06 11:45:05 +00:00
Sander de Smalen	84a1120943	[LiveDebugValues] Handle spill locations with a fixed and scalable component. This patch fixes the two LiveDebugValues implementations (InstrRef/VarLoc)Based to handle cases where the StackOffset contains both a fixed and scalable component. This depends on the `TargetRegisterInfo::prependOffsetExpression` being added in D90020. Feel free to leave comments on that patch if you have them. Reviewed By: djtodoro, jmorse Differential Revision: https://reviews.llvm.org/D90046	2021-01-06 11:30:13 +00:00
Sander de Smalen	a7e3339f3b	[AArch64][SVE] Emit DWARF location expression for SVE stack objects. Extend PEI to emit a DWARF expression for StackOffsets that have a fixed and scalable component. This means the expression that needs to be added is either: <base> + offset or: <base> + offset + scalable_offset * scalereg where for SVE, the scale reg is the Vector Granule Dwarf register, which encodes the number of 64bit 'granules' in an SVE vector and which the debugger can evaluate at runtime. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D90020	2021-01-06 09:40:53 +00:00
Kazu Hirata	cea1c63756	[MachineSink] Construct SmallVector with iterator ranges (NFC)	2021-01-05 21:15:57 -08:00
Christudasan Devadasan	d68458bd56	[GlobalISel] Base implementation for sret demotion. If the return values can't be lowered to registers SelectionDAG performs the sret demotion. This patch contains the basic implementation for the same in the GlobalISel pipeline. Furthermore, targets should bring relevant changes during lowerFormalArguments, lowerReturn and lowerCall to make use of this feature. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D92953	2021-01-06 10:30:50 +05:30
David Blaikie	ad18b075fd	DebugInfo: Add support for always using ranges (rather than low/high pc) in DWARFv5 Given the ability provided by DWARFv5 rnglists to reuse addresses in the address pool, it can be advantageous to object file size to use range encodings even when the range could be described by a direct low/high pc. Add a flag to allow enabling this in DWARFv5 for the purpose of experimentation/data gathering. It might be that it makes sense to enable this functionality by default for DWARFv5 + Split DWARF at least, where the tradeoff/desire to optimize for .o file size is more explicit and .o bytes are higher priority than .dwo bytes.	2021-01-05 16:36:22 -08:00
Craig Topper	4ef91f5871	[DAGCombiner] Don't speculatively create an all ones constant in visitREM that might not be used. This looks to have been done to save some duplicated code under two different if statements, but it ends up being harmful to D94073. This speculative constant can be called on a scalable vector type with i64 element size when i64 scalars aren't legal. The code tries and fails to find a vector type with i32 elements that it can use. So only create the node when we know it will be used.	2021-01-05 12:45:57 -08:00
Matt Arsenault	a427f15d60	GlobalISel: Add isKnownToBeAPowerOfTwo helper function	2021-01-05 12:59:08 -05:00
Jinsong Ji	f26bc0ddd5	[RegisterClassInfo] Return non-zero for RC without allocatable reg In some case, the RC may have 0 allocatable reg. eg: VRSAVERC in PowerPC, which has only 1 reg, but it is also reserved. The curreent implementation will keep calling the computePSetLimit because getRegPressureSetLimit assume computePSetLimit will return a non-zero value. The fix simply early return the value from TableGen for such special case. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92907	2021-01-05 16:18:34 +00:00
Fraser Cormack	9a1ac97d3a	[CodeGen] Format SelectionDAG::getConstant methods (NFC)	2021-01-05 12:59:46 +00:00
QingShan Zhang	2962f1149c	[NFC] Add the getSizeInBytes() interface for MachineConstantPoolValue Current implementation assumes that, each MachineConstantPoolValue takes up sizeof(MachineConstantPoolValue::Ty) bytes. For PowerPC, we want to lump all the constants with the same type as one MachineConstantPoolValue to save the cost that calculate the TOC entry for each const. So, we need to extend the MachineConstantPoolValue that break this assumption. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89108	2021-01-05 03:22:45 +00:00
Roman Lebedev	b4f519bddd	[NFCI] DwarfEHPrepare: update DomTree in non-permissive mode, when present Being stricter will catch issues that would be just papered over in permissive mode, and is likely faster.	2021-01-05 01:26:36 +03:00
Cameron McInally	92be640bd7	[FPEnv][AMDGPU] Disable FSUB(-0,X)->FNEG(X) DAGCombine when subnormals are flushed This patch disables the FSUB(-0,X)->FNEG(X) DAG combine when we're flushing subnormals. It requires updating the existing AMDGPU tests to use the fneg IR instruction, in place of the old fsub(-0,X) canonical form, since AMDGPU is the only backend currently checking the DenormalMode flags. Note that this will require follow-up optimizations to make sure the FSUB(-0,X) form is handled appropriately Differential Revision: https://reviews.llvm.org/D93243	2021-01-04 14:44:10 -06:00
Kazu Hirata	eb198f4c3c	[llvm] Use llvm::any_of (NFC)	2021-01-04 11:42:47 -08:00
Florian Hahn	ed936aad78	[InterleavedAccess] Return correct 'modified' status. Both tryReplaceExtracts and replaceBinOpShuffles may modify the IR, even if no interleaved loads are generated, but currently the pass pretends no changes were made. This patch updates the pass to return true if either of the functions made any changes. In case of tryReplaceExtracts, changes are made if there are any Extracts and true is returned. `replaceBinOpShuffles` always makes changes if BinOpShuffles is not empty. It also always returned true, so I went ahead and change it to just `replaceBinOpShuffles`. Fixes PR48208. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D93997	2021-01-04 15:49:47 +00:00
Kazu Hirata	ba82c0b315	[llvm] Call *(Set\|Map)::erase directly (NFC) We can erase an item in a set or map without checking its membership first.	2021-01-03 09:57:47 -08:00
Brandon Bergren	8f004471c2	[PowerPC] Add the LLVM triple for powerpcle [1/5] Add a triple for powerpcle--. This is a little-endian encoding of the 32-bit PowerPC ABI, useful in certain niche situations: 1) A loader such as the FreeBSD loader which will be loading a little endian kernel. This is required for PowerPC64LE to load properly in pseries VMs. Such a loader is implemented as a freestanding ELF32 LSB binary. 2) Userspace emulation of a 32-bit LE architecture such as x86 on 64-bit hosts such as PowerPC64LE with tools like box86 requires having a 32-bit LE toolchain and library set, as they operate by translating only the main binary and switching to native code when making library calls. 3) The Void Linux for PowerPC project is experimenting with running an entire powerpcle userland. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D93918	2021-01-02 12:17:22 -06:00
Kazu Hirata	171c5fd43e	[llvm] Use llvm::erase_value and llvm::erase_if (NFC)	2021-01-02 09:24:15 -08:00
Roman Lebedev	f4ea21947d	[NFCI][CodeGen] DwarfEHPrepare: don't actually pass DTU into simplifyCFG by default also, don't verify DomTree unless we intend to maintain it. This is a very dumb think-o, i guess i was even warned about it by subconsciousness in 4b80647367950ba3da6a08260487fd0dbc50a9c5's commit message.. Fixes a compile-time regression reported by Martin Storsjö in post-commit review of 2461cdb41724298591133c811df82b0064adfa6b.	2021-01-02 14:38:52 +03:00
Yang Fan	471dec3801	[CodeGen][NFC] Fix a build warning due to an extra semicolon	2021-01-02 10:42:58 +08:00
Roman Lebedev	2461cdb417	[CodeGen][SimplifyCFG] Teach DwarfEHPrepare to preserve DomTree Once the default for SimplifyCFG flips, we can no longer pass nullptr instead of DomTree to SimplifyCFG, so we need to propagate it here. We don't strictly need to actually preserve DomTree in DwarfEHPrepare, but we might as well do it, since it's trivial.	2021-01-02 01:01:19 +03:00
Roman Lebedev	e6b1a27fb9	[NFC][CodeGen] Split DwarfEHPrepare pass into an actual transform and an legacy-PM wrapper This is consistent with the layout of other passes, and simplifies further refinements regarding DomTree handling. This is indended to be a NFC commit.	2021-01-02 01:01:19 +03:00
Roman Lebedev	c38739ad8f	[NFC] clang-format the entire DwarfEHPrepare.cpp	2021-01-02 01:01:19 +03:00
Sanjay Patel	c74e8539ff	[Analysis] flatten enums for recurrence types This is almost all mechanical search-and-replace and no-functional-change-intended (NFC). Having a single enum makes it easier to match/reason about the reduction cases. The goal is to remove `Opcode` from reduction matching code in the vectorizers because that makes it harder to adapt the code to handle intrinsics. The code in RecurrenceDescriptor::AddReductionVar() is the only place that required closer inspection. It uses a RecurrenceDescriptor and a second InstDesc to sometimes overwrite part of the struct. It seem like we should be able to simplify that logic, but it's not clear exactly which cmp+sel patterns that we are trying to handle/avoid.	2021-01-01 12:20:16 -05:00
Fangrui Song	d1fd72343c	Refactor how -fno-semantic-interposition sets dso_local on default visibility external linkage definitions The idea is that the CC1 default for ELF should set dso_local on default visibility external linkage definitions in the default -mrelocation-model pic mode (-fpic/-fPIC) to match COFF/Mach-O and make output IR similar. The refactoring is made available by 2820a2ca3a0e69c3f301845420e0067ffff2251b. Currently only x86 supports local aliases. We move the decision to the driver. There are three CC1 states: * -fsemantic-interposition: make some linkages interposable and make default visibility external linkage definitions dso_preemptable. * (default): selected if the target supports .Lfoo$local: make default visibility external linkage definitions dso_local * -fhalf-no-semantic-interposition: if neither option is set or the target does not support .Lfoo$local: like -fno-semantic-interposition but local aliases are not used. So references can be interposed if not optimized out. Add -fhalf-no-semantic-interposition to a few tests using the half-based semantic interposition behavior.	2020-12-31 13:59:45 -08:00
Juneyoung Lee	5cdf6ed744	[CodeGen] recognize select form of and/ors when splitting branch conditions Recently a few patches are made to move towards using select i1 instead of and/or i1 to represent "a && b"/"a \|\| b" in C/C++. "a && b" in C/C++ does not evaluate b if a is false whereas 'and a, b' in IR evaluates b and uses its result regardless of the result of a. This is problematic because it can cause miscompilation if b was an erroneous operation (https://llvm.org/pr48353). In C/C++, the result is simply false because b is not evaluated, but in IR the result is poison. The discussion at D93065 has more context about this. This patch makes two branch-splitting optimizations (one in SelectionDAGBuilder, one in CodeGenPrepare) recognize select form of and/or as well using m_LogicalAnd/Or. Since it is CodeGen, I think this is semantically ok (at least as safe as what codegen already did). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93853	2021-01-01 04:46:10 +09:00
Kazu Hirata	7bc76fd0ec	[CodeGen] Construct SmallVector with iterator ranges (NFC)	2020-12-31 09:39:11 -08:00
Fangrui Song	f731839584	[LowerEmuTls] Copy dso_local from <var> to __emutls_v.<var> This effect is not testable until we drop the implied dso_local for ELF static/PIE defined symbols from TargetMachine::shouldAssumeDSOLocal.	2020-12-30 16:11:32 -08:00
Juneyoung Lee	9b29610228	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923	2020-12-30 22:36:08 +09:00
Luo, Yuanke	981a0bd858	[X86] Add x86_amx type for intel AMX. The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. It can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981. Differential Revision: https://reviews.llvm.org/D91927	2020-12-30 13:52:13 +08:00
Yuanfang Chen	480936e741	Reland "[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline" (again) This reverts commit 16c8f6e91344ec9840d6aa9ec6b8d0c87a104ca3 with fix. -Wswitch catched an unhandled enum value due to recent commits in TargetPassConfig.cpp.	2020-12-29 16:39:55 -08:00
Yuanfang Chen	16c8f6e913	Revert "Reland "[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline"" This reverts commit 21314940c4856e0cb81b664fd2d2117d1b7dc3e3. Build failure in some bots.	2020-12-29 16:29:07 -08:00
Yuanfang Chen	21314940c4	Reland "[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline" This reverts commit 94427af60c66ffea655a3084825c6c3a9deec1ad (relands 4646de5d75cfce3da4ddeffb6eb8e66e38238800 with fix). Use "return std::move(AsmStreamer);" instead of "return AsmStreamer;" in LVMTargetMachine::createMCStreamer. Unlike Clang, GCC seems having trouble inserting a implicit lvalue->rvalue conversion.	2020-12-29 15:17:23 -08:00
Kazu Hirata	1e3ed09165	[CodeGen] Use llvm::append_range (NFC)	2020-12-28 19:55:16 -08:00
Yuanfang Chen	94427af60c	Revert "[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline" This reverts commit 4646de5d75cfce3da4ddeffb6eb8e66e38238800. Some bots have build failure.	2020-12-28 17:44:22 -08:00
Yuanfang Chen	4646de5d75	[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline Following up on D67687. Please refer to the RFC here http://lists.llvm.org/pipermail/llvm-dev/2020-July/143309.html `CodeGenPassBuilder` is the NPM counterpart of `TargetPassConfig` with below differences. - Debugging features (MIR print/verify, disable pass, start/stop-before/after, etc.) living in `TargetPassConfig` are moved to use PassInstrument as much as possible. (Implementation also lives in `TargetPassConfig.cpp`) - `TargetPassConfig` is a polymorphic base (virtual inheritance) to build the target-dependent pipeline whereas `CodeGenPassBuilder` is the CRTP base/helper to implement the target-dependent pipeline. The motivation is flexibility for targets to customize the pipeline, inlining opportunity, and fits the overall NPM value semantics design. - `TargetPassConfig` is a legacy immutable pass to declare hooks for targets to customize some target-independent codegen layer behavior. This is partially ported to TargetMachine::options. The rest, such as `createMachineScheduler/createPostMachineScheduler`, are left out for now. They should be implemented in LLVMTargetMachine in the future. Reviewed By: arsenm, aeubanks Differential Revision: https://reviews.llvm.org/D83608	2020-12-28 17:36:36 -08:00
Gabriel Hjort Åkerlund	b9a7c89d43	[MIRPrinter] Fix incorrect output of unnamed stack names The MIRParser expects unnamed stack entries to have empty names (''). In case of unnamed alloca instructions, the MIRPrinter would output '<unnamed alloca>', which caused the MIRParser to reject the generated code. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D93685	2020-12-28 18:01:40 +01:00
Chen Zheng	31c2b93d83	[MachineSink] add threshold in machinesink pass to reduce compiling time.	2020-12-27 23:23:07 -05:00
Kazu Hirata	789d250613	[CodeGen, Transforms] Use *Map::lookup (NFC)	2020-12-27 09:57:27 -08:00

... 42 43 44 45 46 ...

32115 Commits