llvm-project

Author	SHA1	Message	Date
Dawid Jurczak	9ba5bb4309	[NFC][LoopIdiom] Make for loops more readable Patch simplifies for loops in LIR following LLVM guidelines: https://llvm.org/docs/CodingStandards.html#use-range-based-for-loops-wherever-possible. Differential Revision: https://reviews.llvm.org/D112077	2021-10-21 12:17:44 +02:00
Frederic Cambus	9635b2951d	[docs] Fix broken link rendering in the LLVM Coding Standards.	2021-10-21 11:12:33 +02:00
Evgeniy Brevnov	1a8ec24efb	[NARY-REASSOCIATE][NFC] Simplify min/max handling In order to explore different variants of reassociation current implementation uses "swap in a loop" approach. Unfortunately, the implementation is more complicated than it could be. This is an attempt to streamline the code. New approach is to extract core functionality into a helper function and call it explicitly as many times as required. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112128	2021-10-21 15:45:53 +07:00
David Sherwood	9448cdc900	[SVE][Analysis] Tune the cost model according to the tune-cpu attribute This patch introduces a new function: AArch64Subtarget::getVScaleForTuning that returns a value for vscale that can be used for tuning the cost model when using scalable vectors. The VScaleForTuning option in AArch64Subtarget is initialised according to the following rules: 1. If the user has specified the CPU to tune for we use that, else 2. If the target CPU was specified we use that, else 3. The tuning is set to "generic". For CPUs of type "generic" I have assumed that vscale=2. New tests added here: Analysis/CostModel/AArch64/sve-gather.ll Analysis/CostModel/AArch64/sve-scatter.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D110259	2021-10-21 09:33:50 +01:00
eopXD	76db6d8080	[NFC][LoopIdiom] Add more test case to runtime-determined memset size This patch supplements missing test case for D107353. - Fix wrong descriptions in 64-bit mode test case - Added testcase under 32-bit mode Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D108507	2021-10-21 00:05:18 -07:00
Yi Kong	1123e03a9d	[opt-viewer] Use safe yaml load_all Differential Revision: https://reviews.llvm.org/D112075	2021-10-21 14:00:03 +08:00
Vitaly Buka	66b650f3da	[NFC][msan] Add NormalArgAfterNoUndef testcase	2021-10-20 21:08:12 -07:00
Vitaly Buka	60a8db6dc5	[NFC][msan] Rerun update_test_checks.py for a test	2021-10-20 21:08:12 -07:00
Vitaly Buka	6742c8a2d8	[NFC][msan] Break the loop when done We have nothing to do after the Argument is found.	2021-10-20 21:08:12 -07:00
Shengchen Kan	edff0070a1	[Codegen] Set ARITH_FENCE as meta-instruction ARITH_FENCE, which was added by https://reviews.llvm.org/D99675, should be a meta-instruction b/c it only emits comments "ARITH_FENCE". Reviewed By: pengfei, LuoYuanke Differential Revision: https://reviews.llvm.org/D112127	2021-10-21 10:19:22 +08:00
Craig Topper	b75f3dd88e	[ARM] Use correct name of floating point ceil intrinsic in test. The intrinsic is called llvm.ceil not llvm.fceil. The checks weren't strong enough to notice that a call to llvm.fceil was emitted in the final assembly.	2021-10-20 17:30:26 -07:00
Arthur Eubanks	6ea7437ca5	[SelectionDAG] Bail out of mergeTruncStores when not optimizing With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores. Fixes PR51827. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D111596	2021-10-20 16:58:22 -07:00
Nikita Popov	8e4ae603d6	[Tests] Add tests for non-speculatable ephemeral values The loads in these examples are currently not considered ephemeral because they are not speculatable.	2021-10-20 23:33:36 +02:00
Sanjay Patel	40163f1df8	[x86] add special-case lowering for usubsat for AVX512 This is a small extension of D112095 to avoid another regression seen with D112085. In this case, we allow the same conversion from usubsat to ALU ops if the target supports vpternlog. That pattern will get converted later in X86DAGToDAGISel::tryVPTERNLOG(). This seems better than putting a magic immediate constant directly in this code to create the exact vpternlog that we need. It's possible that there are other special-cases along these lines, so we should try to keep all of the vpternlog magic in one place. Differential Revision: https://reviews.llvm.org/D112138	2021-10-20 16:41:13 -04:00
Stanislav Mekhanoshin	b92412fb28	[InstCombine] Fold `(a & ~b) & ~c` to `a & ~(b \| c)` %not1 = xor i32 %b, -1 %not2 = xor i32 %c, -1 %and1 = and i32 %a, %not1 %and2 = and i32 %and1, %not2 => %i1 = or i32 %b, %c %i2 = xor i32 %1, -1 %and2 = and i32 %i2, %a Differential Revision: https://reviews.llvm.org/D112108	2021-10-20 13:05:46 -07:00
Stanislav Mekhanoshin	3c59cdee5c	Precommit updated InstCombine/and-xor-or.ll test. NFC.	2021-10-20 12:50:23 -07:00
Florian Hahn	8977bd5806	[IndVars] Invalidate SCEV when IR is changed in rewriteLoopExitValue. At the moment, rewriteLoopExitValue forgets the current phi node in the loop that collects phis to rewrite. A few lines after the value is forgotten, SCEV is used again to analyze incoming values and potentially expand SCEV expression. This means that another SCEV is created for PN, before the IR is actually updated in the next loop. This leads to accessing invalid cached expression in combination with D71539. PN should only be changed once the actual incoming exit value is set in the next loop. Moving invalidation there should ensure that PN is invalidated in all relevant cases. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D111495	2021-10-20 20:48:33 +01:00
Jon Roelofs	b046eb19b8	[AArch64][GlobalISel] combine (and (or x, c1), c2) => (and x, c2) iff c1 & c2 == 0 https://godbolt.org/z/h8ejrG4hb rdar://83597585 Differential Revision: https://reviews.llvm.org/D111856	2021-10-20 12:11:52 -07:00
Stanislav Mekhanoshin	c80d8a8cea	[AMDGPU] MachineLICM cannot hoist VALU MachineLoop::isLoopInvariant() returns false for all VALU because of the exec use. Check TII::isIgnorableUse() to allow hoisting. That unfortunately results in higher register consumption since MachineLICM does not adequately estimate pressure. Therefor I think it shall only be enabled after D107677 even though it does not depend on it. Differential Revision: https://reviews.llvm.org/D107859	2021-10-20 11:47:24 -07:00
Stanislav Mekhanoshin	6185835656	[AMDGPU] Allow rematerialization of SOP with virtual registers D106408 was doing this for all targets although it was reverted due to couple performance regressions on some targets. The difference for AMDGPU is the ability to rematerialize SOP instructions with virtual register uses like we already do for VOP. Differential Revision: https://reviews.llvm.org/D110743	2021-10-20 11:46:50 -07:00
Leonard Grey	5d57578a4e	[MC] Recursively calculate symbol offset This is speculative since I'm not sure if there's some implicit contract that a variable symbol must not have another variable symbol in its evaluation tree. Downstream bug: https://bugs.chromium.org/p/chromium/issues/detail?id=471146#c23. Test is based on alias.s (removed checks since we just need to know it didn't crash). Differential Revision: https://reviews.llvm.org/D109109	2021-10-20 14:29:43 -04:00
Sanjay Patel	80ab06c599	[InstCombine] fold fake vector insert to bit-logic bitcast (inselt (bitcast X), Y, 0) --> or (and X, MaskC), (zext Y) https://alive2.llvm.org/ce/z/Ux-662 Similar to D111082 / db231ebdb07f : We want to avoid relatively opaque vector ops on types that are likely supported by the backend as scalar integers. The bitwise logic ops are more likely to allow further combining. We probably want to generalize this to allow a shift too, but that would oppose instcombine's general rule of not creating extra instructions, so that's left as a potential follow-up. Alternatively, we could do that transform in VectorCombine with the help of the TTI cost model. This is part of solving: https://llvm.org/PR52057	2021-10-20 14:21:40 -04:00
Stanislav Mekhanoshin	503d061dc7	Precommit InstCombine/and-xor-or.ll test. NFC.	2021-10-20 11:13:12 -07:00
Arthur Eubanks	00500d5bad	[NFC] De-template LazyCallGraph::visitReferences() and move into .cpp file This makes changing it and recompiling it much faster.	2021-10-20 10:50:00 -07:00
Itay Bookstein	08ed216000	[IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol As discussed in: * https://reviews.llvm.org/D94166 * https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html The GlobalIndirectSymbol class lost most of its meaning in https://reviews.llvm.org/D109792, which disambiguated getBaseObject (now getAliaseeObject) between GlobalIFunc and everything else. In addition, as long as GlobalIFunc is not a GlobalObject and getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee is a GlobalIFunc cannot currently be modeled properly. Creating aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition, calling getAliaseeObject on a GlobalIFunc will currently return nullptr, which is undesirable because it should return the object itself for non-aliases. This patch refactors the GlobalIFunc class to inherit directly from GlobalObject, and removes GlobalIndirectSymbol (while inlining the relevant parts into GlobalAlias and GlobalIFunc). This allows for calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc itself, making getAliaseeObject() more consistent and enabling alias-to-ifunc to be properly modeled in the IR. I exercised some judgement in the API clients of GlobalIndirectSymbol: some were 'monomorphized' for GlobalAlias and GlobalIFunc, and some remained shared (with the type adapted to become GlobalValue). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D108872	2021-10-20 10:29:47 -07:00
Zhi An Ng	e1fb13401e	[WebAssembly] Add prototype relaxed float min max instructions Add relaxed. f32x4.min, f32x4.max, f64x2.min, f64x2.max. These are only exposed as builtins, and require user opt-in. Differential Revision: https://reviews.llvm.org/D112146	2021-10-20 09:41:51 -07:00
Sanjay Patel	ea9a0556b4	[InstCombine] add tests for casted insertelement; NFC	2021-10-20 12:17:58 -04:00
Fraser Cormack	eabf11f9ea	[CodeGenPrepare] Avoid a scalable-vector crash in ctlz/cttz This patch fixes a crash when despeculating ctlz/cttz intrinsics with scalable-vector types. It is not safe to speculatively get the size of the vector type in bits in case the vector type is not a fixed-length type. As it happens this isn't required as vector types are skipped anyway. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112141	2021-10-20 16:45:55 +01:00
Bjorn Pettersson	3d152bc49d	[NewPM][test] Strickly use -passes in some more lit tests Removed/replaced RUN lines using legacy PM syntax in favor of using -passes in lit tests for Float2Int, MetaRenamer, StripDeadPrototypes and StripSymbols.	2021-10-20 17:06:47 +02:00
Bjorn Pettersson	a3ca7dd0ab	[NewPM][test] Use -passes syntax in Mem2Reg lit tests The legacy PM is deprecated, so use the new PM syntax in lit tests verifying the mem2reg pass.	2021-10-20 17:06:47 +02:00
Craig Topper	fe1f0de003	[RISCV][WebAssembly][TargetLowering] Allow expandCTLZ/expandCTTZ to rely on CTPOP expansion for vectors. Our fallback expansion for CTLZ/CTTZ relies on CTPOP. If CTPOP isn't legal or custom for a vector type we would scalarize the CTLZ/CTTZ. This is different than CTPOP itself which would use a vector expansion. This patch teaches expandCTLZ/CTTZ to rely on the vector CTPOP expansion instead of scalarizing. To do this I had to add additional checks to make sure the operations used by CTPOP expansions are all supported. Some of the operations were already needed for the CTLZ/CTTZ expansion. This is a huge improvement to the RISCV which doesn't have a scalar ctlz or cttz in the base ISA. For WebAssembly, I've added Custom lowering to keep the scalarizing behavior. I've also extended the scalarizing to CTPOP. Differential Revision: https://reviews.llvm.org/D111919	2021-10-20 07:46:41 -07:00
Sanjay Patel	3efd2a0bec	[x86] make helper for useVPTERNLOG; NFC See D112085 for another use case.	2021-10-20 10:26:53 -04:00
Jeremy Morse	89950ade21	[DebugInfo][InstrRef] Track a single variable at a time Here's another performance patch for InstrRefBasedLDV: rather than processing all variable values in a scope at a time, instead, process one variable at a time. The benefits are twofold: * It's easier to reason about one variable at a time in your mind, * It improves performance, apparently from increased locality. The downside is that the value-propagation code gets indented one level further, plus there's some churn in the unit tests. Differential Revision: https://reviews.llvm.org/D111799	2021-10-20 15:03:52 +01:00
Bjorn Pettersson	e9320b1a95	[NewPM][test] Only use -passes syntax in Scalarizer lit tests With legacy PM being deprecated it should be enough to verify the scalarizer pass using the new-PM syntax when invoking opt.	2021-10-20 15:16:18 +02:00
Bjorn Pettersson	5e4dbd7a2f	[NewPM][test] Use -passes syntax in VectorCombine lit tests The legacy PM is deprecated, so use the new PM syntax in lit tests running the vector-combine pass.	2021-10-20 15:16:17 +02:00
Bjorn Pettersson	15f1fb5a30	[NewPM][test] Use -passes syntax in BoundsChecking lit tests The legacy PM is deprecated, so use the new PM syntax in lit tests running the bounds-checking pass.	2021-10-20 15:16:17 +02:00
Bjorn Pettersson	57bd67abfc	[NewPM][test] Use -passes syntax in SpeculativeExecution lit tests The legacy PM is deprecated, so use the new PM syntax in lit tests running the speculative-execution pass.	2021-10-20 15:16:17 +02:00
Bjorn Pettersson	a413663d8f	[NewPM][test] Avoid using -enable-new-pm=1 since -passes implies new PM	2021-10-20 15:16:17 +02:00
Sander de Smalen	be6c8dc765	[SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors. When inserting a scalable subvector into a scalable vector through the stack, the index to store to needs to be scaled by vscale. Before this patch, that didn't yet happen, so it would generate the wrong offset, thus storing a subvector to the incorrect address and overwriting the wrong lanes. For some insert: nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2) The offset was not scaled by vscale: orr x8, x8, #0x4 st1h { z0.h }, p0, [sp] st1h { z1.d }, p1, [x8] ld1h { z0.h }, p0/z, [sp] And is changed to: mov x8, sp st1h { z0.h }, p0, [sp] st1h { z1.d }, p1, [x8, #1, mul vl] ld1h { z0.h }, p0/z, [sp] Differential Revision: https://reviews.llvm.org/D111633	2021-10-20 13:55:24 +01:00
Simon Pilgrim	a3c05982ac	[SLP][X86] Improve SLP tests for division/multiplication by +/- pow2 Add PR51436 test as well as some basic multiply tests, and include SSE2 division coverage	2021-10-20 13:30:27 +01:00
Simon Pilgrim	5b395bd633	[CostModel][X86] Add costs for multiply-by-pow2 constants These are folded to left shifts in the backend. We should be able to extend this for multiply-by-negpow2 after D111968 has landed to resolve PR51436	2021-10-20 13:11:21 +01:00
Simon Pilgrim	9fc523d114	[X86] Remove X86ProcFamilyEnum::IntelSLM Replace X86ProcFamilyEnum::IntelSLM enum with a TuningUseSLMArithCosts flag instead, matching what we already do for Goldmont. This just leaves X86ProcFamilyEnum::IntelAtom to replace with general Tuning/Feature flags and we can finally get rid of the old X86ProcFamilyEnum enum. Differential Revision: https://reviews.llvm.org/D112079	2021-10-20 11:58:39 +01:00
Daniel Kiss	f903c85055	[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions. autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions therefore let's emit .cfi_negate_ra_state for these too. In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication in one step here we can't emit the . cfi_negate_ra_state because that would be point after the ret* instruction. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D111780	2021-10-20 11:03:52 +02:00
Joerg Sonnenberger	ec428f7b78	[SPARC] Recognize the prefetch instruction Reviewed By: LemonBoy Differential Revision: https://reviews.llvm.org/D96311	2021-10-20 10:59:01 +02:00
David Green	862e8d7e55	[AArch64] Improve div and rem costmodel tests. NFC Copied from the X86 tests, these give a better test coveraged than the existing tests.	2021-10-20 09:58:35 +01:00
Paulo Matos	6d0c7bc17d	[WebAssembly] Implementation of table.get/set for reftypes in LLVM IR This change implements new DAG nodes TABLE_GET/TABLE_SET, and lowering methods for load and stores of reference types from IR arrays. These global LLVM IR arrays represent tables at the Wasm level. Differential Revision: https://reviews.llvm.org/D111154	2021-10-20 10:31:31 +02:00
Zi Xuan Wu	de10a02fc0	[CSKY] Complete to add basic integer instruction set Complete the basic integer instruction set and add related predictor in CSKY.td. And it includes the instruction definition and asm parser support. Differential Revision: https://reviews.llvm.org/D111701	2021-10-20 15:50:44 +08:00
Evgeniy Brevnov	269f563a2b	[NARY-REASSOCIATE] Fix infinite recursion optimizing min\max To guarantee convergence of the algorithm each optimization step should decrease number of instructions when IR is modified. This property is not held in this test case. The problem is that SCEV Expander may do "unexpected" reassociation what results in creation of new min/max chains and introduction of extra instructions. As a result on each step we indefinitely optimize back and forth. The solution is to restrict SCEV Expander to perform uncontrolled reassociations by means of "Unknown" expressions. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112060	2021-10-20 14:23:03 +07:00
Wenlei He	e8c245dcd3	[llvm-profgen] Skip duplication factor outside of body sample computation We incorrectly use duplication factor for total samples even though we already accumulate samples instead of taking MAX. It causes profile to have bloated total samples for functions with loop unrolled or vectorized. The change fix the issue for total sample, head sample and call target samples. Differential Revision: https://reviews.llvm.org/D112042	2021-10-19 23:10:45 -07:00
Shao-Ce SUN	9378ca52ca	[NFC] Fix typos	2021-10-20 11:47:26 +08:00

1 2 3 4 5 ...

223056 Commits