llvm-project

Author	SHA1	Message	Date
Daniel Kiss	131c06e6da	Revert "[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions." This reverts commit f903c8505515f15e956febbd8cdfa0037fbaf689.	2022-01-06 19:17:45 +01:00
Craig Topper	ec4dd862bf	[RISCV] Use simm5_plus1_nonzero in isel patterns for vmsgeu.vi/vmsltu.vi intrinsics. The 0 immediate can't be selected to vmsgtu.vi/vmsleu.vi by decrementing the immediate. To prevent his we had special patterns that provided alternate lowering for the 0 cases. This relied on tablegen prioritizing the 0 pattern over the sim5_plus1 range. This patch introduces simm5_plus1_nonzero that excludes 0. It also excludes the special case for vmsltu.vi since we can just use vmsltu.vx and let the 0 be selected to X0. This is an alternative to some of the changes in D116584. Reviewed By: Chenbing.Zheng, asb Differential Revision: https://reviews.llvm.org/D116723	2022-01-06 08:27:27 -08:00
Craig Topper	56ca11e31e	[RISCV] Add an MIR pass to replace redundant sext.w instructions with copies. Function calls and compare instructions tend to cause sext.w instructions to be inserted. If we make good use of W instructions, these operations can often end up being redundant. We don't always detect these during SelectionDAG due to things like phis. There also some cases caused by failure to turn extload into sextload in SelectionDAG. extload selects to LW allowing later sext.ws to become redundant. This patch adds a pass that examines the input of sext.w instructions trying to determine if it is already sign extended. Either by finding a W instruction, other instructions that produce a sign extended result, or looking through instructions that propagate sign bits. It uses a worklist and visited set to search as far back as necessary. Reviewed By: asb, kito-cheng Differential Revision: https://reviews.llvm.org/D116397	2022-01-06 08:23:42 -08:00
Craig Topper	75117fb340	[RISCV] Don't advertise i32->i64 zextload as free for RV64. The zextload hook is only used to determine whether to insert a zero_extend or any_extend for narrow types leaving a basic block. Returning true from this hook tends to cause any load whose output leaves the basic block to become an LWU instead of an LW. Since we tend to prefer sexts for i32 compares on RV64, this can cause extra sext.w instructions to be created in other basic blocks. If we use LW instead of LWU this gives the MIR pass from D116397 a better chance of removing them. Another option might be to teach getPreferredExtendForValue in FunctionLoweringInfo.cpp about our preference for sign_extend of i32 compares. That would cause SIGN_EXTEND to be chosen for any value used by a compare instead of using the isZExtFree heuristic. That will require code to convert from the llvm::Type* to EVT/MVT as well as querying the type legalization actions to get the promoted type in order to call TargetLowering::isSExtCheaperThanZExt. That seemed like many extra steps when no other target wants it. Though it would avoid us needing to lean on the MIR pass in some cases. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D116567	2022-01-06 08:13:42 -08:00
Nikita Popov	f430c1eb64	[Tests] Add elementtype attribute to indirect inline asm operands (NFC) This updates LLVM tests for D116531 by adding elementtype attributes to operands that correspond to indirect asm constraints.	2022-01-06 14:23:51 +01:00
David Green	ba927f66c0	[AArch64] Regenerate arith overflow test, and add a few more select tests. NFC	2022-01-06 11:02:14 +00:00
Christudasan Devadasan	50b5b367c1	[AMDGPU] Iterate LoweredEndCf in the reverse order The function that optimally inserts the exec mask restore operations by combining the blocks currently visits the lowered END_CF pseudos in the forward direction as it iterates the setvector in the order the entries are inserted in it. Due to the absence of BranchFolding at -O0, the irregularly placed BBs cause the forward traversal to incorrectly place two unconditional branches in certain BBs while combining them, especially when an intervening block later gets optimized away in subsequent iterations. It is avoided by reverse iterating the setvector. The blocks at the bottom of a function will get optimized first before processing those at the top. Fixes: SWDEV-315215 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116273	2022-01-06 00:27:11 -05:00
Ikhlas Ajbar	2819e5de42	[Hexagon] Handle instruction selection for select(I1,Q,Q) Lower select(I1,Q,Q) by converting vector predicate Q to vector register V, doing select(I1,V,V), and then converting the resulting V back to Q. Also, try to avoid creating such situations in the first place.	2022-01-05 14:50:12 -08:00
Stefan Pintilie	04496201e0	[PowerPC] Add support for ROP protection for 32 bit. Add support for Return Oriented Programming (ROP) protection for 32 bit. This patch also adds a testing for AIX on both 64 and 32 bit. Reviewed By: amyk Differential Revision: https://reviews.llvm.org/D111362	2022-01-05 15:15:53 -06:00
David Green	ca7ffe09dc	[AArch64] Rename CPY to DUP. NFC These instructions have nothing to do with the new MOP CPY instructions, and are better named DUP to avoid confusion. Differential Revision: https://reviews.llvm.org/D116655	2022-01-05 20:02:39 +00:00
Nico Weber	085f078307	Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."" This reverts commit 859ebca744e634dcc89a2294ffa41574f947bd62. The change contained many unrelated changes and e.g. restored unit test failes for the old lld port.	2022-01-05 13:10:25 -05:00
David Salinas	859ebca744	Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`." This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30. That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup Change-Id: Ibf8e397df94001f248fba609f072088a46abae08 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D115960 Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105	2022-01-05 17:57:32 +00:00
Christudasan Devadasan	e7b89f3222	[AMDGPU] Regenerate test checks in collapse-endcf.mir. NFC	2022-01-05 12:06:11 -05:00
Shubham Pawar	41085357df	[Hexagon] Extend OptAddrMode pass to vgather This change extends the addressing mode optimization pass to HVX vgather. This is specifically intended to resolve compiler not generating indexed addresses for vgather stores to vtcm. Changed the vgather pseudo instructions to accept an immediate operand and handled addition of appropriate immediate operand in addressing mode optimization pass.	2022-01-05 08:44:21 -08:00
David Green	c30f97872f	[AArch64] Regenerate some mir tests to new format. NFC	2022-01-05 15:12:22 +00:00
Nicholas Guy	13992498cd	[AArch64][CodeGen] Emit alignment "Max Skip" operand for AArch64 loops Differential Revision: https://reviews.llvm.org/D114879	2022-01-05 12:54:31 +00:00
Nicholas Guy	73d92faa2f	[CodeGen] Emit alignment "Max Skip" operand The current AsmPrinter has support to emit the "Max Skip" operand (the 3rd of .p2align), however has no support for it to actually be specified. Adding MaxBytesForAlignment to MachineBasicBlock provides this capability on a per-block basis. Leaving the value as default (0) causes no observable differences in behaviour. Differential Revision: https://reviews.llvm.org/D114590	2022-01-05 12:54:30 +00:00
Paul Walker	3728a7de34	[SVE] Add ISel for fabs(fsub(a,b)) ==> FABD. Differential Revision: https://reviews.llvm.org/D116227	2022-01-05 11:59:25 +00:00
Paul Walker	4325fd7402	[AArch64ISelLowering] Don't look through scalable extract_subvector when optimising DUPLANE. When constructDup is passed an extract_subvector it tries to use extract_subvector's operand directly when creating the DUPLANE. This is invalid when extracting from a scalable vector because the necessary DUPLANE ISel patterns do not exist. NOTE: This patch is an update to https://reviews.llvm.org/D110524 that originally fixed this but introduced a bug when the result VT is 64bits. I've restructured the code so the critial final else block is entered when necessary. Differential Revision: https://reviews.llvm.org/D116442	2022-01-05 11:56:59 +00:00
Victor Perez	96e220e688	[LegalizeTypes][VP] Add integer promotion support for vp.select Promote select, vselect and vp.select in a similar way. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116400	2022-01-05 11:01:52 +00:00
Victor Perez	df5226dfb3	[LegalizeTypes][VP] Add widening support for vp.select Widen vp.select the same way as select and vselect. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116407	2022-01-05 09:21:11 +00:00
Zi Xuan Wu	9566cf16ad	[CSKY] Add codegen of select/br/cmp instruction and some frame lowering infra Add basic integer codegen of select/br/cmp instruction. It also includes frame lowering code such as prologue/epilogue.	2022-01-05 15:59:03 +08:00
Heejin Ahn	f2a43f06dd	[WebAssembly] Use llvm utility functions in EH/SjLj This uses `changeToCall` and `changeToInvokeAndSplitBasicBlock` from `lib/Transforms/Utils`, replacing the custom logic. One difference of those functions from our previous logic is they delete the original `CallInst`/`InvokeInst`, which makes them tricky to use while iterating through instructions/BBs. So this CL gathers the candidate calls first and run them through `changeToInvokeAndSplitBasicBlock` later. Also this renames some variables. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D116620	2022-01-04 17:47:20 -08:00
Heejin Ahn	f178f61e1d	[WebAssembly] Nullify unnecessary setjmp calls D107530 did a small optimization that, if a function contains `setjmp` calls but not other calls that can `longjmp`, we don't do SjLj transformation on those `setjmp` calls, because they don't have possibilities of returning from `longjmp`. But we should remove those `setjmp` calls even in that case, because Emscripten doesn't provide that function, assuming it is lowered away by SjLj transformation. `setjmp` always returns 0 when called directly, so this CL replaces them with `i32 0`. Fixes https://github.com/emscripten-core/emscripten/issues/15679. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D116619	2022-01-04 17:44:32 -08:00
Sumanth Gundapaneni	822448635e	[Hexagon] Fix MachineSink not to hoist FP instructions that update USR. Ideally we should make USR as Def for these floating point instructions. However, it violates some assembler MCChecker rules. This patch fixes the issue by marking these FP instructions as non-sinkable.	2022-01-04 15:55:22 -08:00
SANTANU DAS	52f347010a	[Hexagon] Make A2_tfrsi not cheap for operands exceeding 16 bits This patch aids to reduce code size since it removes generation of back-to-back A2_tfrsi instructions. It is enabled only at -Os/-Oz.	2022-01-04 15:46:26 -08:00
Krzysztof Parzyszek	60944d132f	[Hexagon] Convert codegen testcase from .ll to .mir	2022-01-04 15:41:32 -08:00
Harsha Jagasia	2b1c6df5a6	[Hexagon] Performance regression with b2b For code below: { r7 = addasl(r3,r0,#2) r8 = addasl(r3,r2,#2) r5 = memw(r3+r0<<#2) r6 = memw(r3+r2<<#2) } { p1 = cmp.gtu(r6,r5) if (p1.new) memw(r8+#0) = r5 if (p1.new) memw(r7+#0) = r6 } { r0 = mux(p1,r2,r4) } In packetizer, a new packet is created for the cmp instruction since there arent enough resources in previous packet. Also it is determined that the cmp stalls by 2 cycles since it depends on the prior load of r5. In current packetizer implementation, the predicated store is evaluated for whether it can go in the same packet as compare, and since the compare stalls, the stall of the predicated store does not matter and it can go in the same packet as the cmp. However the predicated store will stall for more cycles because of its dependence on the addasl instruction and to avoid that stall we can put it in a new packet. Improve the packetizer to check if an instruction being added to packet will stall longer than instruction already in packet and if so create a new packet.	2022-01-04 14:09:47 -08:00
Craig Topper	a04b532505	[LegalizeIntegerTypes][RISCV] Teach PromoteSetCCOperands to check sign bits of unsigned compares. Unsigned compares work with either zero extended or sign extended inputs just like equality comparisons. I didn't allow this when I refactored the code in D116421 due to lack of tests. But I've since found a simple C test case that demonstrates when this can be useful. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D116617	2022-01-04 12:38:47 -08:00
Brendon Cahoon	db5b791595	[Hexagon] Fix an instruction move in HexagonVectorCombine The HexagonVectorCombine pass was moving an instruction incorrectly, which caused a use in a GEP that was not yet defined. HexagonVectorCombine removes a load from a group due to its dependences, but in realignGroup, the load is processed anyways. In realignGroup, when determining the maximum alignment, only those instructions still in the group should be considered.	2022-01-04 11:41:42 -08:00
Tasmia Rahman	e88eb6443f	[Hexagon] Fix buildVector32 for v4i8 constants The code for constructing a 32-bit constant from 4 8-bit constants has a typo and uses one of the constants twice	2022-01-04 11:19:15 -08:00
Krzysztof Parzyszek	78f5014fea	[Hexagon] Conversions to/from FP types, HVX and scalar Co-authored-by: Anirudh Sundar Subramaniam <quic_sanirudh@quicinc.com> Co-authored-by: Sumanth Gundapaneni <sgundapa@quicinc.com>	2022-01-04 11:03:51 -08:00
Craig Topper	df2e728b77	[RISCV] Teach RISCVGatherScatterLowering to handle more complex recurrence start values. Previously we only recognized strided loads/store when the initial value for the phi was a strided constant vector. This patch extends the support to a strided_constant added to a splatted value. The rewritten loop will add the splat value to the first element of the strided constant vector to use as the scalar start value. The stride is unaffected. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D115958	2022-01-04 10:13:34 -08:00
Philip Reames	b061d86c69	[SCEV] Compute exit count from overflow check expressed w/ x.with.overflow intrinsics This ports the logic we generate in instcombine for a single use x.with.overflow check for use in SCEV's analysis. The result is that we can prove trip counts for many checks, and (through existing logic) often discharge them. Motivation comes from compiling a simple example with -ftrapv. Differential Revision: https://reviews.llvm.org/D116499	2022-01-04 09:44:23 -08:00
Ben Shi	99e7bf46c9	[AVR] Optimize int16 shift operation for shift amount greater than 8 Skip operation on the lower byte in int16 logical left shift when shift amount is greater than 8. Skip operation on the higher byte in int16 logical & arithmetic right shift when shift amount is greater than 8. Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D115594	2022-01-04 11:48:50 +00:00
Ben Shi	f4ef79306c	[AVR] Optimize int8 arithmetic right shift 6 bits Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D115593	2022-01-04 10:36:03 +00:00
Nikita Popov	4ef560ec60	[ELF] Handle .init_array prefix consistently Currently, the code in TargetLoweringObjectFile only assigns @init_array section type to plain .init_array sections, but not prioritized sections like .init_array.00001. This is inconsistent with the interpretation in the AsmParser (see `791523bae6/llvm/lib/MC/MCParser/ELFAsmParser.cpp (L621-L632)`) and upcoming expectations in LLD (see https://github.com/rust-lang/rust/issues/92181 for context). This patch assigns @init_array section type to all sections with an .init_array prefix. The same is done for .fini_array and .preinit_array as well. With that, the logic matches the AsmParser. Differential Revision: https://reviews.llvm.org/D116528	2022-01-04 09:42:58 +01:00
Ben Shi	9fb4e79d06	Revert "[AVR] Optimize int8 arithmetic right shift 6 bits" This reverts commit 5723261370b45fa4d0d295845c6ef9e223f2ff4a. There are failures as reported in https://lab.llvm.org/buildbot#builders/16/builds/21638 https://lab.llvm.org/buildbot#builders/104/builds/5394	2022-01-04 04:14:15 +00:00
Ben Shi	5723261370	[AVR] Optimize int8 arithmetic right shift 6 bits Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D115593	2022-01-04 03:20:29 +00:00
Erik Desjardins	a390c9905d	[X86] Improve selection of the mov instruction in FrameLowering MOV64ri results in a significantly longer encoding, and use of this operator is fairly avoidable as we can always check the size of the immediate we're using. This is an updated version of D99045. Co-authored-by: Simonas Kazlauskas <git@kazlauskas.me> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116458	2022-01-03 11:10:16 -08:00
Erik Desjardins	95cf30401c	[X86] autogen segmented stacks tests (NFC) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116420	2022-01-03 11:09:35 -08:00
Victor Perez	5527139302	[RISCV][VP] Add RVV codegen for [nX]vXi1 vp.select Expand [nX]vXi1 vp.select the same way as [nX]vXi1 vselect. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115546	2022-01-02 23:12:32 -08:00
Craig Topper	d00e438cfe	[RISCV][LegalizeIntegerTypes] Teach PromoteSetCCOperands not to sext i32 comparisons for RV64 if the promoted values are already zero extended. This is similar to what is done for targets that prefer zero extend where we avoid using a zero extend if the promoted values are sign extended. We'll also check for zero extended operands for ugt, ult, uge, and ule when the target prefers sign extend. This is different than preferring zero extend, where we only check for sign bits on equality comparisons. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D116421	2021-12-31 17:15:20 -08:00
Random	2edcde00cb	[MIPS] Add -mfix4300 flag to enable vr4300 mulmul bugfix pass Early revisions of the VR4300 have a hardware bug where two consecutive multiplications can produce an incorrect result in the second multiply. This revision adds the `-mfix4300` flag to llvm (and clang) which, when passed, provides a software fix for this issue. More precise description of the "mulmul" bug: ``` mul.[s,d] fd,fs,ft mul.[s,d] fd,fs,ft or [D]MULT[U] rs,rt ``` When the above sequence is executed by the CPU, if at least one of the source operands of the first mul instruction happens to be `sNaN`, `0` or `Infinity`, then the second mul instruction may produce an incorrect result. This can happen both if the two mul instructions are next to each other and if the first one is in a delay slot and the second is the first instruction of the branch target. Description of the fix: This fix adds a backend pass to llvm which scans for mul instructions in each basic block and inserts a nop whenever the following conditions are met: - The current instruction is a single or double-precision floating-point mul instruction. - The next instruction is either a mul instruction (any kind) or a branch instruction. Differential Revision: https://reviews.llvm.org/D116238	2021-12-31 15:59:44 +03:00
Jay Foad	866b195cb9	[AMDGPU] Regenerate checks for waitcnt-overflow.mir	2021-12-31 11:27:15 +00:00
wangpc	41454ab256	[RISCV] Use constant pool for large integers For large integers (for example, magic numbers generated by TargetLowering::BuildSDIV when dividing by constant), we may need about 4~8 instructions to build them. In the same time, it just takes two instructions to load constants (with extra cycles to access memory), so it may be profitable to put these integers into constant pool. Reviewed By: asb, craig.topper Differential Revision: https://reviews.llvm.org/D114950	2021-12-31 14:48:48 +08:00
jacquesguan	05f82dc877	[RISCV] Fix incorrect cases of vmv.s.f in the VSETVLI insert pass. Fix incorrect cases of vmv.s.f and add test cases for it. Differential Revision: https://reviews.llvm.org/D116432	2021-12-31 14:17:03 +08:00
Krzysztof Parzyszek	db83e3e507	[Hexagon] Generate HVX/FP arithmetic instructions Co-authored-by: Anirudh Sundar Subramaniam <quic_sanirudh@quicinc.com> Co-authored-by: Sumanth Gundapaneni <sgundapa@quicinc.com> Co-authored-by: Joshua Herrera <joshherr@quicinc.com>	2021-12-30 12:47:30 -08:00
Krzysztof Parzyszek	9e6afbedb0	[Hexagon] Generate HVX/FP compare instructions Co-authored-by: Anirudh Sundar Subramaniam <quic_sanirudh@quicinc.com>	2021-12-30 12:17:22 -08:00
Craig Topper	15787ccd45	[RISCV] Add support for STRICT_LRINT/LLRINT/LROUND/LLROUND. Tests for other strict intrinsics. This patch adds isel support for STRICT_LRINT/LLRINT/LROUND/LLROUND. It also adds test cases for f32 and f64 constrained intrinsics that correspond to the intrinsics in float-intrinsics.ll and double-intrinsics.ll. Support for promoting the integer argument of STRICT_FPOWI was added. I've skipped adding tests for f16 intrinsics, since we don't have libcalls for them and we have inconsistent support for promoting them in LegalizeDAG. This will need to be examined more closely. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D116323	2021-12-30 11:54:32 -08:00

1 2 3 4 5 ...

41506 Commits