llvm-project

Author	SHA1	Message	Date
Jon Roelofs	83e6d2edfc	Revert "[ARM] Always lower direct calls as direct when the outliner is enabled (#66434 )" This reverts commit 003bcad9a8b21e15e3786a52b1dafa844075ab84. ARM folks say it regresses some of their benchmarks: https://github.com/llvm/llvm-project/pull/66434#issuecomment-1722424162	2023-09-18 09:45:46 -07:00
Yingwei Zheng	b423e1f05d	[SDAG][RISCV] Avoid neg instructions when lowering atomic_load_sub with a constant rhs This patch avoids creating (sub x0, rhs) when lowering atomic_load_sub with a constant rhs. Comparison with GCC: https://godbolt.org/z/c5zPdP7j4 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158673	2023-09-16 17:09:41 +08:00
Jon Roelofs	003bcad9a8	[ARM] Always lower direct calls as direct when the outliner is enabled (#66434 ) The indirect lowering hinders the outliner's ability to see that sequences are in fact common, since the sequence similarity is rendered opaque by the register callee. The size savings from making them indirect seems to be dwarfed by the outliner's savings from de-duplication. rdar://115178034 rdar://115459865	2023-09-15 10:04:56 -07:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Matt Arsenault	b14e83d1a4	IR: Add llvm.exp10 intrinsic We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10 to fix this asymmetry. AMDGPU already has most of the code for f32 exp10 expansion implemented alongside exp, so the current implementation is duplicating nearly identical effort between the compiler and library which is inconvenient. https://reviews.llvm.org/D157871	2023-09-01 19:45:03 -04:00
Nicholas Guy	d65feccb12	[ARM] Set preferred function alignment Aligning functions yields small performance gains on embedded cores, moreso with numerous small function calls. Similar to aligning loops, if the function can fit within a single cache line then the performance overhead of fetching more instructions can be limited. Differential Revision: https://reviews.llvm.org/D157514	2023-08-16 17:31:21 +01:00
Jay Foad	fdbc944385	Fix typos in comments	2023-08-15 13:57:21 +01:00
Bjorn Pettersson	e53b28c833	[llvm] Drop some bitcasts and references related to typed pointers Differential Revision: https://reviews.llvm.org/D157551	2023-08-10 15:07:07 +02:00
Jay Foad	2dcf051259	[CodeGen] Store call frame size in MachineBasicBlock Record the call frame size on entry to each basic block. This is usually zero except when a basic block has been split in the middle of a call sequence. This simplifies PEI::replaceFrameIndices which previously had to visit basic blocks in a specific order and had special handling for unreachable blocks. More importantly it paves the way for an equally simple implementation of a backwards version of replaceFrameIndices, which is required to fully convert PrologEpilogInserter to backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156113	2023-07-27 10:32:00 +01:00
John Brawn	cee7e7b245	[ARM] Correctly handle execute-only in EmitStructByval Currently when compiling for an execute-only target without movt then EmitStructByval will generate a constant pool load which isn't compatible with execute-only. Handle this by emitting tMOVi32imm, and also simplify the existing movt handling by emitting t2MOVi32imm or MOVi32imm. Differential Revision: https://reviews.llvm.org/D154944	2023-07-19 13:56:36 +01:00
Oliver Stannard	aea8db8eb9	Revert "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI." This reverts commit 58d1eaa3b6ce4f7285c51f83faff7a3ac374c746.	2023-07-13 14:25:39 +01:00
Caslyn Tonelli	6d9065a716	Revert "[ARM] Correctly handle execute-only in EmitStructByval" This reverts commit 210f61cbddeddac47b347db072d674ee142520f6. Differential Revision: https://reviews.llvm.org/D155138	2023-07-12 23:29:54 +00:00
Jay Foad	58d1eaa3b6	[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI. Record the SP adjustment on entry to each basic block. This is almost always zero except on targets like ARM which can split a basic block in the middle of a call sequence. This simplifies PEI::replaceFrameIndices which previously had to visit basic blocks in a specific order and had special handling for unreachable blocks. More importantly it paves the way for an equally simple implementation of a backwards version of replaceFrameIndices, which is required to fully convert PrologEpilogInserter to backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D154281	2023-07-12 14:29:26 +01:00
John Brawn	210f61cbdd	[ARM] Correctly handle execute-only in EmitStructByval Currently when compiling for an execute-only target without movt then EmitStructByval will generate a constant pool load which isn't compatible with execute-only. Handle this by emitting tMOVi32imm, and also simplify the existing movt handling by emitting t2MOVi32imm or MOVi32imm. Differential Revision: https://reviews.llvm.org/D154944	2023-07-12 11:48:01 +01:00
Ties Stuij	f0ae3c23b5	[ARM] in LowerConstantFP, make sure we cover armv6-m execute-only Currently in LowerConstantFP, when we compile for execute-only (XO) we don't check what architecture we're compiling for (v6m=< or >v6m). We shouldn't get here for v6m, so put in an assert. Reviewed By: simonwallis2, dmgreen Differential Revision: https://reviews.llvm.org/D154506	2023-07-11 10:42:15 +01:00
John Brawn	4fb0e0114f	[ARM] Generate out-of-line jump tables for XO without 32-bit branch When we only have a 16-bit pc-relative branch instruction we generate a table of address for a jump table. Currently this is placed inline, but this won't work with execute-only memory. In this case generate the jump table out-of-line. Differential Revision: https://reviews.llvm.org/D153774	2023-06-28 13:30:39 +01:00
Ties Stuij	4f19c6a7c7	[ARM] allow long-call codegen for armv6-M eXecute Only (XO) Recently eXecute Only (XO) codegen was also allowed for armv6-M. Previously this was only implemented for ~armv7+, effectively if MOVW/MOVT is available. Regarding long calls, we remove the check for MOVW/MOVT when generating code for XO, which already was redundant as in the subtarget initialization we already check if XO is valid for the target. And targets that generate valid XO code should be able to handle the (wrapper globaladdress) node. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D153782	2023-06-28 10:50:24 +01:00
Maurice Heumann	249bd9eab0	[ARM] Fix codegen of unaligned volatile load/store of i64 Volatile loads/stores of i64 are lowered to LDRD/STRD on ARMv5TE. However, these instructions require the addresses to be aligned. Unaligned loads/stores therefore should be ignored by this handling. Differential Revision: https://reviews.llvm.org/D152790	2023-06-26 10:45:41 -07:00
Ties Stuij	2273741ea2	[ARM] generate armv6m eXecute Only (XO) code [ARM] generate armv6m eXecute Only (XO) code for immediates, globals Previously eXecute Only (XO) support was implemented for targets that support MOVW/MOVT (~armv7+). See: https://reviews.llvm.org/D27449 XO prevents the compiler from generating data accesses to code sections. This patch implements XO codegen for armv6-M, which does not support MOVW/MOVT, and must resort to the following general pattern to avoid loads: movs r3, :upper8_15:foo lsls r3, #8 adds r3, :upper0_7:foo lsls r3, #8 adds r3, :lower8_15:foo lsls r3, #8 adds r3, :lower0_7:foo ldr r3, [r3] This is equivalent to the code pattern generated by GCC. The above relocations are new to LLVM and have been implemented in a parent patch: https://reviews.llvm.org/D149443. This patch limits itself to implementing codegen for this pattern and enabling XO for armv6-M in the backend. Separate patches will follow for: - switch tables - replacing specific loads from constant islands which are spread out over the ARM backend codebase. Amongst others: FastISel, call lowering, stack frames. Reviewed By: john.brawn Differential Revision: https://reviews.llvm.org/D152795	2023-06-23 10:50:47 +01:00
Igor Kirillov	40a81d3100	[CodeGen] Refactor IR generation functions to use IRBuilder in ComplexDeinterleaving pass This patch updates several functions in LLVM's IR generation code to accept an IRBuilder object as an argument, rather than an Instruction that indicates the insertion point for new instructions. This change is necessary to handle sophisticated -Ofast optimization cases from D148558 where it's unclear which instructions should be used as the insertion point for new operations. Differential Revision: https://reviews.llvm.org/D148703	2023-05-30 16:18:28 +00:00
Sergei Barannikov	01a7967447	[CodeGen] Replace CCState's getNextStackOffset with getStackSize (NFC) The term "next stack offset" is misleading because the next argument is not necessarily allocated at this offset due to alignment constrains. It also does not make much sense when allocating arguments at negative offsets (introduced in a follow-up patch), because the returned offset would be past the end of the next argument. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D149566	2023-05-17 21:51:45 +03:00
Jay Foad	d8229e2f14	[KnownBits] Define and use intersectWith and unionWith Define intersectWith and unionWith as two complementary ways of combining KnownBits. The names are chosen for consistency with ConstantRange. Deprecate commonBits as a synonym for intersectWith. Differential Revision: https://reviews.llvm.org/D150443	2023-05-16 09:23:51 +01:00
Zequan Wu	3977b77a6b	[CodeGen] Fix nomerge attribute not working in tail calls. In D79537, `nomerge` was made to only apply to non-tail calls. This fixes it by also applying it to tail calls. For ARM, I only made the new MI to inherit the flag under `TCRETURNdi` and `TCRETURNri`, because that's the place tail calls got replaced. Not sure if there's any other place needed. Fixes #61545. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D146749	2023-05-10 14:25:11 -04:00
NAKAMURA Takumi	c1221251fb	Restore CodeGen/MachineValueType.h from `Support` This is rework of; - rG13e77db2df94 (r328395; MVT) Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h` can be restored as well. Depends on D148767 Differential Revision: https://reviews.llvm.org/D149024	2023-05-03 00:13:20 +09:00
Sergei Barannikov	e744e51b12	[SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC) This will make them consistent with other overflow-aware nodes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D148196	2023-04-29 21:59:58 +03:00
David Green	d321f3aa64	[ARM] Enable shouldFoldSelectWithIdentityConstant for MVE We already have tablegen patterns for a lot of these, but performing the combine earlier in DAG can help in a few extra cases. Differential Revision: https://reviews.llvm.org/D149269	2023-04-28 14:57:51 +01:00
Daniel Kiss	d75e70d7ae	[AArch64] Add preserve_all calling convention. Clang accepts preserve_all for AArch64 while it is missing form the backed. Fixes #58145 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D135652	2023-04-28 14:55:38 +02:00
David Green	15d2821263	[ARM] Fix qsat for armv5te/armv6 + thumb-mode This is a Thumb1 target, so will not have qsat instructions available. There was a mismatch between hasBaseDSP and the instruction patterns when +dsp was present, which is set by clang (but maybe shouldn't be). The target being thumb1-only should override that, implying that it does not have any qadds. Fixes #62273	2023-04-23 17:20:28 +01:00
Archibald Elliott	9ee4fe63bc	[ARM] Fix Crashes in fp16/bf16 Inline Asm We were still seeing occasional crashes with inline assembly blocks using fp16/bf16 after my previous patches: - https://reviews.llvm.org/rGff4027d152d0 - https://reviews.llvm.org/rG7d15212b8c0c - https://reviews.llvm.org/rG20b2d11896d9 It turns out: - The original two commits were wrong, and we should have always been choosing the SPR register class, not the HPR register class, so that LLVM's SelectionDAGBuilder correctly did the right splits/joins. - The `splitValueIntoRegisterParts`/`joinRegisterPartsIntoValue` changes from rG20b2d11896d9 are still correct, even though they sometimes result in inefficient codegen of casts between fp16/bf16 and i32/f32 (which is visible in these tests). This patch fixes crashes in `getCopyToParts` and when trying to select `(bf16 (bitconvert (fp16 ...)))` dags when Neon is enabled. This patch also adds support for passing fp16/bf16 values using the 'x' constraint that is LLVM-specific. This should broadly match how we pass with 't' and 'w', but with a different set of valid S registers. Differential Revision: https://reviews.llvm.org/D147715	2023-04-13 15:34:04 +01:00
David Green	b4df2b2c6c	[ARM] Combine fadd into fcmla This is the MVE equivalent of https://reviews.llvm.org/D146407. It adds a target combine for fadd(a, vcmla(b, c, d)) -> vcmla(fadd(a, b), c, d), pushing the fadd into the operands of the fcmla, which can help simplify away some additions. Differential Revision: https://reviews.llvm.org/D147200	2023-04-05 10:31:19 +01:00
Craig Topper	219ff07f72	[Targets] Rename Flag->Glue. NFC Long long ago Glue was called Flag, and it was never completely renamed.	2023-04-02 19:28:51 -07:00
Simon Pilgrim	8153b92d9b	[DAG] Add SelectionDAG::SplitScalar helper Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source. Differential Revision: https://reviews.llvm.org/D147264	2023-03-31 18:35:40 +01:00
Kazu Hirata	847b7f358b	[ARM] Use isNullConstant and isOneConstant (NFC)	2023-03-29 21:50:34 -07:00
Caleb Zulawski	71dc3de533	[ARM] Improve min/max vector reductions on Arm This patch adds some more efficient lowering for vecreduce.min/max under NEON, using sequences of pairwise vpmin/vpmax to reduce to a single value. This nearly resolves issues such as #50466, #40981, #38190. Differential Revision: https://reviews.llvm.org/D146404	2023-03-22 16:00:19 +00:00
Archibald Elliott	b189218d44	[ARM] Fix Chain/Glue Bug in PerformVMOVhrCombine In this optimisation, the Chain and Glue from the original CopyFromReg was being lost by this optimisation, which resulted in miscompiles. This fix just ensures that the input chains are correctly updated, and that any any users are also updated with the new chain from the new CopyFromReg. Fixes #60510. Differential Revision: https://reviews.llvm.org/D143713	2023-03-06 11:55:54 +00:00
Archibald Elliott	20b2d11896	[ARM] Fix Crash in 't'/'w' handling without fp16/bf16 After https://reviews.llvm.org/rGff4027d152d0 and https://reviews.llvm.org/rG7d15212b8c0c we saw crashes in SelectionDAG when trying to use these constraints when you don't have the fp16 or bf16 extensions. However, it is still possible to move 16-bit floating point values into the right place in S registers with a normal `vmov`, even if we don't have fp16 instructions we can use within the inline assembly string. This patch therefore fixes the crash. I think the reason we weren't getting this crash before is because I think the __fp16 and __bf16 types got an error diagnostic in the Clang frontend when you didn't have the right architectural extensions to use them. This restriction was recently relaxed. The approach for bf16 needs a bit more explanation. Exactly how BF16 is legalized was changed in rGb769eb02b526e3966847351e15d283514c2ec767 - effectively, whether you have the right instructions to get a bf16 value into/out of a S register with MoveTo/FromHPR depends on hasFullFP16, but whether you use a HPR for a value of type MVT::bf16 depends on hasBF16. This is why the tests are not changed by `+bf16` vs `-bf16`, but I've left both sets of RUN lines in case this changes in the future. Test Changes: - Added more testing for testing inline asm (the core part) - fp16-promote.ll and pr47454.ll show improvements where unnecessary fp16-fp32 up/down-casts are no longer emitted. This results in fewer libcalls where those casts would be done with a libcall. - aes-erratum-fix.ll is fairly noisy, and I need to revisit this test so that the IR is more minimal than it is right now, because most of the changes in this commit do not relate to what AES is actually trying to verify. Differential Revision: https://reviews.llvm.org/D143711	2023-03-06 11:55:08 +00:00
Kazu Hirata	f8f3db2756	Use APInt::count{l,r}_{zero,one} (NFC)	2023-02-19 22:04:47 -08:00
Kazu Hirata	cbde2124f1	Use APInt::popcount instead of APInt::countPopulation (NFC) This is for consistency with the C++20-style bit manipulation functions in <bit>.	2023-02-19 11:29:12 -08:00
Kazu Hirata	7e6e636fb6	Use llvm::has_single_bit<uint32_t> (NFC) This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t> where the argument is wider than uint32_t.	2023-02-15 22:17:27 -08:00
Jake Egan	08533f8b86	Revert "[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation" These commits are causing a test-suite build failure on AIX. Revert for now for time to investigate. https://lab.llvm.org/buildbot/#/builders/214/builds/5779/steps/9/logs/stdio This reverts commit bd87a2449da0c82e63cebdf9c131c54a5472e3a7 and 4c72266830ffa332ebb7cf1d3bbd6c56d001fa0f.	2023-02-14 15:20:06 -05:00
Alex Richardson	bd87a2449d	[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation This function was added for ARM targets, but aligning global/stack pointer arguments passed to memcpy/memmove/memset can improve code size and performance for all targets that don't have fast unaligned accesses. This adds a generic implementation that adjusts the alignment to pointer size if unaligned accesses are slow. Review D134168 suggests that this significantly improves performance on synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134282	2023-02-09 10:11:40 +00:00
Simon Pilgrim	2c580884c1	[ARM] Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFC. Use APInt::setBit() method instead of OR'ing individual bits.	2023-02-08 15:27:05 +00:00
Archibald Elliott	62c7f035b4	[NFC][TargetParser] Remove llvm/ADT/Triple.h I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.	2023-02-07 12:39:46 +00:00
David Green	734d113a6c	[ARM] Remove reduce(shuffle) if all the lanes are used This looks for vaddv(shuffle) or vmlav(shuffle, shuffle), with a shuffle where all the lanes are used once. Due to the reduction being commutative the shuffle can be removed. Differential Revision: https://reviews.llvm.org/D143382	2023-02-07 10:44:35 +00:00
David Green	c56846a892	[ARM] Remove FlattenVectorShuffle and add PerformVQDMULHCombine. This removes the FlattenVectorShuffle that folds shuffles through certain binops. This is now handled by generic DAG combines for all but ARMISD::VQDMULH where a PerformVQDMULHCombine is added to compensate. It pushes identical shuffles down through the operation, in a similar way to the other combines in DAG.	2023-02-05 20:59:49 +00:00
Simon Tatham	60ea6f35a2	[ARM] Allow selecting hard-float ABI in integer-only MVE. Armv8.1-M can be configured to support the integer subset of the MVE vector instructions, and no floating point. In that situation, the FP and vector registers still exist, and so do the load, store and move instructions that transfer data in and out of them. So there's no reason the hard floating point ABI can't be supported, and you might reasonably want to use it, for the sake of intrinsics-based code passing explicit MVE vector types between functions. But the selection of the hard float ABI in the backend was gated on Subtarget->hasVFP2Base(), which is false in the case of integer MVE and no FP. As a result, you'd silently get the soft float ABI even if you deliberately tried to select it, e.g. with clang options such as --target=arm-none-eabi -mfloat-abi=hard -march=armv8.1m.main+nofp+mve The hard float ABI should have been gated on the weaker condition Subtarget->hasFPRegs(), because the only requirement for being able to pass arguments in the FP registers is that the registers themselves should exist. I haven't added a new test, because changing the existing CodeGen/Thumb2/float-ops.ll test seemed sufficient. But I've added a comment explaining why the results are expected to be what they are. Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D142703	2023-02-01 09:05:12 +00:00
Kazu Hirata	e078201835	[Target] Use llvm::count{l,r}_{zero,one} (NFC)	2023-01-28 09:23:07 -08:00
Guillaume Chatelet	355cc3fd8c	[NFC] Deprecate SelectionDag functions taking Alignment as unsigned	2023-01-24 10:40:12 +00:00
Jay Foad	768aed1378	[MC] Make more use of MCInstrDesc::operands. NFC. Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end. Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer. Differential Revision: https://reviews.llvm.org/D142213	2023-01-23 11:31:41 +00:00
Kazu Hirata	188ec33726	[llvm] Use llvm::bit_width (NFC)	2023-01-21 14:48:32 -08:00

1 2 3 4 5 ...

2214 Commits