llvm-project

Author	SHA1	Message	Date
Igor Kirillov	40a81d3100	[CodeGen] Refactor IR generation functions to use IRBuilder in ComplexDeinterleaving pass This patch updates several functions in LLVM's IR generation code to accept an IRBuilder object as an argument, rather than an Instruction that indicates the insertion point for new instructions. This change is necessary to handle sophisticated -Ofast optimization cases from D148558 where it's unclear which instructions should be used as the insertion point for new operations. Differential Revision: https://reviews.llvm.org/D148703	2023-05-30 16:18:28 +00:00
Sergei Barannikov	01a7967447	[CodeGen] Replace CCState's getNextStackOffset with getStackSize (NFC) The term "next stack offset" is misleading because the next argument is not necessarily allocated at this offset due to alignment constrains. It also does not make much sense when allocating arguments at negative offsets (introduced in a follow-up patch), because the returned offset would be past the end of the next argument. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D149566	2023-05-17 21:51:45 +03:00
Jay Foad	d8229e2f14	[KnownBits] Define and use intersectWith and unionWith Define intersectWith and unionWith as two complementary ways of combining KnownBits. The names are chosen for consistency with ConstantRange. Deprecate commonBits as a synonym for intersectWith. Differential Revision: https://reviews.llvm.org/D150443	2023-05-16 09:23:51 +01:00
Zequan Wu	3977b77a6b	[CodeGen] Fix nomerge attribute not working in tail calls. In D79537, `nomerge` was made to only apply to non-tail calls. This fixes it by also applying it to tail calls. For ARM, I only made the new MI to inherit the flag under `TCRETURNdi` and `TCRETURNri`, because that's the place tail calls got replaced. Not sure if there's any other place needed. Fixes #61545. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D146749	2023-05-10 14:25:11 -04:00
NAKAMURA Takumi	c1221251fb	Restore CodeGen/MachineValueType.h from `Support` This is rework of; - rG13e77db2df94 (r328395; MVT) Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h` can be restored as well. Depends on D148767 Differential Revision: https://reviews.llvm.org/D149024	2023-05-03 00:13:20 +09:00
Sergei Barannikov	e744e51b12	[SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC) This will make them consistent with other overflow-aware nodes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D148196	2023-04-29 21:59:58 +03:00
David Green	d321f3aa64	[ARM] Enable shouldFoldSelectWithIdentityConstant for MVE We already have tablegen patterns for a lot of these, but performing the combine earlier in DAG can help in a few extra cases. Differential Revision: https://reviews.llvm.org/D149269	2023-04-28 14:57:51 +01:00
Daniel Kiss	d75e70d7ae	[AArch64] Add preserve_all calling convention. Clang accepts preserve_all for AArch64 while it is missing form the backed. Fixes #58145 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D135652	2023-04-28 14:55:38 +02:00
David Green	15d2821263	[ARM] Fix qsat for armv5te/armv6 + thumb-mode This is a Thumb1 target, so will not have qsat instructions available. There was a mismatch between hasBaseDSP and the instruction patterns when +dsp was present, which is set by clang (but maybe shouldn't be). The target being thumb1-only should override that, implying that it does not have any qadds. Fixes #62273	2023-04-23 17:20:28 +01:00
Archibald Elliott	9ee4fe63bc	[ARM] Fix Crashes in fp16/bf16 Inline Asm We were still seeing occasional crashes with inline assembly blocks using fp16/bf16 after my previous patches: - https://reviews.llvm.org/rGff4027d152d0 - https://reviews.llvm.org/rG7d15212b8c0c - https://reviews.llvm.org/rG20b2d11896d9 It turns out: - The original two commits were wrong, and we should have always been choosing the SPR register class, not the HPR register class, so that LLVM's SelectionDAGBuilder correctly did the right splits/joins. - The `splitValueIntoRegisterParts`/`joinRegisterPartsIntoValue` changes from rG20b2d11896d9 are still correct, even though they sometimes result in inefficient codegen of casts between fp16/bf16 and i32/f32 (which is visible in these tests). This patch fixes crashes in `getCopyToParts` and when trying to select `(bf16 (bitconvert (fp16 ...)))` dags when Neon is enabled. This patch also adds support for passing fp16/bf16 values using the 'x' constraint that is LLVM-specific. This should broadly match how we pass with 't' and 'w', but with a different set of valid S registers. Differential Revision: https://reviews.llvm.org/D147715	2023-04-13 15:34:04 +01:00
David Green	b4df2b2c6c	[ARM] Combine fadd into fcmla This is the MVE equivalent of https://reviews.llvm.org/D146407. It adds a target combine for fadd(a, vcmla(b, c, d)) -> vcmla(fadd(a, b), c, d), pushing the fadd into the operands of the fcmla, which can help simplify away some additions. Differential Revision: https://reviews.llvm.org/D147200	2023-04-05 10:31:19 +01:00
Craig Topper	219ff07f72	[Targets] Rename Flag->Glue. NFC Long long ago Glue was called Flag, and it was never completely renamed.	2023-04-02 19:28:51 -07:00
Simon Pilgrim	8153b92d9b	[DAG] Add SelectionDAG::SplitScalar helper Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source. Differential Revision: https://reviews.llvm.org/D147264	2023-03-31 18:35:40 +01:00
Kazu Hirata	847b7f358b	[ARM] Use isNullConstant and isOneConstant (NFC)	2023-03-29 21:50:34 -07:00
Caleb Zulawski	71dc3de533	[ARM] Improve min/max vector reductions on Arm This patch adds some more efficient lowering for vecreduce.min/max under NEON, using sequences of pairwise vpmin/vpmax to reduce to a single value. This nearly resolves issues such as #50466, #40981, #38190. Differential Revision: https://reviews.llvm.org/D146404	2023-03-22 16:00:19 +00:00
Archibald Elliott	b189218d44	[ARM] Fix Chain/Glue Bug in PerformVMOVhrCombine In this optimisation, the Chain and Glue from the original CopyFromReg was being lost by this optimisation, which resulted in miscompiles. This fix just ensures that the input chains are correctly updated, and that any any users are also updated with the new chain from the new CopyFromReg. Fixes #60510. Differential Revision: https://reviews.llvm.org/D143713	2023-03-06 11:55:54 +00:00
Archibald Elliott	20b2d11896	[ARM] Fix Crash in 't'/'w' handling without fp16/bf16 After https://reviews.llvm.org/rGff4027d152d0 and https://reviews.llvm.org/rG7d15212b8c0c we saw crashes in SelectionDAG when trying to use these constraints when you don't have the fp16 or bf16 extensions. However, it is still possible to move 16-bit floating point values into the right place in S registers with a normal `vmov`, even if we don't have fp16 instructions we can use within the inline assembly string. This patch therefore fixes the crash. I think the reason we weren't getting this crash before is because I think the __fp16 and __bf16 types got an error diagnostic in the Clang frontend when you didn't have the right architectural extensions to use them. This restriction was recently relaxed. The approach for bf16 needs a bit more explanation. Exactly how BF16 is legalized was changed in rGb769eb02b526e3966847351e15d283514c2ec767 - effectively, whether you have the right instructions to get a bf16 value into/out of a S register with MoveTo/FromHPR depends on hasFullFP16, but whether you use a HPR for a value of type MVT::bf16 depends on hasBF16. This is why the tests are not changed by `+bf16` vs `-bf16`, but I've left both sets of RUN lines in case this changes in the future. Test Changes: - Added more testing for testing inline asm (the core part) - fp16-promote.ll and pr47454.ll show improvements where unnecessary fp16-fp32 up/down-casts are no longer emitted. This results in fewer libcalls where those casts would be done with a libcall. - aes-erratum-fix.ll is fairly noisy, and I need to revisit this test so that the IR is more minimal than it is right now, because most of the changes in this commit do not relate to what AES is actually trying to verify. Differential Revision: https://reviews.llvm.org/D143711	2023-03-06 11:55:08 +00:00
Kazu Hirata	f8f3db2756	Use APInt::count{l,r}_{zero,one} (NFC)	2023-02-19 22:04:47 -08:00
Kazu Hirata	cbde2124f1	Use APInt::popcount instead of APInt::countPopulation (NFC) This is for consistency with the C++20-style bit manipulation functions in <bit>.	2023-02-19 11:29:12 -08:00
Kazu Hirata	7e6e636fb6	Use llvm::has_single_bit<uint32_t> (NFC) This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t> where the argument is wider than uint32_t.	2023-02-15 22:17:27 -08:00
Jake Egan	08533f8b86	Revert "[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation" These commits are causing a test-suite build failure on AIX. Revert for now for time to investigate. https://lab.llvm.org/buildbot/#/builders/214/builds/5779/steps/9/logs/stdio This reverts commit bd87a2449da0c82e63cebdf9c131c54a5472e3a7 and 4c72266830ffa332ebb7cf1d3bbd6c56d001fa0f.	2023-02-14 15:20:06 -05:00
Alex Richardson	bd87a2449d	[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation This function was added for ARM targets, but aligning global/stack pointer arguments passed to memcpy/memmove/memset can improve code size and performance for all targets that don't have fast unaligned accesses. This adds a generic implementation that adjusts the alignment to pointer size if unaligned accesses are slow. Review D134168 suggests that this significantly improves performance on synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134282	2023-02-09 10:11:40 +00:00
Simon Pilgrim	2c580884c1	[ARM] Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFC. Use APInt::setBit() method instead of OR'ing individual bits.	2023-02-08 15:27:05 +00:00
Archibald Elliott	62c7f035b4	[NFC][TargetParser] Remove llvm/ADT/Triple.h I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.	2023-02-07 12:39:46 +00:00
David Green	734d113a6c	[ARM] Remove reduce(shuffle) if all the lanes are used This looks for vaddv(shuffle) or vmlav(shuffle, shuffle), with a shuffle where all the lanes are used once. Due to the reduction being commutative the shuffle can be removed. Differential Revision: https://reviews.llvm.org/D143382	2023-02-07 10:44:35 +00:00
David Green	c56846a892	[ARM] Remove FlattenVectorShuffle and add PerformVQDMULHCombine. This removes the FlattenVectorShuffle that folds shuffles through certain binops. This is now handled by generic DAG combines for all but ARMISD::VQDMULH where a PerformVQDMULHCombine is added to compensate. It pushes identical shuffles down through the operation, in a similar way to the other combines in DAG.	2023-02-05 20:59:49 +00:00
Simon Tatham	60ea6f35a2	[ARM] Allow selecting hard-float ABI in integer-only MVE. Armv8.1-M can be configured to support the integer subset of the MVE vector instructions, and no floating point. In that situation, the FP and vector registers still exist, and so do the load, store and move instructions that transfer data in and out of them. So there's no reason the hard floating point ABI can't be supported, and you might reasonably want to use it, for the sake of intrinsics-based code passing explicit MVE vector types between functions. But the selection of the hard float ABI in the backend was gated on Subtarget->hasVFP2Base(), which is false in the case of integer MVE and no FP. As a result, you'd silently get the soft float ABI even if you deliberately tried to select it, e.g. with clang options such as --target=arm-none-eabi -mfloat-abi=hard -march=armv8.1m.main+nofp+mve The hard float ABI should have been gated on the weaker condition Subtarget->hasFPRegs(), because the only requirement for being able to pass arguments in the FP registers is that the registers themselves should exist. I haven't added a new test, because changing the existing CodeGen/Thumb2/float-ops.ll test seemed sufficient. But I've added a comment explaining why the results are expected to be what they are. Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D142703	2023-02-01 09:05:12 +00:00
Kazu Hirata	e078201835	[Target] Use llvm::count{l,r}_{zero,one} (NFC)	2023-01-28 09:23:07 -08:00
Guillaume Chatelet	355cc3fd8c	[NFC] Deprecate SelectionDag functions taking Alignment as unsigned	2023-01-24 10:40:12 +00:00
Jay Foad	768aed1378	[MC] Make more use of MCInstrDesc::operands. NFC. Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end. Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer. Differential Revision: https://reviews.llvm.org/D142213	2023-01-23 11:31:41 +00:00
Kazu Hirata	188ec33726	[llvm] Use llvm::bit_width (NFC)	2023-01-21 14:48:32 -08:00
David Green	e49367e7f3	[ARM] Fix i1 shuffle lowering with multiple operands. The existing lowering of i1 vector shuffle was only considering single-source shuffles, always assuming the second was undef. This extends that to properly handle both operands.	2023-01-17 11:29:51 +00:00
Fangrui Song	6052eac2a8	[ARM] Properly fix -Wsign-compare after D141791	2023-01-16 23:57:44 -08:00
Simon Pilgrim	cf47a8d383	Silence signed/unsigned comparison warnings. NFC.	2023-01-16 18:52:04 +00:00
Simon Pilgrim	f4f8f9f185	[Thumb2][MVE] Recognise shuffle truncation patterns suitable for ARMISD::MVETRUNC I'm helping with the remaining regressions on D127115, and one of my candidate fixes caused some regressions with MVE interleaved shuffles due to poor handling of 'truncation' style shuffle masks (0,2,4,6,...). This patch attempts to use the ARMISD::MVETRUNC node to handle these cases, based off existing code in LowerTruncate. It handles both (0,2,4,6,...) and (1,3,5,7,....) 'top' style patterns (assuming no endian problems). I shift down the 'top' patterns - a basic search of ARM docs suggests MVE has some top/bottom truncation/narrowing instructions but I don't seem to be able to get them to be used. Differential Revision: https://reviews.llvm.org/D141791	2023-01-16 17:59:45 +00:00
Roman Lebedev	cc39c3b17f	[Codegen][LegalizeIntegerTypes] New legalization strategy for scalar shifts: shift through stack https://reviews.llvm.org/D140493 is going to teach SROA how to promote allocas that have variably-indexed loads. That does bring up questions of cost model, since that requires creating wide shifts. Indeed, our legalization for them is not optimal. We either split it into parts, or lower it into a libcall. But if the shift amount is by a multiple of CHAR_BIT, we can also legalize it throught stack. The basic idea is very simple: 1. Get a stack slot 2x the width of the shift type 2. store the value we are shifting into one half of the slot 3. pad the other half of the slot. for logical shifts, with zero, for arithmetic shift with signbit 4. index into the slot (starting from the base half into which we spilled, either upwards or downwards) 5. load 6. split loaded integer This works for both little-endian and big-endian machines: https://alive2.llvm.org/ce/z/YNVwd5 And better yet, if the original shift amount was not a multiple of CHAR_BIT, we can just shift by that remainder afterwards: https://alive2.llvm.org/ce/z/pz5G-K I think, if we are going perform shift->shift-by-parts expansion more than once, we should instead go through stack, which is what this patch does. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D140638	2023-01-14 19:12:18 +03:00
Craig Topper	79858d1908	[CodeGen][Target] Remove uses of Register::isPhysicalRegister/isVirtualRegister. NFC Use isPhysical/isVirtual methods.	2023-01-13 23:12:48 -08:00
Guillaume Chatelet	8fd5558b29	[NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize() This change is one of a series to implement the discussion from https://reviews.llvm.org/D141134.	2023-01-11 16:49:38 +00:00
serge-sans-paille	38818b60c5	Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t), (uint8_t)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955	2023-01-05 14:11:08 +01:00
Qiu Chaofan	a40ef656d8	[Intrinsic] Rename flt.rounds intrinsic to get.rounding Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding intrinsic to replace flt.rounds. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D139507	2022-12-19 15:22:39 +08:00
Nicholas Guy	a3dc5b534a	[ARM][CodeGen] Add integer support for complex deinterleaving Differential Revision: https://reviews.llvm.org/D139628	2022-12-12 11:38:19 +00:00
Peter Rong	ee31a4a702	[ARM] IselLowering unsigned overflow to crash using APInt in PerformSHLSimplify This diff fixes issue https://github.com/llvm/llvm-project/issues/59317 We should check if bitwidth is lower than the shift amount before we subtract them to avoid unsigned overflow. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D139238	2022-12-06 09:58:27 -08:00
Krzysztof Parzyszek	864aaa21b4	TargetLowering: convert Optional to std::optional	2022-12-01 16:19:10 -08:00
Stanislav Mekhanoshin	bcaf31ec3f	[AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed. A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC. Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before. Differential Revision: https://reviews.llvm.org/D124217	2022-11-17 09:23:53 -08:00
Nicholas Guy	d52e2839f3	[ARM][CodeGen] Add support for complex deinterleaving Adds the Complex Deinterleaving Pass implementing support for complex numbers in a target-independent manner, deferring to the TargetLowering for the given target to create a target-specific intrinsic. Differential Revision: https://reviews.llvm.org/D114174	2022-11-14 14:02:27 +00:00
David Green	f970b007e5	[ARM] Fix vector ule zero lowering The instruction icmp ule <4 x i32> %0, zeroinitializer will usually be simplified to icmp eq <4 x i32> %0, zeroinitializer. It is not guaranteed though, and the code for lowering vector compares could pick the wrong form of the instruction if this happened. I've tried to make the code more explicit about the supported conditions. This fixes NEON being unable to select VCMPZ with HS conditions, and fixes some incorrect MVE patterns. Fixes #58514. Differential Revision: https://reviews.llvm.org/D136447	2022-11-02 22:34:05 +00:00
Zhiyao Ma	7e8af2fc0c	[ARM] Support -mexecute-only with -mlong-calls. Instead of using constant pools, use movw movt pair. Differential Revision: https://reviews.llvm.org/D136203	2022-10-24 11:41:24 -07:00
Archibald Elliott	7d15212b8c	[ARM] Support fp16/bf16 using w constraint fp16 and bf16 values can be used in GCC's inline assembly using the "w" constraint, which means "VFP floating-point registers d0-d31" - fp16 and bf16 values are stored in S registers (which alias the D registers). This change ensures that LLVM is compatible with GCC for programs that use fp16 and the 'w' constraint. Differential Revision: https://reviews.llvm.org/D135662	2022-10-13 10:32:06 +01:00
David Green	f2fde99461	[ARM] More bf16 shuffle handling, including perfect shuffles.	2022-10-02 14:31:51 +01:00
Archibald Elliott	ff4027d152	[ARM] Support fp16/bf16 using t constraint fp16 and bf16 values can be used in GCC's inline assembly using the "t" constraint, which means "VFP floating-point registers s0-s31" - fp16 and bf16 values are stored in S registers too. This change ensures that LLVM is compatible with GCC for programs that use fp16 and the 't' constraint. Fixes #57753 Differential Revision: https://reviews.llvm.org/D134553	2022-09-28 14:48:21 +01:00

1 2 3 4 5 ...

2195 Commits