llvm-project

Author	SHA1	Message	Date
David Green	dc1b772933	[AArch64][GlobalISel] Add missing legalization for v16i8 extract element.	2024-02-19 07:26:57 +00:00
XinWang10	bb91b43719	[X86] Handle repeated blend mask in combineConcatVectorOps (#82155 ) https://github.com/llvm/llvm-project/commit/1d27669e8ad07f8f2 add support for fold 512-bit concat(blendi(x,y,c0),blendi(z,w,c1)) to AVX512BW mask select. But when the type of subvector is v16i16, we need to generate repeated mask to make the result correct. The subnode looks like t87: v16i16 = X86ISD::BLENDI t132, t58, TargetConstant:i8<-86>.	2024-02-19 09:24:21 +08:00
Craig Topper	d5167c84f9	[DAGCombiner] Allow tryToFoldExtOfLoad to use a sextload for zext nneg. (#81714 ) If the load is used by any signed setccs, we can use a sextload instead of zextload. Then we don't have to give up on extending the load.	2024-02-17 11:37:13 -08:00
David Green	3a77522387	[AArch64][GlobalISel] Improve and expand fcopysign lowering (#71283 ) This alters the lowering of G_COPYSIGN to support vector types. The general idea is that we just lower it to vector operations using and/or and a mask, which are now converted to a BIF/BIT/BSP. In the process the existing AArch64LegalizerInfo::legalizeFCopySign can be removed, replying on expanding the scalar versions to vector instead, which just needs a small adjustment to allow widening scalars to vectors.	2024-02-17 10:19:27 +00:00
David Green	47c65cf62d	[AArch64][GlobalISel] Fail legalization for unknown libcalls. (#81873 ) If, like powi on windows, the libcall is unavailable we should fall back to SDAG. Currently we try and generate a call to "".	2024-02-17 08:57:14 +00:00
Philip Reames	db836f6094	[RISCV] Narrow vector absolute value (#82041 ) If we have a abs(sext a) we can legally perform this as a zext (abs a). (See the same combine in instcombine - note that the IntMinIsPoison flag doesn't exist in SDAG yet.) On RVV, this is likely profitable because it may allow us to perform the arithmetic operations involved in the abs at a narrower LMUL before widening for the user. We could arguably avoid narrowing below DLEN, but the transform should at worst move around the extend and create one extra vsetvli toggle if the source could previously be handled via loads explicit w/EEW.	2024-02-16 19:26:06 -08:00
Prabhuk	ea9ec80b7a	Revert "[AArch64] Add soft-float ABI (#74460 )" (#82032 ) This reverts commit 9cc98e336980f00cbafcbed8841344e6ac472bdc. Issue: https://github.com/ClangBuiltLinux/linux/issues/1997	2024-02-16 16:43:50 -08:00
Sumanth Gundapaneni	0e6a48c3e8	[Hexagon] Optimize post-increment load and stores in loops. (#82011 ) This patch optimizes the post-increment instructions so that we can packetize them together. v1 = phi(v0, v3') v2,v3 = post_load v1, 4 v2',v3'= post_load v3, 4 This can be optimized in two ways v1 = phi(v0, v3') v2,v3' = post_load v1, 8 v2' = load v1, 4	2024-02-16 16:47:54 -06:00
Philip Reames	4265ad1215	[RISCV] Adjust sdloc when creating an extend for widening instructions (#82040 ) We were using the SDLoc corresponding to the original arithmetic instruction, but here using the SDLoc corresponding to the original extend if we need to introduce a new narrower extend seems cleaner. As can be seen in the test diffs, this very minorly impacts scheduling and register allocation by given the scheduler a hint from original program order.	2024-02-16 13:00:25 -08:00
Shilei Tian	46734aa1e5	[AMDGPU] Use `bf16` instead of `i16` for bfloat (#80908 ) Currently we generally use `i16` to represent `bf16` in those tablegen files. This patch is trying to use `bf16` directly. Fix #79369.	2024-02-16 15:58:30 -05:00
Florian Hahn	2f083b364f	[AArch64] Fix resource length computation for STP. (#81749 ) On some uArchs, `STP [s\|d], [s\|d]` first combines the 2 input registers in a single register using a vector execution unit. IIUC AArch64StorePairSuppress tries to prevent forming STPs in case the critical resource are the vector units, in order to prevent adding more pressure on those units. The implementation however simply computes the new critical resource length by adding resource for another STP. If load/store units are the critical resource, this means we increase that length by one, and incorrectly prevent forming the STP. This patch adjusts the resource computation by also removing 2 STRs, as introducing a STP will remove 2 single stores. This should more accurately reflect the resource usage after introducing an STP, and does not prevent forming STPs if load/store units are the critical resources; in those cases, STP can actually help to reduce resource usage. PR: https://github.com/llvm/llvm-project/pull/81749	2024-02-16 16:10:10 +00:00
Florian Hahn	35f3298ead	[AArch64] Add extra stp suppression tests. Add test case for store suppression that still trigger after https://github.com/llvm/llvm-project/pull/81749	2024-02-16 14:58:54 +00:00
Yingwei Zheng	a300a1a711	[RISCV][ISel] Add codegen support for the experimental zabha extension (#80192 ) This patch implements the codegen support of zabha (Byte and Halfword Atomic Memory Operations) v1.0-rc1 extension. See also https://github.com/riscv/riscv-zabha/blob/v1.0-rc1/zabha.adoc. --------- Co-authored-by: Craig Topper <craig.topper@sifive.com>	2024-02-16 15:35:09 +08:00
Craig Topper	feee627974	[RISCV] Make sure ADDI replacement in optimizeCondBranch has a virtual reg destination. (#81938 ) If it isn't virtual, we may extend the live range of the physical register past were it is valid. For example, across a call. Found while trying to enable -riscv-enable-sink-fold which enables some copy propagation in machine sink that led to ADDIs with physical register destinations.	2024-02-15 16:34:40 -08:00
Hiroshi Yamauchi	692566a8b2	Fix an assert failure with a funclet in a swifttailcc function. (#78806 ) The failure happens in the livedebugvalues pass.	2024-02-15 15:54:03 -08:00
Arthur Eubanks	5b51d45f49	[X86] Use ".lrodata" prefix for large mergeable constants (#81900 ) Otherwise with a small enough large-data-threshold, we can get .rodata.* sections marked large, making .rodata large in the final binary.	2024-02-15 12:50:26 -08:00
Krzysztof Drewniak	b497234146	[AMDGPU] Make maximum hard clause size a subtarget feature (#81287 ) gfx11 chips may, in some conditions, behave incorrectly with S_CLAUSE instructions (hard clauses) containing more than 32 operations (that is, whose arguments exceed 0x1f). However, gfx10 targets will work successfully with clauses of up to length 63. Therefore, define the MaxHardClauseLength property on GCNSubtarget and make it a subtarget feature via tablegen, thus allowing us to specify, both now and in the future, the maximum viable size of clauses on various hardware from the tablegen definition. If MaxHardClauseLength is 0, which is the default, the hardware does not support hard clauses.	2024-02-15 13:58:31 -06:00
Craig Topper	b57ba8ec51	[RISCV] Use APInt in useInversedSetcc to prevent crashes when mask is larger than UINT64_MAX. (#81888 ) There are no checks that the type is legal so we need to handle any type.	2024-02-15 10:48:52 -08:00
David Green	3564000490	[AArch64][GlobalISel] FNeg constant materialization (#80643 ) This is a Global ISel equivalent of #80641, creating fneg(movi) instead of the alternative constant pool load or gpr dup.	2024-02-15 16:22:12 +00:00
ostannard	9cc98e3369	[AArch64] Add soft-float ABI (#74460 ) This adds support for the AArch64 soft-float ABI. The specification for this ABI was added by https://github.com/ARM-software/abi-aa/pull/232. Because all existing AArch64 hardware has floating-point hardware, we expect this to be a niche option, only used for embedded systems on R-profile systems. We are going to document that SysV-like systems should only ever use the base (hard-float) PCS variant: https://github.com/ARM-software/abi-aa/pull/233. For that reason, I've not added an option to select the ABI independently of the FPU hardware, instead the new ABI is enabled iff the target architecture does not have an FPU. For testing, I have run this through an ABI fuzzer, but since this is the first implementation it can only test for internal consistency (callers and callees agree on the PCS), not for conformance to the ABI spec.	2024-02-15 12:39:16 +00:00
Simon Pilgrim	b279ca2783	[DAG] visitCTPOP - CTPOP(SHIFT(X)) -> CTPOP(X) iff the shift doesn't affect any non-zero bits If the source is being (logically) shifted, but doesn't affect any active bits, then we can call CTPOP on the shift source directly.	2024-02-15 10:41:08 +00:00
Jay Foad	2df652a691	[CodeGen] Simplify updateLiveIn in MachineSink (#79831 ) When a whole register is added a basic block's liveins, use LaneBitmask::getAll for the live lanes instead of trying to calculate an accurate mask of the lanes that comprise the register. This simplifies the code and matches other places where a whole register is marked as livein. This also avoids problems when regunits that are synthesized by TableGen to represent ad hoc aliasing have a lane mask of 0. Fixes #78942	2024-02-15 10:39:05 +00:00
Vyacheslav Levytskyy	9552a396ed	add support for the SPV_KHR_linkonce_odr extension (#81512 ) This PR adds support for the SPV_KHR_linkonce_odr extension and modifies existing negative test with a positive check for the extension and proper linkage type in case when the extension is enabled. SPV_KHR_linkonce_odr adds a "LinkOnceODR" linkage type, allowing proper translation of, for example, C++ templates classes merging during linking from different modules and supporting any other cases when a global variable/function must be merged with equivalent global variable(s)/function(s) from other modules during the linking process.	2024-02-15 11:30:17 +01:00
Vyacheslav Levytskyy	dfb9bf35c4	let a user select preferred/unpreferred capabilities in a list of enabling capabilities (#81476 ) By SPIR-V specification: "If an instruction, enumerant, or other feature specifies multiple enabling capabilities, only one such capability needs to be declared to use the feature." However, one capability may be preferred over another. One important case is Shader capability that may not be supported by a backend, but always is inserted if "OpDecorate SpecId" is found, because Enabling Capabilities for the latter is the list of Shader and Kernel, where Shader is coming first and thus always selected as the first available option. In this PR we address the problem by keeping current behaviour of selecting the first option among enabling capabilities as is, but giving a user a way to filter capabilities during the selection process via a newly introduced "--avoid-spirv-capabilities" command line option. This option is to avoid selection of certain capabilities if there are other available enabling capabilities. This PR is changing also existing pruneCapabilities() function. It doesn't remove capability from module requirement anymore, but only adds implicitly required capabilities recursively, so its name is changed accordingly. This change fixes the present bug in collecting required by a module capabilities. Before the change, introduced by this PR, pruneCapabilities() function has been removing, for example, Kernel capability from required by a module, because Kernel is initially required and the second time it was needed pruneCapabilities() removed it by mistake.	2024-02-15 11:28:58 +01:00
chuongg3	f6f8e202f5	[AArch64][GlobalISel] Refactor Combine G_CONCAT_VECTOR (#80866 ) The combine now works using tablegen and checks if new instruction is legal before creating it.	2024-02-15 10:09:20 +00:00
Luke Lau	cd55e230e6	[RISCV] Use $noreg in vsetvli-insert.mir test. NFC This reflects what actually comes out of SelectionDAG after the noreg passthru peephole added in a63bd7e99b00c.	2024-02-15 17:12:52 +08:00
Rohit Aggarwal	36adfec155	Adding support of AMDLIBM vector library (#78560 ) Hi, AMD has it's own implementation of vector calls. This patch include the changes to enable the use of AMD's math library using -fveclib=AMDLIBM. Please refer https://github.com/amd/aocl-libm-ose --------- Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>	2024-02-15 12:13:07 +05:30
Philip Reames	8d32654292	[RISCV] Add coverage for an upcoming set of vector narrowing changes	2024-02-14 13:34:08 -08:00
YunQiang Su	c007fbb198	MipsAsmParser/O32: Don't add redundant $ to $-prefixed symbol in the la macro (#80644 ) When parsing the `la` macro, we add a duplicate `$` prefix in `getOrCreateSymbol`, leading to `error: Undefined temporary symbol $$yy` for code like: ``` xx: la $2,$yy $yy: nop ``` Remove the duplicate prefix. In addition, recognize `.L`-prefixed symbols as local for O32. See: #65020. --------- Co-authored-by: Fangrui Song <i@maskray.me>	2024-02-14 12:48:55 -08:00
sgundapa	de16a05af0	[Hexagon] Fix zero extension of bit predicates with vtrunehb (#81772 ) vector extension from v4i1 to v4i8 generates an incorrect word. This patch uses a vtrunehb for truncation to fix the bug.	2024-02-14 13:10:18 -06:00
Daniel Hoekwater	ea06384bf6	[CodeGen][AArch64] Only split safe blocks in BBSections (#81553 ) Some types of machine function and machine basic block are unsafe to split on AArch64: basic blocks that contain jump table dispatch or targets (D157124), and blocks that contain inline ASM GOTO blocks or their targets (D158647) all cause issues and have been excluded from Machine Function Splitting on AArch64. These issues are caused by any transformation pass that places same-function basic blocks in different text sections (MachineFunctionSplitter and BasicBlockSections) and must be special-cased in both passes.	2024-02-14 10:58:07 -08:00
Philip Reames	275eeda32f	[RISCV] Split long build_vector sequences to reduce critical path (#81312 ) If we have a long chain of vslide1down instructions to build e.g. a <16 x i8> from scalar, we end up with a critical path going through the entire chain. We can instead build two halves, and then combine them with a vselect. This costs one additional temporary register, but reduces the critical path by roughly half. To avoid needing to change VL, we fill each half with undefs for the elements which will come from the other half. The vselect will at worst become a vmerge, but is often folded back into the final instruction of the sequence building the lower half. A couple notes on the heuristic here: * This is restricted to LMUL1 to avoid quadratic costing reasoning. * This only splits once. In future work, we can explore recursive splitting here, but I'm a bit worried about register pressure and thus decided to be conservative. It also happens to be "enough" at the default zvl of 128. * "8" is picked somewhat arbitrarily as being "long". In practice, our build_vector codegen for 2 defined elements in a VL=4 vector appears to need some work. 4 defined elements in a VL=8 vector seems to generally produce reasonable results. * Halves may not be an optimal split point. I went down the rabit hole of trying to find the optimal one, and decided it wasn't worth the effort to start with. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>	2024-02-14 10:15:24 -08:00
Pierre van Houtryve	43c7eb5d7b	[AMDGPU] Replace '.' with '-' in generic target names (#81718 ) The dot is too confusing for tools. Output temporaries would have '10.3-generic' so tools could parse it as an extension, device libs & the associated clang driver logic are also confused by the dot. After discussions, we decided it's better to just remove the '.' from the target name than fix each issue one by one.	2024-02-14 15:19:04 +01:00
David Green	6c84709eff	[AArch64] Materialize constants via fneg. (#80641 ) This is something that is already done as a special case for copysign, this patch extends it to be more generally applied. If we are trying to matrialize a negative constant (notably -0.0, 0x80000000), then there may be no movi encoding that creates the immediate, but a fneg(movi) might. Some of the existing patterns for RADDHN needed to be adjusted to keep them in line with the new immediates.	2024-02-14 13:55:51 +00:00
Philipp Tomsich	6cab375b4b	[AArch64] Add tests for fusion on Ampere1/1A/1B (#81725 ) As commented on the PR #81293, the Ampere1-family does not have test cases for the common fusion cases it implements. This adds the Ampere1 targets to the relevant misched-fusion testcases: * addadrp * addr * aes	2024-02-14 13:05:22 +01:00
Simon Pilgrim	f82e0809ba	[X86] Add v8i64/v16i32/v16i64 ctpop reduction test coverage Add test coverage for types wider than legal	2024-02-14 10:54:23 +00:00
Luke Lau	0fee2115bb	[RISCV] Remove -riscv-v-fixed-length-vector-lmul-max from tests. NFC (#78299 ) Some fixed vector tests in test/CodeGen/RISCV/rvv have multiple run lines that check various configurations of -riscv-v-fixed-length-vector-lmul-max. From what I understand this flag was introduced in the early days of fixed length vector support, but now that fixed vector codegen has matured I'm not sure if it's as relevant today. This patch proposes to remove the various lmul-max run lines from the tests to make them more readable, and any changes to fixed vector codegen easier to review. We have removed them before for the same reason, so this would take care of the remaining test cases: https://reviews.llvm.org/D157973#4593268 (I don't have any strong motivation to remove the actual flag itself, my own personal motivation is just to clean up the tests)	2024-02-14 16:12:37 +08:00
Craig Topper	86ce491f30	[DAGCombiner] Remove unneeded commonAlignment from reduceLoadWidth. (#81707 ) We already have the PtrOff factored into MachinePointerInfo. Any calls to getAlign on the new load with do commonAlignment with the MachinePointerInfo offset and the base alignment.	2024-02-13 23:26:25 -08:00
Jeffrey Byrnes	7180c23cf6	[SeparateConstOffsetFromGEP] Reland: Reorder trivial GEP chains to separate constants (#81671 ) Actually update tests w.r.t `9e5a77f252` and reland https://github.com/llvm/llvm-project/pull/73056	2024-02-13 17:10:23 -08:00
Pranav Kant	21630efb5a	[X86][CodeGen] Restrict F128 lowering to GNU environment (#81664 ) Otherwise it breaks some environment like X64 Android that doesn't have f128 functions available in its libc. Followup to #79611.	2024-02-13 16:39:59 -08:00
Craig Topper	0de2b26942	[RISCV] Register fixed stack slots for callee saved registers for -msave-restore/Zcmp (#81392 ) PEI previously used fake frame indices for these callee saved registers. These fake frame indices are not register with MachineFrameInfo. This required them to be deleted form CalleeSavedInfo after PEI to avoid breaking later passes. See #79535 Unfortunately, removing the registers from CalleeSavedInfo pessimizes Interprocedural Register Allocation. The RegUsageInfoCollector pass runs after PEI and uses CalleeSavedInfo. This patch replaces #79535 by properly creating fixed stack objects through MachineFrameInfo. This changes the stack size and offsets returned by MachineFrameInfo which requires changes to how RISCVFrameLowering uses that information. In addition to the individual object for each register, I've also create a single large fixed object that covers the entire stack area covered by cm.push or the libcalls. cm.push must always push a multiple of 16 bytes and the save restore libcall pushes a multiple of stack align. I think this leaves holes in the stack where we could spill other registers, but it matches what we did previously. Maybe we can optimize this in the future. The only test changes are due to stack alignment handling after the callee save registers. Since we now have the fixed objects, on the stack the offset is non-zero when an aligned object is processed so the offset gets rounded up, increasing the stack size. I suspect we might need some more updates for RVV related code. There is very little or maybe even no testing of RVV mixed with Zcmp and save-restore.	2024-02-13 14:59:28 -08:00
Heejin Ahn	473ef10b0f	[WebAssembly] Demote PHIs in catchswitch BB only (#81570 ) `DemoteCatchSwitchPHIOnly` option in `WinEHPrepare` pass was added in `99d60e0dab`, because Wasm EH uses `WinEHPrepare`, but it doesn't need to demote all PHIs. PHIs in `catchswitch` BBs have to be removed (= demoted) because `catchswitch`s are removed in ISel and `catchswitch` BBs are removed as well, so they can't have other instructions. But because Wasm EH doesn't use funclets, so PHIs in `catchpad` or `cleanuppad` BBs don't need to be demoted. That was the reason `DemoteCatchSwitchPHIOnly` option was added, in order not to demote more instructions unnecessarily. The problem is it should have been set to `true` for Wasm EH. (Its default value is `false` for WinEH) And I mistakenly set it to `false` and wasn't aware about this for more than 5 years. This was not the end of the world; it just means we've been demoting more instructions than we should, possibly huting code size. In practice I think it would've had hardly any effect in real performance given that the occurrence of PHIs in `catchpad` or `cleanuppad` BBs are not very frequent and many people run other optimizers like Binaryen anyway.	2024-02-13 13:43:21 -08:00
Philip Reames	99c5a66c62	Revert "[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants (#73056 )" and follow ups "ninja check-llvm" is failing on tip of tree. This reverts commit ec0aa1646e9953d1a8d0d15dc381d3250c854572. This reverts commit 1b65742f8c71f576381fe85d5e34579b24f2d874.	2024-02-13 13:29:23 -08:00
James Y Knight	c1a99b2c77	[Sparc] limit MaxAtomicSizeInBitsSupported to 32 for 32-bit Sparc. (#81655 ) When in 32-bit mode, the backend doesn't currently implement 64-bit atomics, even though the hardware is capable if you have specified a V9 CPU. Thus, limit the width to 32-bit, for now, leaving behind a TODO. This fixes a regression triggered by PR #73176.	2024-02-13 15:40:51 -05:00
Jeffrey Byrnes	1b65742f8c	[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants (#73056 ) In this case, a trivial GEP chain has the form: ``` %ptr = getelementptr sameType, %base, constant %val = getelementptr sameType, %ptr, %variable ``` That is, a one-index GEP consumes another (of the same basis and result type) one-index GEP, where the inner GEP uses a constant index and the outer GEP uses a variable index. For chains of this type, it is trivial to reorder them (by simply swapping the indexes). The result of doing so is better AddrMode matching for users of the ultimate ptr produced by GEP chain. Future patches can extend this to support non-trivial GEP chains (e.g. those with different basis types and/or multiple indices).	2024-02-13 11:22:49 -08:00
Danila Malyutin	e20462a069	[StatepointLowering] Use Constant instead of TargetConstant for undef value (#81635 ) Prevents isel errors when trying to lower gc relocate of undef value (which turns into CopyToReg of TargetConstant). Such relocates may occur after DCE (e.g. after GVN removes some dead blocks) if there are not passes like instcombine scheduled after to clean them up. Fixes #80294 --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2024-02-13 21:58:01 +03:00
Craig Topper	7d40ea85d5	[RISCV] Enable the TypePromotion pass from AArch64/ARM. This pass looks for unsigned icmps that have illegal types and tries to widen the use/def graph to improve the placement of the zero extends that type legalization would need to insert. I've explicitly disabled it for i32 by adding a check for isSExtCheaperThanZExt to the pass. The generated code isn't perfect, but my data shows a net dynamic instruction count improvement on spec2017 for both base and Zba+Zbb+Zbs.	2024-02-13 09:57:48 -08:00
Craig Topper	9838c8512b	[RISCV] Copy typepromotion-overflow.ll from AArch64. NFC	2024-02-13 09:57:48 -08:00
Joseph Huber	11fcae69db	[LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (#81331 ) Summary: This patch adds a new intrinsic and builtin function mirroring the existing `__builtin_readcyclecounter`. The difference is that this implementation targets a separate counter that some targets have which returns a fixed frequency clock that can be used to determine elapsed time, this is different compared to the cycle counter which often has variable frequency. This patch only adds support for the NVPTX and AMDGPU targets. This is done as a new and separate builtin rather than an argument to `readcyclecounter` to avoid needing to change existing code and to make the separation more explicit.	2024-02-13 10:06:25 -06:00
Nikita Popov	25b9ed6e49	[DAGCombine] Fix multi-use miscompile in load combine (#81586 ) The load combine replaces a number of original loads with one new loads and also replaces the output chains of the original loads with the output chain of the new load. This is incorrect if the original load is retained (due to multi-use), as it may get incorrectly reordered. Fix this by using makeEquivalentMemoryOrdering() instead, which will create a TokenFactor with both chains. Fixes https://github.com/llvm/llvm-project/issues/80911.	2024-02-13 16:41:00 +01:00

... 13 14 15 16 17 ...

52796 Commits