llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	a277dd82d8	[X86] vector-half-conversions.ll - add v4f16->v4i32 fptosi/fptoui test coverage	2024-03-22 13:54:42 +00:00
Simon Pilgrim	74c3150ffc	[X86] Add shuffle tests from Issue #86076 SLP should be doing a better job, but both shuffles lower to poorer codegen than necessary	2024-03-22 11:23:36 +00:00
Farzon Lotfi	79c32eb03d	[DXIL] Add lowerings for cosine and floor (#86173 ) Completes #86170 Completes #86172 - `DXIL.td` - Add changes to lower the cosine and floor intrinsics to dxilOps.	2024-03-22 07:02:47 -04:00
Farzon Lotfi	d8e5c0b4e5	[DXIL] Complete abs lowering (#86158 ) This change completes #86155 - `DXIL.td` - lowering `fabs` intrinsic to the float dxil op. - `DXILIntrinsicExpansion.cpp` - Add intrinsic expansion for the abs case.	2024-03-22 07:01:01 -04:00
XChy	cb4453dc69	[SelectionDAG] Prevent combination on inconsistent type in `combineCarryDiamond` (#84888 ) Fixes #84831 When matching carry pattern with `getAsCarry`, it may produce different type of carryout. This patch checks such case and does early exit. I'm new to DAG, any suggestion is appreciated.	2024-03-22 16:05:20 +05:30
David Green	99d8c25b31	[AArch64] Extra tests for v2i8 concat loads. NFC	2024-03-22 09:55:18 +00:00
Chen Zheng	90454a6098	[PowerPC][AIX] support explicit sections for -ffunction-sections (#85351 ) Fix crashes in https://godbolt.org/z/6voEa1o6Y	2024-03-22 13:23:36 +08:00
Pravin Jagtap	e1a8120a63	[AMDGPU] Support double type in atomic optimizer. (#84307 ) Presently the atomic optimizer supports only 32-bit operations. Plan is to extend the atomic optimizer for 64-bit operations for compute and graphics. This patch extends support for double type for `uniform values` only. Going forward, will extend the support for divergent values. Adding support for divergent values requires extending/legalizing readfirstlane, readlane, writelane, etc ops for 64-bit operations to avoid `bitcast` noise that we have currently. --------- Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-03-22 09:25:06 +05:30
Craig Topper	c67ed2f1e1	[SelectionDAG][RISCV] Use TypeSize version of ComputeValueVTs in TargetLowering::LowerCallTo. (#86166 ) This is needed to support non-intrinsic functions returning tuple types which are represented as structs with scalable vector types in IR. I suspect this may have been broken since https://reviews.llvm.org/D158115	2024-03-21 20:35:08 -07:00
Freddy Ye	3e4caa9da4	[X86] Support DomainReassignment for APX NDD instructions (#85737 )	2024-03-22 08:52:40 +08:00
paperchalice	a2dfc9ac7d	[NewPM][AMDGPU] Add AMDGPUPassRegistry.def (#86095 ) Move the pass registry to a separate file, prepare for porting dag-isel.	2024-03-22 08:49:29 +08:00
Jonas Paulsson	7564566779	Reapply "Move assertion for AdjustsStack from PEI to MachineVerifier (#85698 )" - The check is now actually done in both PEI and the MachineVerifier. - More .mir tests trivially updated with "adjustsStack: true" as needed.	2024-03-21 20:24:57 -04:00
Luke Lau	51d5b65819	[RISCV] Handle scalable ops with < EEW / 2 narrow types in combineBinOp_VLToVWBinOp_VL (#84158 ) We can remove the restriction that the narrow type needs to be exactly EEW / 2 for scalable ISD::{ADD,SUB,MUL} nodes. This allows us to perform the combine even if we can't fully fold the extend into the widening op. VP intrinsics already do this, since they are lowered to _VL nodes which don't have this restriction. The "exactly EEW / 2" narrow type restriction prevented us from emitting V{S,Z}EXT_VL nodes with i1 element types which crash when we try to select them, since no other legal type is double the size of i1, see the test case added in this PR `i1_zext`. So to preserve this, this adds a check for i1 narrow types instead.	2024-03-22 07:26:29 +08:00
Luke Lau	06d245242e	[RISCV] Recursively split concat_vector into smaller LMULs when lowering (#85825 ) This is a reimplementation of the combine added in #83035 but as a lowering instead of a combine, so we don't regress the test case added in e59f120e3a14ccdc55fcb7be996efaa768daabe0 by interfering with the strided load combine Previously the combine had to concatenate the split vectors with insert_subvector instead of concat_vectors to prevent an infinite combine loop. And the reasoning behind keeping it as a combine was because if we emitted the insert_subvector during lowering then we didn't fold away inserts of undef subvectors. However it turns out we can avoid this if we just do this in lowering and select a concat_vector directly, since we get the undef folding for free with `DAG.getNode(ISD::CONCAT_VECTOR, ...)` via foldCONCAT_VECTORS.	2024-03-22 07:08:51 +08:00
Simon Pilgrim	3218570620	[X86] Add shuffle test case for Issue #86068	2024-03-21 17:56:58 +00:00
Craig Topper	f5c90f3000	[RISCV] Use BuildPairF64 and SplitF64 for bitcast i64<->f64 on rv32 regardless of Zfa. (#85982 ) Previously we used BuildPairF64 and SplitF64 only if Zfa was supported since they will select register file moves that are only available with Zfa. We recently changed the handling of BuildPairF64/SplitF64 for Zdinx to not go through memory so we should use that for bitcast. That leaves the D without Zfa case that does need to go through memory. Previously we let type legalization expand to loads and stores using a new stack temporary created for each bitcast. After this patch we will create the loads ands stores in the custom inserter and share the same stack slot for all. This also allows DAGCombiner to optimize when bitcast is mixed with BuildPairF64/SplitF64.	2024-03-21 08:52:51 -07:00
Craig Topper	7678e6e562	[RISCV] Lower the alignment requirement for a GPR pair spill for Zdinx on RV32. (#85871 ) I believe we can use XLen alignment as long as eliminateFrameIndex limits the maximum folded offset to 2043. This way when we split the load/store into two 2 instructions we'll be able to add 4 without overflowing simm12.	2024-03-21 08:14:48 -07:00
Jonas Paulsson	b4b5e8277a	Check for all frame instructions in finalize isel. (#85945 ) Check for all frame instructions in finalize isel, not just for the frame setup opcode. This was proven necessary, see #78001 for discussion.	2024-03-21 11:00:08 -04:00
David Green	686f4599cf	[ARM] Regenerate some check lines. NFC	2024-03-21 13:45:44 +00:00
SahilPatidar	3ac243bc0d	Update amdgpu_gfx functions to use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack #78226 (#81394 ) Resolve #78226	2024-03-21 16:52:08 +05:30
AtariDreams	7e72cafd68	[SelectionDAG] Add MaskedValueIsZero check to allow folding of zero extended variables we know are safe to extend (#85573 ) Add ones for every high bit that will cleared. This will allow us to evaluate variables that have their bits known to see if they have no risk of overflow despite the shift amount being greater than the difference between the two types.	2024-03-21 16:45:17 +05:30
Pierre van Houtryve	95a834a16c	(Reland) [AMDGPU] Run LowerLDS at the end of the fullLTO pipeline (#85626 ) Reland of #75333	2024-03-21 11:44:47 +01:00
Pierre van Houtryve	ccb3a8feaa	[AMDGPU][LowerModuleLDS] Refactor partially lowered module detection (#85793 ) Refactor the logic that checks if a module contains mixed absolute/non-lowered LDS GVs. The check now happens latter when the "worklists" are formed. This is because in some cases (OpenMP) we can have non-lowered GVs in a lowered module, and this is normal because those GVs are just unused and removed from the list at some point before the end of `getUsesOfLDSByFunction`. Doing the check later ensures that if a mixed module is spotted, then it's a _real_ mixed module that needs rejection, not a module containing an intentionally ignored GV.	2024-03-21 11:28:35 +01:00
Matt Arsenault	b6b703b2df	AMDGPU: Infer no-agpr usage in AMDGPUAttributor (#85948 ) SIMachineFunctionInfo has a scan of the function body for inline asm which may use AGPRs, or callees in SIMachineFunctionInfo. Move this into the attributor, so it actually works interprocedurally. Could probably avoid most of the test churn if this bothered to avoid adding this on subtargets without AGPRs. We should also probably try to delete the MIR scan in usesAGPRs but it seems to be trickier to eliminate.	2024-03-21 14:24:06 +05:30
Madhur Amilkanthwar	7bb87d5338	[AArch64][GlobalISel] Take abs scalar codegen closer to SDAG (#84886 ) This patch improves codegen for scalar (<128bits) version of llvm.abs intrinsic by using the existing non-XOR based lowering. This takes the generated code closer to SDAG. codegen with GISel for > 128 bit types is not very good with these method so not doing so.	2024-03-21 09:54:03 +05:30
Thorsten Schütt	deefe3fbc9	[GlobalIsel] Post-review combine ADDO (#85961 ) https://github.com/llvm/llvm-project/pull/82927	2024-03-21 03:56:40 +01:00
Freddy Ye	07a5e31cb3	Move pre-commit test for #85737 (#86062 )	2024-03-21 10:55:26 +08:00
Freddy Ye	35a66f965c	Precommit test for #85737 (#86056 ) Copied from llvm/test/CodeGen/X86/domain-reassignment.mir	2024-03-21 10:19:28 +08:00
Paul Kirth	f6f474c4ef	[llvm][lld] Pre-commit tests for RISCV TLSDESC symbols Currently, we mistakenly mark the local labels used in RISC-V TLSDESC as TLS symbols, when they should not be. This patch adds tests with the current incorrect behavior, and subsequent patches will address the issue. Reviewers: MaskRay, topperc Reviewed By: MaskRay Pull Request: https://github.com/llvm/llvm-project/pull/85816	2024-03-20 13:39:39 -07:00
S. Bharadwaj Yadavalli	3f39571228	[DirectX][DXIL] Distinguish return type for overload type resolution. (#85646 ) Return type of DXIL Ops may be different from valid overload type of the parameters, if any. Such DXIL Ops are correctly represented in DXIL.td. However, DXILEmitter assumes the return type to be the same as parameter overload type, if one exists. This results in generation in incorrect overload index value in DXILOperation.inc for the DXIL Op and incorrect DXIL operation function call in DXILOpLowering pass. This change distinguishes return types correctly from parameter overload types in DXILEmitter backend to handle such DXIL ops. Add specification for DXIL Op `isinf` and corresponding tests to verify the above change. Fixes issue #85125	2024-03-20 14:48:16 -04:00
Craig Topper	891172d9be	[RISCV] Use 'riscv-isa' module flag to set ELF flags and attributes. (#85155 ) Walk all the ISA strings and set the subtarget bits for any extension we find in any string. This allows LTO output to have a ELF attributes from the union of all of the files used to compile it.	2024-03-20 11:35:19 -07:00
Vyacheslav Levytskyy	c2483ed52d	[SPIRV] Add __spirv_ builtins for existing instructions (#85654 ) This PR: * adds __spirv_ builtins for existing instructions; * fixes parsing of "syncscope" values in atomic instructions; * fix a special case of binary header emision.	2024-03-20 19:28:29 +01:00
Vyacheslav Levytskyy	949d70d5e0	[SPIR-V] Fix incorrect bitwise instructions applied to the bool type (#85929 ) This PR ensures that LLVM IR bitwise instructions result in logical SPIR-V instructions when applied to i1 type.	2024-03-20 19:23:12 +01:00
Jonas Paulsson	9ebd329ad8	Revert "Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698 )" This reverts commit 05bde30585710a51592eee0a6cf6df8184d09c92. Reverting due to verifier complaints with expensive checks on build-bot.	2024-03-20 11:48:30 -04:00
Craig Topper	576d81baa5	[RISCV] Use REG_SEQUENCE/EXTRACT_SUBREG to move between individual GPRs and GPRPair. (#85887 ) Previously we used memory like we do to move between GPRs and FPR64 with the D extension on RV32. We can instead use REG_SEQUENCE/EXTRACT_SUBREG to inform register allocation how to do the copy without memory.	2024-03-20 08:44:24 -07:00
Thomas Lively	767e0c8bce	[WebAssembly] Select BUILD_VECTOR with large unsigned lane values (#85880 ) Previously we expected lane constants to be in the range of signed values for each lane size, but the included test case produced large unsigned values that fall outside that range. Allow instruction selection to proceed in this case rather than failing. Fixes #63817.	2024-03-20 08:42:42 -07:00
Neumann Hon	5fb2797f23	[GOFF][z/OS] Change PrivateGlobalPrefix and PrivateLabelPrefix to be L# (#85730 ) The current values for PrivateGlobalPrefix and PrivateLabelPrefix (@@ and @ respectively) are, in hindsight, poor choices for multiple reasons: First, there exist externally visible routines from the language environment that begin with @@. These functions are certainly not local/private by any means and they should not share a prefix with private globals. Secondly, both private globals and private labels should be handled the same way by GOFF, so it doesn't make much sense for them to have separate prefixes. GOFF remains the only file format where these are different and there is no reason for that to be the case	2024-03-20 10:30:30 -04:00
Jonas Paulsson	05bde30585	Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698 ) Have the verifier report a missing AdjustsStack flag rather than waiting until PEI asserts.	2024-03-20 10:29:12 -04:00
Benjamin Kramer	5f5a64134b	Revert "[DAGCombiner] Simplifying `{si\|ui}tofp` when only signbit is needed" This reverts commit 353fbeb0a294d2c7cef6d88607fa0fd50ee81462. It crashes when it encounters an UINT_TO_FP. llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1618 in SDValue llvm::SelectionDAG::getConstant(const ConstantInt &, const SDLoc &, EVT, bool, bool): VT.isInteger() && "Cannot create FP integer constant!"	2024-03-20 15:08:37 +01:00
Pravin Jagtap	e52a687871	[AMDGPU][NFC] Test clean up (#85922 ) Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-03-20 17:29:42 +05:30
Pravin Jagtap	070d1e8321	[AMDGPU] Add test for fpext & fptrunc with bf16. (#85909 ) Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-03-20 14:45:38 +05:30
YunQiang Su	d7e28cd82b	MIPS: Support -m(no-)unaligned-access for r6 (#85174 ) MIPSr6 ISA requires normal load/store instructions support misunaligned memory access, while it is not always do so by hardware. On some microarchitectures or some corner cases it may need support by OS. Don't confuse with pre-R6's lwl/lwr famlily: MIPSr6 doesn't support them, instead, r6 requires lw instruction support misunaligned memory access. So, if -mstrict-align is used for pre-R6, lwl/lwr won't be disabled. If -mstrict-align is used for r6 and the access is not well aligned, some lb/lh instructions will be used to replace lw. This is useful for OS kernels. To be back-compatible with GCC, -m(no-)unaligned-access are also added as Neg-Alias of -m(no-)strict-align.	2024-03-20 14:18:24 +08:00
Peter Rong	4a026b5092	[AMDGCN] Use ZExt when handling indices in insertment element (#85718 ) When i1 true is used as an index, SExt extends it to i32 -1. This would cause BitVector to overflow. The language manual have specified that the index shall be treated as an unsigned number, this patch fixes that. (https://llvm.org/docs/LangRef.html#insertelement-instruction) This patch fixes #85717 --------- Signed-off-by: Peter Rong <PeterRong96@gmail.com>	2024-03-19 21:44:08 -07:00
Jiahan Xie	4bf06bebb9	[GISEL][RISCV] IRTranslator for scalable vector load (#80006 ) Add IRTranslator for scalable vector load instruction and include corresponding tests with alignment argument included, which can be smaller/equal/larger than element size or smaller/equal/larger than the minimum total vector size.	2024-03-19 20:12:26 -04:00
Alex MacLean	888e284903	[NVPTX] Use PTX prmt for llvm.bswap (#85545 )	2024-03-19 15:18:53 -07:00
Noah Goldstein	353fbeb0a2	[DAGCombiner] Simplifying `{si\|ui}tofp` when only signbit is needed If we only need the signbit `uitofp` simplified to 0, and `sitofp` simplifies to `bitcast`. Closes #85138	2024-03-19 17:17:35 -05:00
Noah Goldstein	ebd1379663	[DAGCombiner] Add tests for simplifying `{si\|ui}tofp`; NFC	2024-03-19 17:17:35 -05:00
quic-areg	31f4b329c8	[Hexagon] ELF attributes for Hexagon (#85359 ) Defines a subset of attributes and emits them to a section called .hexagon.attributes. The current attributes recorded are the attributes needed by llvm-objdump to automatically determine target features and eliminate the need to manually pass features.	2024-03-19 16:22:30 -05:00
Simon Pilgrim	2377b9773d	[DAG] SimplifyShift - shift i1/vXi1 X, Y --> X (any non-zero shift amount is undefined). Alive2: https://alive2.llvm.org/ce/z/SdESbg Fixes #85681	2024-03-19 20:18:37 +00:00
Changpeng Fang	ab76052fa9	AMDGPU: Treat SWMMAC the same as MFMA and other WMMA for sched_barrier (#85721 )	2024-03-19 09:58:09 -07:00

... 4 5 6 7 8 ...

52796 Commits