llvm-project

Author	SHA1	Message	Date
Michael Maitland	3967510032	[RISCV][GISel] First mask argument placed in v0 according to RISCV Ve… (#79343 ) …ctor CC.	2024-01-24 16:03:38 -05:00
Jonas Paulsson	84dcf3d35b	[SystemZ] Require D12 for i128 accesses in isLegalAddressingMode() (#79221 ) Machines with vector support handle i128 in vector registers and therefore only have the small displacement available for memory accesses. Update isLegalAddressingMode() to reflect this.	2024-01-24 20:16:05 +01:00
Alex MacLean	3b8539c9dc	[NVPTX] use incomplete aggregate initializers (#79062 ) The PTX ISA specifies that initializers may be incomplete ([5.4.4. Initializers](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#initializers)) > As in C, array initializers may be incomplete, i.e., the number of initializer elements may be less than the extent of the corresponding array dimension, with remaining array locations initialized to the default value for the specified array type. Emitting initializers in this form is preferable because it reduces the size of the PTX, in some cases significantly, and can improve compile time of ptxas as a result.	2024-01-24 09:24:28 -08:00
Philip Reames	396b6bbc5e	[RISCV] Recurse on second operand of two operand shuffles (#79197 ) This builds on bdc41106ee48dce59c500c9a3957af947f30c8c3. This change completes the migration to a recursive shuffle lowering strategy where when we encounter an unknown two argument shuffle, we lower each operand as a single source permute, and then use a vselect (i.e. a vmerge) to combine the results. This relies for code quality on the post-isel combine which will aggressively fold that vmerge back into the materialization of the second operand if possible. Note: The change includes only the most immediately obvious of the stylistic cleanup. There's a bunch of code movement that this enables that I'll do as a separate patch as rolling it into this creates an unreadable diff.	2024-01-24 08:29:28 -08:00
quic-asaravan	dc5b4daae7	[HEXAGON] Inlining Division (#79021 ) This patch inlines float division function calls for hexagon. Co-authored-by: Awanish Pandey <awanpand@codeaurora.org>	2024-01-24 09:30:33 -06:00
Jay Foad	70fc970378	[AMDGPU] Move architected SGPR implementation into isel (#79120 )	2024-01-24 15:06:20 +00:00
Felipe de Azevedo Piovezan	380ac53dfa	[DebugNames] Implement Entry::GetParentEntry query (#78760 ) This commit introduces a helper function to DWARFAcceleratorTable::Entry which follows DW_IDX_Parent attributes to returns the corresponding parent Entry in the table. It is tested by enhancing dwarfdump so that it now prints: 1. When data is corrupt. 2. When parent information is present, but the parent is not indexed. 3. The parent entry offset, when the parent is present and indexed. This is printed in terms a real entry offset (the same that gets printed at the start of each entry: "Entry @ 0x..."), instead of the encoded number in the table (which is an offset from the start off the Entry list). This makes it easy to visually inspect the dwarfdump and check what the parent is.	2024-01-24 06:44:03 -08:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
Florian Hahn	98509c7f97	[AArch64] Add vec3 tests with different load/store alignments. Add extra tests with different load/store alignments for https://github.com/llvm/llvm-project/pull/78637.	2024-01-24 14:19:33 +00:00
Simon Pilgrim	8b43c1be23	[X86] X86FixupVectorConstants - shrink vector load to movsd/movsd/movd/movq 'zero upper' instructions (#79000 ) If we're loading a vector constant that is known to be zero in the upper elements, then attempt to shrink the constant and just scalar load the lower 32/64 bits. Always chose the vzload/broadcast with the smallest constant load, and prefer vzload over broadcasts for same bitwidth to avoid domain flips (mainly a AVX1 issue). Fixes #73783	2024-01-24 14:00:51 +00:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Simon Pilgrim	72f10f7eb5	[X86] Fold not(pcmpeq(and(X,CstPow2),0)) -> pcmpeq(and(X,CstPow2),CstPow2) Fixes #78888	2024-01-24 12:04:45 +00:00
Simon Pilgrim	6255bae6c9	[X86] Add test coverage based on #78888	2024-01-24 12:04:44 +00:00
Mariusz Sikora	cfddb59be2	[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414 ) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2024-01-24 12:21:15 +01:00
Petar Avramovic	c46109d0d7	Revert "AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis" (#79274 ) Reverts llvm/llvm-project#78482	2024-01-24 12:18:34 +01:00
Petar Avramovic	149ed9d2c5	AMDGPU: update GFX11 wmma hazards (#76143 ) One V_NOP or unrelated VALU instruction in between is required for correctness when matrix A or B of current WMMA instruction overlaps with matrix D of previous WMMA instruction. Remaining cases of WMMA operand overlaps are handled by the hardware and do not require handling in hazard recognizer. Hardware may stall in cases where: - matrix C of current WMMA instruction overlaps with matrix D of previous WMMA instruction - VALU instruction reads matrix D of previous WMMA instruction - matrix A,B or C of WMMA instruction reads result of previous VALU instruction	2024-01-24 12:00:35 +01:00
Petar Avramovic	91ddcba83a	AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#78482 ) Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: https://github.com/llvm/llvm-project/pull/73337	2024-01-24 11:58:32 +01:00
Shengchen Kan	33ecef9812	[X86][CodeGen] Fix crash when commute operands of Instruction for code size (#79245 ) Reported in 134fcc62786d31ab73439201dce2d73808d1785a Incorrect opcode is used b/c there is a `[[fallthrough]]` at line 2386.	2024-01-24 17:10:28 +08:00
Shengchen Kan	71d64ed80f	[X86][Peephole] Add NDD entries for EFLAGS optimization	2024-01-24 15:47:58 +08:00
Shengchen Kan	d119ecb958	[X86][NFC] Pre-commit test for RA hints for APX NDD instructions	2024-01-24 14:44:34 +08:00
Douglas Yung	a01195ff5c	Update compiler version expected that seems to be embedded in CHECK line of test at llvm/test/CodeGen/SystemZ/zos-ppa2.ll. The test contains a CHECK line which verifies an .ascii line which originally checks for 18001970010100000000. After the bump of the compiler version to 19, the test started to fail with the string now being 19001970010100000000. This should fix this failing test on bots.	2024-01-23 22:05:17 -08:00
Shengchen Kan	f7b61f81b5	[X86][CodeGen] Transform NDD SUB to CMP if dest reg is dead (#79135 )	2024-01-24 13:58:48 +08:00
Brandon Wu	33d804c6c2	[RISCV] Allow VCIX with SE to reorder (#77049 ) This patch allows VCIX instructions that have side effect to be reordered with memory and other side effecting instructions. However we don't want VCIX instructions to be reordered with each other, so we propose a dummy register called VCIX_STATE and make these instructions implicitly define and use it.	2024-01-24 11:30:12 +08:00
Christudasan Devadasan	230c13d59d	[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669 ) CSR SGPR spilling currently uses the early available physical VGPRs. It currently imposes a high register pressure while trying to allocate large VGPR tuples within the default register budget. This patch changes the spilling strategy by picking the VGPRs in the reverse order, the highest available VGPR first and later after regalloc shift them back to the lowest available range. With that, the initial VGPRs would be available for allocation and possibility of finding large number of contiguous registers will be more.	2024-01-24 07:08:43 +05:30
Aiden Grossman	b1778c7d7b	[AsmPrinter] Remove mbb-profile-dump flag (#76595 ) Now that the work embedding PGO information in SHT_LLVM_BB_ADDR_MAP ELF sections has landed, there is no longer a need to keep around the mbb-profile-dump flag.	2024-01-23 16:48:10 -08:00
Paul Kirth	03a61d34eb	[RISCV] Support TLSDESC in the RISC-V backend (#66915 ) This patch adds basic TLSDESC support in the RISC-V backend. Specifically, we add new relocation types for TLSDESC, as prescribed in https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373, and add a new pseudo instruction to simplify code generation. This patch does not try to optimize the local dynamic case, which can be improved in separate patches. Linker side changes will also be handled separately. The current implementation is only enabled when passing the new `-enable-tlsdesc` codegen flag.	2024-01-23 16:16:07 -08:00
Paul Kirth	9d476e1e1a	[clang][FatLTO] Avoid UnifiedLTO until it can support WPD/CFI (#79061 ) Currently, the UnifiedLTO pipeline seems to have trouble with several LTO features, like SplitLTO units, which means we cannot use important optimizations like Whole Program Devirtualization or security hardening instrumentation like CFI. This patch reverts FatLTO to using distinct pipelines for Full LTO and ThinLTO. It still avoids module cloning, since that was error prone.	2024-01-23 14:04:52 -08:00
Philip Reames	f05dd29cee	[RISCV] Regenerate autogen test to remove spurious diff	2024-01-23 10:57:54 -08:00
Philip Reames	bdc41106ee	[RISCV] Recurse on first operand of two operand shuffles (#79180 ) This is the first step towards an alternate shuffle lowering design for the general two vector argument case. The goal is to leverage the existing lowering for single vector permutes to avoid as many of the vrgathers as required - even if we do need the other. This patch handles only the first argument, and is arguably a slightly weird half-step. However, the test changes from the full two argument recurse patch are a lot harder to reason about. Taking this half step gives much more easily reviewable changes, and is thus worthwhile. I intend to post the patch for the second argument once this has landed.	2024-01-23 10:49:55 -08:00
Philip Reames	bb8a8770e2	[RISCV] Exploit register boundaries when lowering shuffle with exact vlen (#79072 ) If we have a shuffle which is larger than m1, we may be able to split it into a series of individual m1 shuffles. This patch starts with the subcase where the mask allows a 1-to-1 mapping from source register to destination register - each with a possible permutation of their own. We can potentially extend this later, thought in practice this seems to already catch a number of the most interesting cases.	2024-01-23 10:36:22 -08:00
gulfemsavrun	7fe951ad8a	Revert "Reapply [hwasan] Update dbg.assign intrinsics in HWAsan pass … (#79186 ) …#78606" This reverts commit 13c6f1ea2e7eb15fe492d8fca4fa1857c6f86370 because it causes an assertion in DebugInfoMetadata.cpp:1968 in Clang Linux builders for Fuchsia. https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8758111613576762817/+/u/clang/build/stdout	2024-01-23 10:12:10 -08:00
Changpeng Fang	32073b8356	AMDGPU: Do not generate non-temporal hint when Load_Tr intrinsic did not specify it (#79104 ) int_amdgcn_global_load_tr did not specify non-temporal load transpose, thus we should not genetrate the non-temporal hint for the load. We need to implement getTgtMemIntrinsic to create the corresponding MemSDNode. And we don't set the non-temporal flag because the intrinsic did not specify it. NOTE: We need to implement getTgtMemIntrinsic for any memory intrinsics.	2024-01-23 10:05:32 -08:00
Craig Topper	d360963aaa	[RISCV] Add regalloc hints for Zcb instructions. (#78949 ) This hints the register allocator to use the same register for source and destination to enable more compression.	2024-01-23 09:33:06 -08:00
Jay Foad	6cf37dd504	[AMDGPU] Enable architected SGPRs for GFX12 (#79160 )	2024-01-23 16:36:30 +00:00
Simon Pilgrim	e1aa5b1fd1	[DAG] visitSCALAR_TO_VECTOR - don't fold scalar_to_vector(bin(extract(x),extract(y)) -> bin(x,y) if extracts have other uses Fixes #78897 - although the test case still has a number of poor codegen issues (in particular for i686 triples) that will need addressing (combining the nodes in topological order should help).	2024-01-23 16:28:43 +00:00
Mirko Brkušanin	6bb7d515c3	[AMDGPU] Properly check op_sel in GCNDPPCombine (#79122 )	2024-01-23 17:21:16 +01:00
Simon Pilgrim	8c41e3fcb1	[X86] Add test case for Issue #78897	2024-01-23 15:44:01 +00:00
Jeremy Morse	087172258a	[DebugInfo][RemoveDIs] Handle non-instr debug-info in GlobalISel (#75228 ) The RemoveDIs project is aiming to eliminate debug intrinsics like dbg.value and dbg.declare from LLVM, and replace them with DPValue objects attached to instructions. ISel is one of the "terminals" where that information needs to be converted into MIR format: this patch implements support for that in GlobalISel. We aim for the output of LLVM to be identical with/without RemoveDIs debug-info. This patch should be NFC, as we're handling the same data about variables stored in a different format -- it now appears in a DPValue object rather than as an intrinsic. To that end, I've refactored the handling of dbg.values into a dedicated function, and call it whenever a dbg.value or a DPValue is encountered. dbg.declare is handled in a similar way. Testing: adding the --try-experimental-debuginfo-iterators switch to llc causes it to try and convert to the "new" debug-info format if it's built in (LLVM_EXPERIMENTAL_DEBUGINFO_ITERATORS=On), and it'll be covered by our buildbot. One test has a few extra wildcard-regexes added: this is because there's some extra data printed about attached debug-info, which is safe to ignore.	2024-01-23 15:04:08 +00:00
Danial Klimkin	16df714e77	[test] Update stack_guard_remat.ll (#79139 ) Replace cp with a cat. This allows to create a writable file when the original one is read-only.	2024-01-23 14:48:33 +01:00
Florian Hahn	e7b4ff8119	[AArch64] Add vec3 tests with add between load and store. Extra tests for https://github.com/llvm/llvm-project/pull/78637 https://github.com/llvm/llvm-project/pull/78632	2024-01-23 12:38:00 +00:00
Simon Pilgrim	4318b033bd	[MC][X86] Merge lane/element broadcast comment printers. (#79020 ) This is /almost/ NFC - the only annoyance is that for some reason we were using "<C1,C2,..>" for ConstantVector types unlike all other cases - these now use the same "[C1,C2,..]" format as the other constant printers.	2024-01-23 12:33:52 +00:00
Pierre van Houtryve	42b0884238	[AMDGPU] Handle V_PERMLANE64_B32 in fixVcmpxPermlaneHazards (#79125 ) Fixes #78856	2024-01-23 13:10:58 +01:00
Simon Pilgrim	5c7bbe383b	[X86] canonicalizeShuffleWithOp - recognise constant vectors with getTargetConstantFromNode Allows shuffle to fold constant vectors that have already been lowered to constant pool - shuffle combining can then constant fold this. Noticed while triaging #79100	2024-01-23 11:30:06 +00:00
OCHyams	13c6f1ea2e	Reapply [hwasan] Update dbg.assign intrinsics in HWAsan pass #78606 llvm.dbg.assign intrinsics have 2 {value, expression} pairs; fix hwasan to update the second expression. Fixes #76545	2024-01-23 11:24:21 +00:00
Shengchen Kan	66237d647e	[X86][CodeGen] Add entries for NDD SHLD/SHRD to the commuteInstructionImpl	2024-01-23 17:05:09 +08:00
David Spickett	f20556678c	Reland "[llvm][AArch64] Copy all operands when expanding BLR_BTI bundle (#78267 )" (#78719 ) This reverts commit 955417ade2648c2b1a4e5f0be697f61570590a88. The problem with the previous version was that the bundle instruction had arguments like "target arg1 arg2". When it's expanded we produced a BL or BLR which can only accept one argument, the target of the branch. Now I realise why expandCALL_RVMARKER has to copy them in mutiple steps. The operands for the called function need to be changed to implicit arguments of the branch instruction. * Copy the branch target. * Copy all register operands, marking them as implicit. * Copy any other operands without modifying them. Prior to any attempt to fix #77915: BL @_setjmp, csr_aarch64_aapcs, implicit-def $lr, implicit $sp, implicit-def dead $lr, implicit $sp, implicit-def $sp Which is dropping the use of the arguments for the called function. My first fix attempt produced: BL @_setjmp, $x0, $w1, <regmask $fp ...>, implicit-def $lr, implicit $sp, implicit-def dead $lr, implicit $sp, implicit-def $sp It copied the arguments but as explicit arguments to the BL which only expects 1, failing verification. With this new change we produce: BL @_setjmp, csr_aarch64_aapcs, implicit-def $lr, implicit $sp, implicit $x0, implicit $w1, implicit-def dead $lr, implicit $sp, implicit-def $sp Note specifically the added "implicit $x0, implicit $w1". So BL only has 1 explicit argument, but the arguments to the function are still used.	2024-01-23 08:45:47 +00:00
yjijd	44ba6ebc99	[CodeGen][LoongArch] Set FP_TO_SINT/FP_TO_UINT to legal for vector types (#79107 ) Support the following conversions: v4f32->v4i32, v2f64->v2i64(LSX) v8f32->v8i32, v4f64->v4i64(LASX) v4f32->v4i64, v4f64->v4i32(LASX)	2024-01-23 15:57:06 +08:00
yjijd	f799f93692	[CodeGen][LoongArch] Set SINT_TO_FP/UINT_TO_FP to legal for vector types (#78924 ) Support the following conversions: v4i32->v4f32, v2i64->v2f64(LSX) v8i32->v8f32, v4i64->v4f64(LASX) v4i32->v4f64, v4i64->v4f32(LASX)	2024-01-23 15:16:23 +08:00
Simeon K	297b77036e	[RISCV] Fix stack size computation when M extension disabled (#78602 ) Ensure that getVLENFactoredAmount does not fail when the scale amount requires the use of a non-trivial multiplication but the M extension is not enabled. In such case, perform the multiplication using shifts and adds.	2024-01-22 23:10:25 -08:00
Ami-zhang	fcb8342a21	[LoongArch] Add definitions and feature 'frecipe' for FP approximation intrinsics/builtins (#78962 ) This PR adds definitions and 'frecipe' feature for FP approximation intrinsics/builtins. In additions, this adds and complements relative testcases.	2024-01-23 14:24:58 +08:00

1 2 3 4 5 ...

51752 Commits