llvm-project

Author	SHA1	Message	Date
Felipe de Azevedo Piovezan	380ac53dfa	[DebugNames] Implement Entry::GetParentEntry query (#78760 ) This commit introduces a helper function to DWARFAcceleratorTable::Entry which follows DW_IDX_Parent attributes to returns the corresponding parent Entry in the table. It is tested by enhancing dwarfdump so that it now prints: 1. When data is corrupt. 2. When parent information is present, but the parent is not indexed. 3. The parent entry offset, when the parent is present and indexed. This is printed in terms a real entry offset (the same that gets printed at the start of each entry: "Entry @ 0x..."), instead of the encoded number in the table (which is an offset from the start off the Entry list). This makes it easy to visually inspect the dwarfdump and check what the parent is.	2024-01-24 06:44:03 -08:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
Florian Hahn	98509c7f97	[AArch64] Add vec3 tests with different load/store alignments. Add extra tests with different load/store alignments for https://github.com/llvm/llvm-project/pull/78637.	2024-01-24 14:19:33 +00:00
Simon Pilgrim	8b43c1be23	[X86] X86FixupVectorConstants - shrink vector load to movsd/movsd/movd/movq 'zero upper' instructions (#79000 ) If we're loading a vector constant that is known to be zero in the upper elements, then attempt to shrink the constant and just scalar load the lower 32/64 bits. Always chose the vzload/broadcast with the smallest constant load, and prefer vzload over broadcasts for same bitwidth to avoid domain flips (mainly a AVX1 issue). Fixes #73783	2024-01-24 14:00:51 +00:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Simon Pilgrim	72f10f7eb5	[X86] Fold not(pcmpeq(and(X,CstPow2),0)) -> pcmpeq(and(X,CstPow2),CstPow2) Fixes #78888	2024-01-24 12:04:45 +00:00
Simon Pilgrim	6255bae6c9	[X86] Add test coverage based on #78888	2024-01-24 12:04:44 +00:00
Mariusz Sikora	cfddb59be2	[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414 ) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2024-01-24 12:21:15 +01:00
Petar Avramovic	c46109d0d7	Revert "AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis" (#79274 ) Reverts llvm/llvm-project#78482	2024-01-24 12:18:34 +01:00
Petar Avramovic	149ed9d2c5	AMDGPU: update GFX11 wmma hazards (#76143 ) One V_NOP or unrelated VALU instruction in between is required for correctness when matrix A or B of current WMMA instruction overlaps with matrix D of previous WMMA instruction. Remaining cases of WMMA operand overlaps are handled by the hardware and do not require handling in hazard recognizer. Hardware may stall in cases where: - matrix C of current WMMA instruction overlaps with matrix D of previous WMMA instruction - VALU instruction reads matrix D of previous WMMA instruction - matrix A,B or C of WMMA instruction reads result of previous VALU instruction	2024-01-24 12:00:35 +01:00
Petar Avramovic	91ddcba83a	AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#78482 ) Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: https://github.com/llvm/llvm-project/pull/73337	2024-01-24 11:58:32 +01:00
Shengchen Kan	33ecef9812	[X86][CodeGen] Fix crash when commute operands of Instruction for code size (#79245 ) Reported in 134fcc62786d31ab73439201dce2d73808d1785a Incorrect opcode is used b/c there is a `[[fallthrough]]` at line 2386.	2024-01-24 17:10:28 +08:00
Shengchen Kan	71d64ed80f	[X86][Peephole] Add NDD entries for EFLAGS optimization	2024-01-24 15:47:58 +08:00
Shengchen Kan	d119ecb958	[X86][NFC] Pre-commit test for RA hints for APX NDD instructions	2024-01-24 14:44:34 +08:00
Douglas Yung	a01195ff5c	Update compiler version expected that seems to be embedded in CHECK line of test at llvm/test/CodeGen/SystemZ/zos-ppa2.ll. The test contains a CHECK line which verifies an .ascii line which originally checks for 18001970010100000000. After the bump of the compiler version to 19, the test started to fail with the string now being 19001970010100000000. This should fix this failing test on bots.	2024-01-23 22:05:17 -08:00
Shengchen Kan	f7b61f81b5	[X86][CodeGen] Transform NDD SUB to CMP if dest reg is dead (#79135 )	2024-01-24 13:58:48 +08:00
Brandon Wu	33d804c6c2	[RISCV] Allow VCIX with SE to reorder (#77049 ) This patch allows VCIX instructions that have side effect to be reordered with memory and other side effecting instructions. However we don't want VCIX instructions to be reordered with each other, so we propose a dummy register called VCIX_STATE and make these instructions implicitly define and use it.	2024-01-24 11:30:12 +08:00
Christudasan Devadasan	230c13d59d	[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669 ) CSR SGPR spilling currently uses the early available physical VGPRs. It currently imposes a high register pressure while trying to allocate large VGPR tuples within the default register budget. This patch changes the spilling strategy by picking the VGPRs in the reverse order, the highest available VGPR first and later after regalloc shift them back to the lowest available range. With that, the initial VGPRs would be available for allocation and possibility of finding large number of contiguous registers will be more.	2024-01-24 07:08:43 +05:30
Aiden Grossman	b1778c7d7b	[AsmPrinter] Remove mbb-profile-dump flag (#76595 ) Now that the work embedding PGO information in SHT_LLVM_BB_ADDR_MAP ELF sections has landed, there is no longer a need to keep around the mbb-profile-dump flag.	2024-01-23 16:48:10 -08:00
Paul Kirth	03a61d34eb	[RISCV] Support TLSDESC in the RISC-V backend (#66915 ) This patch adds basic TLSDESC support in the RISC-V backend. Specifically, we add new relocation types for TLSDESC, as prescribed in https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373, and add a new pseudo instruction to simplify code generation. This patch does not try to optimize the local dynamic case, which can be improved in separate patches. Linker side changes will also be handled separately. The current implementation is only enabled when passing the new `-enable-tlsdesc` codegen flag.	2024-01-23 16:16:07 -08:00
Paul Kirth	9d476e1e1a	[clang][FatLTO] Avoid UnifiedLTO until it can support WPD/CFI (#79061 ) Currently, the UnifiedLTO pipeline seems to have trouble with several LTO features, like SplitLTO units, which means we cannot use important optimizations like Whole Program Devirtualization or security hardening instrumentation like CFI. This patch reverts FatLTO to using distinct pipelines for Full LTO and ThinLTO. It still avoids module cloning, since that was error prone.	2024-01-23 14:04:52 -08:00
Philip Reames	f05dd29cee	[RISCV] Regenerate autogen test to remove spurious diff	2024-01-23 10:57:54 -08:00
Philip Reames	bdc41106ee	[RISCV] Recurse on first operand of two operand shuffles (#79180 ) This is the first step towards an alternate shuffle lowering design for the general two vector argument case. The goal is to leverage the existing lowering for single vector permutes to avoid as many of the vrgathers as required - even if we do need the other. This patch handles only the first argument, and is arguably a slightly weird half-step. However, the test changes from the full two argument recurse patch are a lot harder to reason about. Taking this half step gives much more easily reviewable changes, and is thus worthwhile. I intend to post the patch for the second argument once this has landed.	2024-01-23 10:49:55 -08:00
Philip Reames	bb8a8770e2	[RISCV] Exploit register boundaries when lowering shuffle with exact vlen (#79072 ) If we have a shuffle which is larger than m1, we may be able to split it into a series of individual m1 shuffles. This patch starts with the subcase where the mask allows a 1-to-1 mapping from source register to destination register - each with a possible permutation of their own. We can potentially extend this later, thought in practice this seems to already catch a number of the most interesting cases.	2024-01-23 10:36:22 -08:00
gulfemsavrun	7fe951ad8a	Revert "Reapply [hwasan] Update dbg.assign intrinsics in HWAsan pass … (#79186 ) …#78606" This reverts commit 13c6f1ea2e7eb15fe492d8fca4fa1857c6f86370 because it causes an assertion in DebugInfoMetadata.cpp:1968 in Clang Linux builders for Fuchsia. https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8758111613576762817/+/u/clang/build/stdout	2024-01-23 10:12:10 -08:00
Changpeng Fang	32073b8356	AMDGPU: Do not generate non-temporal hint when Load_Tr intrinsic did not specify it (#79104 ) int_amdgcn_global_load_tr did not specify non-temporal load transpose, thus we should not genetrate the non-temporal hint for the load. We need to implement getTgtMemIntrinsic to create the corresponding MemSDNode. And we don't set the non-temporal flag because the intrinsic did not specify it. NOTE: We need to implement getTgtMemIntrinsic for any memory intrinsics.	2024-01-23 10:05:32 -08:00
Craig Topper	d360963aaa	[RISCV] Add regalloc hints for Zcb instructions. (#78949 ) This hints the register allocator to use the same register for source and destination to enable more compression.	2024-01-23 09:33:06 -08:00
Jay Foad	6cf37dd504	[AMDGPU] Enable architected SGPRs for GFX12 (#79160 )	2024-01-23 16:36:30 +00:00
Simon Pilgrim	e1aa5b1fd1	[DAG] visitSCALAR_TO_VECTOR - don't fold scalar_to_vector(bin(extract(x),extract(y)) -> bin(x,y) if extracts have other uses Fixes #78897 - although the test case still has a number of poor codegen issues (in particular for i686 triples) that will need addressing (combining the nodes in topological order should help).	2024-01-23 16:28:43 +00:00
Mirko Brkušanin	6bb7d515c3	[AMDGPU] Properly check op_sel in GCNDPPCombine (#79122 )	2024-01-23 17:21:16 +01:00
Simon Pilgrim	8c41e3fcb1	[X86] Add test case for Issue #78897	2024-01-23 15:44:01 +00:00
Jeremy Morse	087172258a	[DebugInfo][RemoveDIs] Handle non-instr debug-info in GlobalISel (#75228 ) The RemoveDIs project is aiming to eliminate debug intrinsics like dbg.value and dbg.declare from LLVM, and replace them with DPValue objects attached to instructions. ISel is one of the "terminals" where that information needs to be converted into MIR format: this patch implements support for that in GlobalISel. We aim for the output of LLVM to be identical with/without RemoveDIs debug-info. This patch should be NFC, as we're handling the same data about variables stored in a different format -- it now appears in a DPValue object rather than as an intrinsic. To that end, I've refactored the handling of dbg.values into a dedicated function, and call it whenever a dbg.value or a DPValue is encountered. dbg.declare is handled in a similar way. Testing: adding the --try-experimental-debuginfo-iterators switch to llc causes it to try and convert to the "new" debug-info format if it's built in (LLVM_EXPERIMENTAL_DEBUGINFO_ITERATORS=On), and it'll be covered by our buildbot. One test has a few extra wildcard-regexes added: this is because there's some extra data printed about attached debug-info, which is safe to ignore.	2024-01-23 15:04:08 +00:00
Danial Klimkin	16df714e77	[test] Update stack_guard_remat.ll (#79139 ) Replace cp with a cat. This allows to create a writable file when the original one is read-only.	2024-01-23 14:48:33 +01:00
Florian Hahn	e7b4ff8119	[AArch64] Add vec3 tests with add between load and store. Extra tests for https://github.com/llvm/llvm-project/pull/78637 https://github.com/llvm/llvm-project/pull/78632	2024-01-23 12:38:00 +00:00
Simon Pilgrim	4318b033bd	[MC][X86] Merge lane/element broadcast comment printers. (#79020 ) This is /almost/ NFC - the only annoyance is that for some reason we were using "<C1,C2,..>" for ConstantVector types unlike all other cases - these now use the same "[C1,C2,..]" format as the other constant printers.	2024-01-23 12:33:52 +00:00
Pierre van Houtryve	42b0884238	[AMDGPU] Handle V_PERMLANE64_B32 in fixVcmpxPermlaneHazards (#79125 ) Fixes #78856	2024-01-23 13:10:58 +01:00
Simon Pilgrim	5c7bbe383b	[X86] canonicalizeShuffleWithOp - recognise constant vectors with getTargetConstantFromNode Allows shuffle to fold constant vectors that have already been lowered to constant pool - shuffle combining can then constant fold this. Noticed while triaging #79100	2024-01-23 11:30:06 +00:00
OCHyams	13c6f1ea2e	Reapply [hwasan] Update dbg.assign intrinsics in HWAsan pass #78606 llvm.dbg.assign intrinsics have 2 {value, expression} pairs; fix hwasan to update the second expression. Fixes #76545	2024-01-23 11:24:21 +00:00
Shengchen Kan	66237d647e	[X86][CodeGen] Add entries for NDD SHLD/SHRD to the commuteInstructionImpl	2024-01-23 17:05:09 +08:00
David Spickett	f20556678c	Reland "[llvm][AArch64] Copy all operands when expanding BLR_BTI bundle (#78267 )" (#78719 ) This reverts commit 955417ade2648c2b1a4e5f0be697f61570590a88. The problem with the previous version was that the bundle instruction had arguments like "target arg1 arg2". When it's expanded we produced a BL or BLR which can only accept one argument, the target of the branch. Now I realise why expandCALL_RVMARKER has to copy them in mutiple steps. The operands for the called function need to be changed to implicit arguments of the branch instruction. * Copy the branch target. * Copy all register operands, marking them as implicit. * Copy any other operands without modifying them. Prior to any attempt to fix #77915: BL @_setjmp, csr_aarch64_aapcs, implicit-def $lr, implicit $sp, implicit-def dead $lr, implicit $sp, implicit-def $sp Which is dropping the use of the arguments for the called function. My first fix attempt produced: BL @_setjmp, $x0, $w1, <regmask $fp ...>, implicit-def $lr, implicit $sp, implicit-def dead $lr, implicit $sp, implicit-def $sp It copied the arguments but as explicit arguments to the BL which only expects 1, failing verification. With this new change we produce: BL @_setjmp, csr_aarch64_aapcs, implicit-def $lr, implicit $sp, implicit $x0, implicit $w1, implicit-def dead $lr, implicit $sp, implicit-def $sp Note specifically the added "implicit $x0, implicit $w1". So BL only has 1 explicit argument, but the arguments to the function are still used.	2024-01-23 08:45:47 +00:00
yjijd	44ba6ebc99	[CodeGen][LoongArch] Set FP_TO_SINT/FP_TO_UINT to legal for vector types (#79107 ) Support the following conversions: v4f32->v4i32, v2f64->v2i64(LSX) v8f32->v8i32, v4f64->v4i64(LASX) v4f32->v4i64, v4f64->v4i32(LASX)	2024-01-23 15:57:06 +08:00
yjijd	f799f93692	[CodeGen][LoongArch] Set SINT_TO_FP/UINT_TO_FP to legal for vector types (#78924 ) Support the following conversions: v4i32->v4f32, v2i64->v2f64(LSX) v8i32->v8f32, v4i64->v4f64(LASX) v4i32->v4f64, v4i64->v4f32(LASX)	2024-01-23 15:16:23 +08:00
Simeon K	297b77036e	[RISCV] Fix stack size computation when M extension disabled (#78602 ) Ensure that getVLENFactoredAmount does not fail when the scale amount requires the use of a non-trivial multiplication but the M extension is not enabled. In such case, perform the multiplication using shifts and adds.	2024-01-22 23:10:25 -08:00
Ami-zhang	fcb8342a21	[LoongArch] Add definitions and feature 'frecipe' for FP approximation intrinsics/builtins (#78962 ) This PR adds definitions and 'frecipe' feature for FP approximation intrinsics/builtins. In additions, this adds and complements relative testcases.	2024-01-23 14:24:58 +08:00
Eli Friedman	a6065f0fa5	Arm64EC entry/exit thunks, consolidated. (#79067 ) This combines the previously posted patches with some additional work I've done to more closely match MSVC output. Most of the important logic here is implemented in AArch64Arm64ECCallLowering. The purpose of the AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for other targets, and generate most of the Arm64EC-specific bits: generating thunks, mangling symbols, generating aliases, and generating the .hybmp$x table. This is all done late for a few reasons: to consolidate the logic as much as possible, and to ensure the IR exposed to optimization passes doesn't contain complex arm64ec-specific constructs. The other changes are supporting changes, to handle the new constructs generated by that pass. There's a global llvm.arm64ec.symbolmap representing the .hybmp$x entries for the thunks. This gets handled directly by the AsmPrinter because it needs symbol indexes that aren't available before that. There are two new calling conventions used to represent calls to and from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few changes to handle the associated exception-handling info, SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX. I've intentionally left out handling for structs with small non-power-of-two sizes, because that's easily separated out. The rest of my current work is here. I squashed my current patches because they were split in ways that didn't really make sense. Maybe I could split out some bits, but it's hard to meaningfully test most of the parts independently. Thanks to @dpaoliello for extensive testing and suggestions. (Originally posted as https://reviews.llvm.org/D157547 .)	2024-01-22 21:28:07 -08:00
Shengchen Kan	5c68c6d70f	[X86] Support encoding/decoding and lowering for APX variant SHL/SHR/SAR/ROL/ROR/RCL/RCR/SHLD/SHRD (#78853 ) Four variants: promoted legacy, ND (new data destination), NF (no flags update) and NF_ND (NF + ND). The syntax of NF instructions is aligned with GNU binutils. https://sourceware.org/pipermail/binutils/2023-September/129545.html	2024-01-23 10:23:27 +08:00
Jim Lin	b8e708b9d3	[RISCV] Merge ADDI with X0 into base offset (#78940 ) If offset is `addi rd, x0, imm`, merge imm into base offset.	2024-01-23 09:57:05 +08:00
Carl Ritson	4db4d7f282	[AMDGPU] SILowerSGPRSpills: do not update MRI reserve registers (#77888 ) VGPRs used for spilling do not require explicit reservation with MRI. freezeReservedRegs() executed before register allocation ensures these are placed in the reserve set. The only pass after SILowerSGPRSpills is SIPreAllocateWWMRegs which explicitly tests for interference before register allocation so should not reuse a WWM VGPR holding spill data. reserveReg prevents calculation of correct liveness for physical registers which could be used to extend SIPreAllocateWWMRegs.	2024-01-23 10:49:26 +09:00
Simeon K	58cfd56356	[VP][RISCV] Introduce llvm.vp.minimum/maximum intrinsics (#74840 ) Although there are predicated versions of minnum/maxnum, the ones for minimum/maximum are currently missing. This patch introduces these intrinsics and implements their lowering to RISC-V.	2024-01-22 16:46:39 -08:00
David Green	a2d68b4bec	[SelectOpt] Add handling for Select-like operations. (#77284 ) Some operations behave like selects. For example `or(zext(c), y)` is the same as select(c, y\|1, y)` and instcombine can canonicalize the select to the or form. These operations can still be worthwhile converting to branch as opposed to keeping as a select or or instruction. This patch attempts to add some basic handling for them, creating a SelectLike abstraction in the select optimization pass. The backend can opt into handling `or(zext(c),x)` as a select if it could be profitable, and the select optimization pass attempts to handle them in much the same way as a `select(c, x\|1, x)`. The Or(x, 1) may need to be added as a new instruction, generated as the or is converted to branches. This helps fix a regression from selects being converted to or's recently.	2024-01-22 23:46:58 +00:00

... 20 21 22 23 24 ...

52796 Commits