llvm-project

Author	SHA1	Message	Date
DianQK	a58dcc5e08	Reland "[SimplifyCFG] Improve the precision of `PtrValueMayBeModified`" This relands commit f890f010f6a70addbd885acd0c8d1b9578b6246f. The result value of `getelementptr inbounds (TY, null, not zero)` is a poison value. We can think of it as undefined behavior.	2024-01-25 06:42:14 +08:00
DianQK	a0c1b5bdda	Reland "[SimplifyCFG] Check if the return instruction causes undefined behavior" This relands commit b6a0be8ce3114d0c57e7a7d6c3c222986ca506ad. Return undefined to a noundef return value is undefined. Example: ``` define noundef i32 @test_ret_noundef(i1 %cond) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br label %bb2 bb2: %r = phi i32 [ undef, %entry ], [ 1, %bb1 ] ret i32 %r } ```	2024-01-25 06:42:14 +08:00
Stanislav Mekhanoshin	6384b6239b	[AMDGPU] Simplify VOP3PWMMA_Profile. NFC. (#79377 )	2024-01-24 14:33:00 -08:00
Micah Weston	23faa81d3f	[SHT_LLVM_BB_ADDR_MAP] Avoids side-effects in addition since order is unspecified. (#79168 ) Turns out the problem with https://github.com/llvm/llvm-project/issues/60013 is due to the fact that order of operation is unspecified in C++: https://en.cppreference.com/w/cpp/language/eval_order. A small example of where this manifests with MSVC can be seen here https://ooo.godbolt.org/z/bxqKeqzqn. This patch does the following: * Removes the addition operations where we sequence more than one side-effect based expression. * Removes test guards to now run on Windows	2024-01-24 17:26:48 -05:00
Alexey Bataev	36e4a7ecca	[SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict weak ordering. Try to make PHICompare to meat strict weak ordering criteria.	2024-01-24 13:46:05 -08:00
Jay Foad	fe9f3903f2	[AMDGPU] Update isLegalAddressingMode for GFX12 SMEM loads (#78728 )	2024-01-24 21:04:43 +00:00
Michael Maitland	3967510032	[RISCV][GISel] First mask argument placed in v0 according to RISCV Ve… (#79343 ) …ctor CC.	2024-01-24 16:03:38 -05:00
Joseph Huber	a551703cb5	[Offload] Fix the offloading wrapper when merged multiple times. (#79231 ) Summary: The offloading wrapper is a object file that contains code necessary to register offloading entries for the given runtime. Currently, we expected only one of these to be present when we make the final executable. However, in the case of redistributable linking with `-r` we can end up with multiple of these being generated before finally creating the executable. This patch simply changes the defintiions of these globals to be mergable. This allows multiples of these to participate in a single link job. For ELF, we just make the dummy variable internal and used so it sets up the section as expected. For COFF we make the entries weak_odr so they merge to a single symbol	2024-01-24 13:50:35 -06:00
Alexey Bataev	48bbd76587	[SLP]Fix PR79229: Check that extractelement is used only in a single node before erasing. Before trying to erase the extractelement instruction, not enough to check for single use, need to check that it is not used in several nodes because of the preliminary nodes reordering.	2024-01-24 11:22:22 -08:00
Andy Kaylor	bb65f5a5d9	Move raw_string_ostream back to raw_ostream.cpp (#79224 ) The implementation of raw_string_ostream::write_impl() was moved to raw_socket_stream.cpp when the raw_socket_ostream support was separated. This patch moves it back to facilitate disabling socket support in downstream projects.	2024-01-24 11:20:23 -08:00
Jonas Paulsson	84dcf3d35b	[SystemZ] Require D12 for i128 accesses in isLegalAddressingMode() (#79221 ) Machines with vector support handle i128 in vector registers and therefore only have the small displacement available for memory accesses. Update isLegalAddressingMode() to reflect this.	2024-01-24 20:16:05 +01:00
Michael Maitland	d2d42dcfde	[CodeGen][MISched] Rename instance of Cycle -> ReleaseAtCycle b1ae461a5358932851de42b66ffde8748da51a83 renamed Cycle -> ReleaseAtCycle. 7e09239e24b339f45f63a670e2e831150826bf70 was committed without rebasing but used the old Cycle syntax. This caused a build failure when 7e09239e24b339f45f63a670e2e831150826bf70 was squash-and-merged. This patch fixes this problem.	2024-01-24 10:54:14 -08:00
Michael Maitland	7e09239e24	[CodeGen][MISched] Handle empty sized resource usage. (#75951 ) TargetSchedule.td explicitly allows the usage of a ProcResource for zero cycles, in order to represent that the ProcResource must be available but is not consumed by the instruction. On the other hand, ResourceSegments explicitly does not allow for a zero sized interval. In order to remedy this, this patch handles the special case of when there is an empty interval usage of a resource by not adding an empty interval. We ran into this issue downstream, but it makes sense to have this upstream since it is explicitly allowed by TargetSchedule.td.	2024-01-24 13:40:23 -05:00
Alexey Bataev	ca654acc16	[SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict weak ordering. Compared NumUses to meet the reaquirements of the strict weak ordering.	2024-01-24 09:36:25 -08:00
Alex MacLean	3b8539c9dc	[NVPTX] use incomplete aggregate initializers (#79062 ) The PTX ISA specifies that initializers may be incomplete ([5.4.4. Initializers](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#initializers)) > As in C, array initializers may be incomplete, i.e., the number of initializer elements may be less than the extent of the corresponding array dimension, with remaining array locations initialized to the default value for the specified array type. Emitting initializers in this form is preferable because it reduces the size of the PTX, in some cases significantly, and can improve compile time of ptxas as a result.	2024-01-24 09:24:28 -08:00
Kazu Hirata	1605bf5815	[ConstraintElimination] Use std::move in the constructor (NFC) (#79259 ) Moving the contents of Coefficients saves 0.43% of heap allocations during the compilation of a large preprocessed file, namely X86ISelLowering.cpp, for the X86 target.	2024-01-24 09:18:57 -08:00
Philip Reames	e9311f9c5a	[RISCV] Separate single source and dual source lowering code [nfc] The two single source cases aren't effected by the swap or select matching as those are dual operand specific. Similarly, a two source shuffle can't be a rotate. We can extend this idea for some of the shuffle types above, but some of them are validly either single or dual source. We don't want to loose that and the code complexity of versioning early and having to repeat some shuffle kinds doesn't (currently) seem worth it.	2024-01-24 09:16:50 -08:00
ostannard	56602a48c7	[TableGen] Include source location in JSON dump (#79028 ) This adds a '!loc' field to each record containing the file name and line number of the record declaration.	2024-01-24 17:07:20 +00:00
Philip Reames	fd817249f4	[RISCV] Sink code into using branch in shuffle lowering [nfc] Follow up to 396b6bbc, sink code into consuming branch, and fix one comment I realized used the misleading wording. (Permute is a specific sub-type of single source shuffle.)	2024-01-24 08:52:07 -08:00
Jan Svoboda	6c1dbd5359	[clang] NFC: Remove `{File,Directory}Entry::getName()` (#74910 ) The files and directories that Clang accesses are uniqued by their inode. For each inode `FileManager` will create exactly one `FileEntry` or `DirectoryEntry` object, which makes answering the question _"Are these two files/directories the same?"_ a simple pointer equality check. However, since the same inode can be accessed through multiple different paths, asking the `FileEntry` or `DirectoryEntry` object _"What is your name?"_ doesn't have clear semantics. In c0ff9908 we started reporting the most recent name used to access the entry, which turned out to be necessary for Clang modules. However, the long-term solution has always been to explicitly track the as-requested name. This has been implemented in 4dc5573a as `FileEntryRef` and `DirectoryEntryRef`. The `DirectoryEntry::getName()` interface has been deprecated since the Clang 17 release and `FileEntry::getName()` since Clang 18. We have replaced uses of these deprecated APIs in `main` with `DirectoryEntryRef::getName()` and `FileEntryRef::getName()` respectively. This makes it possible to remove `{File,Directory}Entry::getName()` for good along with the `FileManager` code that implements them.	2024-01-24 08:41:14 -08:00
Philip Reames	396b6bbc5e	[RISCV] Recurse on second operand of two operand shuffles (#79197 ) This builds on bdc41106ee48dce59c500c9a3957af947f30c8c3. This change completes the migration to a recursive shuffle lowering strategy where when we encounter an unknown two argument shuffle, we lower each operand as a single source permute, and then use a vselect (i.e. a vmerge) to combine the results. This relies for code quality on the post-isel combine which will aggressively fold that vmerge back into the materialization of the second operand if possible. Note: The change includes only the most immediately obvious of the stylistic cleanup. There's a bunch of code movement that this enables that I'll do as a separate patch as rolling it into this creates an unreadable diff.	2024-01-24 08:29:28 -08:00
Ivan Kosarev	2e81ac25b4	[AMDGPU][NFC] Simplify AGPR/VGPR load/store operand definitions. (#79289 ) Part of <https://github.com/llvm/llvm-project/issues/62629>.	2024-01-24 15:38:16 +00:00
quic-asaravan	dc5b4daae7	[HEXAGON] Inlining Division (#79021 ) This patch inlines float division function calls for hexagon. Co-authored-by: Awanish Pandey <awanpand@codeaurora.org>	2024-01-24 09:30:33 -06:00
Jeremy Morse	0065d06760	[NFC][DebugInfo] Maintain RemoveDIs flag when attributor creates functions (#79143 ) We're using this flag (IsNewDbgInfoFormat) to detect the boundaries in LLVM of what's treating debug-info as intrinsics (i.e. dbg.value), and what's using DPValue objects (the non-intrinsic replacement). The attributor tends to create new wrapper functions and doesn't insert them into Modules in the usual way, thus we have to manually update that flag to signal what debug-info mode it's using. I've added some --try-experimental-debuginfo-iterators RUN lines to tests that would otherwise crash because of this, so that they're exercised by our new-debuginfo-iterators buildbot. NB: there's an attributor test with a dbg.value in it, however attributes re-order themselves in RemoveDIs mode for various reasons, so we're going to address that in a different patch.	2024-01-24 15:20:05 +00:00
Jay Foad	70fc970378	[AMDGPU] Move architected SGPR implementation into isel (#79120 )	2024-01-24 15:06:20 +00:00
Felipe de Azevedo Piovezan	380ac53dfa	[DebugNames] Implement Entry::GetParentEntry query (#78760 ) This commit introduces a helper function to DWARFAcceleratorTable::Entry which follows DW_IDX_Parent attributes to returns the corresponding parent Entry in the table. It is tested by enhancing dwarfdump so that it now prints: 1. When data is corrupt. 2. When parent information is present, but the parent is not indexed. 3. The parent entry offset, when the parent is present and indexed. This is printed in terms a real entry offset (the same that gets printed at the start of each entry: "Entry @ 0x..."), instead of the encoded number in the table (which is an offset from the start off the Entry list). This makes it easy to visually inspect the dwarfdump and check what the parent is.	2024-01-24 06:44:03 -08:00
Florian Hahn	3d91d9613e	[ConstraintElim] Make sure min/max intrinsic results are not poison. The result of umin may be poison and in that case the added constraints are not be valid in contexts where poison doesn't cause UB. Only queue facts for min/max intrinsics if the result is guaranteed to not be poison. This could be improved in the future, by only adding the fact when solving conditions using the result value. Fixes https://github.com/llvm/llvm-project/issues/78621.	2024-01-24 14:25:55 +00:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
Simon Pilgrim	8b43c1be23	[X86] X86FixupVectorConstants - shrink vector load to movsd/movsd/movd/movq 'zero upper' instructions (#79000 ) If we're loading a vector constant that is known to be zero in the upper elements, then attempt to shrink the constant and just scalar load the lower 32/64 bits. Always chose the vzload/broadcast with the smallest constant load, and prefer vzload over broadcasts for same bitwidth to avoid domain flips (mainly a AVX1 issue). Fixes #73783	2024-01-24 14:00:51 +00:00
Rainer Orth	182ab1c703	[Support] Adjust .note.GNU-stack guard in Support/BLAKE3/blake3__x86-64_unix.S (#76229 ) When using GNU ld 2.41 on FreeBSD 14.0/amd64, there are linker warnings like ``` /vol/gcc/bin/gld-2.41: warning: blake3_avx512_x86-64_unix.S.o: missing .note.GNU-stack section implies executable stack /vol/gcc/bin/gld-2.41: NOTE: This behaviour is deprecated and will be removed in a future version of the linker ``` This can be fixed by adjusting the guard of the `.note.GNU-stack` sections in `blake3__x86-64_unix.S` to match `llvm/lib/MC/MCAsmInfoELF.cpp:MCAsmInfoELF::getNonexecutableStackSection` which emits the section on all ELF targets but Solaris. Tested on `amd64-pc-freebsd14.0`.	2024-01-24 14:33:45 +01:00
ostannard	5469010ba7	[AArch64] FP/SIMD is not mandatory for v8-R (#79004 ) The FP/SIMD instructions are optional for v8-R, so they should not be marked as a dependency of HasV8_0rOps. This had the effect of disabling some v8R-specific system registers when any of these features was disabled. I've moved these features to be enabled by default for Cortex-R82 (currently the only v8-R AArch64 core), matching the previous behavior, and clang's default. Based on a patch by Simi Pallipurath <simi.pallipurath@arm.com>	2024-01-24 13:12:03 +00:00
Nikita Popov	89dae798cc	[Loads] Use BatchAAResults for available value APIs (NFCI) This allows caching AA queries both within and across the calls, and enables us to use a custom AAQI configuration.	2024-01-24 14:04:21 +01:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Simon Pilgrim	72f10f7eb5	[X86] Fold not(pcmpeq(and(X,CstPow2),0)) -> pcmpeq(and(X,CstPow2),CstPow2) Fixes #78888	2024-01-24 12:04:45 +00:00
Simon Pilgrim	17cfc15d6b	Fix spelling typo. NFC commutatvity -> commutativity	2024-01-24 12:04:44 +00:00
Ivan Kosarev	78d8ce316f	[AMDGPU] Require explicit immediate offsets for SGPR+IMM SMEM instructions. (#79131 ) As otherwise SGPR+IMM instructions are not distinguishable to SGPR-only ones in AsmParser, leading to ambiguities. GFX12 doesn't have special SGPR-only variants, so we still allow optional immediate offsets for the subtarget. Also rename the offset operand classes while there. Part of <https://github.com/llvm/llvm-project/issues/69256>.	2024-01-24 11:46:05 +00:00
Mariusz Sikora	cfddb59be2	[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414 ) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2024-01-24 12:21:15 +01:00
Petar Avramovic	c46109d0d7	Revert "AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis" (#79274 ) Reverts llvm/llvm-project#78482	2024-01-24 12:18:34 +01:00
Petar Avramovic	149ed9d2c5	AMDGPU: update GFX11 wmma hazards (#76143 ) One V_NOP or unrelated VALU instruction in between is required for correctness when matrix A or B of current WMMA instruction overlaps with matrix D of previous WMMA instruction. Remaining cases of WMMA operand overlaps are handled by the hardware and do not require handling in hazard recognizer. Hardware may stall in cases where: - matrix C of current WMMA instruction overlaps with matrix D of previous WMMA instruction - VALU instruction reads matrix D of previous WMMA instruction - matrix A,B or C of WMMA instruction reads result of previous VALU instruction	2024-01-24 12:00:35 +01:00
Petar Avramovic	91ddcba83a	AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#78482 ) Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: https://github.com/llvm/llvm-project/pull/73337	2024-01-24 11:58:32 +01:00
Jeremy Morse	fe0e632b00	[DebugInfo][RemoveDIs] Support DPValues in HWAsan (#78731 ) This patch extends HWASAN to support maintenance of debug-info that isn't stored as intrinsics, but is instead in a DPValue object. This is straight-forwards: we collect any such objects in StackInfoBuilder, and apply the same operations to them as we would to dbg.value and similar intrinsics. I've also replaced some calls to getNextNode with debug-info skipping next calls, and use iterators for instruction insertion rather than instruction pointers. This avoids any difference in output between intrinsic / non-intrinsic debug-info, but also means that any debug-info comes before code inserted by HWAsan, rather than afterwards. See the test modifications, where the variable assignment (presented as a dbg.value) jumps up over all the code inserted by HWAsan. Seeing how the code inserted by HWAsan is always (AFAIUI) given the source-location of the instruction being instrumented, I don't believe this will have any effect on which lines variable assignments become visible on; it may extend the number of instructions covered by the assignments though.	2024-01-24 10:38:35 +00:00
Nikita Popov	cd7ea4ea65	[LAA] Drop alias scope metadata that is not valid across iterations (#79161 ) LAA currently adds memory locations with their original AATags to AST. However, scoped alias AATags may be valid only within one loop iteration, while LAA reasons across iterations. Fix this by determining which alias scopes are defined inside the loop, and drop AATags that reference these scopes. Fixes https://github.com/llvm/llvm-project/issues/79137.	2024-01-24 11:20:16 +01:00
Shengchen Kan	303e64826b	[X86][NFC] Remove dead code for "_REV" instructions ADC/SBB with reverse encoding is never emitted by compiler before encoding optimization, which is called after flag-copy lowering. This is a partial reland for 8bbf100799a97f8342bf1a8409c6fb48f03e837f	2024-01-24 17:26:57 +08:00
Nikita Popov	a7a1b8b17e	[MSSAUpdater] Handle simplified accesses when updating phis (#78272 ) This is a followup to #76819. After those changes, we can still run into an assertion failure for a slight variation of the test case: When fixing up MemoryPhis, we map the incoming access to the access of the cloned instruction -- which may now no longer exist. Fix this by reusing the getNewDefiningAccessForClone() helper, which will look upwards for a new defining access in that case.	2024-01-24 10:15:42 +01:00
Shengchen Kan	33ecef9812	[X86][CodeGen] Fix crash when commute operands of Instruction for code size (#79245 ) Reported in 134fcc62786d31ab73439201dce2d73808d1785a Incorrect opcode is used b/c there is a `[[fallthrough]]` at line 2386.	2024-01-24 17:10:28 +08:00
Kazu Hirata	b0763a1ae9	[DebugInfo] Use std::size (NFC)	2024-01-24 00:27:38 -08:00
Kazu Hirata	18a3c7a01e	[AMDGPU] Use llvm::none_of (NFC)	2024-01-24 00:27:37 -08:00
Kazu Hirata	873a7bb129	[Transforms] Use llvm::pred_size and llvm::predecessors (NFC)	2024-01-24 00:27:35 -08:00
Felix Kellenbenz	11ca56eaf1	[llvm-objcopy] Don't remove .gnu_debuglink section when using --strip-all (#78919 ) This fixes the issue mentioned here: https://github.com/llvm/llvm-project/issues/57407 It prevents `llvm-objcopy` from removing the `.gnu _debuglink` section when used with the `--strip-all` flag. Since `--strip-all` is the default of `llvm-strip` the patch also prevents `llvm-strip` from removing the `.gnu_debuglink` section.	2024-01-24 08:12:16 +00:00
Shengchen Kan	71d64ed80f	[X86][Peephole] Add NDD entries for EFLAGS optimization	2024-01-24 15:47:58 +08:00

1 2 3 4 5 ...

177895 Commits