263469 Commits

Author SHA1 Message Date
DianQK
a58dcc5e08
Reland "[SimplifyCFG] Improve the precision of PtrValueMayBeModified"
This relands commit f890f010f6a70addbd885acd0c8d1b9578b6246f.

The result value of `getelementptr inbounds (TY, null, not zero)` is a poison value.
We can think of it as undefined behavior.
2024-01-25 06:42:14 +08:00
DianQK
a0c1b5bdda
Reland "[SimplifyCFG] Check if the return instruction causes undefined behavior"
This relands commit b6a0be8ce3114d0c57e7a7d6c3c222986ca506ad.

Return undefined to a noundef return value is undefined.

Example:

```
define noundef i32 @test_ret_noundef(i1 %cond) {
entry:
  br i1 %cond, label %bb1, label %bb2
bb1:
  br label %bb2
bb2:
  %r = phi i32 [ undef, %entry ], [ 1, %bb1 ]
  ret i32 %r
}
```
2024-01-25 06:42:14 +08:00
Stanislav Mekhanoshin
6384b6239b
[AMDGPU] Simplify VOP3PWMMA_Profile. NFC. (#79377) 2024-01-24 14:33:00 -08:00
Micah Weston
23faa81d3f
[SHT_LLVM_BB_ADDR_MAP] Avoids side-effects in addition since order is unspecified. (#79168)
Turns out the problem with
https://github.com/llvm/llvm-project/issues/60013 is due to the fact
that order of operation is unspecified in C++:
https://en.cppreference.com/w/cpp/language/eval_order. A small example
of where this manifests with MSVC can be seen here
https://ooo.godbolt.org/z/bxqKeqzqn.

This patch does the following:
* Removes the addition operations where we sequence more than one
side-effect based expression.
* Removes test guards to now run on Windows
2024-01-24 17:26:48 -05:00
Alexey Bataev
36e4a7ecca [SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict
weak ordering.
Try to make PHICompare to meat strict weak ordering criteria.
2024-01-24 13:46:05 -08:00
Jay Foad
fe9f3903f2
[AMDGPU] Update isLegalAddressingMode for GFX12 SMEM loads (#78728) 2024-01-24 21:04:43 +00:00
Michael Maitland
3967510032
[RISCV][GISel] First mask argument placed in v0 according to RISCV Ve… (#79343)
…ctor CC.
2024-01-24 16:03:38 -05:00
Joseph Huber
a551703cb5
[Offload] Fix the offloading wrapper when merged multiple times. (#79231)
Summary:
The offloading wrapper is a object file that contains code necessary to
register offloading entries for the given runtime. Currently, we
expected only one of these to be present when we make the final
executable. However, in the case of redistributable linking with `-r` we
can end up with multiple of these being generated before finally
creating the executable.

This patch simply changes the defintiions of these globals to be
mergable. This allows multiples of these to participate in a single link
job. For ELF, we just make the dummy variable internal and used so it
sets up the section as expected. For COFF we make the entries weak_odr
so they merge to a single symbol
2024-01-24 13:50:35 -06:00
Alexey Bataev
48bbd76587 [SLP]Fix PR79229: Check that extractelement is used only in a single node
before erasing.

Before trying to erase the extractelement instruction, not enough to
check for single use, need to check that it is not used in several nodes
because of the preliminary nodes reordering.
2024-01-24 11:22:22 -08:00
Andy Kaylor
bb65f5a5d9
Move raw_string_ostream back to raw_ostream.cpp (#79224)
The implementation of raw_string_ostream::write_impl() was moved to
raw_socket_stream.cpp when the raw_socket_ostream support was separated.
This patch moves it back to facilitate disabling socket support in
downstream projects.
2024-01-24 11:20:23 -08:00
Jonas Paulsson
84dcf3d35b
[SystemZ] Require D12 for i128 accesses in isLegalAddressingMode() (#79221)
Machines with vector support handle i128 in vector registers and
therefore only have the small displacement available for memory
accesses. Update isLegalAddressingMode() to reflect this.
2024-01-24 20:16:05 +01:00
Nico Weber
d8a34c25bc [opt] Remove trailing space that accidentally got added 2024-01-24 14:09:54 -05:00
Nico Weber
609695b23e [gn] port 32f7922646d5 (LLVMOptDriver) 2024-01-24 14:03:33 -05:00
Nico Weber
3135984024 Reland "[CMake/Bazel] Support usage of opt driver as a library (#79205)"
This reverts commit be08be5d5de97cd593fb99affa1fa994d104eb70.
The build error was due to a different change, apologies!
2024-01-24 13:57:07 -05:00
Michael Maitland
d2d42dcfde [CodeGen][MISched] Rename instance of Cycle -> ReleaseAtCycle
b1ae461a5358932851de42b66ffde8748da51a83 renamed Cycle ->
ReleaseAtCycle.

7e09239e24b339f45f63a670e2e831150826bf70 was committed without rebasing
but used the old Cycle syntax.

This caused a build failure when
7e09239e24b339f45f63a670e2e831150826bf70 was squash-and-merged. This
patch fixes this problem.
2024-01-24 10:54:14 -08:00
Nico Weber
be08be5d5d Revert "[CMake/Bazel] Support usage of opt driver as a library (#79205)"
This reverts commit 32f7922646d5903f63d16c9fbfe3d508b0f8cda7.

Doesn't build, see
https://github.com/llvm/llvm-project/pull/79205#issuecomment-1908730527
2024-01-24 13:52:10 -05:00
Michael Maitland
7e09239e24
[CodeGen][MISched] Handle empty sized resource usage. (#75951)
TargetSchedule.td explicitly allows the usage of a ProcResource for zero
cycles, in order to represent that the ProcResource must be available
but is not consumed by the instruction. On the other hand,
ResourceSegments explicitly does not allow for a zero sized interval. In
order to remedy this, this patch handles the special case of when there
is an empty interval usage of a resource by not adding an empty
interval.

We ran into this issue downstream, but it makes sense to have
this upstream since it is explicitly allowed by TargetSchedule.td.
2024-01-24 13:40:23 -05:00
William Moses
32f7922646
[CMake/Bazel] Support usage of opt driver as a library (#79205)
In Bazel, Clang current separates the clang executable into a
clang-driver library, and the actual clang executable. This allows
downstream users to make their own variations of clang, without having
to redo/maintain separate build pipelines.

This adds the same for opt for both CMake and Bazel.
2024-01-24 13:39:27 -05:00
Jeremy Morse
604a6c409e
[BPI] Transfer value-handles when assign/move constructing BPI (#77774)
Background: BPI stores a collection of edge branch-probabilities, and
also a set of Callback value-handles for the blocks in the
edge-collection. When a block is deleted, BPI's eraseBlock method is
called to clear the edge-collection of references to that block, to
avoid dangling pointers.

However, when move-constructing or assigning a BPI object, the
edge-collection gets moved, but the value-handles are discarded. This
can lead to to stale entries in the edge-collection when blocks are
deleted without the callback -- not normally a problem, but if a new
block is allocated with the same address as an old block, spurious
branch probabilities will be recorded about it. The fix is to transfer
the handles from the source BPI object.

This was exposed by an unrelated debug-info change, it probably just
shifted around allocation orders to expose this. Detected as
nondeterminism and reduced by Zequan Wu:


f1b0a54451 (commitcomment-136737090)

(No test because IMHO testing for a behaviour that varies with memory
allocators is likely futile; I can add the reproducer with a CHECK for
the relevant branch weights if it's desired though)
2024-01-24 17:45:43 +00:00
Nico Weber
4a9a1d83be [gn build] port 4a582845597e (tablegen'd clang builtins) 2024-01-24 12:43:35 -05:00
Alexey Bataev
ca654acc16 [SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict
weak ordering.

Compared NumUses to meet the reaquirements of the strict weak ordering.
2024-01-24 09:36:25 -08:00
Alex MacLean
3b8539c9dc
[NVPTX] use incomplete aggregate initializers (#79062)
The PTX ISA specifies that initializers may be incomplete ([5.4.4.
Initializers](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#initializers))
> As in C, array initializers may be incomplete, i.e., the number of
initializer elements may be less than the extent of the corresponding
array dimension, with remaining array locations initialized to the
default value for the specified array type.

Emitting initializers in this form is preferable because it reduces the
size of the PTX, in some cases significantly, and can improve compile
time of ptxas as a result.
2024-01-24 09:24:28 -08:00
Kazu Hirata
1605bf5815
[ConstraintElimination] Use std::move in the constructor (NFC) (#79259)
Moving the contents of Coefficients saves 0.43% of heap allocations
during the compilation of a large preprocessed file, namely
X86ISelLowering.cpp, for the X86 target.
2024-01-24 09:18:57 -08:00
Philip Reames
e9311f9c5a [RISCV] Separate single source and dual source lowering code [nfc]
The two single source cases aren't effected by the swap or select matching
as those are dual operand specific.  Similarly, a two source shuffle can't
be a rotate.

We can extend this idea for some of the shuffle types above, but some of
them are validly either single or dual source.  We don't want to loose that
and the code complexity of versioning early and having to repeat some shuffle
kinds doesn't (currently) seem worth it.
2024-01-24 09:16:50 -08:00
ostannard
56602a48c7
[TableGen] Include source location in JSON dump (#79028)
This adds a '!loc' field to each record containing the file name and
line number of the record declaration.
2024-01-24 17:07:20 +00:00
Philip Reames
fd817249f4 [RISCV] Sink code into using branch in shuffle lowering [nfc]
Follow up to 396b6bbc, sink code into consuming branch, and fix one
comment I realized used the misleading wording.  (Permute is a specific
sub-type of single source shuffle.)
2024-01-24 08:52:07 -08:00
Jan Svoboda
6c1dbd5359
[clang] NFC: Remove {File,Directory}Entry::getName() (#74910)
The files and directories that Clang accesses are uniqued by their
inode. For each inode `FileManager` will create exactly one `FileEntry`
or `DirectoryEntry` object, which makes answering the question _"Are
these two files/directories the same?"_ a simple pointer equality check.

However, since the same inode can be accessed through multiple different
paths, asking the `FileEntry` or `DirectoryEntry` object _"What is your
name?"_ doesn't have clear semantics. In c0ff9908 we started reporting
the most recent name used to access the entry, which turned out to be
necessary for Clang modules. However, the long-term solution has always
been to explicitly track the as-requested name. This has been
implemented in 4dc5573a as `FileEntryRef` and `DirectoryEntryRef`.

The `DirectoryEntry::getName()` interface has been deprecated since the
Clang 17 release and `FileEntry::getName()` since Clang 18. We have
replaced uses of these deprecated APIs in `main` with
`DirectoryEntryRef::getName()` and `FileEntryRef::getName()`
respectively.

This makes it possible to remove `{File,Directory}Entry::getName()` for
good along with the `FileManager` code that implements them.
2024-01-24 08:41:14 -08:00
Nikita Popov
56444d5687
Remove fork handling from release issue workflow (#79310)
This is currently broken, because the check is performed on the wrong
repository. repo here is llvm/llvm-project, which is not a fork (so this
will always trigger), then we'll push a new branch to
llvmbot/llvm-project, and then again set the wrong owner, so we'll look
for the branch in llvm/llvm-project rather than llvmbot/llvm-project.

Rather than fixing this, I'm removing the code entirely, as it shouldn't
be needed anymore (llvmbot/llvm-project is a fork of llvm/llvm-project).
2024-01-24 17:39:10 +01:00
Philip Reames
396b6bbc5e
[RISCV] Recurse on second operand of two operand shuffles (#79197)
This builds on bdc41106ee48dce59c500c9a3957af947f30c8c3.

This change completes the migration to a recursive shuffle lowering
strategy where when we encounter an unknown two argument shuffle, we
lower each operand as a single source permute, and then use a vselect
(i.e. a vmerge) to combine the results. This relies for code quality on
the post-isel combine which will aggressively fold that vmerge back into
the materialization of the second operand if possible.

Note: The change includes only the most immediately obvious of the
stylistic cleanup. There's a bunch of code movement that this enables
that I'll do as a separate patch as rolling it into this creates an
unreadable diff.
2024-01-24 08:29:28 -08:00
Ivan Kosarev
2e81ac25b4
[AMDGPU][NFC] Simplify AGPR/VGPR load/store operand definitions. (#79289)
Part of <https://github.com/llvm/llvm-project/issues/62629>.
2024-01-24 15:38:16 +00:00
quic-asaravan
dc5b4daae7
[HEXAGON] Inlining Division (#79021)
This patch inlines float division function calls for hexagon.

Co-authored-by: Awanish Pandey <awanpand@codeaurora.org>
2024-01-24 09:30:33 -06:00
Jeremy Morse
0065d06760
[NFC][DebugInfo] Maintain RemoveDIs flag when attributor creates functions (#79143)
We're using this flag (IsNewDbgInfoFormat) to detect the boundaries in
LLVM of what's treating debug-info as intrinsics (i.e. dbg.value), and
what's using DPValue objects (the non-intrinsic replacement). The
attributor tends to create new wrapper functions and doesn't insert them
into Modules in the usual way, thus we have to manually update that flag
to signal what debug-info mode it's using.

I've added some --try-experimental-debuginfo-iterators RUN lines to
tests that would otherwise crash because of this, so that they're
exercised by our new-debuginfo-iterators buildbot.

NB: there's an attributor test with a dbg.value in it, however
attributes re-order themselves in RemoveDIs mode for various reasons, so
we're going to address that in a different patch.
2024-01-24 15:20:05 +00:00
Jay Foad
70fc970378
[AMDGPU] Move architected SGPR implementation into isel (#79120) 2024-01-24 15:06:20 +00:00
Felipe de Azevedo Piovezan
380ac53dfa
[DebugNames] Implement Entry::GetParentEntry query (#78760)
This commit introduces a helper function to DWARFAcceleratorTable::Entry
which follows DW_IDX_Parent attributes to returns the corresponding
parent Entry in the table.

It is tested by enhancing dwarfdump so that it now prints:

1. When data is corrupt.
2. When parent information is present, but the parent is not indexed.
3. The parent entry offset, when the parent is present and indexed. This
is printed in terms a real entry offset (the same that gets printed at
the start of each entry: "Entry @ 0x..."), instead of the encoded number
in the table (which is an offset from the start off the Entry list).
This makes it easy to visually inspect the dwarfdump and check what the
parent is.
2024-01-24 06:44:03 -08:00
Florian Hahn
3d91d9613e
[ConstraintElim] Make sure min/max intrinsic results are not poison.
The result of umin may be poison and in that case the added constraints
are not be valid in contexts where poison doesn't cause UB. Only queue
facts for min/max intrinsics if the result is guaranteed to not be
poison.

This could be improved in the future, by only adding the fact when
solving conditions using the result value.

Fixes https://github.com/llvm/llvm-project/issues/78621.
2024-01-24 14:25:55 +00:00
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
Florian Hahn
98509c7f97
[AArch64] Add vec3 tests with different load/store alignments.
Add extra tests with different load/store alignments for
https://github.com/llvm/llvm-project/pull/78637.
2024-01-24 14:19:33 +00:00
Nikita Popov
7143b451d7 [JumpThreading] Add test for #79175 (NFC) 2024-01-24 15:03:52 +01:00
Simon Pilgrim
8b43c1be23
[X86] X86FixupVectorConstants - shrink vector load to movsd/movsd/movd/movq 'zero upper' instructions (#79000)
If we're loading a vector constant that is known to be zero in the upper elements, then attempt to shrink the constant and just scalar load the lower 32/64 bits.

Always chose the vzload/broadcast with the smallest constant load, and prefer vzload over broadcasts for same bitwidth to avoid domain flips (mainly a AVX1 issue).

Fixes #73783
2024-01-24 14:00:51 +00:00
Rainer Orth
182ab1c703
[Support] Adjust .note.GNU-stack guard in Support/BLAKE3/blake3_*_x86-64_unix.S (#76229)
When using GNU ld 2.41 on FreeBSD 14.0/amd64, there are linker warnings
like
```
/vol/gcc/bin/gld-2.41: warning: blake3_avx512_x86-64_unix.S.o: missing .note.GNU-stack section implies executable stack
/vol/gcc/bin/gld-2.41: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
```
This can be fixed by adjusting the guard of the `.note.GNU-stack`
sections in `blake3_*_x86-64_unix.S` to match `llvm/lib/MC/MCAsmInfoELF.cpp:MCAsmInfoELF::getNonexecutableStackSection` which emits the section on all ELF targets
but Solaris.

Tested on `amd64-pc-freebsd14.0`.
2024-01-24 14:33:45 +01:00
ostannard
5469010ba7
[AArch64] FP/SIMD is not mandatory for v8-R (#79004)
The FP/SIMD instructions are optional for v8-R, so they should not be
marked as a dependency of HasV8_0rOps. This had the effect of disabling
some v8R-specific system registers when any of these features was
disabled.

I've moved these features to be enabled by default for Cortex-R82
(currently the only v8-R AArch64 core), matching the previous behavior,
and clang's default.

Based on a patch by Simi Pallipurath <simi.pallipurath@arm.com>
2024-01-24 13:12:03 +00:00
Nikita Popov
89dae798cc [Loads] Use BatchAAResults for available value APIs (NFCI)
This allows caching AA queries both within and across the calls,
and enables us to use a custom AAQI configuration.
2024-01-24 14:04:21 +01:00
Mirko Brkušanin
7fdf608cef
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Simon Pilgrim
72f10f7eb5 [X86] Fold not(pcmpeq(and(X,CstPow2),0)) -> pcmpeq(and(X,CstPow2),CstPow2)
Fixes #78888
2024-01-24 12:04:45 +00:00
Simon Pilgrim
6255bae6c9 [X86] Add test coverage based on #78888 2024-01-24 12:04:44 +00:00
Simon Pilgrim
17cfc15d6b Fix spelling typo. NFC
commutatvity -> commutativity
2024-01-24 12:04:44 +00:00
Florian Hahn
c83180c124
[ConstraintElim] Add tests for #78621.
Tests with umin where the result may be poison for
https://github.com/llvm/llvm-project/issues/78621.
2024-01-24 11:56:03 +00:00
Ivan Kosarev
78d8ce316f
[AMDGPU] Require explicit immediate offsets for SGPR+IMM SMEM instructions. (#79131)
As otherwise SGPR+IMM instructions are not distinguishable to SGPR-only
ones in AsmParser, leading to ambiguities.

GFX12 doesn't have special SGPR-only variants, so we still allow
optional immediate offsets for the subtarget.

Also rename the offset operand classes while there.

Part of <https://github.com/llvm/llvm-project/issues/69256>.
2024-01-24 11:46:05 +00:00
Mariusz Sikora
cfddb59be2
[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions

    Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
    instructions that were supported on GFX940 (MI300):
    - V_CVT_F32_FP8
    - V_CVT_F32_BF8
    - V_CVT_PK_F32_FP8
    - V_CVT_PK_F32_BF8
    - V_CVT_PK_FP8_F32
    - V_CVT_PK_BF8_F32
    - V_CVT_SR_FP8_F32
    - V_CVT_SR_BF8_F32

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2024-01-24 12:21:15 +01:00
Petar Avramovic
c46109d0d7
Revert "AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis" (#79274)
Reverts llvm/llvm-project#78482
2024-01-24 12:18:34 +01:00