518027 Commits

Author SHA1 Message Date
Craig Topper
fcacda899f [RISCV] Remove constant_fold_cast_op from RISCVPostLegalizerCombiner.
This is no longer tested after other recent changes. AArch64 does
have this in their PostLegalizerCombiner.
2024-11-12 23:28:48 -08:00
Kazu Hirata
9571cc2b28
[ARM] Remove unused includes (NFC) (#115995)
Identified with misc-include-cleaner.
2024-11-12 23:15:21 -08:00
Kazu Hirata
735ab61ac8
[CodeGen] Remove unused includes (NFC) (#115996)
Identified with misc-include-cleaner.
2024-11-12 23:15:06 -08:00
Jianjian Guan
a6f8af676a
[RISCV] Improve vmsge and vmsgeu selection (#115435)
Select vmsge(u) vs, C to vmsgt(u) vs, C-1 if C is not in the imm range
and not the minimum value.

Fix https://github.com/llvm/llvm-project/issues/114505.
2024-11-13 15:05:08 +08:00
Boaz Brickner
9a365bc9a0
[Clang] [NFC] Add "human" diagnostic argument format (#115835)
This allows formatting large integers in a human friendly way. Example:
"5321584" -> "5.32M".
Use it where such human numbers are generated manually today.
2024-11-13 07:58:11 +01:00
Boaz Brickner
edfa75de33
[clang] [NFC] Split checkAttributesAfterMerging() to multiple functions (#115464) 2024-11-13 07:42:50 +01:00
Luke Lau
1294ddabbc [RISCV] Add cost model tests for vp.{s,u}{min,max}. NFC 2024-11-13 14:32:44 +08:00
Kasper Nielsen
1824e45cd7
[MLIR,Python] Support converting boolean numpy arrays to and from mlir attributes (unrevert) (#115481)
This PR re-introduces the functionality of
https://github.com/llvm/llvm-project/pull/113064, which was reverted in
0a68171b3c
due to memory lifetime issues.

Notice that I was not able to re-produce the ASan results myself, so I
have not been able to verify that this PR really fixes the issue.

---

Currently it is unsupported to:
1. Convert a MlirAttribute with type i1 to a numpy array
2. Convert a boolean numpy array to a MlirAttribute

Currently the entire Python application violently crashes with a quite
poor error message https://github.com/pybind/pybind11/issues/3336

The complication handling these conversions, is that MlirAttribute
represent booleans as a bit-packed i1 type, whereas numpy represents
booleans as a byte array with 8 bit used per boolean.

This PR proposes the following approach:
1. When converting a i1 typed MlirAttribute to a numpy array, we can not
directly use the underlying raw data backing the MlirAttribute as a
buffer to Python, as done for other types. Instead, a copy of the data
is generated using numpy's unpackbits function, and the result is send
back to Python.
2. When constructing a MlirAttribute from a numpy array, first the
python data is read as a uint8_t to get it converted to the endianess
used internally in mlir. Then the booleans are bitpacked using numpy's
bitpack function, and the bitpacked array is saved as the MlirAttribute
representation.
2024-11-13 01:23:10 -05:00
Matthias Springer
804d3c4ce1
[mlir][IR] Add Block::isReachable helper function (#114928)
Add a new helper function `isReachable` to `Block`. This function
traverses all successors of a block to determine if another block is
reachable from the current block.

This functionality has been reimplemented in multiple places in MLIR.
Possibly additional copies in downstream projects. Therefore, moving it
to a common place.
2024-11-13 14:58:09 +09:00
Sushant Gokhale
9991ea28fc
[CostModel][AArch64] Make extractelement, with fmul user, free whenev… (#111479)
…er possible

In case of Neon, if there exists extractelement from lane != 0 such that
  1. extractelement does not necessitate a move from vector_reg -> GPR
  2. extractelement result feeds into fmul
3. Other operand of fmul is a scalar or extractelement from lane 0 or
lane equivalent to 0
then the extractelement can be merged with fmul in the backend and it
incurs no cost.

  e.g. 
  ```
define double @foo(<2 x double> %a) { 
    %1 = extractelement <2 x double> %a, i32 0 
    %2 = extractelement <2 x double> %a, i32 1
    %res = fmul double %1, %2    
    ret double %res
  }
```
  `%2` and `%res` can be merged in the backend to generate:
  `fmul    d0, d0, v0.d[1]`

The change was tested with SPEC FP(C/C++) on Neoverse-v2. 
**Compile time impact**: None
**Performance impact**: Observing 1.3-1.7% uplift on lbm benchmark with -flto depending upon the config.
2024-11-13 11:10:49 +05:30
Kazu Hirata
95554cbd77
[memprof] Teach extractCallsFromIR to recognize heap allocation functions (#115938)
This patch teaches extractCallsFromIR to recognize heap allocation
functions.  Specifically, when we encounter a callee that is known to
be a heap allocation function like "new", we set the callee GUID to 0.

Note that I am planning to do the same for the caller-callee pairs
extracted from the profile.  That is, when I encounter a frame that
does not have a callee, we assume that the frame is calling some heap
allocation function with GUID 0.

Technically, I'm not recognizing enough functions in this patch.
TCMalloc is known to drop certain frames in the call stack immediately
above new.  This patch is meant to lay the groundwork, setting up
GetTLI, plumbing it to extractCallsFromIR, and adjusting the unit
tests.  I'll address remaining issues in subsequent patches.
2024-11-12 21:37:29 -08:00
Matt Arsenault
5911fbb39d
AMDGPU: Do not fold copy to physreg from operation on frame index (#115977) 2024-11-12 21:35:51 -08:00
Alex Bradbury
2baead09b2 [docs] Add blank line before bulletpoint list to fix HowToAddABuilder
The bulletpoint list wasn't rendering properly due to a missing blank
line.
2024-11-13 05:26:02 +00:00
Valentin Clement (バレンタイン クレメン)
2583071fb4
[flang][cuda] Compute size of derived type arrays (#115914) 2024-11-12 21:23:58 -08:00
Jonas Devlieghere
4714215efb
[lldb] Support true/false in ValueObject::SetValueFromCString (#115780)
Support "true" and "false" (and "YES" and "NO" in Objective-C) in
ValueObject::SetValueFromCString.

Fixes #112597
2024-11-12 21:18:22 -08:00
Shilei Tian
de0fd64bed
[AMDGPU] Introduce a new generic target gfx9-4-generic (#115190)
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
2024-11-12 23:11:05 -05:00
Han-Kuan Chen
5a5502b9e1
[SLP] NFC. Use Value instead of template. (#115440) 2024-11-13 11:58:19 +08:00
Justin Fargnoli
274feef7dd
Reland "[NVPTX] Emit prmt selection value in hex" (#115952)
Initially landed in 3ed4b0b0efca7a9467ce83fc62de9413da38006d. 

Reverted in 375d1925dbd0c051fe2d4a86fe98ed08f4a502c5 because the
[`load-store.ll`](https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/NVPTX/load-store.ll)
test was not updated after 5e75880165553e9afb721239689a9c79ec84a108.

5e75880165553e9afb721239689a9c79ec84a108 is now updated in
7a99f2322c324972f2c5091dddd7752fa21d5a78.
2024-11-12 19:21:34 -08:00
Petr Hosek
5fa47d8c52
[libc] Support multilib with runtimes build (#115357)
This adds minimal support for multilibs akin to libc++.
2024-11-12 19:16:27 -08:00
Sirui Mu
e887f8290d
[mlir][LLVM] Add !invariant.group metadata to llvm.load and llvm.store (#115723)
This patch adds support for the `!invariant.group` metadata to the
`llvm.load` and the `llvm.store` operation.
2024-11-13 10:54:34 +08:00
Valentin Clement (バレンタイン クレメン)
37143fe27e
[flang][cuda] Make launch configuration optional for cuf kernel (#115947) 2024-11-12 16:49:44 -08:00
Tarun Prabhu
01d233ff40
Revert "[clang][flang] Support -time in both clang and flang"
Reverts llvm/llvm-project#109165

This created a buildbot failure on
[Fuchsia](https://lab.llvm.org/buildbot/#/builders/11/builds/8080).
2024-11-12 17:08:02 -07:00
Sterling-Augustine
7ba864b592
[SandboxVectorizer] Register erase callback for seed collection (#115951) 2024-11-12 16:03:27 -08:00
Matthias Springer
b0a4e958e8
[mlir][bufferization] Add support for non-unique func.return (#114017)
Multiple `func.return` ops inside of a `func.func` op are now supported
during bufferization. This PR extends the code base in 3 places:

- When inferring function return types, `memref.cast` ops are folded
away only if all `func.return` ops have matching buffer types. (E.g., we
don't fold if two `return` ops have operands with different layout
maps.)
- The alias sets of all `func.return` ops are merged. That's because
aliasing is a "may be" property.
- The equivalence sets of all `func.return` ops are taken only if they
match. If different `func.return` ops have different equivalence sets
for their operands, the equivalence information is dropped. That's
because equivalence is a "must be" property.

This commit is in preparation of removing the deprecated
`func-bufferize` pass. That pass can bufferize functions with multiple
`return` ops.
2024-11-13 08:51:39 +09:00
Michael Jones
d6219e6599
[libc] Make fstatvfs test less flakey (#115949) 2024-11-12 18:40:52 -05:00
Min-Yih Hsu
84e95beae9
[RISCV] Update SiFive P600's scheduling model on RVV instructions (#115243)
The biggest change is assigning vector crypto instructions to the
correct processor resource.

The majority of these changes are guided by our RVV-capable
llvm-exegesis.
2024-11-12 15:29:40 -08:00
Rahul Joshi
7b5e285d16
[NFC][Clang] Use range for loops in ClangDiagnosticsEmitter (#115573)
Use range based for loops in Clang diagnostics emitter.
2024-11-12 14:39:02 -08:00
Shlomi Regev
13317502da
[mlir] Add a null pointer check in symbol lookup (#115165)
Dead code analysis crashed because a symbol that is called/used didn't appear in the symbol
table. 
This patch fixes this by adding a nullptr check after symbol table lookup.
2024-11-12 23:31:25 +01:00
LLVM GN Syncbot
5a5122cac6 [gn build] Port 0e97b4d05a0b 2024-11-12 22:23:40 +00:00
Thorsten Schütt
0e97b4d05a
[GlobalISel] Combine G_MERGE_VALUES of x and undef (#113616)
into anyext x

; CHECK-NEXT: [[MV1:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[TRUNC]](s32),
[[DEF]](s32)

Please continue padding merge values.

//   %bits_8_15:_(s8) = G_IMPLICIT_DEF
//   %0:_(s16) = G_MERGE_VALUES %bits_0_7:(s8), %bits_8_15:(s8)

%bits_8_15 is defined by undef. Its value is undefined and we can pick
an arbitrary value. For optimization, we pick anyext, which plays well
with the undefinedness.

//   %0:_(s16) = G_ANYEXT %bits_0_7:(s8)

The upper bits of %0 are undefined and the lower bits come from
%bits_0_7.
2024-11-12 23:23:32 +01:00
Jorge Gorbe Moya
9d85ba5724
[SandboxIR] Preserve the order of switch cases after revert. (#115577)
Preserving the case order is not strictly necessary to preserve
semantics (for example, operations like SwitchInst::removeCase will
happily swap cases around). However, I'm planning to introduce an
optional verification step for SandboxIR that will use StructuralHash to
compare IR after a revert to the original IR to help catch tracker bugs,
and the order difference triggers a difference there.
2024-11-12 14:10:46 -08:00
Nikolas Klauser
a2042521a0
[libc++] Remove _AlgPolicy from std::copy and algorithms using std::copy (#115887)
`std::copy` doesn't use the `_AlgPolicy` for anything other than calling
itself with it, so we can just remove the argument. This also removes
the need in a few other algorithms which had an `_AlgPolicy` argument
only to call `copy`.
2024-11-12 23:03:52 +01:00
Alex Bradbury
8da61a3434
[llvm][docs] Expand HowToAddABuilder with guidance on testing locally (#115024)
With <https://github.com/llvm/llvm-zorg/pull/289> and <https://github.com/llvm/llvm-zorg/pull/293> landed, it's now reasonable to ask people to test their builder configurations locally. This patch adds documentation on how to do so.
2024-11-12 22:02:20 +00:00
lialan
24a8092be7
[MLIR] Avoid vector.extract_strided_slice when not needed (#115941)
In `staticallyExtractSubvector`, When the extracting slice is the same
as source vector, do not need to emit `vector.extract_strided_slice`.

This fixes the lit test case `@vector_store_i4` in
`mlir\test\Dialect\Vector\vector-emulate-narrow-type.mlir`, where
converting from `vector<8xi4>` to `vector<4xi8>` does not need slice
extraction.

The issue was introduced in #113411 and #115070, CI failure link:
https://buildkite.com/llvm-project/github-pull-requests/builds/118845

This PR does not include a lit test case because it is a fix and the
above mentioned `@vector_store_i4` test actually tests the mechanism.

Signed-off-by: Alan Li <me@alanli.org>
2024-11-12 13:58:58 -08:00
Nikolas Klauser
36fa8bdfa0
[libc++][NFC] Remove unused functions from <__split_buffer> (#115735) 2024-11-12 22:55:59 +01:00
Krzysztof Drewniak
49f90e798f
[mlir][affine] Cancel exactly-matching delinearize/linearize pairs (#115758)
If we linearize values (with an assertion tha they are disjoint) and
then delinearize that linear index with th exact same basis, we know
that these operations are exact inverses of each other and can be
replaced with the original inputs to the linearization.

Similarly, if we take a linear index, delinearize it with some bases,
and then re-linearize it with that same basis (noting that the outputs
of the delinearization are guaranteed to by `disjoint`, even if this is
not asserted on the linearize_index operation), the re-linearization is
the inverse of the delinearization, so those two operations can also be
canceled out.

This commit adds canonicalization patterns for these simple
cancelations.
2024-11-12 15:36:07 -06:00
Peng Sun
fe83a7282e
[TOSA] Introduce Tosa_ElementwiseUnaryOp with Type and Shape Enforcement (#115784)
* Enforce that Tosa_ElementwiseUnaryOp requires output tensors to match
the input tensor's type and shape.
* Update the following ops to conform to Tosa_ElementwiseUnaryOp: clamp,
erf, sigmoid, tanh, cos, sin, abs, bitwise_not, ceil, clz, exp, floor,
log, logical_not, negate, reciprocal, rsqrt.
* Add invalid tests for each operator to ensure compliance with TOSA
v1.0 Specification.

Signed-off-by: Peng Sun <peng.sun@arm.com>
2024-11-12 13:35:47 -08:00
Gábor Horváth
d2db9bd708
[clang][APINotes] Add support for the SwiftEscapable attribute (#115866)
This is similar to SwiftCopyable. Also fix missing SwiftCopyable dump
for TagInfo.
2024-11-12 21:34:56 +00:00
Tex Riddell
5c2a133b13
Emit constrained atan2 intrinsic for clang builtin (#113636)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

- `Builtins.td` - Add f16 support for libm atan2 builtin
- `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin
- `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of
atan2 for clang builtin to lib call calling convention check, now that
atan2 maps to an intrinsic.
- add atan2 cases to llvm.experimental.constrained tests for more
backends: ARM, PowerPC, RISCV, SystemZ.
- LangRef.rst: add llvm.experimental.constrained.atan2, revise
llvm.atan2 description.

Last part of Implement the atan2 HLSL Function. Fixes #70096.
2024-11-12 13:34:29 -08:00
Tarun Prabhu
f5396748c7
[clang][flang] Support -time in both clang and flang
The -time option prints timing information for the subcommands
(compiler, linker) in a format similar to that used by gcc/gfortran.

This partially addresses requests from #89888
2024-11-12 14:27:22 -07:00
John Harrison
e5ba117274
[lldb-dap] Remove g_dap references from lldb-dap/LLDBUtils. (#115933)
This refactor removes g_dap references from lldb-dap/LLDBUtils.{h,cpp}
to allow us to create more than one g_dap instance in the future.
2024-11-12 13:19:17 -08:00
Nikolas Klauser
5b67372aec [libc++] Remove a few unused includes from <__algorithm/find_end.h> 2024-11-12 22:11:15 +01:00
Craig Topper
4bd6e15a45 [RISCV][GISel] Sync MaxIterations/ObserverLvl/EnableFullDCE for PreLegalizer combiners with AArch64. 2024-11-12 13:07:51 -08:00
Michael Jones
6aa7403858
[libc] Fix fpbits test running 80bit ld everywhere (#115937)
After #115084 the 80 bit long double tests error if sizeof(long double)
isn't 96 or 128 bits. This caused failures in long double is double
systems (since long double is 64 bits) so I've disabled the 80 bit long
double tests on systems that don't use them.
2024-11-12 12:52:08 -08:00
Benjamin Maxwell
014455a587
[SDAG] Limit sincos/frexp stack slot folding to stores chained to entry (#115906)
When the chain is not the entry node there is a risk the stores are
within a (CALLSEQ_START, CALLSEQ_END), which when the node is expanded
will lead to nested call sequences.

It should be possible to check for this and allow more cases, but for
now, let's limit this to cases where it's definitely safe.

Fixes #115323
2024-11-12 20:48:41 +00:00
Miguel A. Arroyo
5cd6e21bdd
[LLD][COFF] allow saving intermediate files with /lldsavetemps (#115131)
* Parity with the `-save-temps=` flag in the `ELF` `lld` driver.
2024-11-12 22:30:48 +02:00
Haojian Wu
70d6789c7a [bazel] Port for 7302c8dbe71b7c03b73a35a21fa4b415fa1f4505 2024-11-12 21:06:19 +01:00
Maksim Panchenko
d922045381
[BOLT] Use AsmInfo for address size. NFCI (#115932)
Use AsmInfo instead of DWARFObj interface for extracting address size
and format.
2024-11-12 11:53:34 -08:00
Maksim Panchenko
be89e794f7
[BOLT][AArch64] Add support for long absolute LLD thunks/veneers (#113408)
Absolute thunks generated by LLD reference function addresses recorded
as data in code. Since they are generated by the linker, they don't have
relocations associated with them and thus the addresses are left
undetected. Use pattern matching to detect such thunks and handle them
in VeneerElimination pass.
2024-11-12 11:27:08 -08:00
Krystian Stasiowski
3ab5927b97
[Clang][Comments] Make @relates an inline comment command (#115040)
According to the Doxygen documentation,
the `relates`, `related`, `relatesalso`, and `relatedalso` commands all
have a single argument. This patch changes their classification from
`VerbatimLineCommand` to `InlineCommand` so the argument is correctly
parsed.
2024-11-12 14:18:28 -05:00