This allows formatting large integers in a human friendly way. Example:
"5321584" -> "5.32M".
Use it where such human numbers are generated manually today.
This PR re-introduces the functionality of
https://github.com/llvm/llvm-project/pull/113064, which was reverted in
0a68171b3c
due to memory lifetime issues.
Notice that I was not able to re-produce the ASan results myself, so I
have not been able to verify that this PR really fixes the issue.
---
Currently it is unsupported to:
1. Convert a MlirAttribute with type i1 to a numpy array
2. Convert a boolean numpy array to a MlirAttribute
Currently the entire Python application violently crashes with a quite
poor error message https://github.com/pybind/pybind11/issues/3336
The complication handling these conversions, is that MlirAttribute
represent booleans as a bit-packed i1 type, whereas numpy represents
booleans as a byte array with 8 bit used per boolean.
This PR proposes the following approach:
1. When converting a i1 typed MlirAttribute to a numpy array, we can not
directly use the underlying raw data backing the MlirAttribute as a
buffer to Python, as done for other types. Instead, a copy of the data
is generated using numpy's unpackbits function, and the result is send
back to Python.
2. When constructing a MlirAttribute from a numpy array, first the
python data is read as a uint8_t to get it converted to the endianess
used internally in mlir. Then the booleans are bitpacked using numpy's
bitpack function, and the bitpacked array is saved as the MlirAttribute
representation.
Add a new helper function `isReachable` to `Block`. This function
traverses all successors of a block to determine if another block is
reachable from the current block.
This functionality has been reimplemented in multiple places in MLIR.
Possibly additional copies in downstream projects. Therefore, moving it
to a common place.
…er possible
In case of Neon, if there exists extractelement from lane != 0 such that
1. extractelement does not necessitate a move from vector_reg -> GPR
2. extractelement result feeds into fmul
3. Other operand of fmul is a scalar or extractelement from lane 0 or
lane equivalent to 0
then the extractelement can be merged with fmul in the backend and it
incurs no cost.
e.g.
```
define double @foo(<2 x double> %a) {
%1 = extractelement <2 x double> %a, i32 0
%2 = extractelement <2 x double> %a, i32 1
%res = fmul double %1, %2
ret double %res
}
```
`%2` and `%res` can be merged in the backend to generate:
`fmul d0, d0, v0.d[1]`
The change was tested with SPEC FP(C/C++) on Neoverse-v2.
**Compile time impact**: None
**Performance impact**: Observing 1.3-1.7% uplift on lbm benchmark with -flto depending upon the config.
This patch teaches extractCallsFromIR to recognize heap allocation
functions. Specifically, when we encounter a callee that is known to
be a heap allocation function like "new", we set the callee GUID to 0.
Note that I am planning to do the same for the caller-callee pairs
extracted from the profile. That is, when I encounter a frame that
does not have a callee, we assume that the frame is calling some heap
allocation function with GUID 0.
Technically, I'm not recognizing enough functions in this patch.
TCMalloc is known to drop certain frames in the call stack immediately
above new. This patch is meant to lay the groundwork, setting up
GetTLI, plumbing it to extractCallsFromIR, and adjusting the unit
tests. I'll address remaining issues in subsequent patches.
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
Initially landed in 3ed4b0b0efca7a9467ce83fc62de9413da38006d.
Reverted in 375d1925dbd0c051fe2d4a86fe98ed08f4a502c5 because the
[`load-store.ll`](https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/NVPTX/load-store.ll)
test was not updated after 5e75880165553e9afb721239689a9c79ec84a108.
5e75880165553e9afb721239689a9c79ec84a108 is now updated in
7a99f2322c324972f2c5091dddd7752fa21d5a78.
Multiple `func.return` ops inside of a `func.func` op are now supported
during bufferization. This PR extends the code base in 3 places:
- When inferring function return types, `memref.cast` ops are folded
away only if all `func.return` ops have matching buffer types. (E.g., we
don't fold if two `return` ops have operands with different layout
maps.)
- The alias sets of all `func.return` ops are merged. That's because
aliasing is a "may be" property.
- The equivalence sets of all `func.return` ops are taken only if they
match. If different `func.return` ops have different equivalence sets
for their operands, the equivalence information is dropped. That's
because equivalence is a "must be" property.
This commit is in preparation of removing the deprecated
`func-bufferize` pass. That pass can bufferize functions with multiple
`return` ops.
The biggest change is assigning vector crypto instructions to the
correct processor resource.
The majority of these changes are guided by our RVV-capable
llvm-exegesis.
Dead code analysis crashed because a symbol that is called/used didn't appear in the symbol
table.
This patch fixes this by adding a nullptr check after symbol table lookup.
into anyext x
; CHECK-NEXT: [[MV1:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[TRUNC]](s32),
[[DEF]](s32)
Please continue padding merge values.
// %bits_8_15:_(s8) = G_IMPLICIT_DEF
// %0:_(s16) = G_MERGE_VALUES %bits_0_7:(s8), %bits_8_15:(s8)
%bits_8_15 is defined by undef. Its value is undefined and we can pick
an arbitrary value. For optimization, we pick anyext, which plays well
with the undefinedness.
// %0:_(s16) = G_ANYEXT %bits_0_7:(s8)
The upper bits of %0 are undefined and the lower bits come from
%bits_0_7.
Preserving the case order is not strictly necessary to preserve
semantics (for example, operations like SwitchInst::removeCase will
happily swap cases around). However, I'm planning to introduce an
optional verification step for SandboxIR that will use StructuralHash to
compare IR after a revert to the original IR to help catch tracker bugs,
and the order difference triggers a difference there.
`std::copy` doesn't use the `_AlgPolicy` for anything other than calling
itself with it, so we can just remove the argument. This also removes
the need in a few other algorithms which had an `_AlgPolicy` argument
only to call `copy`.
In `staticallyExtractSubvector`, When the extracting slice is the same
as source vector, do not need to emit `vector.extract_strided_slice`.
This fixes the lit test case `@vector_store_i4` in
`mlir\test\Dialect\Vector\vector-emulate-narrow-type.mlir`, where
converting from `vector<8xi4>` to `vector<4xi8>` does not need slice
extraction.
The issue was introduced in #113411 and #115070, CI failure link:
https://buildkite.com/llvm-project/github-pull-requests/builds/118845
This PR does not include a lit test case because it is a fix and the
above mentioned `@vector_store_i4` test actually tests the mechanism.
Signed-off-by: Alan Li <me@alanli.org>
If we linearize values (with an assertion tha they are disjoint) and
then delinearize that linear index with th exact same basis, we know
that these operations are exact inverses of each other and can be
replaced with the original inputs to the linearization.
Similarly, if we take a linear index, delinearize it with some bases,
and then re-linearize it with that same basis (noting that the outputs
of the delinearization are guaranteed to by `disjoint`, even if this is
not asserted on the linearize_index operation), the re-linearization is
the inverse of the delinearization, so those two operations can also be
canceled out.
This commit adds canonicalization patterns for these simple
cancelations.
* Enforce that Tosa_ElementwiseUnaryOp requires output tensors to match
the input tensor's type and shape.
* Update the following ops to conform to Tosa_ElementwiseUnaryOp: clamp,
erf, sigmoid, tanh, cos, sin, abs, bitwise_not, ceil, clz, exp, floor,
log, logical_not, negate, reciprocal, rsqrt.
* Add invalid tests for each operator to ensure compliance with TOSA
v1.0 Specification.
Signed-off-by: Peng Sun <peng.sun@arm.com>
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
- `Builtins.td` - Add f16 support for libm atan2 builtin
- `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin
- `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of
atan2 for clang builtin to lib call calling convention check, now that
atan2 maps to an intrinsic.
- add atan2 cases to llvm.experimental.constrained tests for more
backends: ARM, PowerPC, RISCV, SystemZ.
- LangRef.rst: add llvm.experimental.constrained.atan2, revise
llvm.atan2 description.
Last part of Implement the atan2 HLSL Function. Fixes#70096.
The -time option prints timing information for the subcommands
(compiler, linker) in a format similar to that used by gcc/gfortran.
This partially addresses requests from #89888
After #115084 the 80 bit long double tests error if sizeof(long double)
isn't 96 or 128 bits. This caused failures in long double is double
systems (since long double is 64 bits) so I've disabled the 80 bit long
double tests on systems that don't use them.
When the chain is not the entry node there is a risk the stores are
within a (CALLSEQ_START, CALLSEQ_END), which when the node is expanded
will lead to nested call sequences.
It should be possible to check for this and allow more cases, but for
now, let's limit this to cases where it's definitely safe.
Fixes#115323
Absolute thunks generated by LLD reference function addresses recorded
as data in code. Since they are generated by the linker, they don't have
relocations associated with them and thus the addresses are left
undetected. Use pattern matching to detect such thunks and handle them
in VeneerElimination pass.
According to the Doxygen documentation,
the `relates`, `related`, `relatesalso`, and `relatedalso` commands all
have a single argument. This patch changes their classification from
`VerbatimLineCommand` to `InlineCommand` so the argument is correctly
parsed.