This reverts commit 6f62979a5a5bcf70d65f23e0991a274e6df5955b.
The reapplies commit 80ea5f46df3e365a0a2112889bb91732167b6214.
That commit was reverted because it was causing compiler-rt test
failures due to tysan not having its dependencies set up properly within
CMake. That situation has since been rectified in
3cef099ceddccefca8e11268624397cde9e04af6.
Reviewers: lnihlen, rnk, gburgessiv, cmtice
Reviewed By: rnk, cmtice
Pull Request: https://github.com/llvm/llvm-project/pull/144033
This patch enhances the PtrReplacer as follows:
1. Users are now collected iteratively to be generous on the stack. In the case of PHIs with incoming values which have not yet been visited, they are pushed back into the stack for reconsideration.
2. Replace users of the pointer root in a reverse-postorder traversal, instead of a simpletraversal over the collected users. This reordering ensures that the uses of an instruction are replaced before replacing the instruction itself.
3. During the replacement of PHI, use the same incoming value if it does not have a replacement.
This patch specifically fixes the case when an incoming value of a PHI
is addrspacecasted.
This is a reland of https://github.com/llvm/llvm-project/pull/137215.
As was established
[previously](https://github.com/llvm/llvm-project/pull/140957), we
created a structure to model a resource range and to detect an overlap
in a given set of these.
However, a resource range only overlaps with another resource range if
they have:
- equivalent ResourceClass (SRV, UAV, CBuffer, Sampler)
- equivalent resource name-space
- overlapping shader visibility
For instance, the following don't overlap even though they have the same
register range:
- `CBV(b0)` and `SRV(t0)` (different resource class)
- `CBV(b0, space = 0)` and `CBV(b0, space = 1)` (different space)
- `CBV(b0, visibility = Pixel)` and `CBV(b0, visibility = Domain)`
(non-overlapping visibility)
The first two clauses are naturally modelled by grouping all the
`RangeInfo`s that have the equivalent `ResourceClass` and `Space` values
together and check if there is any overlap on a `ResourceRange` for all
these `RangeInfo`s. However, `Visibility` is not quite as easily mapped
(`Visibility = All` would overlap with any other visibility). So we will
instead need to track a `ResourceRange` for each of the `Visibility`
types in a group. Then we can determine when inserting a range of the
same group if it would overlap with any overlapping visibilities.
The collection of `RangeInfo` for `RootDescriptor`s, sorting of the
`RangeInfo`s into the groups and finally the insertion of each point
into their respective `ResourceRange`s are implemented. Furthermore, we
integrate this into `SemaHLSL` to provide a diagnostic for each entry
function that uses the invalid root signature.
- Implements collection of `RangeInfo` for `RootDescriptors`
- Implements resource range validation in `SemaHLSL`
- Add diagnostic testing of error production in
`RootSignature-resource-ranges-err.hlsl`
- Add testing to ensure no errors are raised in valid root signatures
`RootSignature-resource-ranges.hlsl`
Part 2 of https://github.com/llvm/llvm-project/issues/129942
A final pr will be produced to integrate the analysis of
`DescriptorTable`, `StaticSampler` and `RootConstants` by defining how
to construct the `RangeInfo` from their elements respectively.
Remove SymbolToFileName mapping from every local symbol to its
containing FILE symbol name, and reuse FileSymbols to disambiguate
local symbols instead.
Also removes the check for `ld-temp.o` file symbol which was added to
prevent LTO build mode from affecting the disambiguated name. This may
cause incompatibility when using the profile collected on a binary built
in a different mode than the input binary.
Addresses #90661.
Speeds up discover file objects by 5-10% for large binaries:
- binary with ~1.2M symbols: 12.6422s -> 12.0297s
- binary with ~4.5M symbols: 48.8851s -> 43.7315s
I happened to notice that when legalizing get.active.lane.mask with
large vectors we were materializing via constant pool instead of just
shifting by a constant.
We should probably be doing a full cost comparison for the different
lowering strategies as opposed to our current adhoc heuristics, but the
few cases this regresses seem pretty minor. (Given the reduction in vset
toggles, they might not be regressions at all.)
---------
Co-authored-by: Craig Topper <craig.topper@sifive.com>
This patch adds support to the haswell sub-architecture (x86_64h) to
scripted processes.
rdar://147208252
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
While the source code isn't supposed to change during a build, in some
environments it does. This adds an option that disables caching of stat
failures, meaning that source files can be added to the build during
scanning.
This adds a `-no-cache-negative-stats` option to clang-scan-deps to
enable this behavior. There are no tests for clang-scan-deps as there's
no reliable way to do so from it. A unit test has been added that
modifies the filesystem between scans to test it.
When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.
Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.
The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:
CFI enabled: +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]
The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.
This optimization is implemented for AArch64 and X86_64 only.
lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:
```
N Min Max Median Avg Stddev
x 512 1.2264546 1.3481076 1.2970261 1.2965788 0.018620888
+ 512 1.2561196 1.3839965 1.3214632 1.3209327 0.019443971
Difference at 95.0% confidence
0.0243538 +/- 0.00233202
1.87831% +/- 0.179859%
(Student's t, pooled s = 0.0190369)
```
[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057
Pull Request: https://github.com/llvm/llvm-project/pull/138366
CPA stands for Checked Pointer Arithmetic and is part of the 2023 MTE
architecture extensions for A-profile.
The new CPA instructions perform regular pointer arithmetic (such as
base register + offset) but check for overflow in the most significant
bits of the result, enhancing security by detecting address tampering.
In this patch we intend to capture the semantics of pointer arithmetic
when it is not folded into loads/stores, then generate the appropriate
scalar CPA instructions. In order to preserve pointer arithmetic
semantics through the backend, we use the PTRADD SelectionDAG node type.
Use backend option `-aarch64-use-featcpa-codegen=true` to enable CPA
CodeGen (for a target with CPA enabled).
The story of this PR is that initially it introduced the PTRADD
SelectionDAG node and the respective visitPTRADD() function, adapted
from the CHERI/Morello LLVM tree. The original authors are
@davidchisnall, @jrtc27, @arichardson.
After a while, @ritter-x2a took the part of the code that was
target-independent and merged it separately in #140017. This PR thus
remains as the AArch64-part only.
Mode details about the CPA extension can be found at:
-
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023
- https://developer.arm.com/documentation/ddi0602/2023-09/ (e.g ADDPT
instruction)
This PR follows #79569.
It does not address vector FEAT_CPA instructions.
This change speeds up fragment matching for large BOLTed binaries where
all fragments of global parent functions are put under `bolt-pseudo.o`
file symbol:
- before: iterating over symbols under `bolt-pseudo.o` only to fail
to find a parent,
- after: bail out immediately and use a global parent by name.
Test Plan: NFC, updated register-fragments-bolt-symbols.s
`BoltAddressTranslation::getFallthroughsInTrace` iterates over address
translation map entries and therefore has direct access to both original
and translated offsets. Return the translated offsets in fall-throughs
list to avoid duplicate address translation inside `doTrace`.
Test Plan: NFC
fixes#144608
- there is a getPointerOperandIndex function so we don't need to iterate
the operands trying to find the pointer. This resulted in a small
cleanup to visitStoreInst and visitLoadInst.
- The meat of this change was in visitGetElementPtrInst to account for
allocas and not bail when we don't find a global.
Implements descriptor table parsing from root signature metadata. This
is required to support root signatures in hlsl.
Closes: #[126640](https://github.com/llvm/llvm-project/issues/126640)
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
Going mostly by the comment here - but it says "vscale is not
necessarily a power-of-2". Both in tree targets have vscale as a power
of two, and we have an existing TTI hook for that.
This patch adds the `PtrLikeTypeInterface` type interface to identify
pointer-like types. This interface is defined as:
```
A ptr-like type represents an object storing a memory address. This object
is constituted by:
- A memory address called the base pointer. This pointer is treated as a
bag of bits without any assumed structure. The bit-width of the base
pointer must be a compile-time constant. However, the bit-width may remain
opaque or unavailable during transformations that do not depend on the
base pointer. Finally, it is considered indivisible in the sense that as
a `PtrLikeTypeInterface` value, it has no metadata.
- Optional metadata about the pointer. For example, the size of the memory
region associated with the pointer.
Furthermore, all ptr-like types have two properties:
- The memory space associated with the address held by the pointer.
- An optional element type. If the element type is not specified, the
pointer is considered opaque.
```
This patch adds this interface to `!ptr.ptr` and the `memref` type.
Furthermore, this patch adds necessary ops and type to handle casting
between `!ptr.ptr` and ptr-like types.
First, it defines the `!ptr.ptr_metadata` type. An opaque type to
represent the metadata of a ptr-like type. The rationale behind adding
this type, is that at high-level the metadata of a type like `memref`
cannot be specified, as its structure is tied to its lowering.
The `ptr.get_metadata` operation was added to extract the opaque pointer
metadata. The concrete structure of the metadata is only known when the
op is lowered.
Finally, this patch adds the `ptr.from_ptr` and `ptr.to_ptr` operations.
Allowing to cast back and forth between `!ptr.ptr` and ptr-like types.
```mlir
func.func @func(%mr: memref<f32, #ptr.generic_space>) -> memref<f32, #ptr.generic_space> {
%ptr = ptr.to_ptr %mr : memref<f32, #ptr.generic_space> -> !ptr.ptr<#ptr.generic_space>
%mda = ptr.get_metadata %mr : memref<f32, #ptr.generic_space>
%res = ptr.from_ptr %ptr metadata %mda : !ptr.ptr<#ptr.generic_space> -> memref<f32, #ptr.generic_space>
return %res : memref<f32, #ptr.generic_space>
}
```
It's future work to replace and remove the `bare-ptr-convention` through
the use of these ops.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Newly introduced Atomic.cpp fails to compile on its own, but somehow
compiles fine in the build. Maybe it's because PCH, but it needs to be
fixed nevertheless.
Fixes#141136
- Implement `visitExtractElementInst` and `visitInsertElementInst` in
`DXILDataScalarizerVisitor` to scalarize `extractelement` and
`insertelement` instructions whose index operand is not a `ConstantInt`
by converting the vector to an array and then loading from the array
- Rename the `replaceVectorWithArray` helper function to
`equivalentArrayTypeFromVector`, relocate the function toward the top of
the file, and remove the unused `Ctx` parameter
This tries to be a bit smarter for the OLD behaviour of CMP0116, to glob
more relevant directories looking for possible dependencies.
The changes are:
- Remove some duplication of lines in the `tablegen` function.
- Put CURRENT_SOURCE_DIR into `tblgen_includes` (at the front)
- Glob all directories in `tblgen_includes`
- Give up on `local_tds` which was wrong when using tablegen to compile
a file in a different directory (as TargetParser does)
- Use `EXTRA_INCLUDES` in TargetParser `tablegen` calls.
This is still an under-approximation of what might be included, at least
comparing the RISCVTargetParserDef.inc.d (after building
`target_parser_gen`), and the list of deps in the ninja file when
explicitly setting CMP0116 to OLD.
Fixes#144639
Changes:
* Decouple layout propagation from subgroup distribution and move it to
an independent pass.
* Refine layout assignment to handle control-flow ops correctly (scf.for, scf.while).
* Refine test cases.
In Aya/Rust, BPF map definitions are nested in two nested types:
* A struct representing the map type (e.g., `HashMap`, `RingBuf`) that
provides methods for interacting with the map type (e.g. `HashMap::get`,
`RingBuf::reserve`).
* An `UnsafeCell`, which informs the Rust compiler that the type is
thread-safe and can be safely mutated even as a global variable. The
kernel guarantees map operation safety.
This leads to a type hierarchy like:
```rust
pub struct HashMap<K, V, const M: usize, const F: usize = 0>(
core::cell::UnsafeCell<HashMapDef<K, V, M, F>>,
);
const BPF_MAP_TYPE_HASH: usize = 1;
pub struct HashMapDef<K, V, const M: usize, const F: usize = 0> {
r#type: *const [i32; BPF_MAP_TYPE_HASH],
key: *const K,
value: *const V,
max_entries: *const [i32; M],
map_flags: *const [i32; F],
}
```
Then used in the BPF program code as a global variable:
```rust
#[link_section = ".maps"]
static HASH_MAP: HashMap<u32, u32, 1337> = HashMap::new();
```
Which is an equivalent of the following BPF map definition in C:
```c
#define BPF_MAP_TYPE_HASH 1
struct {
int (*type)[BPF_MAP_TYPE_HASH];
typeof(int) *key;
typeof(int) *value;
int (*max_entries)[1337];
} map_1 __attribute__((section(".maps")));
```
Accessing the actual map definition requires traversing:
```
HASH_MAP -> __0 -> value
```
Previously, the BPF backend only visited the pointee types of the
outermost struct, and didn’t descend into inner wrappers. This caused
issues when the key/value types were custom structs:
```rust
// Define custom structs for key and values.
pub struct MyKey(u32);
pub struct MyValue(u32);
#[link_section = ".maps"]
#[export_name = "HASH_MAP"]
pub static HASH_MAP: HashMap<MyKey, MyValue, 10> = HashMap::new();
```
These types weren’t fully visited and appeared in BTF as forward
declarations:
```
#30: <FWD> 'MyKey' kind:struct
#31: <FWD> 'MyValue' kind:struct
```
The fix is to enhance `visitMapDefType` to recursively visit inner
composite members. If a member is a composite type (likely a wrapper),
it is now also visited using `visitMapDefType`, ensuring that the
pointee types of the innermost stuct members, like `MyKey` and
`MyValue`, are fully resolved in BTF.
With this fix, the correct BTF entries are emitted:
```
#6: <STRUCT> 'MyKey' sz:4 n:1
#00 '__0' off:0 --> [7]
#7: <INT> 'u32' bits:32 off:0
#8: <PTR> --> [9]
#9: <STRUCT> 'MyValue' sz:4 n:1
#00 '__0' off:0 --> [7]
```
Fixes: #143361
This patch adds an off-by-default flag which, when enabled via `-mllvm -msan-poison-undef-vectors=true`, fixes a false negative in MSan (partially-undefined constant fixed-length vectors). It is currently off by default since, by fixing the false positive, code/tests that previously passed MSan may start failing. The default will be changed in a future patch.
Prior to this patch, MSan computes that partially-undefined constant fixed-length vectors are fully initialized, which leads to false negatives; moreover, benign vector rewriting could theoretically flip MSan's shadow computation from initialized to uninitialized or vice-versa (*). `-msan-poison-undef-vectors=true` calculates the shadow precisely: for each element of the vector, the corresponding shadow is fully uninitialized if the element is undefined/poisoned, otherwise it is fully initialized.
Updates the test from https://github.com/llvm/llvm-project/pull/143823
(*) For example:
```
%x = insertelement <2 x i64> <i64 0, i64 poison>, i64 42, i64 0
%y = insertelement <2 x i64> <i64 poison, i64 poison>, i64 42, i64 0
```
%x and %y are equivalent but, prior to this patch, MSan incorrectly computes the shadow of %x as <0, 0> rather than <0, -1>.
Define the __dmr1024 type used to manipulate the new DMR registers
introduced by the Dense Math Facility (DMF) on PowerPC, and add six
Clang builtins that correspond to the integer outer-product accumulate
to ACC PowerPC instructions:
* __builtin_mma_dmxvi8gerx4
* __builtin_mma_pmdmxvi8gerx4
* __builtin_mma_dmxvi8gerx4pp
* __builtin_mma_pmdmxvi8gerx4pp
* __builtin_mma_dmxvi8gerx4spp
* __builtin_mma_pmdmxvi8gerx4spp.
isComplete previously meant different things for different conversion
directions.
Refactored bytes_processed to bytes_stored which now consistently
increments on every push and decrements on pop making both directions
more consistent with each other
This patch disables unexpected_disabled_cpp17.verify.cpp under clang
modules builds because it changes diagnostics criteria post #143423,
causing the test to fail.
This patch follows a similar style to 853059a15011fd8b57dd0.
This was found when working on trying to land #144033.
See work detail: https://github.com/iree-org/iree/issues/20920
Add support for f4E2M1FN in `arith.truncf` and `arith.extf` ops though a software emulation
---------
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Adds support to MSan for `free_sized` and `free_aligned_sized` from C23.
Other sanitizers will be handled with their own separate PRs.
For https://github.com/llvm/llvm-project/issues/144435
Signed-off-by: Justin King <jcking@google.com>