Some of the changes in the patch include:
1. Using iterators instead of instruction pointers when applicable.
2. Modifying Polly functions to accept iterators instead of inst
pointers.
3. Updating API usages such as use begin instead of front.
let constructor is legacy (do not use in tree!) since the tableGen
backend emits most of the glue logic to build a pass.
Note: The following constructor has been retired:
```cpp
std::unique_ptr<Pass> createAsyncParallelForPass(bool asyncDispatch,
int32_t numWorkerThreads,
int32_t minTaskSize);
```
To update your codebase, replace it with the new options-based API:
```cpp
AsyncParallelForPassOptions options{/*asyncDispatch=*/, /*numWorkerThreads=*/, /*minTaskSize=*/};
createAsyncParallelForPass(options);
```
An i8 and i16 vector extract/insert has to go via a i32 to make sure the
types are legal. This patch adds patterns for extract from a i8/i16
vector, inserted into a i16/i32 vector. This avoids the round trip via a
GPR which can limit performance.
Support the ptrtoint(gep null, x) -> x and ptrtoint(gep inttoptr(x), y)
-> x+y folds for the case where there is a chain of geps that ends in
null or inttoptr. This avoids some regressions from #137297.
While here, also be a bit more careful about edge cases like pointer to
vector splats and mismatched pointer and index size.
As reported in issue #103477, visibility of instantiated member
functions used to be ignored when calculating visibility of a
specialization.
This patch modifies `getLVForClassMember` to look up for a source
template for an instantiated member, and changes `mergeTemplateLV` to
apply it.
A similar issue was reported in #31462, but it seems that `extern`
declaration with visibility prevents the function from being emitted
as hidden. This behavior seems correct, even though GCC emits it as
with default visibility instead.
Both tests from #103477 and #31462 are added as LIT tests `test72` and
`test73` respectively.
MemberPointerType may refer to a dependent class (qualifier), for
which getMostRecentCXXRecordDecl returns NULL. It seems that the
compiler never executed this code path before patch #136128 where the
issue was reported.
LIT tests 74 and 75 are reduced from Chromium and LLVM libc test
harness as reported in #136128.
Function member (test74):
MemberPointerType 'type-parameter-0-0 (type-parameter-0-1::*)(void)' dependent
|-TemplateTypeParmType 'type-parameter-0-1' dependent depth 0 index 1
`-FunctionProtoType 'type-parameter-0-0 (void)' dependent cdecl
`-TemplateTypeParmType 'type-parameter-0-0' dependent depth 0 index 0
Template parameter (test75):
MemberPointerType 'type-parameter-0-1 type-parameter-0-0::*' dependent
|-TemplateTypeParmType 'type-parameter-0-0' dependent depth 0 index 0
`-TemplateTypeParmType 'type-parameter-0-1' dependent depth 0 index 1
This patch fixes:
mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp:1304:3: error: default
label in switch which covers all enumeration values
[-Werror,-Wcovered-switch-default]
…when SVE is available
Currently,
```
sdiv(x, y) --> cmlt + usra + sshr , where y is positive pow-2 integer
sdiv(x, y) --> cmlt + usra + sshr + neg , where y is negative pow-2 integer
```
Patch aims to transform this into
```
sdiv(x, y) --> ptrue + asrd , where y is positive pow-2 integer
sdiv(x, y) --> ptrue + asrd + subr , where y is negative pow-2 integer
```
Address-discriminated __ptrauth types do not have unique object
representations so they are not trivially comparable. Test all other
trivialities too even though they are not incorrect.
Fixes#137473
The last use was removed by:
commit f8afb8fdedae04ad2670857c97925c439d47d862
Author: Aaron Puchert <aaron.puchert@sap.com>
Date: Fri Apr 29 22:12:21 2022 +0200
I noticed this when working on a patch downstream, and I don't think
this is an issue upstream yet.
But if a VPWidenIntrinsicRecipe is created without an underlying
CallInst, e.g. in createEVLRecipe, it will crash if you try to clone it
because it assumes the CallInst always exists.
This fixes it by using the CallInst-less constructor in this case.
With EVL tail folding an AnyOf reduction will end up emitting an i1
vp.merge.
Unfortunately due to RVV not containing any tail undisturbed mask
instructions, an i1 vp.merge will get expanded to a lengthy sequence:
```asm
vsetvli a1, zero, e64, m1, ta, ma
vid.v v10
vmsltu.vx v10, v10, a0
vmand.mm v9, v9, v10
vmandn.mm v8, v8, v9
vmand.mm v9, v0, v9
vmor.mm v0, v9, v8
```
This addresses this by matching this specific AnyOf pattern in
RISCVCodegenPrepare and widening it from i1 to i8, which will end up
producing a single masked i8 vor.vi inside the loop:
```llvm
loop:
%phi = phi <vscale x 4 x i1> [ zeroinitializer, %entry ], [ %rec, %loop ]
%cmp = icmp ...
%rec = call <vscale x 4 x i1> @llvm.vp.merge(%cmp, true, %phi, %evl)
```
```llvm
loop:
%phi = phi <vscale x 4 x i8> [ zeroinitializer, %entry ], [ %rec, %loop ]
%cmp = icmp ...
%rec = call <vscale x 4 x i8> @llvm.vp.merge(%cmp, true, %phi, %evl)
%trunc = trunc <vscale x 4 x i8> %rec to <vscale x 4 x i1>
```
I ended up adding this in RISCVCodegenPrepare instead of the
LoopVectorizer itself since it would have required adding a target hook.
It may also be possible to generalize this to other i1 vp.merges in
future.
Normally the trunc will be sunk outside of the loop. But it also doesn't
check to see if all the non-phi users of the vp.merge are outside of the
loop: If there are in-loop users this still seems to be profitable, see
the test diff in `@widen_anyof_rdx_use_in_loop`
Fixes#132180
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801
for context.
This is a non-functional change which just changes the interface of
GlobalValue, in preparation for future functional changes. This part
touches a fair few users, so is split out for ease of review. Future
changes to the GlobalValue implementation can then be focused purely on
that class.
This does the following:
* Rename GlobalValue::getGUID(StringRef) to
getGUIDAssumingExternalLinkage. This is simply making explicit at the
callsite what is currently implicit.
* Where possible, migrate users to directly calling getGUID on a
GlobalValue instance.
* Otherwise, where possible, have them call the newly renamed
getGUIDAssumingExternalLinkage, to make the assumption explicit.
There are a few cases where neither of the above are possible, as the
caller saves and reconstructs the necessary information to compute the
GUID themselves. We want to migrate these callers eventually, but for
this first step we leave them be.
The debugserver code predates modern C++, but with C++11 and later
there's no need to have something like PThreadMutex. This migrates
MachProcess away from PThreadMutex in preparation for removing it.
The debugserver code predates modern C++, but with C++11 and later
there's no need to have something like PThreadMutex. This migrates
RNBRemote away from PThreadMutex in preparation for removing it.