This patch contains a list of tests that are currently failing in the
LLVM_ENABLE_PROFCHECK=ON build. This enables passing them to lit through
the LIT_XFAIL env variable. This is necessary for getting a buildbot
spun up to catch regressions while work is being done to fix the
existing issues.
We need to keep this in the LLVM tree so that tests can be removed from
the list at the same time the passes causing issues are fixed.
Issue #147390
LSan was recently refactored to call GetMaxUserVirtualAddress for
diagnostic purposes. This leads to failures for some of our downstream
tests which only run with lsan. This occurs because
GetMaxUserVirtualAddress depends on setting up shadow via a call to
__sanitizer_shadow_bounds, but shadow bounds aren't set for standalone
lsan because it doesn't use shadow. This updates the function to invoke
the same syscall used by __sanitizer_shadow_bounds calls for getting the
memory limit. Ideally this function would only be called once since we
only need to get the bounds once.
More context in https://fxbug.dev/437346226.
Operate directly on the existing Ops vector instead of copying to
a new vector. This is similar to what the autogenerated codegen
does for other intrinsics.
Currently only cases rooted at a full copy of an MFMA result are
handled.
Prepare to relax that by testing more intricate subregister usage.
Currently only full copies are handled, add some tests to help work
towards handling subregisters.
Previously it would just assert if the extract needed elements from
both halves. Extract the individual elements from both halves and
create a new vector, as the simplest implementation. This could
try to do better and create a partial extract or shuffle (or
maybe that's best left for the combiner to figure out later).
Fixes secondary issue noticed as part of #153808
`VPEVLBasedIVPHIRecipe` will lower to VPInstruction scalar phi and
generate scalar phi. This recipe will only occupy a scalar register just
like other phi recipes.
This patch fix the register usage for `VPEVLBasedIVPHIRecipe` from
vector
to scalar which is close to generated vector IR.
https://godbolt.org/z/6Mzd6W6ha shows that no register spills when
choosing `<vscale x 16>`.
Note that this test is basically copied from AArch64.
SimplifyBranchConditionForVFAndUF only recognized canonical IVs and a
few PHI
recipes in the loop header. With more IV-step optimizations,
the canonical widen-canonical-iv can be replaced by a canonical
VPWidenIntOrFpInduction,
which the pass did not handle, causing regressions (missed
simplifications).
This patch replaces canonical VPWidenIntOrFpInduction with a StepVector
in the vector preheader
since the vector loop region only executes once.
I'm trying to remove the redirection in SmallSet.h:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
to make it clear that we are using SmallPtrSet. There are only
handful places that rely on this redirection.
This patch replaces SmallSet to SmallPtrSet where the element type is
a pointer.
I'm trying to remove the redirection in SmallSet.h:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
to make it clear that we are using SmallPtrSet. There are only
handful places that rely on this redirection.
This patch replaces SmallSet to SmallPtrSet where the element type is
a pointer.
I'm trying to remove the redirection in SmallSet.h:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
to make it clear that we are using SmallPtrSet. There are only
handful places that rely on this redirection.
This patch replaces SmallSet to SmallPtrSet where the element type is
a pointer.
#153975 added a new test,
`test/CodeGen/AMDGPU/disable-preload-kernargs.ll`, that triggers an
assertion under `LLVM_ENABLE_EXPENSIVE_CHECKS` complaining about not
invalidating analyses even when the Pass made changes. It was caused by
the fact that the Pass only invalidates the analyses when number of
explicit arguments is greater than zero, while it is possible that some
functions will be removed even when there isn't any explicit argument,
hence the missed invalidation.
If src2 and dst aren't the same register, to fold a copy
to AGPR into the instruction we also need to reassign src2
to an available AGPR. All the other uses of src2 also need
to be compatible with the AGPR replacement in order to avoid
inserting other copies somewhere else.
Perform this transform, after verifying all other uses are
compatible with AGPR, and have an available AGPR available at
all points (which effectively means rewriting a full chain of
mfmas and load/store at once).
Currently the privatization recipe of a scalar allocatable is as follow:
```
acc.private.recipe @privatization_ref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> init {
^bb0(%arg0: !fir.ref<!fir.box<!fir.heap<i32>>>):
%0 = fir.alloca !fir.box<!fir.heap<i32>>
%1:2 = hlfir.declare %0 {uniq_name = "acc.private.init"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
acc.yield %1#0 : !fir.ref<!fir.box<!fir.heap<i32>>>
}
```
This change adds the allocation for the scalar.
Summary:
This patch changes the linux build to use the wide reads on the memory
operations by default. These memory functions will now potentially read
outside of the bounds explicitly allowed by the current function. While
technically undefined behavior in the standard, plenty of C library
implementations do this. it will not cause a segmentation fault on linux
as long as you do not cross a page boundary, and because we are only
*reading* memory it should not have atomic effects.
Operate directly on the existing Ops vector instead of copying to
a new vector. This is similar to what the autogenerated codegen
does for other intrinsics.
This reduced the clang binary size by ~96kb on my local Release+Asserts
build.
We used to abuse Operands list to store instruction encoding's
DecoderMethod there. Let's store it in the InstructionEncoding class
instead, where it belongs.
`ASTReader::FinishedDeserializing()` calls
`adjustDeductedFunctionResultType(...)` [0], which in turn calls
`FunctionDecl::getMostRecentDecl()`[1]. In modules builds,
`getMostRecentDecl()` may reach out to the `ASTReader` and start
deserializing again. Starting deserialization starts `ReadTimer`;
however, `FinishedDeserializing()` doesn't call `stopTimer()` until
after it's call to `adjustDeductedFunctionResultType(...)` [2]. As a
result, we hit an assert checking that we don't start an already started
timer [3]. To fix this, we simply don't start the timer if it's already
running.
Unfortunately I don't have a test case for this yet as modules builds
are notoriously difficult to reduce.
[0]:
4d2288d318/clang/lib/Serialization/ASTReader.cpp (L11053)
[1]:
4d2288d318/clang/lib/AST/ASTContext.cpp (L3804)
[2]:
4d2288d318/clang/lib/Serialization/ASTReader.cpp (L11065-L11066)
[3]:
4d2288d318/llvm/lib/Support/Timer.cpp (L150)
ActionCache is used to store a mapping from CASID to CASID. The current
implementation of the ActionCache can only be used to associate the
key/value from the same hash context.
ActionCache has two operations: `put` to store the key/value and `get`
to
lookup the key/value mapping. ActionCache uses the same TrieRawHashMap
data structure to store the mapping, where is CASID of the key is the
hash to index the map.
While CASIDs for key/value are often associcate with actual CAS
ObjectStore, it doesn't provide the guarantee of the existence of such
object in any ObjectStore.
When providing allocation and deallocation traces,
the ASan compiler-rt runtime already provides call
addresses (`TracePCType::Calls`).
On Darwin, system sanitizers (libsanitizers)
provides return address. It also discards a few
non-user frames at the top of the stack, because
these internal libmalloc/libsanitizers stack
frames do not provide any value when diagnosing
memory errors.
Introduce and add handling for
`TracePCType::ReturnsNoZerothFrame` to cover this
case and enable libsanitizers traces line-level
testing.
rdar://157596927
---
Commit 1 is a mechanical refactoring to introduce
and adopt `TracePCType` enum to replace
`pcs_are_call_addresses` bool. It preserve the
current behavior:
```
pcs_are_call_addresses:
false -> TracePCType::Returns (default)
true -> TracePCType::Calls
```
Best reviewed commit by commit.
Prevent an assertion failure in the cstring checker when library
functions like memcpy are defined with non-default address spaces.
Adds a test for this case.
The previous implementation of getExactInverse used the following check
to identify powers of two:
// Check that the number is a power of two by making sure that only the
// integer bit is set in the significand.
if (significandLSB() != semantics->precision - 1)
return false;
This condition verifies that the only set bit in the significand is the
integer bit, which is correct for normal numbers. However, this logic is
not correct for subnormal values.
APFloat represents subnormal numbers by shifting the significand right
while holding the exponent at its minimum value. For a power of two in
the subnormal range, its single set bit will therefore be at a position
lower than precision - 1. The original check would consequently fail,
causing the function to determine that these numbers do not have an
exact multiplicative inverse.
The new logic calculated this correctly but it seems that
test/CodeGen/Thumb2/mve-vcvt-fixed-to-float.ll expected the old
behavior.
Seeing as how getExactInverse does not have tests or documentation, we
conservatively maintain (and document) this behavior.
This reverts commit 47e62e846beb267aad50eb9195dfd855e160483e.
Adds the `#llvm.target<triple = $TRIPLE, chip = $CHIP, features =
$FEATURES>` attribute and along with a `-llvm-target-to-data-layout`
pass to derive a MLIR data layout from the LLVM data layout string
(using the existing `DataLayoutImporter`). The attribute implements the
relevant DLTI-interfaces, to expose the `triple`, `chip` (AKA `cpu`) and
`features` on `#llvm.target` and the full `DataLayoutSpecInterface`. The
pass combines the generated `#dlti.dl_spec` with an existing `dl_spec`
in case one is already present, e.g. a `dl_spec` which is there to
specify size of the `index` type.
Adds a `TargetAttrInterface` which can be implemented by all attributes
representing LLVM targets.
Similar to the Draft PR https://github.com/llvm/llvm-project/pull/78073.
RFC on which this PR is based:
https://discourse.llvm.org/t/mandatory-data-layout-in-the-llvm-dialect/85875
Static analysis flagged the columns - 1 code, it was correct but the
assumption was not obvious. I document the assumption w/ assertions.
While digging through related code I found getColumnNumber that looks
wrong at first inspection and adding parentheses makes it clearer.
Goal is simply to reduce direct usage of getLength and setLength so that
if we end up moving memset.pattern (whose length is in elements) there
are fewer places to audit.
Given the test case:
```llvm
define fastcc i16 @testbtst(i16 %a) nounwind {
entry:
switch i16 %a, label %no [
i16 11, label %yes
i16 10, label %yes
i16 9, label %yes
i16 4, label %yes
i16 3, label %yes
i16 2, label %yes
]
yes:
ret i16 1
no:
ret i16 0
}
```
We currently get this result:
```asm
testbtst: ; @testbtst
; %bb.0: ; %entry
move.l %d0, %d1
and.l #65535, %d1
sub.l #11, %d1
bhi .LBB0_3
; %bb.1: ; %entry
and.l #65535, %d0
move.l #3612, %d1
btst %d0, %d1
bne .LBB0_3 ; <------- Erroneous condition
; %bb.2: ; %yes
moveq #1, %d0
rts
.LBB0_3: ; %no
moveq #0, %d0
rts
```
The cause of this is a line that explicitly reverses the `btst`
condition code. But on M68k, `btst` sets condition codes the same as
`and` with a bitmask, meaning `EQ` indicates failure (bit is zero) and
not success, so the condition does not need to be reversed.
In my testing, I've only been able to get switch statements to lower to
`btst`, so I wasn't able to explicitly test other options for lowering.
But (if possible to trigger) I believe they have the same logical error.
For example, in `LowerAndToBTST()`, a comment specifies that it's
lowering a case where the `and` result is compared against zero, which
means the corresponding `btst` condition should also not be reversed.
This patch simply flips the ternary expression in
`getBitTestCondition()` to match the ISD condition code with the same
M68k code, instead of the opposite.