Currently the privatization recipe of a scalar allocatable is as follow:
```
acc.private.recipe @privatization_ref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> init {
^bb0(%arg0: !fir.ref<!fir.box<!fir.heap<i32>>>):
%0 = fir.alloca !fir.box<!fir.heap<i32>>
%1:2 = hlfir.declare %0 {uniq_name = "acc.private.init"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
acc.yield %1#0 : !fir.ref<!fir.box<!fir.heap<i32>>>
}
```
This change adds the allocation for the scalar.
Summary:
This patch changes the linux build to use the wide reads on the memory
operations by default. These memory functions will now potentially read
outside of the bounds explicitly allowed by the current function. While
technically undefined behavior in the standard, plenty of C library
implementations do this. it will not cause a segmentation fault on linux
as long as you do not cross a page boundary, and because we are only
*reading* memory it should not have atomic effects.
Operate directly on the existing Ops vector instead of copying to
a new vector. This is similar to what the autogenerated codegen
does for other intrinsics.
This reduced the clang binary size by ~96kb on my local Release+Asserts
build.
We used to abuse Operands list to store instruction encoding's
DecoderMethod there. Let's store it in the InstructionEncoding class
instead, where it belongs.
`ASTReader::FinishedDeserializing()` calls
`adjustDeductedFunctionResultType(...)` [0], which in turn calls
`FunctionDecl::getMostRecentDecl()`[1]. In modules builds,
`getMostRecentDecl()` may reach out to the `ASTReader` and start
deserializing again. Starting deserialization starts `ReadTimer`;
however, `FinishedDeserializing()` doesn't call `stopTimer()` until
after it's call to `adjustDeductedFunctionResultType(...)` [2]. As a
result, we hit an assert checking that we don't start an already started
timer [3]. To fix this, we simply don't start the timer if it's already
running.
Unfortunately I don't have a test case for this yet as modules builds
are notoriously difficult to reduce.
[0]:
4d2288d318/clang/lib/Serialization/ASTReader.cpp (L11053)
[1]:
4d2288d318/clang/lib/AST/ASTContext.cpp (L3804)
[2]:
4d2288d318/clang/lib/Serialization/ASTReader.cpp (L11065-L11066)
[3]:
4d2288d318/llvm/lib/Support/Timer.cpp (L150)
ActionCache is used to store a mapping from CASID to CASID. The current
implementation of the ActionCache can only be used to associate the
key/value from the same hash context.
ActionCache has two operations: `put` to store the key/value and `get`
to
lookup the key/value mapping. ActionCache uses the same TrieRawHashMap
data structure to store the mapping, where is CASID of the key is the
hash to index the map.
While CASIDs for key/value are often associcate with actual CAS
ObjectStore, it doesn't provide the guarantee of the existence of such
object in any ObjectStore.
When providing allocation and deallocation traces,
the ASan compiler-rt runtime already provides call
addresses (`TracePCType::Calls`).
On Darwin, system sanitizers (libsanitizers)
provides return address. It also discards a few
non-user frames at the top of the stack, because
these internal libmalloc/libsanitizers stack
frames do not provide any value when diagnosing
memory errors.
Introduce and add handling for
`TracePCType::ReturnsNoZerothFrame` to cover this
case and enable libsanitizers traces line-level
testing.
rdar://157596927
---
Commit 1 is a mechanical refactoring to introduce
and adopt `TracePCType` enum to replace
`pcs_are_call_addresses` bool. It preserve the
current behavior:
```
pcs_are_call_addresses:
false -> TracePCType::Returns (default)
true -> TracePCType::Calls
```
Best reviewed commit by commit.
Prevent an assertion failure in the cstring checker when library
functions like memcpy are defined with non-default address spaces.
Adds a test for this case.
The previous implementation of getExactInverse used the following check
to identify powers of two:
// Check that the number is a power of two by making sure that only the
// integer bit is set in the significand.
if (significandLSB() != semantics->precision - 1)
return false;
This condition verifies that the only set bit in the significand is the
integer bit, which is correct for normal numbers. However, this logic is
not correct for subnormal values.
APFloat represents subnormal numbers by shifting the significand right
while holding the exponent at its minimum value. For a power of two in
the subnormal range, its single set bit will therefore be at a position
lower than precision - 1. The original check would consequently fail,
causing the function to determine that these numbers do not have an
exact multiplicative inverse.
The new logic calculated this correctly but it seems that
test/CodeGen/Thumb2/mve-vcvt-fixed-to-float.ll expected the old
behavior.
Seeing as how getExactInverse does not have tests or documentation, we
conservatively maintain (and document) this behavior.
This reverts commit 47e62e846beb267aad50eb9195dfd855e160483e.
Adds the `#llvm.target<triple = $TRIPLE, chip = $CHIP, features =
$FEATURES>` attribute and along with a `-llvm-target-to-data-layout`
pass to derive a MLIR data layout from the LLVM data layout string
(using the existing `DataLayoutImporter`). The attribute implements the
relevant DLTI-interfaces, to expose the `triple`, `chip` (AKA `cpu`) and
`features` on `#llvm.target` and the full `DataLayoutSpecInterface`. The
pass combines the generated `#dlti.dl_spec` with an existing `dl_spec`
in case one is already present, e.g. a `dl_spec` which is there to
specify size of the `index` type.
Adds a `TargetAttrInterface` which can be implemented by all attributes
representing LLVM targets.
Similar to the Draft PR https://github.com/llvm/llvm-project/pull/78073.
RFC on which this PR is based:
https://discourse.llvm.org/t/mandatory-data-layout-in-the-llvm-dialect/85875
Static analysis flagged the columns - 1 code, it was correct but the
assumption was not obvious. I document the assumption w/ assertions.
While digging through related code I found getColumnNumber that looks
wrong at first inspection and adding parentheses makes it clearer.
Goal is simply to reduce direct usage of getLength and setLength so that
if we end up moving memset.pattern (whose length is in elements) there
are fewer places to audit.
Given the test case:
```llvm
define fastcc i16 @testbtst(i16 %a) nounwind {
entry:
switch i16 %a, label %no [
i16 11, label %yes
i16 10, label %yes
i16 9, label %yes
i16 4, label %yes
i16 3, label %yes
i16 2, label %yes
]
yes:
ret i16 1
no:
ret i16 0
}
```
We currently get this result:
```asm
testbtst: ; @testbtst
; %bb.0: ; %entry
move.l %d0, %d1
and.l #65535, %d1
sub.l #11, %d1
bhi .LBB0_3
; %bb.1: ; %entry
and.l #65535, %d0
move.l #3612, %d1
btst %d0, %d1
bne .LBB0_3 ; <------- Erroneous condition
; %bb.2: ; %yes
moveq #1, %d0
rts
.LBB0_3: ; %no
moveq #0, %d0
rts
```
The cause of this is a line that explicitly reverses the `btst`
condition code. But on M68k, `btst` sets condition codes the same as
`and` with a bitmask, meaning `EQ` indicates failure (bit is zero) and
not success, so the condition does not need to be reversed.
In my testing, I've only been able to get switch statements to lower to
`btst`, so I wasn't able to explicitly test other options for lowering.
But (if possible to trigger) I believe they have the same logical error.
For example, in `LowerAndToBTST()`, a comment specifies that it's
lowering a case where the `and` result is compared against zero, which
means the corresponding `btst` condition should also not be reversed.
This patch simply flips the ternary expression in
`getBitTestCondition()` to match the ISD condition code with the same
M68k code, instead of the opposite.
Remove the ArrayRef<const Value*> Args operand from
getOperandsScalarizationOverhead and require that the callers
de-duplicate arguments and filter constant operands.
Removing the Value * based Args argument enables callers where no Value
* operands are available to use the function in a follow-up: computing
the scalarization cost directly for a VPlan recipe.
It also allows more accurate cost-estimates in the future: for example,
when vectorizing a loop, we could also skip operands that are live-ins,
as those also do not require scalarization.
PR: https://github.com/llvm/llvm-project/pull/154126
When building the base type for constructor initializer, the case of an
UnresolvedUsingType was not being handled.
For the non-dependent case, we are also skipping adding the UsingType,
but this is just missing information in the AST. A FIXME for this is
added.
This fixes a regression introduced in #147835, which was never released,
so there are no release notes.
Fixes#154436
This patch implements the `RandomGenerator`, a new input generator that
enables conformance testing for functions with large input spaces (e.g.,
double-precision math functions).
**Architectural Refactoring**
To support different generation strategies in a clean and extensible
way, the existing `ExhaustiveGenerator` was refactored into a new class
hierarchy:
* A new abstract base class, `RangeBasedGenerator`, was introduced using
the Curiously Recurring Template Pattern (CRTP). It contains the common
logic for generators that operate on a sequence of ranges.
* `ExhaustiveGenerator` now inherits from this base class, simplifying
its implementation.
**New Components**
* The new `RandomGenerator` class also inherits from
`RangeBasedGenerator`. It implements a strategy that randomly samples a
specified number of points from the total input space.
* Random number generation is handled by a new, self-contained
`RandomState` class (a `xorshift64*` PRNG seeded with `splitmix64`) to
ensure deterministic and reproducible random streams for testing.
**Example Usage**
As a first use case and demonstration of this new capability, this patch
also adds the first double-precision conformance test for the `log`
function. This test uses the new `RandomGenerator` to validate the
implementations from the `llvm-libm`, `cuda-math`, and `hip-math`
providers.
Summary:
Clang has support for boolean vectors, these builtins expose the LLVM
instruction of the same name. This differs from a manual load and select
by potentially suppressing traps from deactivated lanes.
Fixes: https://github.com/llvm/llvm-project/issues/107753
Summary:
We added boolean vectors in clang 15 and wish to extend them further in
clang-22. However, there's no way to query for their support as they are
separate to the normal extended vector type. This adds a feature so we
can check for it as a feature directly.
This pr implements the boiler plate required to use `llvm-objcopy` for
`DXContainer` object files.
It defines a minimal structure `object` to represent the `DXContainer`
header and the following parts.
This structure is a simple representation of the object data to allow
for simple modifications at the granularity of each part. It follows
similarily to how the respective `object`s are defined for `ELF`,
`wasm`, `XCOFF`, etc.
This is the first step to implement
https://github.com/llvm/llvm-project/issues/150275 and
https://github.com/llvm/llvm-project/issues/150277 as compiler actions
that invoke `llvm-objcopy` for functionality.
Add `mlir-capi-global-constructors-test` to `MLIR_TEST_DEPENDS` when
`MLIR_ENABLE_EXECUTION_ENGINE` is enabled, to ensure that it is also
built during standalone builds, and therefore fix test failure due to
the executable being missing.
I don't understand the purpose of `LLVM_ENABLE_PIC AND TARGET
${LLVM_NATIVE_ARCH}` block, but the condition is not true in standalone
builds.
Fixes 7610b1372955da55e3dc4e2eb1440f0304a56ac8.
`GetOmpObjectList` takes a clause, and returns the pointer to the
contained OmpObjectList, or nullptr if the clause does not contain one.
Some clauses with object list were not recognized: handle all clauses,
and move the implementation to flang/Parser/openmp-utils.cpp.
This PR adds support for complex power operations (`cpow`) in the
`ComplexToROCDLLibraryCalls` conversion pass, specifically targeting
AMDGPU architectures. The implementation optimises complex
exponentiation by using mathematical identities and special-case
handling for small integer powers.
- Force lowering to `complex.pow` operations for the `amdgcn-amd-amdhsa`
target instead of using library calls
- Convert `complex.pow(z, w)` to `complex.exp(w * complex.log(z))` using
mathematical identity