There semantic analysis of the ATOMIC construct will require additional
rewriting (reassociation of certain expressions for user convenience),
and that will be driven by diagnoses made in the semantic checks.
While the rewriting of min/max is not required to be done in semantic
analysis, moving it there will make all rewriting for ATOMIC construct
be located in a single location.
This change introduces the #cir.global_view attribute and adds support
for using that attribute to handle initializing a global variable with
the address of another global variable.
This does not yet include support for the optional list of indices to
get an offset from the base address. Those will be added in a follow-up
patch.
This adds support for initializing the vptr member of a dynamic class in
the constructor of that class.
This does not include support for lowering the
`cir.vtable.address_point` operation to the LLVM dialect. That handling
will be added in a follow-up patch.
Support for normal cleanups was introduced with a simplified
implementation compared to what's in the incubator (which corresponds
closely to the classic codegen implementation).
This change introduces more of the infrastructure that will later be
needed to handle non-trivial cleanup cases, including exception
handling.
fixes#151764
This fix has two parts first we track all lifetime intrinsics and if
they are users of an alloca of a target extention like dx.RawBuffer then
we eliminate those memory intrinsics when we visit the alloca.
We do step one to allow us to use the Dead Store Elimination Pass. This
removes the alloca and simplifies the use of the target extention back
to using just the global. That keeps things in a form the
DXILBitcodeWriter is expecting.
Obviously to pull this off we needed to bring back the legacy pass
manager plumbing for the DSE pass and hook it up into the DirectX
backend.
The net impact of this change is that DML shader pass rate went from
89.72% (4268 successful compilations) to 90.98% (4328 successful
compilations).
Not quite NFC as it looks like the original intrinsic-handling code
never got updated to use records. This was never caught because that
code wasn't tested. I've adjusted an existing test so the behaviour is
now covered.
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- bf16div
- bf16divf
- bf16divl
- bf16divf128
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
The mapa.shared.cluster variant that takes in address-space 3 now should
output address-space 7. This patch updates the NVVMOps.td file to reflect this.
The tests largely cover AVX-VNNI (Vector Neural Network Instructions):
- vpdpbusd, vpdpbusds
- vpdpwssd, vpdpwssds
AVX-VNNI-INT8:
- vpdpbssd, vpdpbssds
- vpdpbsud, vpdpbsuds
- vpdpbuud, vpdpbuuds
AVX-VNNI-INT16:
- vpdpwsud, vpdpwsuds
- vpdpwusd, vpdpwusds
- vpdpwuud, vpdpwuuds
These instructions are currently heuristically handled (by OR'ing
together the vectors). This is incorrect because:
1) multiplication by a zero should result in an initialized value 2) the
addition is horizontal (within vectors, not "vertically" between
vectors).
Future work can improve the instrumentation by applying the updated
handleVectorPmaddIntrinsic() from
https://github.com/llvm/llvm-project/pull/152941
This reverts commit 29ad073c6c325dbf92c1aa5a285ca48e55cb918b i.e.,
relands 927e19f5f3b357823f86f6c4f1378abedccadf27.
It was reverted because of buildbot breakages. This reland adds
"-pthread" and also moves the test to Posix-only.
Original commit message:
ASan's instrumentation pass uses
`ASanStackFrameLayout::ComputeASanStackFrameLayout()` to calculate the
offset of variables, taking into account alignment. However, the fake
stack frames returned by the runtime's `GetFrame()` are not guaranteed
to be sufficiently aligned (and in some cases, even guaranteed to be
misaligned), hence the offset addresses may sometimes be misaligned.
This change fixes the misalignment issue by padding the FakeStack. Every
fake stack frame is guaranteed to be aligned to the size of the frame.
The memory overhead is low: 64KB per FakeStack, compared to the
FakeStack size of ~700KB (min) to 11MB (max).
Updates the test case from
https://github.com/llvm/llvm-project/pull/152889.
Reverts llvm/llvm-project#147422
Seems to be causing problems with tracebacks. Probably the trackback
code doesn't know how to switch back to the regular stack after it gets
to the top of the signal stack.
When an undef/poison value is lowered as a an immediate, it becomes -1.
When reaching the backend, the -1 was printed as operand to
OpVectorShuffle instead of the proper 0xFFFFFFFF.
From the SPIR-V spec:
A Component literal may also be FFFFFFFF, which means the
corresponding result component has no source and is undefined.
The reason the existing tests were passing `spirv-val` was because the
binary format was used as output, meaning the `-1` was lowered to
`0xFFFFFFFF`. But when the text format is used, `-1` is emitted as-is
which is wrong.
Fixes#151691
M68k's SETCC instruction (`scc`) distinctly fills the destination byte
with all 1s. If boolean contents are set to `ZeroOrOneBooleanContent`,
LLVM can mistakenly think the destination holds `0x01` instead of `0xff`
and emit broken code as a result. This change corrects the boolean
content type to `ZeroOrNegativeOneBooleanContent`.
For example, this IR:
```llvm
define dso_local signext range(i8 0, 2) i8 @testBool(i32 noundef %a) local_unnamed_addr #0 {
entry:
%cmp = icmp eq i32 %a, 4660
%. = zext i1 %cmp to i8
ret i8 %.
}
```
would previously build as:
```asm
testBool: ; @testBool
cmpi.l #4660, (4,%sp)
seq %d0
and.l #255, %d0
rts
```
Notice the `zext` is erroneously not clearing the low bits, and thus the
register returns with 255 instead of 1. This patch fixes the issue:
```asm
testBool: ; @testBool
cmpi.l #4660, (4,%sp)
seq %d0
and.l #1, %d0
rts
```
Most of the tests containing `scc` suffered from the same value error as
described above, so those tests have been updated to match the new
output (which also logically corrects them).
The following intrinsics were replaced by a combination of
`__builtin_shufflevector` and `__builtin_convertvector`:
- `__builtin_ia32_vcvtph2ps`
- `__builtin_ia32_vcvtph2ps256`
Fixes#152749
Fixes#153043.
This is another case of debug location not getting updated when the
insert point is changed by the `restoreIP`. Fixed by using the wrapper
function that updates the debug location.
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- fmaximumbf16
- fmaximum_magbf16
- fmaximum_mag_numbf16
- fmaximum_numbf16
- fminimumbf16
- fminimum_magbf16
- fminimum_mag_numbf16
- fminimum_numbf16
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
This PR adds support for the following instructions to the RISC-V
VLOptimizer: vandn.vx, vandn.vv, vbrev.v, vclz.v, vcpop.v, vctz.v,
vror.vi, vror.vx, vror.vv, vrol.vx, vrol.vv.
Fixes#147485.
I changed the regexp for the ARM targets making the part `@+[\t]*@"?(?P=func)"?` optional since when the -asm-verbose=false is passed it is not generated and this led to the issue.
Add a new AutomapToTargetData pass. This gathers the declare target
enter variables which have the AUTOMAP modifier. And adds
omp.declare_target_enter/exit mapping directives for fir.alloca and
fir.free oeprations on the AUTOMAP enabled variables.
Automap Ref: OpenMP 6.0 section 7.9.7.
This change implements several small improvements to
`Intrinsic::getAttributes`:
1. Use `SequenceToOffsetTable` to emit `ArgAttrIdTable`. This enables
reuse of entries when they share a common prefix. This reduces the size
of this table from 546 to 484 entries, which is 248 bytes.
2. Fix `AttributeComparator` to purely compare argument attributes and
not look at function attributes. This avoids unnecessary duplicates in
the uniqueing process and eliminates 2 entries from
`ArgAttributesInfoTable`, saving 8 bytes.
3. Improve `Intrinsic::getAttributes` code to not initialize all entries
of `AS` always. Currently, we initialize all entries of the array `AS`
even if we may not use all of them. In addition to the runtime cost, for
Clang release builds, since the initialization loop is unrolled, it
consumes ~330 bytes of code to initialize the `AS` array. Address this
by declaring the storage for AS using just a char array with appropriate
`alignas` (similar to how `SmallVectorStorage` defines its inline
elements).
Previously, the NVVM dialect's ldmatrix operation could only generate a
limited subset of the available NVVM ldmatrix intrinsics. The intrinsics
generating new ops introduced in BlackWell are not accessible through
the NVVM ops. This commit extends the ldmatrix operation to support all
available ldmatrix intrinsics.
This patch handles the strided update in the `#pragma omp target update
from(data[a🅱️c])` directive where 'c' represents the strided access
leading to non-contiguous update in the `data` array when the offloaded
execution returns the control back to host from device using the `from`
clause.
Issue: Clang CodeGen where info is generated for the particular
`MapType` (to, from, etc), it was failing to detect the strided access.
Because of this, the `MapType` bits were incorrect when passed to
runtime. This led to incorrect execution (contiguous) in the
libomptarget runtime code.
Added a minimal testcase that verifies the working of the patch.
Append `-nostdlib++` and `-nostdinc++` flags to `CMAKE_REQUIRED_FLAGS`
only if we are actually building with Clang. These flags are also
passed to the C compiler, which is not allowed in GCC. Since CMake
implicitly performs some tests using the C compiler, this can lead
to incorrect check results. This should be safe, since FWIU we only
need them when bootstrapping Clang.
Even though we know that Clang supports these flags, we still need
to explicitly check if they work, as in some scenarios adding
`-nostdlib++` actually breaks the build. See PR #108357 for examples
of that.
Fixes#90332
Signed-off-by: Michał Górny <mgorny@gentoo.org>
Materialize VF and VFxUF computation using VPInstruction
instead of directly creating IR.
This is one of the last few steps needed to model the full vector
skeleton in VPlan.
This is mostly NFC, although in some cases we remove some unused
computations.
PR: https://github.com/llvm/llvm-project/pull/152879
Reopen#128938.
Attempt to shrink the size of vector loads where only some of the
incoming lanes are used for rebroadcasts in shufflevector instructions.
---------
Co-authored-by: Leon Clark <leoclark@amd.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
This along with IntrReadMem means that the Intrinsic only reads memory
through the given argument ptr and its derivatives. This allows passes
like Inliner to attach alias.scope to the call instruction as it sees
that no other memory is accessed.
Discovered via SWDEV-543741
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
By default, `mlir-translate` writes all output into a single file even
when `--split-input-file` is used. This is not an issue for text files
as they can be easily split with an output separator. However, this
causes issues with binary SPIR-V modules.
Firstly, a binary file with multiple modules is not a valid SPIR-V, but
will be created if multiple modules are specified in the same file and
separated by "// -----". This does not cause issues with MLIR internal
tools but does not work with SPIRV-Tools.
Secondly, splitting binary files after serialization is non-trivial,
when compared to text files, so using an external tool is not desirable.
This patch adds a SPIR-V serialization option that write SPIR-V modules
to separate files in addition to writing them to the `mlir-translate`
output file. This is not the ideal solution and ideally `mlir-translate`
would allow generating multiple output files when `--split-input-file`
is used, however adding such functionality is again non-trival due to
how processing and splitting is done: output is written to a
single `os` that is passed around, and the number of split buffers is not
known ahead of time. As such a I propose to have a SPIR-V internal
option that will dump modules to files in the form they can be processed
by `spirv-val`. The behaviour of the new added argument may be
confusing, but benefits from being internal to SPIR-V target.
Alternatively, we could expose the spirv option in
`mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp`, and slice the
output file on the SPIR-V magic number, and not keep the file generated
by default by `mlir-translate`. This would be a bit cleaner in API
sense, as it would not generate the additional file containing all
modules together. However, it pushes SPIR-V specific code into the
generic part of the `mlir-translate` and slicing is potentially more
error prone that just writing a single module after it was serialized.
Partially fix#149023.
The original code `MRI.def_begin(Reg)->getParent()` may return the
incorrect MI, as the physical register `Reg` may have multiple
definitions.
This patch selects the correct MI to verify by comparing the MBB of each
definition.
New testcase hangs with -O1/2/3 enabled. The BranchFolding may be to
blame.