We already use cc rules from `@rules_cc//cc:defs.bzl` in a few files,
but this uses it everywhere. Done automatically by running `buildifier
--lint=fix
--warnings=native-cc-binary,native-cc-library,native-cc-test,load` over
all the files. I also ran `buildifier` once more to ensure there wasn't
any missing formatting, so that caused a few unrelated diffs.
The LLDB standalone build using Xcode currently fails due to the headers
being attached to multiple targets, but none of these targets depending
on each other. This commit resolves this by creating those dependencies.
This is a rewrite of the current strided store optimization to be a DAG
combine. This allows it to kick in slightly more broadly, in particular
for the scalable lowering paths.
bd66fd0 ([CostModel/RISCV] Fix costs of vector [l](lrint|lround))
introduced buildbot failures by using a temporary ArrayRef when a
SmallVector should have been used. Fix this.
Failure: https://lab.llvm.org/buildbot/#/builders/186/builds/11133
This partially reverts https://github.com/llvm/llvm-project/pull/140744,
restoring the original TheLoop->isLoopInvariant check instead the more
powerful Legal->isInvariant, which uses SCEV.
This causes a mis-compile, because SCEV can prove that the stored value
is loop-invariant, which in turn converts the store to a uniform store.
But in VPlan, we aren't yet able to determine that the stored value is
loop-invariant, so we extract the last lane, which is incorrect, because
it does not account for the mask of the store.
Restoring the original code is a safe fix and avoids this subtle
divergence.
Fixes https://github.com/llvm/llvm-project/issues/149347.
PR: https://github.com/llvm/llvm-project/pull/150828
This patch fixes:
lldb/source/Plugins/Process/wasm/ProcessWasm.cpp:107:25: error:
format specifies type 'unsigned long long' but the argument has type
'lldb::tid_t' (aka 'unsigned long') [-Werror,-Wformat]
Code written in assembly can have missing code markers. In BOLT, we can
compensate by recognizing that a function entry point should start a
code sequence.
Seen such code in lua jit library.
Add entity mapping mode to llvm-ir2vec and improve triplet generation format for knowledge graph embedding training.
This change streamlines the workflow for training the vocabulary embeddings with IR2Vec by:
1. Directly generating numeric IDs instead of requiring string-to-ID preprocessing
2. Providing entity mappings in standard knowledge graph embedding format
3. Structuring triplet output in train2id format compatible with knowledge graph embedding frameworks
4. Adding metadata headers to simplify post-processing and training setup
These improvements make IR2Vec more compatible with standard knowledge graph embedding training pipelines and reduce the preprocessing steps needed before training.
See #149215 for more details on how it is used.
(Tracking issues - #141817, #141834)
Take the actual instruction cost into account, and don't fallthrough to
code that doesn't apply to [l]lrint. Also strip invalid costs for
[b]f16, as a companion to #146507, and unify it with [l]lround costs as
a companion to #147713.
Implicitly declared types (like __NSConstantString_tag, etc) will be
declared with visibility attributes. This causes problems when merging
ASTs because we currently reject declaration merging for declarations
with attributes.
This relaxes that restriction somewhat; implicit declarations can now
have attributes when merging; we assume that if the compiler generated
it, it's fine.
Add XeVM dialect to LLVMIR translation.
Currently no ops are translated.
Only xevm.DecorationCacheControl are translated to metadata for spirv
decoration - !spirv.DecorationCacheControlINTEL.
Co-authored-by: Artem Kroviakov artem.kroviakov@intel.com
This was written out of an abundance of caution because the changes were
being added to the release branch. Now we can be a little less cautious
and switch to using an assert. No behavioral changes are expected.
To initialize all elements of a primitive array at once. This saves us
from creating the InitMap just to destroy it again after all elements
have been initialized.
## Description
<!--- Title/Description will be Subject/Body of commit message. -->
<!--- Please be concise and limit the subject line to 50 characters, -->
<!--- and wrap the Description at 72 characters. -->
<!--- Describe why this is required, what problem it solves. -->
Adds support for ternary equivalent operations of the form `ternary(A,
X, and(B,C))` where `X=[xor(B,C)| nor(B,C)| eqv(B,C)| not(B)| not(C)]`.
List of `xxeval` equivalent ternary operations added and the
corresponding `imm` value required:
Ternary Operator| Imm Value
--|--
ternary(A, xor(B,C), and(B,C)) | 22
ternary(A, nor(B,C), and(B,C)) | 24
ternary(A, eqv(B,C), and(B,C)) | 25
ternary(A, not(C), and(B,C)) | 26
ternary(A, not(B), and(B,C)) | 28
eg. `xxeval XT,XA,XB,XC,22`
- performs `XA ? xor(XB, XC) : and(XB,XC)`and places the result in `XT`.
Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
Extend support in LLDB for WebAssembly. This PR adds a new Process
plugin (ProcessWasm) that extends ProcessGDBRemote for WebAssembly
targets. It adds support for WebAssembly's memory model with separate
address spaces, and the ability to fetch the call stack from the
WebAssembly runtime.
I have tested this change with the WebAssembly Micro Runtime (WAMR,
https://github.com/bytecodealliance/wasm-micro-runtime) which implements
a GDB debug stub and supports the qWasmCallStack packet.
```
(lldb) process connect --plugin wasm connect://localhost:4567
Process 1 stopped
* thread #1, name = 'nobody', stop reason = trace
frame #0: 0x40000000000001ad
wasm32_args.wasm`main:
-> 0x40000000000001ad <+3>: global.get 0
0x40000000000001b3 <+9>: i32.const 16
0x40000000000001b5 <+11>: i32.sub
0x40000000000001b6 <+12>: local.set 0
(lldb) b add
Breakpoint 1: where = wasm32_args.wasm`add + 28 at test.c:4:12, address = 0x400000000000019c
(lldb) c
Process 1 resuming
Process 1 stopped
* thread #1, name = 'nobody', stop reason = breakpoint 1.1
frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12
1 int
2 add(int a, int b)
3 {
-> 4 return a + b;
5 }
6
7 int
(lldb) bt
* thread #1, name = 'nobody', stop reason = breakpoint 1.1
* frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12
frame #1: 0x40000000000001e5 wasm32_args.wasm`main at test.c:12:12
frame #2: 0x40000000000001fe wasm32_args.wasm
```
This PR is based on an unmerged patch from Paolo Severini:
https://reviews.llvm.org/D78801. I intentionally stuck to the
foundations to keep this PR small. I have more PRs in the pipeline to
support the other features/packets.
My motivation for supporting Wasm is to support debugging Swift compiled
to WebAssembly:
https://www.swift.org/documentation/articles/wasm-getting-started.html
When OpenACC is enabled and Fortran loops are annotated with `acc loop`,
they are lowered to `acc.loop` operation. And rest of the contained
loops use the normal FIR lowering path.
Hovever, the OpenACC specification has special provisions related to
contained loops and their induction variable. In order to adhere to
this, we convert all valid contained loops to `acc.loop` in order to
store this information appropriately.
The provisions in the spec that motivated this change (line numbers are
from OpenACC 3.4):
- 1353 Loop variables in Fortran do statements within a compute
construct are predetermined to be private to the thread that executes
the loop.
- 3783 When do concurrent appears without a loop construct in a kernels
construct it is treated as if it is annotated with loop auto. If it
appears in a parallel construct or an accelerator routine then it is
treated as if it is annotated with loop independent.
By valid loops - we convert do loops and do concurrent loops which have
induction variable. Loops which are unstructured are not handled.
D16 pesudo instructions are introduced in true16 mode to represet a D16
load/store. In MC lowering, the pesudo instructions are lowered to the
corresponding D16 Lo/Hi MC Inst respecting the register allocation.
However, the pesudo instruction has size 0 and cause an issue in the
Inst size estimation. Use D16 Lo when calculating inst size
The Cygwin target is generally very similar to the MinGW target. The
default auto-import behavior, the default calling convention, the
`.dll.a` import library extension, the `__GXX_TYPEINFO_EQUALITY_INLINE`
pre-define by `g++`, and the long double configuration.
Co-authored-by: Mateusz Mikuła <oss@mateuszmikula.dev>
Cygwin and MinGW share the auto import behavior that could result in
__stack_check_guard being non-dso-local. Allow windres to assume a
Cygwin target as well as a MinGW one, so defines like _WIN32 would not
be present on Cygwin.
This PR fixes the computation of padded shapes for convolution-style
affine maps (e.g., d0 + d1) in `PadTilingInterface`. Previously, the
codes used the direct sum of loop upper bounds, leading to over-padding.
For example, the following `conv_2d_nhwc_fhwc` op, if only padding the c
dimensions to multiples of 16, it also incorrectly pads the convolved
dimensions and generates the wrong input shape as:
```
%padded = tensor.pad %arg0 low[0, 0, 0, 0] high[0, 1, 1, 12] {
^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
tensor.yield %cst : f32
} : tensor<1x16x16x4xf32> to tensor<1x17x17x16xf32>
%padded_0 = tensor.pad %arg1 low[0, 0, 0, 0] high[0, 0, 0, 12] {
^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
tensor.yield %cst : f32
} : tensor<16x3x3x4xf32> to tensor<16x3x3x16xf32>
%0 = linalg.conv_2d_nhwc_fhwc {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%padded, %padded_0 : tensor<1x17x17x16xf32>, tensor<16x3x3x16xf32>) outs(%arg2 : tensor<1x14x14x16xf32>) -> tensor<1x14x14x16xf32>
return %0 : tensor<1x14x14x16xf32>
```
The new implementation uses the maximum accessed index as the input for
affine map and then adds 1 after aggregating all the terms to get the
final padded size. This fixed
https://github.com/llvm/llvm-project/issues/148679.
There is a pattern that rewrites
elementwise_op(broadcast(x1 : T to U), broadcast(x2 : T to U), ...) to
broadcast(elementwise_op(x1, x2, ...) : T to U).
This pattern did not, however, account for the case where a broadcast
constant is represented as a SplatElementsAttr, which can safely be
reshaped or scalarized but is not a `vector.broadcast` or `vector.splat`
operation.
This patch fixes this oversight, prenting premature broadcasting.
This did result in the need to update some linalg dialect tests, which
now feature a less-broadcast computation and/or more constant folding.
`LocationDescription` contains both the insertion point and the debug
location. When `LocationDescription` is available, it is better to use
`updateToLocation` which will update both. This PR replaces
`restoreIP(Loc.IP)` with `updateToLocation(Loc)` as former may not
update debug location in all cases.
I am not checking the return value of `updateToLocation` because that is
checked just a few lines above in all cases and we would have returned
early if it failed.
I was observing segfaults at executable exit in the rtsan instrumented
unit tests. Bisecting the offending test led to observing that this test
is not using our safe test fixture for anything involving a file
descriptor. Changing to use the fixture eliminated the segfault on exit.
This reverts commit 83dfdd8f5485f6b50213c88f02878f86b3f53852.
Temporary revert, as the above patch contains some python code requiring at
least version 3.10, when the minimum required by LLVM is 3.8.
My aim here is to make these a little easier to maintain by relying on
aliases where these instructions overlap with the Hint instructions they
are based on.
The following instructions have not been converted to aliases as they
have complex mappings from ther immediate encodings to the immediate
encoding of the underlying instruction (setting high bits):
- qc.pputci
- qc.sync, qc.sync, qc.syncwf, qc.syncwl
- qc.c.sync, qc.c.syncr, qc.c.syncwf, qc.syncwl
Co-authored-by: Sudharsan Veeravalli <quic_svs@quicinc.com>
This PR implements fabsbf16 math function for BFloat16 type along with
the tests.
---------
Signed-off-by: krishna2803 <kpandey81930@gmail.com>
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
Co-authored-by: OverMighty <its.overmighty@gmail.com>
This PR introduces the initial version of a C++ framework for the
conformance testing of GPU math library functions, building upon the
skeleton provided in #146391.
The main goal of this framework is to systematically measure the
accuracy of math functions in the GPU libc, verifying correctness or at
least conformance to standards like OpenCL via exhaustive or random
accuracy tests.