There will be more schedule definitions for vendor extentions and
we need to add these `UnsupportedSchedXXX` to exsiting models every
time we add new schedule definitions.
The fact is that each vendor will barely implement other vendors'
extensions, so we can package these definitions into one.
This removes the CostKind == TCK_RecipThroughput limitation from
getCmpSelInstrCost, allowing it to return more accurate costs for CodeSize and
Lat / SizeLat. Especially for larger vectors under CodeSize, the returned costs
are currently 1, not the legalization cost.
Fixes asserting with windows-elf triples. Should fix regression
reported in https://github.com/llvm/llvm-project/pull/147225#issuecomment-3054190938
I'm not sure this is a valid triple, but I'm guessing the MCJIT usage
is an accident. This does change the behavior from trying to use WinEH
to DwarfCFI; however the backend crashes with WinEH so I'm assuming following
ELF is the more correct option.
Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So
that we can use EVT::getVectorVT to generate EVT type in
getOptimalMemOpType.
Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).
C++23 mandates that temporaries used in range-based for loops are
lifetime-extended
to cover the full loop. This patch adds a check for loop variables and
compiler-
generated `__range` bindings to apply the correct extension.
Includes test cases based on examples from CWG900/P2644R1.
Fixes https://github.com/llvm/llvm-project/issues/109793
To prepare for other platforms, such as 64-bit AIX, that have a non-zero
mmap beginning address.
---------
Co-authored-by: David Justo <david.justo.1996@gmail.com>
Changes in this PR:
* Declare most of workitem functions in clc and opencl folders.
* Call clc workitem function in corresponding OpenCL workitem function.
* Move ptx-nvidiacl workitem built-in implementations into clc.
* Move a few amdgcn workitem built-in implementations into clc.
* Include only needed headers in OpenCL workitem functions.
* Implement get_local_linear_id, get_max_sub_group_size,
get_num_sub_groups,
get_sub_group_id, get_sub_group_local_id, get_sub_group_size for
ptx-nvidiacl.
llvm-diff shows this PR adds a few new symbols to nvptx64--nvidiacl.bc.
llvm-diff shows no change to amdgcn--amdhsa.bc, nvptx--.bc and
nvptx64--.bc.
These scripts belong in the `mlgo-utils` directory when directly used
with python3. But since they are also used to package with pip, symlink
the entrypoint scripts to mlgo-utils directory. Adjust the bazel paths
to account for this as well. This loosely follows the same structure as lit.
Verified that I was also able to build the package successfully and use
the script.
Added emission of the 2-element reduction instead of 2 extracts + scalar
op, when trying to vectorize operands of the instruction, if it is more
profitable.
* Introduce an error code for illegal_line_offset in sampleprof_error
namespace, and use it for line offset parsing error.
* Add `const` for `LineLocation::serialize`.
* Use structured binding, make_first/second_range in loops.
I'm working on a [sample-profile format
change](https://github.com/llvm/llvm-project/compare/users/mingmingl-llvm/samplefdo-profile-format)
to extend SampleFDO profile with vtable profiles. And this change splits
the non-functional changes.
These instructions operate on bytes so we need to round the demanded
bits up to the nearest byte which we aren't doing. I think we forgot to
update this when we changed from hasAllWUsers to hasNBitUsers.
We don't have any test case for these instruction so remove them until
we can put together a test.
This PR addresses instances of compiler warning C4146 that can be
replaced with std::numeric_limits. Specifically, these are cases where a
literal such as '-1ULL' was used to assign a value to a uint64_t
variable. The intent is much cleaner if we use the appropriate
std::numeric_limits value<Type>::max() for these cases.
Addresses #147439
This PR adds a new transformation that turns sequences of `vector.to_elements` and `vector.from_elements` into a binary tree of `vector.shuffle` operations.
(Related RFC:
https://discourse.llvm.org/t/rfc-adding-vector-to-elements-op-to-the-vector-dialect/86779).
Example:
```
%0:4 = vector.to_elements %a : vector<4xf32>
%1:4 = vector.to_elements %b : vector<4xf32>
%2:4 = vector.to_elements %c : vector<4xf32>
%3 = vector.from_elements %0#0, %0#1, %0#2, %0#3,
%1#0, %1#1, %1#2, %1#3,
%2#0, %2#1, %2#2, %2#3 : vector<12xf32>
==>
%0 = vector.shuffle %a, %b [0, 1, 2, 3, 4, 5, 6, 7] : vector<4xf32>, vector<4xf32>
%1 = vector.shuffle %c, %c [0, 1, 2, 3, -1, -1, -1, -1] : vector<4xf32>, vector<4xf32>
%2 = vector.shuffle %0, %1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] : vector<8xf32>, vector<8xf32>
```
The algorithm leverages the structured extraction/insertion information
of `vector.to_elements` and `vector.from_elements` operations and builds
a set of intervals to determine the vector length that should be used at
each level of the tree to combine the level inputs in pairs.
There are a few improvements that can be implemented in the future, such
as shuffle mask compression to avoid unnecessarily large vector lengths
with poison values, but I decided to keep things "simpler" and spend
more time documenting the different steps of the algorithm so that
people can follow along.
This is so that we'll be able to use it in compiler-rt as well.
Dependencies on LLVM Support were removed from the header by restoring
code from the original SipHash implementation.
Reviewers: kuhar, dwblaikie, ahmedbougacha
Reviewed By: dwblaikie
Pull Request: https://github.com/llvm/llvm-project/pull/134197
This patch bumps the runner version from v3.222.0 to v3.226.0 as
v3.222.0 is too old at this point to connect to Github. This is needed
for the new premerge system given we are directly using this container.
This did not impact the existing libc++ CI as the runner was contained
in a separate container image.
When an llvm tool crashes (e.g. from a segmentation fault),
SignalHandler will re-raise the signal. The effect is that crash reports
now contain SignalHandler in the stack trace. The crash reports are
still useful, but the presence of SignalHandler can confuse tooling and
automation that deduplicate or analyze crash reports.
rdar://150464802
When we create a lambda, we would skip over declaration contexts
representing a require expression body, which would lead to wrong
lookup.
Note that I wasn't able to establish why the code
in `Sema::createLambdaClosureType` was there to begin with (it's not
exactly recent)
The changes to mangling only ensure the status quo is preserved and do
not attempt to address the known issues of
mangling lambdas in require clauses.
In particular the itanium mangling is consistent with Clang before this
patch but differs from GCC's.
Fixes#147650
This just moves the test from `libcxx` to `generic`. There are currently
no `std::function` formatters for libstdc++ so I didn't add a test-case
for it.
Split out from https://github.com/llvm/llvm-project/pull/146740
This PR adds a test for parsing the bitfield_info attribute.
Additionally, it updates the `storage_type` and `is_signed` fields to
match the style used in the incubator ASM format guide.