Test other splitting situations that appear in greedy.
This includes ensuring we have a case that hits a local split
and instruction split (most of the tests hit the region split path).
Also test a few cases where the final result isn't fully used, resulting
in partial copy bundles instead of a simple full copy. Test physreg
and virtreg agpr interference with a reassignment candidate.
I'm accumulating too many failure cases, and MIR tests are very prone
to painful merge conflicts, so I've added a few more tests and extracted
new tests from #147975.
Closes#149026
Print a more detailed error message when new/modified IR could not be
legalized with `allowPatternRollback = false`. This is useful to
understand why a pattern is incompatible with the new One-Shot Dialect
Conversion driver.
---------
Co-authored-by: Jeremy Kun <jkun@google.com>
This is to support a new inline function reduction in llvm-reduce,
which should pre-filter callsites that are not eligible for inlining.
This code was mostly structured as a match and apply, with a few
exceptions. The ugliest piece is for propagating and verifying
compatible
getGC and personalities. Also collection of EHPad and the convergence
token
to use are now cached in InlineFunctionInfo.
I was initially confused by the split between the checks performed here
and isInlineViable, so better document how this system is supposed to
work.
It turns out this split does make sense, in that isInlineViable checks
if it's possible based on the callee content and the ultimate inline
depended on the callsite context. I think more renames of these
functions
would help, and isInlineViable should probably move out of InlineCost to
be
with these transfoms.
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.
Move this information to ArgFlags to make it directly available in all
relevant places.
I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
Update the easy add/sub/mul/logic/cmp/scalar_to_vector intrinsics to be
constexpr compatible.
I'm not expecting anyone to be very interested in using MMX intrinsics,
but they're smaller than the other types and are useful to test the
constexpr handling and test methods before we start applying them to
SSE/AVX2/AVX512 intrinsics.
Instead of relying on getOrCreateVectorTripCount to initialize
EPI.VectorTripCount, delay initialization after we retrieved the resume
phi and get the trip count from there. This makes the code independent
of legacy vector trip count creation.
When a conversion pattern is initialized without a type converter, the
driver implementation currently looks up the most recently mapped value.
This is undesirable because the most recently mapped value could be a
materialization. I.e., the type of the value being looked up could
depend on which other patterns have run before. Such an implementation
makes the type conversion infrastructure fragile and unpredictable.
The current implementation also contradicts the documentation in the
markdown file. According to that documentation, the values provided by
the adaptor should match the types of the operands of the match
operation when running without a type converter. This mechanism is not
desirable, either, for two reasons:
1. Some patterns have started to rely on receiving the most recently
mapped value. Changing the behavior to the documented behavior will
cause regressions. (And there would be no easy way to fix those without
forcing the use of a type converter or extending the `getRemappedValue`
API.)
2. It is more useful to receive the most recently mapped value. A value
of the original operand type can be retrieved by using the operand of
the matched operation. The adaptor is not needed at all in that case.
To implement the new behavior, materializations are now annotated with a
marker attribute. The marker is needed because not all
`unrealized_conversion_cast` ops are materializations that act as "pure
type conversions". E.g., when erasing an operation, its results are
mapped to newly-created "out-of-thin-air values", which are
materializations (with no input) that should be treated like regular
replacement values during a lookup. This marker-based lookup strategy is
also compatible with the One-Shot Dialect Conversion implementation
strategy, which does not utilize the mapping infrastructure anymore and
queries all necessary information by examining the IR.
Add a helper function to `ConversionPatternRewriter` that returns the
dialect conversion configuration. This flag is useful when migrating
conversion patterns to the new One-Shot Conversion Driver: patterns can
check if they are running in rollback mode or not. They can then work
around API changes and makes sure that the pattern keeps working with
both the old and new driver.
Also remove the `config` field from `OperationLegalizer`. That field was
never needed.
This commit fixes the false positive that C++ class methods with libc
function names would be false warned about. For example,
```
struct T {void strcpy() const;};
void test(const T& t) { str.strcpy(); // no warn }
```
rdar://156264388
Reviewed in #152379
- Move the allocator index set up after the allocate statement otherwise
the derived type descriptor is not allocated.
- Support array of derived-type with device component
Add support for 3-input fmaxnum/fminnum/fmaximum/fminimum introduced in
PTX 8.8 for sm_100+:
- Use a tree reduction when 3-input operations are supported and the
reduction has the `reassoc` flag.
- If not on sm_100+/PTX 8.8, fallback to 2-input operations and use the
default shuffle reduction.
Previously we could only scalably vectorize interleave groups with
factor 2, but after 7ef77eb9984d1fb537a409cf4be89560fbb681fe we now
support all factors (available on RISC-V). So this adds the remaining
check lines for the scalable VFs.
https://github.com/llvm/llvm-project/pull/152192 forgot to make the
argument changes to Android code in UBsan minimal causing a build error
for Android LLVM:
```
/b/f/w/src/git/out/llvm-project/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp:102:3: error: no matching function for call to 'format_msg'
102 | format_msg(kind, caller, msg_buf, msg_buf + sizeof(msg_buf));
| ^~~~~~~~~~
/b/f/w/src/git/out/llvm-project/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp:37:13:
note: candidate function not viable: requires 5 arguments, but 4 were
provided
37 | static void format_msg(const char *kind, uintptr_t caller,
| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
38 | const uintptr_t *address, char *buf,
const char *end) {
```
This change adds the address argument to abort_with_message just like
__ubsan_report_error_fatal so it can be passed to format_msg.
Now that VPWidenPointerInductionRecipes are modelled in VPlan in
#148274, we can support them in EVL tail folding.
We need to replace their VFxUF operand with EVL as the increment is not
guaranteed to always be VF on the penultimate iteration, and UF is
always 1 with EVL tail folding.
We also need to move the creation of the backedge value to the latch so
that EVL dominates it.
With this we will no longer fail to convert a VPlan to EVL tail folding,
so adjust tryAddExplicitVectorLength to account for this. This brings us
to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail
folding vs no tail folding.
The test in only-compute-cost-for-vplan-vfs.ll previously relied on
widened pointer inductions with EVL tail folding to end up in a scenario
with no vector VPlans, so this also replaces it with an unvectorizable
fixed-order recurrence test from
first-order-recurrence-multiply-recurrences.ll that also gets discarded.
When the rhs is a an array element, the assert was triggered but this is
still a valid transfer. Remove the assert. The operation has a verifier
to check its validity.
fixes#140819
SROA pass is making it so that some globals get loaded into stack
allocations. This means we find an alloca where we use to expect a load
and now need to walk an alloca -> store -> maybe load chain before we
find the global. Doing so fixes All but two instances of #137715 And
fixes every instance of `Load of "8.sroa.0" is not a global resource
handle we are currently seeing in the DML shaders.
This adds commentary to explain why ASan does not work for AArch64
non-Android Linux with 39-bit and 42-bit VMAs (e.g.,
https://github.com/llvm/llvm-project/issues/145259).
Additionally, it updates the 42-bit VMA shadow map comment, which has
been outdated for the last 10 years
(18b2258c92df93c83bc7fce94c20baff3c06e2c6 changed 39-bit and 42-bit to
use the same offset), and adds a comment for the 48-bit VMA shadow map.
For RV32 we don't need the byte shifted by 24 to be zero extend
since the extended bits are shifted out.
For RV64, we don't need the byte shifted by 24 to be zero extended
if the upper 32 bits of the result aren't demanded.
The target's output bitcode `libclc_builtins_lib` is located in a
sub-directory in clang resource directory since df7473673214. Setting
TARGET_FILE property can allow targets in non-libclc project to obtain
the path to `libclc_builtins_lib`.
The new `sys::detail::getHostCPUNameForARM` for Windows (#151596) was
implemented using a C++ bit-field, which caused the associated unit
tests to fail on big-endian machines as it assumed a little-endian
layout.
This change switches from the C++ bit-field to LLVM's `BitField` type
instead.
Introduce the `enumToStringRef` enum into `ScopedPrinter.h` that
replicates `enumToString` behaviour, expect that instead of returning a
hex value string, it just returns an empty string. This allows us to
return a StringRef and easily check if an invalid enum was provided
based on the StringRef size
This then uses `enumToStringRef` to remove the redundant
`getResourceName` and `printEnum` functions.
Resolves: https://github.com/llvm/llvm-project/issues/151200.
Addressed TODO to template the StringConverter pop functions to have a
single implementation (combine popUTF8 and popUTF32 into a single
templated pop function)
When two threads are accessing the same `pcm`, it is possible that the
reading thread sees the timestamp update, while the file on disk is not
updated.
This PR moves timestamp update from `writeAST` to
`compileModuleAndReadASTImpl`, so we only update the timestamp after the
file has been committed to disk.
rdar://152097193
- Move the allocator index set up after the allocate statement otherwise
the derived type descriptor is not allocated.
- Support array of derived-type with device component
This patch refactors big chunks of the common functionality shared
between monolithic-linux.sh and monolithic-windows.sh to a separate
script, utils.sh, that is then sourced in both of the files. This makes
it a bit easier to maintain the scripts.
Platform differences should not be a large deal for the setup here as we
are using bash as the shell on both Linux and Windows.
Reviewers:
lnihlen, gburgessiv, Keenuts, DavidSpickett, dschuff, cmtice, Endilll
Reviewed By: DavidSpickett, cmtice
Pull Request: https://github.com/llvm/llvm-project/pull/152199
Enables AMD data center class GPUs to use memory manager memory pooling
up to 3GB allocation by default, up from the "1 << 13" threshold that
all plugin-nextgen devices use.
Globally addressable scratch is a new feature introduced in gfx1250.
However, this feature changes how scratch space is mapped into the flat
aperture, making address space casts from private to flat no longer
uniform.
Add information to the libcxx testing documentation, about the names of
the new CI libcxx runner sets, their current values, and how to change
the values or the runner set being used.
private/firstprivate typically do copy operations, however copying a VLA
isn't really possible. This patch introduces a warning to alert the
person that this copy isn't happening correctly.
As a future direction, we MIGHT consider doing additional work to make
sure they are initialized/copied/deleted/etc correctly.