Looks like some of these were not properly formatted at some point. This
patch reformats these files so that future diffs are cleaner when
running the formatter over the whole file.
This is the x64 equivalent of #121516
Since import call optimization was originally [added to x64 Windows to
implement a more efficient retpoline
mitigation](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618)
the section and constant names relating to this all mention "retpoline"
and we need to mark indirect calls, control-flow guard calls and jumps
for jump tables in the section alongside calls to imported functions.
As with the AArch64 feature, this emits a new section into the obj which
is used by the MSVC linker to generate the Dynamic Value Relocation
Table and the section itself does not appear in the final binary.
The Windows Loader requires a specific sequence of instructions be
emitted when this feature is enabled:
* Indirect calls/jumps must have the function pointer to jump to in
`rax`.
* Calls to imported functions must use the `rex` prefix and be followed
by a 5-byte nop.
* Indirect calls must be followed by a 3-byte nop.
This patch makes llvm-exegesis emit an error when the machine function
fails in MachineVerification rather than aborting. This allows
downstream users (particularly https://github.com/google/gematria) to
handle these errors rather than having the entire process crash. This
essentially be NFC from the user perspective minus the addition of the
new error message.
## Purpose
This patch prepares the llvm/CodeGen library for public interface
annotations in support of an LLVM Windows DLL (shared library) build,
tracked in #109483. The purpose of this patch is to make the upcoming
codemod of this library more straight-forward. It is not expected to
impact any functionality.
The `LLVM_ABI` annotations will be added in a subsequent patch. These
changes are required to build with visibility annotations using Clang
and gcc on Linux/Darwin/etc; Windows DLL can build fine without them.
## Overview
This PR does four things in preparation for adding `LLVM_ABI`
annotations to llvm/CodeGen:
1. Explicitly include `Machine.h` and `Function.h` headers from
`MachinePassManager.cpp` so that `Function` and `Machine` types are
available for the instantiations of `InnerAnalysisManagerProxy`. Without
this change, Clang only will only export one of the templates after
visibility annotations are added to them. Unclear if this is a Clang bug
or expected behavior, but this change avoids the issue and should be
harmless.
2. Refactor the definition of `MachineFunctionAnalysisManager` to its
own header file. Without this change, it is not possible to add
visibility annotations to the declaration with causing gcc to produce
`-Wattribute` warnings.
3. Remove the redundant specialization of the
`DominatorTreeBase<MachineBasicBlock, false>::addRoot` method. The
specialization is the same as implemented in `DominatorTreeBase` so
should be unnecessary. Without this change, it is not possible to
annotate the subsequent instantiations of `DominatorTreeBase` in the
header file without gcc producing `-Wattribute` warnings. Mark
unspecialized `addRoot` as `inline` to match the removed specialized
version.
4. Move the explicit instantiations of the `GenericDomTreeUpdater`
template earlier in the header file. These need to appear before being
used in the `MachineDomTreeUpdater` class definition or gcc will produce
warnings once visibility annotations are added.
## Background
The LLVM Windows DLL effort is tracked in #109483. Additional context is
provided in [this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307).
Clang and gcc handle visibility attributes on explicit template
instantiations a bit differently; gcc is pickier and generates
`-Wattribute` warnings when an explicit instantiation with a visibility
annotation appears after the type has already appeared in the
translation unit. These warnings can be avoided by moving explicit
template instantiations so they always appear first.
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
Now that MLIR accepts nuw and nusw in getelementptr, this patch emits
the inbounds and nuw attributes when lower memref to LLVM in load and
store operators.
This patch also strengthens the memref.load and memref.store spec about
undefined behaviour during lowering.
This patch also lifts the |rewriter| parameter in getStridedElementPtr
ahead so that LLVM::GEPNoWrapFlags can be added at the end with a
default value and grouped together with other operators' parameters.
Signed-off-by: Lin, Peiyong <linpyong@gmail.com>
There are several type traits that produce a boolean value or type based
on the well-formedness of some expression (more precisely, the immediate
context, i.e. for example excluding nested template instantiation):
* `__is_constructible` and variants,
* `__is_convertible` and variants,
* `__is_assignable` and variants,
* `__reference_{binds_to,{constructs,converts}_from}_temporary`,
* `__is_trivially_equality_comparable`,
* `__builtin_common_type`.
(It should be noted that the standard doesn't always base this on the
immediate context being well-formed: for `std::common_type` it's based
on whether some expression "denotes a valid type." But I assume that's
an editorial issue and means the same thing.)
Errors in the immediate context are suppressed, instead the type traits
return another value or produce a different type if the expression is
not well-formed. This is achieved using an `SFINAETrap` with
`AccessCheckingSFINAE` set to true. If the type trait is used outside of
an SFINAE context, errors are discarded because in that case the
`SFINAETrap` sets `InNonInstantiationSFINAEContext`, which makes
`isSFINAEContext` return an `optional(nullptr)`, which causes the errors
to be discarded in `EmitDiagnostic`. However, in an SFINAE context this
doesn't happen, and errors are added to `SuppressedDiagnostics` in the
`TemplateDeductionInfo` returned by `isSFINAEContext`. Once we're done
with deducing template arguments and have decided which template is
going to be instantiated, the errors corresponding to the chosen
template are then emitted. At this point we get errors from those type
traits that we wouldn't have seen if used with the same arguments
outside of an SFINAE context. That doesn't seem right.
So what we want to do is always set `InNonInstantiationSFINAEContext`
when evaluating these well-formed-testing type traits, regardless of
whether we're in an SFINAE context or not. This should only affect the
immediate context, as nested contexts add a new `CodeSynthesisContext`
that resets `InNonInstantiationSFINAEContext` for the time it's active.
Going through uses of `SFINAETrap` with `AccessCheckingSFINAE` = `true`,
it occurred to me that all of them want this behavior and we can just
use this parameter to decide whether to use a non-instantiation context.
The uses are precisely the type traits mentioned above plus the
`TentativeAnalysisScope`, where I think it is also fine. (Though I think
we don't do tentative analysis in SFINAE contexts anyway.)
Because the parameter no longer just sets `AccessCheckingSFINAE` in Sema
but also `InNonInstantiationSFINAEContext`, I think it should be renamed
(along with uses, which also point the reviewer to the affected places).
Since we're testing for validity of some expression, `ForValidityCheck`
seems to be a good name.
The added tests should more or less correspond to the users of
`SFINAETrap` with `AccessCheckingSFINAE` = `true`. I added a test for
errors outside of the immediate context for only one type trait, because
it requires some setup and is relatively noisy.
We put the `ForValidityCheck` condition first because it's constant in
all uses and this would then allow the compiler to prune the call to
`isSFINAEContext` when true.
Fixes#132044.
Motivation example:
```
> lldb -c altmain2.core
...
(lldb) var F
(const char *) F = 0x0804a000 ""
```
The variable `F` points to a read-only memory page not dumped to the
core file, so `Process::ReadMemory()` cannot read the data. The patch
switches to `Target::ReadMemory()`, which can read data both from the
process memory and the application binary.
DWARF linetable entries are usually emitted as a sequence of
MCDwarfLineAddrFragment fragments containing the line-number difference
and an MCExpr describing the instruction-range the linetable entry
covers. These then get relaxed during assembly emission.
However, a large number of these instruction-range expressions are
ranges within a fixed MCDataFragment, i.e. a range over fixed-size
instructions that are not subject to relaxation at a later stage. Thus,
we can compute the address-delta immediately, and not spend time and
memory describing that computation so it can be deferred.
Add more test coverage for peeling the last iteration with variable trip
counts. Separate test cases for constant and variable trip counts in
different files.
This addresses post-commit review feedback from someone who discovered
that we diagnosed code like the following:
```
struct S {
int len;
const char fam[];
} s;
```
despite it being invalid to initialize the flexible array member.
Note, this applies to flexible array members and zero-sized arrays at
the end of a structure (an old-style flexible array member), but it does
not apply to one-sized arrays at the end of a structure because those do
occupy storage that can be initialized.
This patch fixes:
flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIROrderedAssignments.cpp:1377:10:
error: variable 'isValid' set but not used
[-Werror,-Wunused-but-set-variable]
This change adds handling for C++ member operator calls, implicit no-op
casts, and l-value call expressions. Together, these changes enable
handling of range for loops based on iterators.
While investigating an issue with code coverage reporting around
exceptions it was useful to have a baseline of what works today.
This change adds end-to-end testing to validate code coverage behavior
that is currently working with regards to exception handling.
When conditional tail call is located in old code while BOLT is
operating in lite mode, the call will require optional pending
relocation with a type that is currently not supported resulting in a
build-time crash.
Before a proper fix is implemented, ignore conditional tail calls for
relocation purposes and mark their target functions to be patched, i.e.
to be served as veneers/thunks.
This PR adds the XeGPU workgroup (wg) to subgroup (sg) pass. The wg to
sg pass transforms the xegpu wg level operations to subgroup operations
based on the sg_layout and sg_data attribute. The PR adds transformation
patterns for following Ops
1. CreateNdDesc
2. LoadNd
3. StoreNd
4. PrefetchNd
4. UpdateNdOffset
5. Dpas
In 'EmitStoreThroughExtVectorComponentLValue', move the code which ZExts
in the case the Destination Scalar Type is larger than the Source Scalar
Type, to the top of the function, to ensure each condition is handled.
The previous code missed this case:
```
bool4 b = true.xxxx;
b.xyz = false.xxx;
```
Leading to a bad shuffle vector.
Closes#140564
We need the top bits to be zeroes, but an v8i8->i32 EXTRACT_VECTOR_ELT will
anyext into the top bits. The instruction we create (UADDV) is known to be
zeroes in the upper bits, so we can convert to a larger v2i32 vector and
extract from there, similar to the operation currently performed for i64
types.
Fixes#140707
A default-initialized union with a const member is generally reasonable
in C and isn't necessarily incompatible with C++, so we now silence the
diagnostic in that case. However, we do still diagnose a const-
qualified, default-initialized union as that is incompatible with C++.
Change the default from 256 to 24. The argument is that 256 is too large to be scanned
by eye, and too large to print in a terminal which can be only 40-50 lines in height.
When all children must be shown, `frame variable` and `expression` both support the `-A`
(`--show-all-children`) flag.
rdar://145327522
This is a NFC patch.
This patch duplicate GFX11plus runlines and apply them with
"+mattr=+real-true16" and "+mattr=-real-true16" on fmax3/fmin3 tests,
and putting '-real-true16' on gisel testline. And then update the test
with the update script
CSE may delete operations from hlfir.exactly_once and reuse
the equivalent results from the parent region(s), e.g. from the parent
hlfir.region_assign. This makes it problematic to clone
hlfir.exactly_once before the top-level hlfir.where.
This patch adds a "canonicalizer" that pulls in such operations
back into hlfir.exactly_once.
Now CFINV follows AXFLAGS behaviour for CRm.
It looks like (0) in the instruction encoding means that the behaviour
is UNPREDICTABLE if that bit is not zero.
In a sub-subscript of an array-section, it is actually an array section.
So make sure we get the location correct when there isn't a 'colon' to
look at.
isLoadFromStackSlot/isStoreToStackSlot/getMemOperandsWithOffsetWidth
The first 2 probably requires spills/reloads which we don't use
LD_RV32/SD_RV32 for yet.
I think getMemOperandsWithOffsetWidth is mainly used for load/store
clustering. I think we can assume this just works.
llvm-objdump currently calls MCDisassembler::getInstruction with
unadjusted address and MCInstPrinter::printInst with adjusted address.
The decoded branch targets will be adjusted as expected for most targets
(as the getInstruction address is insignificant) but not for SystemZ
(where the getInstruction address is displayed).
Specify an adjust address to fix SystemZInstPrinter output.
The added test utilizes llvm/utils/update_test_body.py to make updates
easier and additionally checks that we don't adjust SHN_ABS symbol
addresses.
Pull Request: https://github.com/llvm/llvm-project/pull/140471
This patch fixes:
llvm/lib/Target/X86/X86ISelLowering.cpp:39622:12: error: explicitly
assigning value of variable of type 'SDValue' to itself
[-Werror,-Wself-assign-overloaded]
These are identical in IR as the 'compute' constructs, but require a
little additional work since we have 2 operations to work around, not
just 1. Note that the test is nearly identical to the compute version,
except that the combined 'tag's are present, plus the 'loop' construct.
Very similar to what we do in lowerShuffleAsVALIGN
I've updated isTargetShuffleEquivalent to correctly handle SM_SentinelZero in the expected shuffle mask, but it only allows an exact match (or the test mask was undef) - it can't be used to match zero elements with MaskedVectorIsZero.
Noticed while working on #140516