The SelectionDAG Isel supports the both version of combines mentioned
below :
```
select Cond, Pow2, 0 --> (zext Cond) << log2(Pow2)
select Cond, 0, Pow2 --> (zext !Cond) << log2(Pow2)
```
The GlobalIsel for now only supports the first one defined in it's
generic combinerHelper.cpp. This patch adds the missing second one.
Limits #117900 to only fold when scalar_to_vector doesn't perform implicit truncation, as the scaled shift calculation doesn't currently account for this - this can be addressed in a future update.
Fixes#121306
For global function merging, the target of the arc-attached call must be
a constant and cannot be parameterized.
This change adds a check to bypass this case in `canParameterizeCallOperand()`.
Loop latches often have a loop-carried dependency, and if they have
several SelectLike instructions in one select group, it is usually
profitable to convert it to branches rather than keep selects.
AIX assembly is very different from the gas syntax. We don't expect
other targets to share these differences. Unify the numerous,
essentially AIX-specific variables.
…e Graph
In MachinePipeliner, a DAG class is used to represent the Data
Dependence Graph. Data Dependence Graph generally contains cycles, so
it's not appropriate to use DAG classes. In fact, some "hacks" are used
to express back-edges in the current implementation. This patch adds a
new class to provide a better interface for manipulating dependencies.
Our approach is as follows:
- To build the graph, we use the ScheduleDAGInstrs class as it is,
because it has powerful functions and the current implementation depends
heavily on it.
- After the graph construction is finished (i.e., during scheduling), we
use the new class DataDependenceGraph to manipulate the dependencies.
Since we don't change the dependencies during scheduling, the new class
only provides functions to read them. Also, this patch is just a
refactoring, i.e., scheduling results should not change with or without
this patch.
llvm-mc --assemble prints an initial `.text` from `initSections`.
This is weird for quick assembly tasks that do not specify `.text`.
Omit the .text by moving section directive printing from `changeSection`
to `switchSection`. switchSectionNoPrint now correctly calls the
`changeSection` hook (needed by MachO).
The initial directives of clang -S are now reordered. On ELF targets, we
get `.file "a.c"; .text` instead of `.text; .file "a.c"`.
If there is no function, `.text` will be omitted.
Here we add two methods `getCommonMinimalPhysRegClass` and a LLT
version `getCommonMinimalPhysRegClassLLT`, which return the most
sub register class of the right type that contains these two input
registers.
We don't overload the `getMinimalPhysRegClass` as there will be
ambiguities.
We use it to simplify some code in RISC-V target.
With this change, targets are no longer required to put memory / strict-fp opcodes after special
`ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers.
This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709).
Pull Request: https://github.com/llvm/llvm-project/pull/119969
We used to skip fixed registers, but fixed registers are not enough
because there are some runtime unusable registers like registers
reserved by `-ffixed-xxx` options.
Here we change to use reserved registers so that the estimated
pressure is more accurate.
This reverts commit 4307198d51487cc16f98eebb2113caf4a1905914.
Broke bot ppc64le-clang-multistage-test:
undefined reference to
`llvm::DroppedVariableStats::populateVarIDSetAndInlinedMap in
In function `llvm::DroppedVariableStatsIR::visitEveryInstruction
To get Dropped variable statistics for MIR, we need to move the base
class DroppedVariableStats code to the CodeGen library because we cannot
have CodeGen link against Passes.
Also moved the code for the virtual functions to the header because
clang/lib/CodeGen doesn't link against llvm/lib/CodeGen however it does
link against Passes which contains the `class StandardInstrumentations`
code but not the definition for the virtual functions leading to the
error about not finding vtable for `class DroppedVariableStatsIR`
There are a number of backends (specifically AArch64, AMDGPU, Mips, and
RISCV) which contain a “TODO: make CombinerHelper methods const”
comment. This PR does just that and makes all of the CombinerHelper
methods const, removes the TODO comments and makes the associated
instances const. This change makes some sense because the CombinerHelper
class simply modifies the state of _other_ objects to which it holds
pointers or references.
Note that AMDGPU contains an identical comment for an instance of
AMDGPUCombinerHelper (a subclass of CombinerHelper). I deliberately
haven’t modified the methods of that class in order to limit the scope
of the change. I’m happy to do so either now or as a follow-up.
- update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for
all uses, to allow specifiction of target specific intrinsics
- add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api
- update TTI api to provide `isTargetIntrinsicWith...` functions and
consistently name them
- move `isTriviallyScalarizable` to VectorUtils
- update all uses of the api and provide the TTI parameter
Resolves#117030
SDNode::use_iterator now returns an SDUse& when dereferenced.
SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses
work on use_iterator. SDNode::user_begin/user_end/users work on
user_iterator.
We can now write range based for loops using SDUse& and SDNode::uses().
I've converted many of these in this patch. I didn't update loops that
have additional variables updated in their for statement.
Some loops use SDNode::use_iterator::getOperandNo() which also prevents
using range based for loops. I plan to move this into SDUse in a follow
up patch.
Most of these are just places that want the first user and aren't
iterating over the whole list.
While there I changed some use_size() == 1 to hasOneUse() which
is more efficient.
This is part of an effort to rename use_iterator to user_iterator
and provide a use_iterator that dereferences to SDUse&. This patch
helps reduce the diff on later patches.
This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.
I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.
Scalarize vector FPOWI instead of promoting the type. This allows the
scalar FPOWIs to be visited and converted to libcalls before promoting
the type.
FIXME: This should be done in LegalizeVectorOps/LegalizeDAG, but call
lowering needs the unpromoted EVT.
Without this patch, in some backends, such as RISCV64 and LoongArch64,
the i32 type is illegal and will be promoted. This causes exponent type
check to fail when ISD::FPOWI node generates a libcall.
Fix https://github.com/llvm/llvm-project/issues/118079
The Complex Deinterleaving pass assumes that all values emitted will
result in complex numbers, this patch aims to remove that assumption and
adds support for emitting just the real or imaginary components, not
both.
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of
`TargetRegisterInfo::getRegPressureSetLimit` with some logics to
adjust the limit by removing reserved registers.
It seems that we shouldn't use
`TargetRegisterInfo::getRegPressureSetLimit`
directly, just like the comment "This limit must be adjusted
dynamically for reserved registers" said.
Thus we should use `RegisterClassInfo::getRegPressureSetLimit` and
remove replicated code.
Separate from https://github.com/llvm/llvm-project/pull/118787
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of
`TargetRegisterInfo::getRegPressureSetLimit` with some logics to
adjust the limit by removing reserved registers.
It seems that we shouldn't use
`TargetRegisterInfo::getRegPressureSetLimit`
directly, just like the comment "This limit must be adjusted
dynamically for reserved registers" said.
Separate from https://github.com/llvm/llvm-project/pull/118787
There was a buildbot breakage
(https://lab.llvm.org/buildbot/#/builders/24/builds/3329/steps/11/logs/stdio):
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AMDGPU/ran-out-of-registers-error-all-regs-reserved.ll:9:10:
error: CHECK: expected string not found in input
; CHECK: error: <unknown>:0:0: no registers from class available to
allocate in function 'no_registers_from_class_available_to_allocate'
2: ==75198==ERROR: AddressSanitizer: stack-use-after-scope on address
0xfa23f9f1c270 at pc 0xb2660dda9340 bp 0xfffffe8ab340 sp 0xfffffe8ab338
caused by https://github.com/llvm/llvm-project/pull/120184, which made a
partial fix but also renabled the tests. This patch attempts to fix
forward by applying the same fix to the error message highlighted in the
buildbot.
This patch introduces the LLVM components of a type sanitizer: a
sanitizer for type-based aliasing violations.
It is based on Hal Finkel's https://reviews.llvm.org/D32198.
C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit
these given TBAA metadata added by Clang. Roughly, a pointer of given
type cannot be used to access an object of a different type (with, of
course, certain exceptions). Unfortunately, there's a lot of code in the
wild that violates these rules (e.g. for type punning), and such code
often must be built with -fno-strict-aliasing. Performance is often
sacrificed as a result. Part of the problem is the difficulty of finding
TBAA violations. Hopefully, this sanitizer will help.
For each TBAA type-access descriptor, encoded in LLVM's IR using
metadata, the corresponding instrumentation pass generates descriptor
tables. Thus, for each type (and access descriptor), we have a unique
pointer representation. Excepting anonymous-namespace types, these
tables are comdat, so the pointer values should be unique across the
program. The descriptors refer to other descriptors to form a type
aliasing tree (just like LLVM's TBAA metadata does). The instrumentation
handles the "fast path" (where the types match exactly and no
partial-overlaps are detected), and defers to the runtime to handle all
of the more-complicated cases. The runtime, of course, is also
responsible for reporting errors when those are detected.
The runtime uses essentially the same shadow memory region as tsan, and
we use 8 bytes of shadow memory, the size of the pointer to the type
descriptor, for every byte of accessed data in the program. The value 0
is used to represent an unknown type. The value -1 is used to represent
an interior byte (a byte that is part of a type, but not the first
byte). The instrumentation first checks for an exact match between the
type of the current access and the type for that address recorded in the
shadow memory. If it matches, it then checks the shadow for the
remainder of the bytes in the type to make sure that they're all -1. If
not, we call the runtime. If the exact match fails, we next check if the
value is 0 (i.e. unknown). If it is, then we check the shadow for the
remainder of the byes in the type (to make sure they're all 0). If
they're not, we call the runtime. We then set the shadow for the access
address and set the shadow for the remaining bytes in the type to -1
(i.e. marking them as interior bytes). If the type indicated by the
shadow memory for the access address is neither an exact match nor 0, we
call the runtime.
The instrumentation pass inserts calls to the memset intrinsic to set
the memory updated by memset, memcpy, and memmove, as well as
allocas/byval (and for lifetime.start/end) to reset the shadow memory to
reflect that the type is now unknown. The runtime intercepts memset,
memcpy, etc. to perform the same function for the library calls.
The runtime essentially repeats these checks, but uses the full TBAA
algorithm, just as the compiler does, to determine when two types are
permitted to alias. In a situation where access overlap has occurred and
aliasing is not permitted, an error is generated.
Clang's TBAA representation currently has a problem representing unions,
as demonstrated by the one XFAIL'd test in the runtime patch. We'll
update the TBAA representation to fix this, and at the same time, update
the sanitizer.
When the sanitizer is active, we disable actually using the TBAA
metadata for AA. This way we're less likely to use TBAA to remove memory
accesses that we'd like to verify.
As a note, this implementation does not use the compressed shadow-memory
scheme discussed previously
(http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That
scheme would not handle the struct-path (i.e. structure offset)
information that our TBAA represents. I expect we'll want to further
work on compressing the shadow-memory representation, but I think it
makes sense to do that as follow-up work.
It goes together with the corresponding clang changes
(https://github.com/llvm/llvm-project/pull/76260) and compiler-rt
changes (https://github.com/llvm/llvm-project/pull/76261)
PR: https://github.com/llvm/llvm-project/pull/76259
This adds a new helper `canFoldStoreIntoLibCallOutputPointers()` to
check that it is safe to fold a store into a node that will expand to a
library call that takes output pointers. This requires checking for two
(independent) properties:
1. The store is not within a CALLSEQ_START..CALLSEQ_END pair
* If it is, the expansion would lead to nested call sequences (which is
invalid)
2. The node does not appear as a predecessor to the store
* If it does, attempting to merge the store into the call would result
in a cycle in the DAG
These two properties are checked as part of the same traversal in
`canFoldStoreIntoLibCallOutputPointers()`
This makes sure no optimizations are applied that assume the
bigger alignment or size, which could be incorrect if we link
together with non-instrumented code.
This reverts commit f7443905af1e06eaacda1e437fff8d54dc89c487.
This is to avoid an assertion if an undef operand appears in a
stackmap. This is important to avoid hitting verifier errors
when register allocation starts adding undefs in error scenarios.
Rather than trying to treat undef operands as special, leave them
alone and avoid producing an invalid spill. It would a bit more
precise to produce a spill of an undef register here, but that's not
exposed through the storeRegToStackSlot API.
https://reviews.llvm.org/D122605
This was an alternative to https://reviews.llvm.org/D122582
ISD::isBuildVectorAllOnes can peek through bitcasts, so this can match against FP NAN (ish) data (e.g. double (bitcast i64 -1)) under certain circumstances - bail if the type isn't an integer and let bitcast folding handle it first.
Fixes#120093
A previous PR introduced a threshold where we would mask out a LR that
had been evicted a certain number of times to combat pathological
compile time cases with a somewhat adversarial model. However, this
patch did not take into account urgent LRs which led to compilation
failures when greedy would expect us to provide an eviction and we could
not due to the newly introduced logic.
This patch make a couple of improvements to ReduceLoadOpStoreWidth.
When determining the minimum size of "NewBW" we now take byte boundaries
into account. If we for example touch bits 6-10 we shouldn't accept
NewBW=8, because we would fail later when detecting that we can't access
bits from two different bytes in memory using a single load. Instead we
make sure to align LSB/MSB according to byte size boundaries up front
before searching for a viable "NewBW".
In the past we only tried to find a "ShAmt" that was a multiple of
"NewBW", but now we use a sliding window technique to scan for a viable
"ShAmt" that is a multiple of the byte size. This can help out finding
more opportunities for optimization (specially if the original type
isn't byte sized, and for big-endian targets when the original
load/store is aligned on the most significant bit).