`__stack_chk_fail` does not return, but `unreachable` was not generated
following `call __stack_chk_fail`. This had a possibility to generate an
invalid binary for functions with a return type, because
`__stack_chk_fail`'s return type is void and `call __stack_chk_fail` can
be the last instruction in the function whose return type is non-void.
Generating `unreachable` after it makes sure CFGStackify's
`fixEndsAtEndOfFunction` handles it correctly.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D83277
This patch fixes all remaining warnings in:
llvm/test/CodeGen/AArch64/sve-trunc.ll
llvm/test/CodeGen/AArch64/sve-vector-splat.ll
I hit some warnings related to getCopyPartsToVector. I fixed two
issues:
1. In widenVectorToPartType() we assumed that we'd always be
using BUILD_VECTOR nodes to expand from one vector type to another,
which is incorrect for scalable vector types. I've fixed this for now
by simply bailing out immediately for scalable vectors.
2. In getCopyToPartsVector() I've changed the code to compare
the element counts of different types.
Differential Revision: https://reviews.llvm.org/D83028
SelectionDAGBuilder converts logic-of-compares into multiple branches based
on a boolean TLI setting in isJumpExpensive(). But that probably never
considered the pattern of extracted bools from a vector compare - it seems
unlikely that we would want to turn vector logic into control-flow.
The motivating x86 reduction case is shown in PR44565:
https://bugs.llvm.org/show_bug.cgi?id=44565
...and that test shows the expected improvement from using pmovmsk codegen.
For AArch64, I modified the test to include an extra op because the simpler
test gets transformed by a codegen invocation of SimplifyCFG.
Differential Revision: https://reviews.llvm.org/D82602
Whilst trying to assemble the following test:
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_set2.c
I discovered we were hitting some warnings about possible invalid
calls to getVectorNumElements() in getCopyToPartsVector(). I've
tried to fix these by using ElementCount types where possible and
I've made the assumption that we don't support using a fixed width
vector to copy parts of a scalable vector, and vice versa. Looking
at how the copy is implemented I think that's the right thing for
now.
Differential Revision: https://reviews.llvm.org/D82744
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.
Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.
Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.
Reviewed By: nickdesaulniers, void
Differential Revision: https://reviews.llvm.org/D79794
This lowers intrinsic @llvm.get.active.lane.mask to a setcc node, i.e. an icmp
ule, and creates vectors for its 2 arguments on which the comparison is
performed.
Differential Revision: https://reviews.llvm.org/D82292
Summary:
- AssertAlign node records the guaranteed alignment on its source node,
where these alignments are retrieved from alignment attributes in LLVM
IR. These tracked alignments could help DAG combining and lowering
generating efficient code.
- In this patch, the basic support of AssertAlign node is added. So far,
we only generate AssertAlign nodes on return values from intrinsic
calls.
- Addressing selection in AMDGPU is revised accordingly to capture the
new (base + offset) patterns.
Reviewers: arsenm, bogner
Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81711
Summary:
Half-precision floating point arguments and returns are currently
promoted to either float or int32 in clang's CodeGen and there's
no existing support for the lowering of `half` arguments and returns
from IR in AArch32's backend.
Such frontend coercions, implemented as coercion through memory
in clang, can cause a series of issues in argument lowering, as causing
arguments to be stored on the wrong bits on big-endian architectures
and incurring in missing overflow detections in the return of certain
functions.
This patch introduces the handling of half-precision arguments and returns in
the backend using the actual "half" type on the IR. Using the "half"
type the backend is able to properly enforce the AAPCS' directions for
those arguments, making sure they are stored on the proper bits of the
registers and performing the necessary floating point convertions.
Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer
Reviewed By: ostannard
Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75169
Summary:
The naked function attribute is meant to suppress all function
prologue/epilogue instructions.
On ARM, some are still emitted if an argument greater than 64 bytes in size
(the threshold for using the byval attribute in IR) is passed partially
in registers.
Perform the check for Attribute::Naked and early exit in
SelectionDAGISel::LowerArguments().
Checking in ARMFrameLowering::determineCalleeSaves() is too late.
A test case is included.
Reviewers: llvm-commits, olista01, danielkiss
Reviewed By: danielkiss
Subscribers: kristof.beyls, hiraditya, danielkiss
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80715
Change-Id: Icedecf2a4ad31bc3c35ab0df7489a9d346e1f7cc
Now that all of the statepoint related routines have classes with isa support, let's cleanup.
I'm leaving the (dead) utitilities in tree for a few days so that I can do the same cleanup downstream without breakage.
In the current statepoint design, we have four distinct groups of operands to the call: call args, gc transition args, deopt args, and gc args. This format prexisted the support in IR for operand bundles and was in fact one of the inspirations for the extension. However, we never went back and rearchitected statepoints to fully leverage bundles.
This change is the first in a small sequence to do so. All this does is extend the SelectionDAG lowering code to allow deopt and gc transition operands to be specified in either inline argument bundles or operand bundles.
Differential Revision: https://reviews.llvm.org/D8059
This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven,
and performs rounding to the nearest integer value, rounding halfway
cases to even. The intrinsic represents the missed case of IEEE-754
rounding operations and now llvm provides full support of the rounding
operations defined by the standard.
Differential Revision: https://reviews.llvm.org/D75670
If the caller needs to reponsible for making sure the MaybeAlign
has a value, then we should just make the caller convert it to an Align
with operator*.
I explicitly deleted the relational comparison operators that
were being inherited from Optional. It's unclear what the meaning
of two MaybeAligns were one is defined and the other isn't
should be. So make the caller reponsible for defining the behavior.
I left the ==/!= operators from Optional. But now that exposed a
weird quirk that ==/!= between Align and MaybeAlign required the
MaybeAlign to be defined. But now we use the operator== from
Optional that takes an Optional and the Value.
Differential Revision: https://reviews.llvm.org/D80455
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
Now that load/store alignment is required, we no longer need most
of them. Also switch the getLoadStoreAlignment() helper to return
Align instead of MaybeAlign.
Along the lines of D77454 and D79968. Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
various places, including SelectionDAG and InstCombine.
Differential Revision: https://reviews.llvm.org/D80044
This is D77454, except for stores. All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.
Differential Revision: https://reviews.llvm.org/D79968
The fact that loads and stores can have the alignment missing is a
constant source of confusion: code that usually works can break down in
rare cases. So fix the LoadInst API so the alignment is never missing.
To reduce the number of changes required to make this work, IRBuilder
and certain LoadInst constructors will grab the module's datalayout and
compute the alignment automatically. This is the same alignment
instcombine would eventually apply anyway; we're just doing it earlier.
There's a minor risk that the way we're retrieving the datalayout
could break out-of-tree code, but I don't think that's likely.
This is the last in a series of patches, so most of the necessary
changes have already been merged.
Differential Revision: https://reviews.llvm.org/D77454
Summary:
This patch handles illegal scalable types when lowering IR operations,
addressing several places where the value of isScalableVector() is
ignored.
For types such as <vscale x 8 x i32>, this means splitting the
operations. In this example, we would split it into two
operations of type <vscale x 4 x i32> for the low and high halves.
In cases such as <vscale x 2 x i32>, the elements in the vector
will be promoted. In this case they will be promoted to
i64 (with a vector of type <vscale x 2 x i64>)
Reviewers: sdesmalen, efriedma, huntergr
Reviewed By: efriedma
Subscribers: david-arm, tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78812
We allocated a suitably aligned frame index so we know that all the values
have ABI alignment.
For MIPS this avoids using pair of lwl + lwr instructions instead of a
single lw. I found this when compiling CHERI pure capability code where
we can't use the lwl/lwr unaligned loads/stores and and were to falling
back to a byte load + shift + or sequence.
This should save a few instructions for MIPS and possibly other backends
that don't have fast unaligned loads/stores.
It also improves code generation for CodeGen/X86/pr34653.ll and
CodeGen/WebAssembly/offset.ll since they can now use aligned loads.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78999
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
Summary:
When generating code for the LLVM IR zeroinitialiser operation, if
the vector type is scalable we should be using SPLAT_VECTOR instead
of BUILD_VECTOR.
Differential Revision: https://reviews.llvm.org/D78636
Using getValueType() is not correct for architectures extended with CHERI since
we need a pointer type and not the value that is loaded. While stack
protector is useless when you have CHERI (since CHERI provides much
stronger security guarantees), we still have a test to check that we can
generate correct code for checks. Merging b281138a1b67ca4405b77d774adc3de72742e7a2
into our tree broke this test. Fix by using TLI.getFrameIndexTy().
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D77785
Summary:
Remove asserting vector getters from Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: dexonsmith, sdesmalen, efriedma
Reviewed By: efriedma
Subscribers: cfe-commits, hiraditya, llvm-commits
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D77278
I've always found the "findValue" a little odd and
inconsistent with other things in SDB.
This simplfifies the code in SDB to just handle a splat constant
address or a 2 operand GEP in the same BB. This removes the
need for "findValue" since the operands to the GEP are
guaranteed to be available. The splat constant handling is
new, but was needed to avoid regressions due to constant
folding combining GEPs created in CGP.
CGP is now responsible for canonicalizing gather/scatters into
this form. The pattern I'm using for scalarizing, a scalar GEP
followed by a GEP with an all zeroes index, seems to be subject
to constant folding that the insertelement+shufflevector was not.
Differential Revision: https://reviews.llvm.org/D76947
The "Align" passed into getMachineMemOperand etc. is the alignment of
the MachinePointerInfo, not the alignment of the memory operation.
(getAlign() on a MachineMemOperand automatically reduces the alignment
to account for this.)
We were passing on wrong (overconservative) alignment in a bunch of
places. Fix a bunch of these, mostly in legalization. And while I'm
here, switch to the new Align APIs.
The test changes are all scheduling changes: the biggest effect of
preserving large alignments is that it improves alias analysis, so the
scheduler has more freedom.
(I was originally just trying to do a minor cleanup in
SelectionDAGBuilder, but I accidentally went deeper down the rabbit
hole.)
Differential Revision: https://reviews.llvm.org/D77687
Summary:
No error or warning is emitted when specific reserved registers are
written to in inline assembly. Therefore, writes to the program counter
or to the frame pointer, for instance, were permitted, which could have
led to undesirable behaviour.
Example:
int foo() {
register int a __asm__("r7"); // r7 = frame-pointer in M-class ARM
__asm__ __volatile__("mov %0, r1" : "=r"(a) : : );
return a;
}
In contrast, GCC issues an error in the same scenario.
This patch detects writes to specific reserved registers in inline
assembly for ARM and emits an error in such case. The detection works
for output and input operands. Clobber operands are not handled here:
they are already covered at a later point in
AsmPrinter::emitInlineAsm(const MachineInstr *MI). The registers
covered are: program counter, frame pointer and base pointer.
This is ARM only. Therefore the implementation of other targets'
counterparts remain open to do.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76848
I only left it at the interface to ParseConstraints since that
needs updates to other callers in different files. I'll do that
as a follow up.
Differential Revision: https://reviews.llvm.org/D77892
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: stoklund, sdesmalen, efriedma
Reviewed By: sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77272
These should not be assuming address space 0. Calling getPointerTy is
generally the wrong thing to do, since you should already know the
type from the incoming IR.