This patch enables signed truncation check transforms for i8 on rv32 when XVT is i64 and Zbb is enabled.
It is a small improvement of D149977.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D150177
Emit FNMADD instead of FNEG(FMADD) for optimization levels
above Oz when fast-math flags (nsz+contract) permit it.
Differential Revision: https://reviews.llvm.org/D149260
The new test case showed that the NoPHIs flag needs to be cleared.
Original commit message:
[SystemZ] Bugfix in expansion of memmem operations.
Since NC, OC, and XC clobber CC, the EXRL_Pseudo targeting these must also be
marked to do so.
Original patch by uweigand.
Reviewed by: uweigand
Differential Revision: https://reviews.llvm.org/D150251
Fixes: https://github.com/llvm/llvm-project/issues/62572
Sorry - mir test fails with expensive checks on build bot.
Seems to relate to the fact that there are no PHIs in the .mir input, but after
they are created the verifyer reports "Found PHI instruction with NoPHIs property
set".
This reverts commit 00454a17f361d677d5423905c888daca1a80661a.
Reland D147506 after fixing the failure in bot
https://lab.llvm.org/buildbot/#/builders/247/builds/4125
Some debuggers like DBX on AIX assume the address in debug line
entries is always incremental. But clang generates two entries (entry
for file scope line and entry for prologue end) with same address if
prologue is empty
And if the prologue is empty, seems the first debug line entry for the
function is unnecessary(i.e. removing the first entry won't impact the
behavior in GDB on Linux), so I implement this for all debuggers.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D147506
KCFI machine function passes transform indirect calls with a
cfi-type attribute into architecture-specific type checks bundled
together with the calls. Instead of having a separate pass for each
architecture, add a generic machine function pass for KCFI and
move the architecture-specific code that emits the actual check to
TargetLowering. This avoids unnecessary duplication and makes it
easier to add KCFI support to other architectures.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D149915
The x86_intrcc calling convention will build two STACKALLOC_W_PROBING machine instructions if the function takes an error code. This is caused by an additional call to emitSPUpdate in llvm/lib/Target/X86/X86FrameLowering.cpp:1650. Previously only the first STACKALLOC_W_PROBING machine instruction was properly handled, the second one was simply ignored. This lead to miscompilations where the stack pointer wasn't properly updated (see https://github.com/rust-lang/rust/issues/109918). This patch fixes this by handling all STACKALLOC_W_PROBING machine instructions.
To be honest I don't quite understand why this didn't lead to more noticeable miscompilations previously.
This is my first time contributing to LLVM.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D150033
Instead of ad-hoc updating liveness, recompute it completely for the affected register.
This does not affect any existing test and fixes an edge case that
caused a "Non-empty but used interval" error in the register allocator
due to how the pass updated liveranges. It created a "isolated" live-through
segment.
Overall this change just seems to be a net positive with no side effect observed. There may be a compile time impact but it's expected to be minimal.
Fixes SWDEV-388279
Reviewed By: critson
Differential Revision: https://reviews.llvm.org/D150105
There's a target hook that's called in DAGCombiner that we stub here, I'll
implement the equivalent override for AArch64 in a subsequent patch since it's
used by different shift combine.
This change by itself has minor code size improvements on arm64 -Os CTMark:
Program size.__text
outputg181ppyy output8av1cxfn diff
consumer-typeset/consumer-typeset 410648.00 410648.00 0.0%
tramp3d-v4/tramp3d-v4 364176.00 364176.00 0.0%
kimwitu++/kc 449216.00 449212.00 -0.0%
7zip/7zip-benchmark 576128.00 576120.00 -0.0%
sqlite3/sqlite3 285108.00 285100.00 -0.0%
SPASS/SPASS 411720.00 411688.00 -0.0%
ClamAV/clamscan 379868.00 379764.00 -0.0%
Bullet/bullet 452064.00 451928.00 -0.0%
mafft/pairlocalalign 246184.00 246108.00 -0.0%
lencod/lencod 428524.00 428152.00 -0.1%
Geomean difference -0.0%
Differential Revision: https://reviews.llvm.org/D150086
This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4.
The original commit causes a Chrome build assertion failure with
ThinLTO: https://crbug.com/1443635
This uses the same sequence we get from LegalizeDAG for i32 on RV32, but modified
to use W instructions.
When the RHS is constant one of the setccs simplifies to a constant and the xor will either
be an xori with 1 or get removed.
When the RHS is not a constant it was not an obvious improvement and it was a regression
when used with a branch. So I've restricted to the constant case.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D150135
This patch adds the additional step of looking through AND, OR, XOR
instructions when we check the number of leading zeros.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D149223
This helps avoid constant materialization for the patterns
InstCombine emits for something like INT_MIN <= X && x <= INT_MAX.
See top of changed test files for more detailed explanation.
I've enabled this for i16 when Zbb is enabled. sext.b did not seem
to be a benefit due to the constants folding into addi/sltiu.
This an alternative to https://reviews.llvm.org/D149814
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D149977
In the past, D71436 added writing the `offset` operator for some
legitimate cases. However, for memory references in Intel syntax, the
`offset` operator (`[offset sym]`) appears to be superfluous at best,
possibly wrong and contradictory at worst.
This patch bypasses writing the `offset` operator in
`X86AsmPrinter::PrintIntelMemReference` which affects exactly this
case. A similar code flow exists in `X86IntelInstPrinter.cpp` -
`X86IntelInstPrinter::printMemReference`.
The motivation for fixing this output is to allow us to reject the
confusing `call [offset fn_ref]` syntax in MC, as discussed in D149579.
Depends on D149579
Differential Revision: https://reviews.llvm.org/D150047
`(xor X, SignBit)` == `(add X, SignBit)`. For atomics whose result is
used, the `add` option is preferable because of the `xadd` instruction
which allows us to avoid either a CAS loop or a `btc; setcc; shl`.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D149689
For constant BUILD_VECTORs the operands need to be legal types. This can mean
that when the number of sign bits is calculated it may look that the entire
constant and inefficiently produce less sign bits than it could. For example i8
vectors could use i32 elements, for which 0x000000ff would be incorrectly
limited to 1 sign bit as the original value has 24 sign bits. This makes it
look at the constant directly, truncated to the correct type for the element so
that it can correctly return 8.
Differential Revision: https://reviews.llvm.org/D149956
The logic in this function is a bit of a mess, but masking a vector constant should allow us to OR the zero-extended i8/i16 scalar value in place.
We can do more here - reusing the OR pattern if the relevant unused elements are known zero etc. but this is enough to address a regression from D127115.
Better KnownBits handling of the icmp and/or an upcoming USUBSAT fold would constant fold this test away and prevent us testing for a cleared overflow flag.
Give up on erasing an IMPLICIT_DEF if it might be live-in to a call
instruction in a basic block with EH pad successors. This fixes a
liveness bug that will be diagnosed by MachineVerifer when D149947
lands.
Differential Revision: https://reviews.llvm.org/D149954
Also add -verify-machineinstrs to make it easier to catch a
MachineVerifier failure introduced by D149947.
Differential Revision: https://reviews.llvm.org/D149953
Add basic computeOverflowForSignedAdd helper to recognise that sadd overflow can't occur if both operands have more that one sign bit.
Add computeOverflowForAdd wrapper that calls computeOverflowForSignedAdd/computeOverflowForUnsignedAdd depending on the IsSigned argument, and use this in DAGCombiner::visitADDO
In a `__asm` block, a symbol reference is usually a memory constraint
(indirect TargetLowering::C_Memory) [LOOP]. CALL and JUMP instructions are special
that `__asm call k` can be an address constraint, if `k` is a function.
Clang always gives us indirect TargetLowering::C_Memory and need to convert it
to direct TargetLowering::C_Address. D133914 implements this conversion, but
does not consider JMP or case-insensitive CALL. This patch implements the missing
cases, so that `__asm jmp k` (`jmp ${0:P}`) will correctly lower to `jmp _k`
instead of `jmp dword ptr [_k]`.
(`__asm call k` lowered to `call dword ptr ${0:P}` and is fixed by D149695 to
lower to `call ${0:P}` instead.)
[LOOP]: Some instructions like LOOP{,E,NE} and Jcc always use an address
constraint (`loop _k` instead of `loop dword ptr [_k]`).
After this patch and D149579, all the following cases will be correct.
```
int k(int);
int (*kptr)(int);
...
__asm call k; // correct without this patch
__asm CALL k; // correct, but needs this patch to be compatible with D149579
__asm jmp k; // correct, but needs this patch to be compatible with D149579
__asm call kptr; // will be fixed by D149579. "Broken case" in clang/test/CodeGen/ms-inline-asm-functions.c
__asm jmp kptr; // will be fixed by this patch and D149579
```
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D149920