Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered.
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
This patch makes the `VBROADCAST***X**` subvector broadcast instructions consistent - the `***X**` section represents the original subvector type/size, but we were not correctly using the AVX512 Z/Z256/Z128 suffix to consistently represent the destination width (or we missed it entirely).
This caused the MCOperand to be returned in memory. An MCOperand is only
16 bytes and therefore can be returned in registers on x86-64 and
AArch64 (and others).
Based on feedback from #95403 - we use multiply by constant for various lowerings (shifts, division etc.), so its very useful to printout the constants to help understand the transform involved.
vXi16 multiplies are the easiest to add for this initial commit, but we can add other arithmetic instructions as follow ups when the need arises (I intend to add PMADDUBSW handling for #95403 next).
I've done my best to update all test checks but there are bound to be ones that got missed that will only appear when the file is regenerated.
xray instruments tail call function exits by inserting a nop sled before
the tail call. When tracing is enabled, the nop sled is replaced with a
call to `__xray_FunctionTailExit()`. This currently does not work for
conditional tail calls, as the instrumentation assumes that the tail
call will be unconditional. This causes two issues:
- `__xray_FunctionTailExit()` is inappropately called even when the
tail call is not taken.
- `__xray_FunctionTailExit()`'s prologue/epilogue adjusts the stack
pointer with add/sub instructions. This clobbers condition flags,
which can flip the condition used for the tail call, leading to
incorrect program behavior.
Fix this by rewriting conditional calls when lowering patchable tail
calls.
With this change, a conditional patchable tail call like:
```
je target
```
Will be lowered to:
```
jne .fallthrough
.p2align 1, ..
.Lxray_sled_N:
SLED_CODE
jmp target
.fallthrough:
```
Fixes#82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
This fixes an edge case where functions starting with inline assembly
would assert while trying to lower that inline asm instruction.
After this PR, for now we always add a no-op (xchgw in this case) without
considering the size of the next inline asm instruction. We might want
to revisit this in the future.
This fixes Unreal Engine 5.3.2 compilation with clang-cl and /HOTPATCH.
Should close https://github.com/llvm/llvm-project/issues/56234
The convention is for such MC-specific options to reside in
MCTargetOptions. However, CompressDebugSections/RelaxELFRelocations do
not follow the convention: `CompressDebugSections` is defined in both
TargetOptions and MCAsmInfo and there is forwarding complexity.
Move the option to MCTargetOptions and hereby simplify the code. Rename
the misleading RelaxELFRelocations to X86RelaxRelocations. llvm-mc
-relax-relocations and llc -x86-relax-relocations can now be unified.
Handle masked predicated movss/movsd in addConstantComments now that we can generically handle the destination + mask register
This will more significantly help improve 'fixup constant' comments from #73509
Handle masked predicated load/broadcasts in addConstantComments now that we can generically handle the destination + mask register
This will more significantly help improve 'fixup constant' comments from #73509
Remove handling from EmitAnyX86InstComments and handle all VPMOVSX/VPMOVZX comments in addConstantComments now that we can generically handle the destination + mask register and shuffle mask comment
This will allow us to easily use printDstRegisterName for other mask predicate destination registers, and printout shuffle masks from other instruction types.
Further develops the vsextload support added in #79815 / b5d35feacb7246573c6a4ab2bddc4919a4228ed5 - reduces the size of the vector constant by storing it in the constant pool in a truncated form, and zero-extend it as part of the load.
llvm-project/llvm/lib/Target/X86/X86MCInstLower.cpp:1588:48:
error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]
if (C && C->getType()->getScalarSizeInBits() == SrcEltBits) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
1 error generated.
Reduce the size of the vector constant by storing it in the constant pool in a truncated form, and sign-extend it as part of the load.
I've extended the existing FixupConstant functionality to support these sext constant rebuilds - we still select the smallest stored constant entry and prefer vzload/broadcast/vextload for same bitwidth to avoid domain flips.
I intend to add the matching load+zero-extend handling in a future PR, but that requires some alterations to the existing MC shuffle comments handling first.
This is /almost/ NFC - the only annoyance is that for some reason we were using "<C1,C2,..>" for ConstantVector types unlike all other cases - these now use the same "[C1,C2,..]" format as the other constant printers.
Previously, tail jump pseudo-opcodes were skipped by the
`encodeInstruction()` call inside `X86AsmPrinter::LowerPATCHABLE_OP`.
This caused emission of a 2-byte NOP and dropping of the tail jump.
With this PR, we change `PATCHABLE_OP` to not wrap the first
`MachineInstr` anymore, but inserting itself before,
leaving the instruction unaltered. At lowering time in `X86AsmPrinter`,
we now "look ahead" for the next non-pseudo `MachineInstr` and
lower+encode it, to inspect its size. If the size is below what
`PATCHABLE_OP` expects, it inserts NOPs; otherwise it does nothing. That
way, now the first `MachineInstr` is always lowered as usual even if
`"patchable-function"="prologue-short-redirect"` is used.
Fixes https://github.com/llvm/llvm-project/issues/76879,
https://github.com/llvm/llvm-project/issues/76958 and
https://github.com/llvm/llvm-project/issues/59039
Allows cases where movss/movsd etc. are loading constant (ConstantDataSequential) sub-vectors, ensuring we pad with the correct number of zero upper elements by making repeated printConstant calls to print zeroes in a matching int/fp format.
This allows us to check the entire constant address calculation, and ensure we're not performing any runtime address math into the constant pool (noticed in an upcoming patch).
llvm-project/llvm/lib/Target/X86/X86MCInstLower.cpp:1867:20:
error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare]
if (SclWidth == C->getType()->getScalarSizeInBits()) {
~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
If we're loading a constant value, print the constant (and the zero upper elements) instead of just the shuffle mask.
This did require me to move the shuffle mask handling into addConstantComments as we can't handle this in the MC layer.
We were printing the entire Constant, which if we were loading from a wider constant pool entry meant that we were confusing the asm comment with upper bits that aren't actually part of the load result
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
In cases where a broadcast op is loading from a constant entry wider than the broadcast element, we were incorrectly printing the entire entry and not just the lower bits referenced by the broadcast.
Noticed in D150143/D150526 - we currently create scalar Constant values using the broadcast instruction width, which might be wider than the original build vector width, making it tricky to recognise the original constant bits data.
If we have widened the broadcast value, its much more useful for asm comments if we create a ConstantVector with the original element data, add that to the constant-pool and load that with the same (wider) broadcast instruction.
Without frame pointers, the locations of variables on the stack are emitted
relative to the stack pointer (via the stack pointer being the value of
DW_AT_frame_base on the subprogram). If a call modifies the stack pointer
this results in the locations being wrong and the debugger displaying the
wrong values for variables.
By using DW_OP_call_frame_cfa in these situations the emitted location for
the variable will automatically handle changes in the stack pointer
(provided LLVM is emitting the correct CFI directives elsewhere, of course).
The CFA needs to be adjusted for the size of the stack frame (including the
return address) to allow the variable locations themselves to remain
unchanged by this patch.
Certain LLDB features cannot cope with DW_OP_call_frame_cfa, so this change
is heuristically limited to the cases where it's necessary for correctness
to minimize the fallout there.
Reviewed By: #debug-info, scott.linder, jryans, jmorse
Differential Revision: https://reviews.llvm.org/D143463