PR #70014 fixes A32 to use GOT for dso_preemptable `__stack_chk_guard`
with static relocation model (e.g. -fPIE/-fPIC LTO compiles with -no-pie
linking).
This patch fixes such `__stack_chk_guard` access for Thumb1 and Thumb2.
Note: `t2LDRLIT_ga_pcrel` is only for ELF. mingw needs
`.refptr.__stack_chk_guard` (https://reviews.llvm.org/D92738).
Fix#64999
Previously, tail jump pseudo-opcodes were skipped by the
`encodeInstruction()` call inside `X86AsmPrinter::LowerPATCHABLE_OP`.
This caused emission of a 2-byte NOP and dropping of the tail jump.
With this PR, we change `PATCHABLE_OP` to not wrap the first
`MachineInstr` anymore, but inserting itself before,
leaving the instruction unaltered. At lowering time in `X86AsmPrinter`,
we now "look ahead" for the next non-pseudo `MachineInstr` and
lower+encode it, to inspect its size. If the size is below what
`PATCHABLE_OP` expects, it inserts NOPs; otherwise it does nothing. That
way, now the first `MachineInstr` is always lowered as usual even if
`"patchable-function"="prologue-short-redirect"` is used.
Fixes https://github.com/llvm/llvm-project/issues/76879,
https://github.com/llvm/llvm-project/issues/76958 and
https://github.com/llvm/llvm-project/issues/59039
When there are debug intrinsics in-between groups of select
instructions, select-optimise sinks them into the "end" block. This
needs to be replicated for DPValues, the non-instruction variable
assignment object. Implement that and add a RUN line to a test that was
sensitive to this to ensure it gets tested.
(The exact range of instructions being transformed here is a little
fiddly, hence I've gone with a helper lambda).
This is a follow-up to 8c1b7fba1fb -- GlobalISel currently doesn't handle
RemoveDIs mode debug-info, but will (see #75228). Disable this runline
until then.
(This is a patch-landing ordering problem)
updateNewZAFunctions is extended to generate the following on entry to a
function with either the "aarch64_pstate_za_new" or "arm_new_zt0"
attribute:
- Private-ZA interface: commit any active lazy-saves & enable PSTATE.ZA.
- "aarch64_pstate_za_new": zero ZA.
- "arm_new_zt0": zero ZT0.
Additionally, PSTATE.ZA should disabled before returning if the function
has a private-ZA interface.
Allows cases where movss/movsd etc. are loading constant (ConstantDataSequential) sub-vectors, ensuring we pad with the correct number of zero upper elements by making repeated printConstant calls to print zeroes in a matching int/fp format.
This patch abstracts visitEntryValueDbgValue to deal with the substance
of variable locations (Value, Var, Expr, DebugLoc) rather than how
they're stored. That allows us to call it from handleDebugValue, which
is similarly abstracted. This allows the entry-value behaviour (see the
test) to be supported with non-instruction debug-info too!.
Rename intrinsics for fcvtu to fcvtzu and fcvts to fcvtzs.
Use llvm_anyvector_ty for both multi vector returns and operands,
therefore the return and operands can be specified in the intrinsic
call, e.g.
@llvm.aarch64.sve.scvtf.x4.nxv4f32.nxv4i32
This patch tweaks two AMDGPU passes to use iterators rather than
instruction pointers for expressing an insertion point. This is needed
to accurately support DPValues, the non-instruction storage object for
debug-info.
Two tests were sensitive to this change (variable assignments were being
put in the wrong place), and I've added extra run-lines with the "try
new debug-info..." flag. These get tested on our public buildbot to
ensure they continue to work accurately.
Legalize shl/lshr/ashr for smaller/larger vector widths with legal
element sizes
Smaller than legal vector types does not work at the moment as it relies
on G_ANYEXT to work with smaller than legal vector types
For inline asm with memory operands, we can merge the offset into
the second operand of memory constraint operands.
Differential Revision: https://reviews.llvm.org/D158062
SIMemoryLegalizer tests were slow, with most of them taking 4.5 to 5.3s
to complete and that's on a fast machine. I also recall seeing them in
the slowest tests list on build bots.
This removes the verify-machineinstrs option from these tests to speed
them up, bringing the slowest test down to +-2s.
Verifier still runs in EXPENSIVE_CHECKS builds.
In software pipelining, when searching for the Initiation Interval (II),
`MachinePipeliner` tries to reduce register pressure, but doesn't check
how many variables can actually be alive at the same time. As a result,
a lot of register spills/fills can be generated after register
allocation, which might cause performance degradation. To prevent such
cases, this patch adds a check phase that calculates the maximum
register pressure of the scheduled loop and reject it if the pressure is
too high. This can be enabled this by specifying
`pipeliner-register-pressure`. Additionally, an II search range is
currently fixed at 10, which is too small to find a schedule when the
above algorithm is applied. Therefore this patch also adds a new option
`pipeliner-ii-search-range` to specify the length of the range to
search. There is one more new option
`pipeliner-register-pressure-margin`, which can be used to estimate a
register pressure limit less than actual for conservative analysis.
Discourse thread:
https://discourse.llvm.org/t/considering-register-pressure-when-deciding-initiation-interval-in-machinepipeliner/74725
The enc/dec of promoted AMX-TILE instructions have been supported in
https://github.com/llvm/llvm-project/pull/76210.
This patch support lowering for promoted AMX-TILE instructions and
integrate test to existing tests.
https://github.com/llvm/llvm-project/pull/76125 supported the enc/dec
for CMPCCXADD instructions, this patch
1. Add lowering test for promoted CMPCCXADD
2. Update the representation of condition code for promoted CMPCCXADD to
align with the existing one
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.
This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
If a function has ZT0 state and calls a function which does not
preserve ZT0, the caller must save and restore ZT0 around the call.
If the caller shares ZT0 state and the callee is not shared ZA, we must
additionally call SMSTOP/SMSTART ZA around the call.
This patch adds new AArch64ISDNodes for spilling & filling ZT0.
Where requiresPreservingZT0 is true, ZT0 state will be preserved
across a call.
We want to know the upper 33 bits of the And Input are zero. SExt
only guarantees they are the same.
We originally checked for SExt or ZExt when we were using
isImpliedByDomCondition because a ZExt may have been changed to SExt
before we visited the And.
We are no longer using isImpliedByDomCondition so we can only look
for zext with the nneg flag.
While here, switch to PatternMatch to simplify the code.
Fixes#78783
We just need to check if the global is large or not.
In the kernel code model, globals are in the negative 2GB of the address
space, so globals can be a sign extended 32-bit immediate.
In other code models, small globals are in the low 2GB of the address
space, so sign extending them is equivalent to zero extending them.
That instruction is not supported on GFX12.
Added a testcase which previously crashed without this change.
Co-authored-by: pvanhout <pierre.vanhoutryve@amd.com>
If we're loading a constant value, print the constant (and the zero upper elements) instead of just the shuffle mask.
This did require me to move the shuffle mask handling into addConstantComments as we can't handle this in the MC layer.
The new Clang attributes no longer support the combination of having a
private-ZA function that preserves ZA. The use of __arm_preserves("za")
means that ZA is shared and preserved.
There wasn't that much benefit to the special handling of this, because
in practice it only meant that we'd avoid restoring the lazy-save
afterwards, but it still needed setting up a lazy-save (with the
possibility of using a 0-sized buffer).
Perhaps a new attribute will be added in the future to support this
case, at which point we can revert back some of the changes removed in
this patch. But for now removing this code simplifies things.
Fixes#77915
Previously I based the operand copying on expandCALL_RVMARKER but did
not understand it properly at the time. This lead to me dropping the
arguments of the function being branched to.
This fixes that by copying all operands from the BLR_BTI to the BL/BLR
without skipping anything.
I've updated the existing test by adding function arguments.
Add custom lowering for ctlz.i8 to avoid multiple add/sub operations.
---------
Co-authored-by: Leon Clark <leoclark@amd.com>
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>