Currently, the lowering operation for bitreverse using Intel AVX512 GFNI only supports byte vectors
Extend the operation to i32 and i64.
---------
Co-authored-by: shami <shami_thoke@yahoo.com>
Allows us to match insert_subvector(insert_subvector(undef, insert_subvector(insert_subvector(undef, x, 0), y, 1), 0), 0),
insert_subvector(insert_subvector(undef, z, 0), w, 1), 2)
This fixes an edge case where functions starting with inline assembly
would assert while trying to lower that inline asm instruction.
After this PR, for now we always add a no-op (xchgw in this case) without
considering the size of the next inline asm instruction. We might want
to revisit this in the future.
This fixes Unreal Engine 5.3.2 compilation with clang-cl and /HOTPATCH.
Should close https://github.com/llvm/llvm-project/issues/56234
Following #78348, we should treat functions with an explicit section as
small, unless the section name is (or has the prefix) ".ltext".
Clang emits global initializers into a ".text.startup" section on Linux.
If we mix small/medium code model object files with large code model
object files, we'll end up mixing sections with and without the large
section flag.
-fsanitize=function emits a signature and function hash before a
function. Similar to 7f6e2c9, these can be sheared off when
`.subsections_via_symbols` is used.
This change uses the same technique 7f6e2c9 introduced for prefixes:
emitting a symbol for the metadata, then marking the actual function
entry as an .alt_entry symbol.
Correct strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
These tests needed the strictfp attribute added to some function
definitions. FP wait instructions now appear as a result.
Test changes verified with D146845.
In SplitCriticalEdge, DebugLoc of the branch instruction in new created
MBB was set to empty. It should be set and we can find proper DebugLoc
for it in most cases. This patch set it to non empty merged DebugLoc of
current MBB branches.
Align(8) is QWORD aligned, but this was checking to see if alignment was
greater than that, when it should have been checking for being greater
than OR EQUAL to Align(8).
This bug was introduced in
https://github.com/llvm/llvm-project/commit/6a6af30d433d7 during the
transition to the Align type.
Allow using atomicrmw fadd, fsub, fmin, and fmax with vectors of
floating-point type. AMDGPU supports atomic fadd for <2 x half> and <2 x
bfloat> on some targets and address spaces.
Note this only supports the proper floating-point operations; float
vector typed xchg is still not supported. cmpxchg still only supports
integers, so this inserts bitcasts for the loop expansion.
I have support for fp vector typed xchg, and vector of int/ptr
separately implemented but I don't have an immediate need for those
beyond feature consistency.
Reverse the fold with handling inside canCreateUndefOrPoison for cases where we know that the extract index is in bounds.
This exposed a number or regressions, and required some initial freeze handling of SCALAR_TO_VECTOR, which will require us to properly improve demandedelts support to handle its undef upper elements.
There is still one outstanding regression to be addressed in the future - how do we want to handle folds involving frozen loads?
Fixes#86968
Based off #85592 - our truncation -> PACKSS/PACKUS folds should be able to use the nsw/nuw flags to recognise when we don't need to mask/sext_inreg prior to the PACKSS/PACKUS nodes.
VPERMI (VPERMQ/PD) is nearly always lane-crossing and poorly merges with target shuffles (other than itself).
For now, I've restricted VPERMI to only merge with itself, constants, loads and splats.
We might be able to merge with a few other special cases (AND/ANDNP with constant?), which could help the shuffle-vs-trunc-256.ll AVX512VL regression, but since that now gives similar codegen to the other AVX512 variants, I'd prefer to improve the shuffle lowering for that properly.
Previously we wouldn't remove dead copies from basic blocks with
successors. The comment said we didn't want to trust the live-in lists.
The comment is very old so I'm not sure if that's still a concern today.
This patch checks the live-in lists and removes copies from
MaybeDeadCopies if they are referenced by any live-ins in any
successors. We only do this if the tracksLiveness property is set. If
that property is not set, we retain the old behavior.
We try clamp the index to be within the bounds of the stack object
we create, but if we don't freeze it, poison can propagate into the
clamp code. This can cause the access to leave the bounds of the
stack object.
We have other instances of this issue in type legalization and extract_elt/subvector,
but posting this patch first for direction check.
Fixes#86717
combineExtractFromVectorLoad no longer uses the vector we're extracting from to determine the pointer offset calculation, allowing us to extract from types that have been bitcast to work with specific target shuffles.
Fixes#85419
For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code.
This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames.
Fixes#48911
Adjust logic of 1cb9f37a17ab to match freebsd/freebsd-src@9a4d48a645.
D113443 is the original attempt to bring this FreeBSD patch to
llvm-project,
but it never landed. This change is required to build FreeBSD kernel
modules
with -fstack-protector using a standard LLVM toolchain. The FreeBSD
kernel
loader does not handle R_X86_64_REX_GOTPCRELX relocations.
Fixes#50932.
When folding (ashr (shl, x, c1), c2) we need to treat c1 and c2
as unsigned to find out if the combined shift should be a left
or right shift.
Also do an early out during pre-legalization in case c1 and c2
has differet types, as that otherwise complicated the comparison
of c1 and c2 a bit.
It has been noticed that combineShiftRightArithmetic isn't dealing
properly with large shift amounts, as demonstrated by the test
case added in this commit.
I think the problem partly is related to X86 using i8 as shift amount
type during ISel. So shift amount larger then 127 may be treated
as negative shift amounts if not being careful.
FADD/FSUB/FMUL are usually less port-bound than INSERT_SUBVECTOR, so only concatenate if it reduces the instruction count and doesn't introduce extra INSERT_SUBVECTOR nodes.
We don't want to concat fadd/fsub/fmul if both operands would need concatenating (as the fp op is usually cheaper than the concat), but if at least one operand is free to concat (i.e. constant or extracted from a wider vector), then we should try to concat the fp op.