I want to use this function for GISel too so Type * is a better common
interface. All of the callers already convert EVT to Type * as needed
by calling lowering anyway.
All of these immediates are signed, as the surrounding comments
indicate. This fixes an assertion failure in
CodeGen/Generic/dag-combine-ossfuzz-crash.ll when run with a
powerpc-aix triple.
Fix all the places I could find that did't do this. We were already
mostly correct for FP_ROUND after
9a976f36615dbe15e76c12b22f711b2e597a8e51, but not STRICT_FP_ROUND.
This is to prevent assertion failures when we disable implicit
truncation in getConstant().
getCanonicalConstSplat() works with a mix of unsigned and signed values,
so I explicitly truncate the APInt there.
This patch fixes the combines for vector_shuffles when either or both of
its left and right hand side inputs are scalar_to_vector nodes.
Previously, when both left and right side inputs are scalar_to_vector
nodes, the current combine could not handle this situation, as the shuffle
mask was updated incorrectly. To temporarily solve this solution, this combine
was simply disabled and not performed.
Now, not only does this patch aim to resolve the previous issue of the
incorrect shuffle mask adjustments respectively, but it also updates any test
cases that are affected by this change.
Patch migrated from https://reviews.llvm.org/D130487.
This patch utilizes getReservedRegs() to find asm clobberable registers.
And to make the result of getReservedRegs() accurate, this patch
implements the todo, which is to make r2 allocatable on AIX for some
leaf functions.
Add support for using a thread-local variable with a specified offset
for holding the stack guard canary value. This supports both 32- and 64-
bit PowerPC targets.
This mirrors changes from #108942 but targeting PowerPC instead of
RISCV. Because both of these PRs modify the same driver functions, this
series is stack on top of the RISC-V one.
---------
Signed-off-by: Keith Packard <keithp@keithp.com>
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
There are no partial vector loads on pwr7 so current v4i8 codegen is an
int load then store to vector sized temp and re-load as vector. Try to
use lfiwax to load 32 bits into an FP reg and take advantage of VSX FP
and vector reg sharing to move the result to the right vector position.
Unfortunately expandIS_FPCLASS is called directly in SelectionDAGBuilder
depending on whether IS_FPCLASS is custom or not. This helps avoid ppc test
regressions in a future patch where the custom lowering would be bypassed.
If v2i64 scalar_to_vector is made custom, llc can crash in certain
legalization cases where v2i64 vectors are injected, even if they
weren't otherwise present. The code generated would be fine, but that
operation is not handled in ReplaceNodeResults. Add handling.
Try to make P7 code with scalar to vector operations that use store/re-load to run smoother on P10 by supplying enough store width to cover the load and allow hardware store forwarding.
In PowerPC ABI, a few initial arguments are passed through registers,
but their places in parameter save area are reserved, arguments passed
by memory goes after the reserved location.
For debugging purpose, we may want to save copy of the pass-by-reg
arguments into correct places on stack. The new option achieves by
adding new function level attribute and make argument lowering part
aware of it.
This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5.
I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
MachineFunction's probably should not include a backreference to
the owning MachineModuleInfo. Most of these references were used
just to query the MCContext, which MachineFunction already directly
stores. Other contexts are using it to query the LLVMContext, which
can already be accessed through the IR function reference.
Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up
miscompiling.
This patch also refactors the logic out of isInTailCallPosition() into the callers.
As a result memcpy and memmove are also modified to do the same thing
for consistency.
rdar://131419786
Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.
This is important for targets like AMDGPU that want to be able to use
`lld` to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)
Summary:
These Libcalls represent which functions are available to the backend.
If a runtime call is not available, the target sets the the name to
`nullptr`. Currently, this logic is spread around the various targets.
This patch pulls all of the locations that disable libcalls into the
intializer. This patch is effectively NFC.
The motivation behind this patch is that currently the LTO handling uses
the list of all runtime calls to determine which functions cannot be
internalized and must be extracted from static libraries. We do not want
this to happen for libcalls that are not emitted by the backend. A
follow-up patch will move out this logic so the LTO pass can know which
rtlib calls are actually used by the backend.
Implementation is NOT compatible with IBM XL C 16.1 and earlier but is
compatible with GCC.
It handles all ByVals with greater alignment then pointer width the same
way IBM XL C handles Byvals
that have vector members. For overaligned objects that do not contain
vectors IBM XL C does not align them properly if they are passed in the
GPR
argument registers.
This patch was originally written by Sean Fertile @mandlebug.
Previously on Phabricator https://reviews.llvm.org/D105659
The Prefix instruction is introduced on PowerPC ISA3_1.
In the PR,
1. The `FeaturePrefixInstrs` do not imply the `FeatureP8Vector`
,`FeatureP9Vector` .
2. `FeaturePrefixInstrs` implies only the FeatureISA3_1.
3. For the prefix instructions `paddi` and `pli` , they have `Predicates
= [PrefixInstrs] `
4. For the prefix instructions `plfs` and `plfd`, they have `Predicates
= [PrefixInstrs, HasFPU] `
5. For the prefix instructions "plxv` , "plxssp` and `plxsd` , they have
`Predicates = [PrefixInstrs, HasP10Vector]`
Fixes#62372
Perform bitcast lowering requires 64-bit to be native supported,
However this is not true on 32-bit targets. Explicitly require
64-bit target.
Fixes#92233
Under some circumstance (library loaded with the main program), TLS
initial-exec model can be applied to local-dynamic access(es). We
could use some simple heuristic to decide the update at function level:
* If there is equal or less than a number of TLS local-dynamic access(es)
in the function, use TLS initial-exec model. (the threshold which default to
1 is controlled by hidden option)
Following the aix-small-local-exec-tls target attribute, this patch adds
a target attribute for an AIX-specific option in llc that informs the
compiler that it can use a faster access sequence for the local-dynamic
TLS model (formally named aix-small-local-dynamic-tls) when TLS
variables are less than ~32KB in size.
The patch either produces an addi/la with a displacement off of module
handle (return value from .__tls_get_mod) when the address is
calculated, or it produces an addi/la followed by a load/store when the
address is calculated and used for further accesses.
---------
Co-authored-by: Amy Kwan <amy.kwan1@ibm.com>
If the V2 of the vector_shuffle is undef, the two vector inputs are
expected to be the same when do the VECINSERT transformation. For now
the first operand of VECINSERT is set to undef which is not right.
This patch fixes this bug.