This patch adds a common lower action for `G_FABS`, which generates `and
x8, x8, #0x7fffffffffffffff` to reset the sign bit. The action does not
support vectors since `G_AND` does not support fp128.
This approach is different than what SDAG is doing. SDAG stores the
value onto stack, clears the sign bit in the most significant byte, and
loads the value back into register. This involves multiple memory ops
and sounds slower.
This patch introduces lowering of the partial add reduction intrinsic to
a udot or svdot for AArch64. This also involves adding a
`shouldExpandPartialReductionIntrinsic` target hook, which AArch64 will
return false from in the cases that it can be lowered.
It may be profitable to revert SCCP propagation of C++ static values,
if such constants are pointers, in order to avoid redundant pointer
computation, since the method returning the constant is non-removable.
These would implicitly cast the register to `unsigned`. Switch most of
them to use printReg will give a more readable output. Change some
others to use Register::id() so we can eventually remove the implicit
cast to `unsigned`.
Summary:
This patch handles the types(MVT) in `selectionDAG` for RISCV vector
tuples.
As described in previous patch handling llvm types, the MVTs also have
32 variants:
```
riscv_nxv1i8x2, riscv_nxv1i8x3, riscv_nxv1i8x4, riscv_nxv1i8x5, riscv_nxv1i8x6, riscv_nxv1i8x7, riscv_nxv1i8x8,
riscv_nxv2i8x2, riscv_nxv2i8x3, riscv_nxv2i8x4, riscv_nxv2i8x5, riscv_nxv2i8x6, riscv_nxv2i8x7, riscv_nxv2i8x8,
riscv_nxv4i8x2, riscv_nxv4i8x3, riscv_nxv4i8x4, riscv_nxv4i8x5, riscv_nxv4i8x6, riscv_nxv4i8x7, riscv_nxv4i8x8,
riscv_nxv8i8x2, riscv_nxv8i8x3, riscv_nxv8i8x4, riscv_nxv8i8x5, riscv_nxv8i8x6, riscv_nxv8i8x7, riscv_nxv8i8x8,
riscv_nxv16i8x2, riscv_nxv16i8x3, riscv_nxv16i8x4,
riscv_nxv32i8x2.
```
Detail:
An intuitive way to model vector tuple type is using nested scalable
vector, e.g. `nElts=NF, EltTy=nxv2i32`. However it's not compatible to
what we've done to handle scalable vector in TargetLowering, so it would
need more effort to change the code to handle this concept.
Another approach is encoding the `MinNumElts` info in `sz` of `MVT`,
e.g.
`nElts=NF, sz=(NF*MinNumElts*8)`, this makes it much easier to handle
and
changes less code.
This patch adopts the latter approach.
Stacked on https://github.com/llvm/llvm-project/pull/97992
Reverts llvm/llvm-project#103371
There is `heap-use-after-free`, commented on
206b5aff44a95754f6dd7a5696efa024e983ac59
Maybe `if (Next == E || BB != Next->getParent()) {` is enough,
but not sure, what was the intent there,
If a lowering changed control flow, resume the legalization
loop at the first newly inserted block.
This will allow incrementally legalizing atomicrmw and cmpxchg.
The AArch64 test might be a bugfix. Previously it would lower
the vector FP case as a cmpxchg loop, but cmpxchgs get lowered
but previously weren't. Maybe it shouldn't be reporting cmpxchg
for the expand type in the first place though.
When getting a neutral value, we can prefer using a positive zero over a
negative zero if nsz is set on the FADD (or reduction). A positive zero
should be cheaper to materialize on basically all targets.
Arguably, we should be doing this kind of canonicalization in
DAGCombine, but we don't do that for any of the other reduction
variants, so this seems like path of least resistance. This does mean
that we can only do this for "fast" reductions. Just nsz isn't enough,
as that goes through the SEQ_FADD path where the IR level start value
isn't folded away.
If folks think this is to RISCV specific, let me know. There's a trivial
RISCV specific implementation. I went with the generic one as I through
this might benefit other targets.
Adds support for wider-than-legal vector types for the histogram
intrinsic (llvm.experimental.vector.histogram.add) by splitting the
vector. Also adds integer promotion for the Inc operand.
This preserves the semantis of fneg and matches what we do in
LegalizeDAG.
I kept the legal FSUB check to force unrolling for some targets that
don't have FSUB but have XOR. On Aarch64, using xor broke some tests that
expected to see a (v1f64 (fma (insertvector_elt (f64 (fneg
(extractvectorelt X)))))) pattern.
GCC supports code like "asm volatile ("" : "=r" (i) : "0" (f))" where i
is integer type and f is floating point type. Currently this code
produces an error with Clang. The change allows mixed scalar types
between input and output constraints.
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
These lists are quite static and several of the parameters are actually
constant across all users. Heavy use of macros is undesirable, and not
idiomatic in LLVM, so let's just use the naive switch cases.
I'll probably continue with removing the other property macros. These
two just happened to be the two I actually had to figure out for an
unrelated change.
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
Fixes the previous buildbot error by adding an explicit triple to the test,
ensuring that llc can produce a valid object file.
This reverts commit 926f0979af4f6172d4ed3dea5603aa97c800bef1.
Reverted (along with the NFC followup fix) due to buildbot failure:
https://lab.llvm.org/buildbot/#/builders/160/builds/4142
This reverts commit 3ef37e2f8f672393ee409fde8309198df0981735, and commit
616f7d3d4f6d9bea6f776e357c938847e522a681.
Fixes: https://github.com/llvm/llvm-project/issues/104695
This patch adds the is_stmt flag to line table entries for the first
instruction with a non-0 line location in each basic block, to ensure
that it will be used for stepping even if the last instruction in the
previous basic block had the same line number; this is important for
cases where the new BB is reachable from BBs other than the preceding
block.
For some reason, isOperationLegalOrCustom is not the same as
isOperationLegal || isOperationCustom. Unfortunately, it checks
if the type is legal which makes it uesless for custom lowering
on non-legal types (which is always ppcf128).
Really the DAG builder shouldn't be going to expand this in the
builder, it makes it difficult to work with. It's only here to work
around the DAG requiring legal integer types the same size as
the FP type after type legalization.
This patch is moving out stepvector intrinsic from the experimental
namespace.
This intrinsic exists in LLVM for several years now, and is widely used.
Tail duplication will generate the redundant move before return. It is
because the MachineCopyPropogation can't recognize COPY after post-RA
pseudoExpand.
This patch make MachineCopyPropogation recognize `%0 = ADDI %1, 0` as
COPY
LLVM often extends global names by adding suffixes to distinguish unique
identities. However, these suffixes are not always stable across
different runs and build environments. To address this issue, I
implemented `get_stable_name` to ignore such suffixes and obtain the
original name. This approach is not new, as PGO or Bolt already handle
this issue similarly. Using the stable name obtained from
`get_stable_name`, I implemented `stable_hash_name` while utilizing the
same underlying `xxh3_64bit` algorithm as before.
This patch prepares the NFC groundwork for global outlining using
CGData, which will follow
https://github.com/llvm/llvm-project/pull/90074.
- The `MinRepeats` parameter is now explicitly passed to the
`getOutliningCandidateInfo` function, rather than relying on a default
value of 2. For local outlining, the minimum number of repetitions is
typically 2, but for the global outlining (mentioned above), we will
optimistically create a single `Candidate` for each `OutlinedFunction`
if stable hashes match a specific code sequence. This parameter is
adjusted accordingly in global outlining scenarios.
- I have also implemented `unique_ptr` for `OutlinedFunction` to ensure
safe and efficient memory management within `FunctionList`, avoiding
unnecessary implicit copies.
This depends on https://github.com/llvm/llvm-project/pull/101461.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
Currently, `getStackAlignment` asserts if the stack alignment wasn't
specified. This makes it inconvenient to use and complicates testing.
This change also makes `exceedsNaturalStackAlignment` method redundant.
In the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold, don't use a bitmask / MaskedValueIsZero as we can't guarantee that the shift amount is in bounds.
Fixes#106202
Use hasPhys instead of MCRegister::isPhysicalRegister.
I think the MCRegister returned from getPhys can only contain a physical
register or 0. hasPhys checks that the register returned from getPhys is non-zero.
So I think they are equivalent in this usage.
When a pattern is matched in TableGen, a check is run called
isObviouslySafeToFold(). One of the condition that it checks for is
whether the instructions that are being matched are consecutive, so the
instruction's insertion point does not change.
This patch allows the movement of the insertion point of a load
instruction if none of the intervening instructions are stores or have
side-effects.
Backend:
- Caller and callee arguments no longer have to match, just to take up the same space, as they can be changed before the call
- Allowed tail calls if callee and callee both (or neither) use sret, wheras before it would be dissalowed if either used sret
- Allowed tail calls if byval args are used
- Added debug trace for IsEligibleForTailCallOptimisation
Frontend (clang):
- Do not generate extra alloca if sret is used with musttail, as the space for the sret is allocated already
Change-Id: Ic7f246a7eca43c06874922d642d7dc44bdfc98ec
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.
This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.