We cannot query this attribute from a subtarget given a machine function.
At this point attribute itself is already unavailable and can only be
obtained through MFI.
Differential Revision: https://reviews.llvm.org/D46781
llvm-svn: 332166
Summary:
We have no logic to promote alloca to vector for an AddrSpaceCast instruction.
Reviewer:
arsenm
Differential Revision:
https://reviews.llvm.org/D45993
llvm-svn: 332147
Remove a useless SwitchSection which also causes compilation failure
when IR contains comdat.
The SwitchSection is useless because the current section is already
correct text section for the function therefore no need to switch.
It causes compilation failure for comdat because functions with comdat
has specific text section, not the default .text section.
Since HIP uses comdat, this bug caused failures for HIP.
Differential Revision: https://reviews.llvm.org/D46770
llvm-svn: 332137
ExtendSetCCUses updates SETCC nodes which use a load (OriginalLoad) to
reflect a simplification to the load (ExtLoad).
Based on my reading, ExtendSetCCUses may create new nodes to extend a
constant attached to a SETCC. It also creates fresh SETCC nodes which
refer to any updated operands.
ISTM that the location applied to the new constant and SETCC nodes
should be the same as the location of the ExtLoad.
This was suggested by Adrian in https://reviews.llvm.org/D45995.
Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D46216
llvm-svn: 332119
This teaches tryToFoldExtOfLoad to set the right location on a
newly-created extload. With that in place, the logic for performing a
certain ([s|z]ext (load ...)) combine becomes identical for sexts and
zexts, and we can get rid of one copy of the logic.
The test case churn is due to dependencies on IROrders inherited from
the wrong SDLoc.
Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D46158
llvm-svn: 332118
These directives allow the 'C' (compressed) extension to be enabled/disabled
within a single file.
Differential Revision: https://reviews.llvm.org/D45864
Patch by Kito Cheng
llvm-svn: 332107
Summary:
performPostLD1Combine in AArch64ISelLowering looks for vector
insert_vector_elt of a loaded value which it can optimize into a single
LD1LANE instruction. The code checking for the pattern was not checking
if the lane index was a constant which could cause two problems:
- an assert when lowering the LD1LANE ISD node since it assumes an
constant operand
- an assert in isel if the lane index value depends on the
post-incremented base register
Both of these issues are avoided by simply checking that the lane index
is a constant.
Fixes bug 35822.
Reviewers: t.p.northover, javed.absar
Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D46591
llvm-svn: 332103
Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets.
So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened.
We may be able to drop some of the old patterns, but I leave that for a future patch.
llvm-svn: 332049
Summary: The final -wasm component has been the default for some time now.
Subscribers: jfb, dschuff, jgravelle-google, eraman, aheejin, JDevlieghere, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D46342
llvm-svn: 332007
With nnan, there's no need for the masked merge / blend
sequence (that probably costs much more than the min/max
instruction).
Somewhere between clang 5.0 and 6.0, we started producing
these intrinsics for fmax()/fmin() in C source instead of
libcalls or fcmp/select. The backend wasn't prepared for
that, so we regressed perf in those cases.
Note: it's possible that other targets have similar problems
as seen here.
Noticed while investigating PR37403 and related bugs:
https://bugs.llvm.org/show_bug.cgi?id=37403
The IR FMF propagation cases still don't work. There's
a proposal that might fix those cases in D46563.
llvm-svn: 331992
Clang 6.0 was updated to create these intrinsics rather than
libcalls or fcmp/select, but the backend wasn't prepared to
handle that optimally.
This bug is not the primary reason for PR37403:
https://bugs.llvm.org/show_bug.cgi?id=37403
...but it's probably more important for x86 perf.
llvm-svn: 331988
Summary:
The combine in rebuildSetCC may be combined to another
node leaving our references stale. Keep a handle on
it to avoid stale references.
Fixes PR36602.
Reviewers: dbabokin, RKSimon, eli.friedman, davide
Subscribers: hiraditya, uabelho, JesperAntonsson, qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D46404
llvm-svn: 331985
Const/local/shared address spaces are all < 4GB and we can always use
32-bit pointers to access them. This has substantial performance impact
on kernels that uses shared memory for intermediary results.
The feature is disabled by default.
Differential Revision: https://reviews.llvm.org/D46147
llvm-svn: 331941
The second source operand of G_SHL, G_ASHR, and G_LSHR must preserve its
value as a (small) unsigned integer, therefore its incorrect to widen it
in any way but by zero extending it.
G_SHL was using G_ANYEXT and G_ASHR - G_SEXT (which is correct for their
destination and first source operands, but not the "number of bits to
shift" operand).
Generally, shifts aren't as similar to regular binary operations as it
might seem, for instance, they aren't commutative nor associative and
the second source operand usually requires a special treatment.
Reviewers: bogner, javed.absar, aivchenk, rovka
Reviewed By: bogner
Subscribers: igorb, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D46413
llvm-svn: 331926
If a multiply is truncated, SimplifyDemandedBits
sometimes turns a zero_extend of the inputs into an
any_extend, which makes the known bits computation unhelpful.
Ignore these and compute known bits for the underlying value,
since we insert the correct extend type after.
llvm-svn: 331919
This is an extension of an existing combine to reduce wider
shls if the result fits in the final result type. This
introduces the same combine, but reduces the shift to a middle
sized type to avoid the slow 64-bit shift.
llvm-svn: 331916
If the truncate is only accessing the first element of the vector,
we can use the original source value.
This helps with some combine ordering issues after operations are
lowered to integer operations between bitcasts of build_vector.
In particular it stops unnecessarily materializing the unused
top half of a vector in some cases.
llvm-svn: 331909
Previously if !LegalOperations we would blindly call getBitcast and hope that getNode would constant fold it. But if the conversion is between a vector and a scalar, getNode has no simplification.
This means we would just get back the original N. We would then return that N which would make the caller of visitBITCAST think that we used CombineTo and did our own worklist management. This prevents target specific optimizations from being called for vector/scalar bitcasts until after legal operations.
llvm-svn: 331896
MOVNTPD/MOVNTPS should be WriteFStore
Standardized BDW/HSW/SKL/SKX WriteFStore/WriteVecStore - fixes some missed instregex patterns. (V)MASKMOVDQU was already using the default, its costs gets increased but is still nowhere near the real cost of that nasty instruction....
llvm-svn: 331864
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit.
We know it wasn't r331819.
llvm-svn: 331846
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is
!DILabel(scope: !1, name: "foo", file: !2, line: 3)
We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is
llvm.dbg.label(metadata !1)
It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.
We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.
Differential Revision: https://reviews.llvm.org/D45024
Patch by Hsiangkai Wang.
llvm-svn: 331841
Refactoring LegalizerHelper::widenScalar member function reducing its
size by approximately a factor of 2 and (hopefuly) making it more
straightforward and regular by introducing widenScalarSrc and
widenScalarDst helper methods.
The new widenScalar* methods mutate the instructions in place instead
of recreating them from scratch and removing the originals. The
compile time implications of this were measured on sqlite3
amalgamation, targeting AArch64 in -O0:
LegalizerHelper::widenScalar: > 25% faster
Legalizer::runOnMachineFunction: ~ 4.0 - 4.5% faster
Also adding MachineOperand::setCImm and refactoring out
MachineIRBuilder::recordInsertion methods to make the change possible.
Reviewers: aditya_nandakumar, bogner, javed.absar, t.p.northover, ab, dsanders, arsenm
Reviewed By: aditya_nandakumar
Subscribers: wdng, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D46414
llvm-svn: 331819