The physical register in the asm has the wrong type for the declared
IR. It seems to work in the DAG by extracting the 4 elements that are
defined in the IR from the register, but that isn't handled here. This
doesn't seem to be a well tested path since other mismatched cases are
crashing the DAG asm handling.
A shift-left > 63 triggers a UBSAN failure. This patch kicks the can
down the road (to the consumer) by emitting a more compact
representation of the shift computation in DWARF expressions.
Differential Revision: https://reviews.llvm.org/D118183
DIStringType is used to encode the debug info of a character object
in Fortran. A Fortran deferred-length character object is typically
implemented as a pair of the following two pieces of info: An address
of the raw storage of the characters, and the length of the object.
The stringLocationExp field contains the DIExpression to get to the
raw storage.
This patch also enables the emission of DW_AT_data_location attribute
in a DW_TAG_string_type debug info entry based on stringLocationExp
in DIStringType.
A test is also added to ensure that the bitcode reader is backward
compatible with the old DIStringType format.
Differential Revision: https://reviews.llvm.org/D117586
This reverts commit ef8206320769ad31422a803a0d6de6077fd231d2.
- It conflicts with the existing llvm::size in STLExtras, which will now
never be called.
- Calling it without llvm:: breaks C++17 compat
The loop below the changed line assumes that the element
width of the target constant is the same as the element
width of the loaded value, but that is not always true.
We could try harder to do some kind of min/max calc even
if the sizes don't match, but that can be another patch
if needed. This fixes#53401 (miscompile) and does not
change the motivating cases added when this analysis
was introduced:
ad298f86b7ad2a
Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence.
This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32)
on those targets which have no V_XNOR.
Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form.
We assume that xor (not) is already turned into the not (xor) by the combiner.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D116270
This is the unsigned variant of D111976, where we convert a clamped
fptoui to a fptoui.sat. Because we are unsigned, the condition this time
is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes.
This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.
Differential Revision: https://reviews.llvm.org/D114964
When evicting interference, it causes an asseertion error
since LiveIntervals::intervalIsInOneMBB assumes that input
is not empty.
This patch fixed bug mentioned in D118020.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D118124
A shift-left > 63 triggers a UBSAN failure. This patch kicks the can
down the road (to the consumer) by emitting a more compact
representation of the shift computation in DWARF expressions.
Differential Revision: https://reviews.llvm.org/D118183
During fast-isel calling 'markFunctionEnd' in the base class will call
tidyLandingPads. This can cause an issue where we have determined that
we need ehinfo and emitted a traceback table with the bits set to
indicate that we will be emitting the ehinfo, but the tidying deletes
all landing pads. In this case we end up emitting a reference to
__ehinfo.N symbol, but not emitting a definition to said symbol and the
resulting file fails to assemble.
Differential Revision: https://reviews.llvm.org/D117040
An sra is basically sign-extending a narrower value. Fold away the
shift by doing a sextload of a narrower value, when it is legal to
reduce the load width accordingly.
Differential Revision: https://reviews.llvm.org/D116930
This patch adds widening support for ISD::VP_MERGE, which widens
identically to VP_SELECT and similarly to other select-like nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D118030
This patch adds splitting support for ISD::VP_MERGE, which splits
identically to VP_SELECT and similarly to other select-like nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D118032
Split these nodes in a similar way as their masked versions.
Reviewed By: frasercrmck, craig.topper
Differential Revision: https://reviews.llvm.org/D117760
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().
This is part of D117885, in preparation for deprecating the API.
DwarfCompileUnit::getOrCreateSourceID() is often called many times
in sequence with the same DIFile. This is currently very expensive,
because it involves creating a string from directory and file name
and looking it up in a string map. This patch remembers the last
DIFile and its ID and directly returns that.
This gives a geomean -1.3% compile-time improvement on CTMark O0-g.
Differential Revision: https://reviews.llvm.org/D118041
This matches the actual runtime function more closely.
I considered also renaming both RetainRV/UnsafeClaimRV to end with
"ARV", for AutoreleasedReturnValue, but there's less potential
for confusion there.
Over in the comments for D116821, some use-cases have cropped up where
there's a substantial increase in memory usage. A quick inspection
shows that a) it's a lot of memory and b) there are several things to
be done to reduce it. Reverting (via disabling this feature by default)
to avoid bothering people in the meantime.
Given that step_vector is practically a constant, doing this early
helps with DAGCombine folds that happen before type legalization.
There is currently no way to test this happens earlier, although existing
tests for step_vector folds continue protect the folds happening at all.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D117863
If the bitreverse gets expanded, it will introduce a new bswap. By
putting a bswap before the bitreverse, we can ensure it gets cancelled
out when this happens.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D118012
This can show up during when bitreverse is expanded to bswap and
swap of bits within a byte. If the input is already a bswap, we
should cancel them out before we further transform them in a way
that makes it harder to see the redundancy.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D118007
Only using that change in StringRef already decreases the number of
preoprocessed lines from 7837621 to 7776151 for LLVMSupport
Perhaps more interestingly, it shows that many files were relying on the
inclusion of StringRef.h to have the declaration from STLExtras.h. This
patch tries hard to patch relevant part of llvm-project impacted by this
hidden dependency removal.
Potential impact:
- "llvm/ADT/StringRef.h" no longer includes <memory>,
"llvm/ADT/Optional.h" nor "llvm/ADT/STLExtras.h"
Related Discourse thread:
https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831
In code review for D117104 two slightly weird checks were found
in DAGCombiner::reduceLoadWidth. They were typically checking
if BitsA was a mulitple of BitsB by looking at (BitsA & (BitsB - 1)),
but such a comparison actually only make sense if BitsB is a power
of two.
The checks were related to the code that attempted to shrink a load
based on the fact that the loaded value would be right shifted.
Afaict the legality of the value types is checked later (typically in
isLegalNarrowLdSt), so the existing checks were both overly
conservative as well as being wrong whenever ExtVTBits wasn't a
power of two. The latter was a situation triggered by a number of
lit tests so we could not just assert on ExtVTBIts being a power of
two).
When attempting to simply remove the checks I found some problems,
that seems to have been guarded by the checks (maybe just out of
luck). A typical example would be a pattern like this:
t1 = load i96* ptr
t2 = srl t1, 64
t3 = truncate t2 to i64
When DAGCombine is visiting the truncate reduceLoadWidth is called
attempting to narrow the load to 64 bits (ExtVT := MVT::i64). Then
the SRL is detected and we set ShAmt to 64.
In the past we've bailed out due to i96 not being a multiple of 64.
If we simply remove that check then we would end up replacing the
load with a new load that would read 64 bits but with a base pointer
adjusted by 64 bits. So we would read 32 bits the wasn't accessed by
the original load.
This patch will instead utilize the fact that the logical left shift
can be folded away by using a zextload. Thus, the pattern above will
now be combined into
t3 = load i32* ptr+offset, zext to i64
Another case is shown in the X86/shift-folding.ll test case:
t1 = load i32* ptr
t2 = srl i32 t1, 8
t3 = truncate t2 to i16
In the past we bailed out due to the shift count (8) not being a
multiple of 16. Now the narrowing kicks in and we get
t3 = load i16* ptr+offset
Differential Revision: https://reviews.llvm.org/D117406
EmitSchedule() shouldn't be touching instructions after the provided
insertion point. The change introduced in D83561 performs a scan to
the end of the block, and thus may move unrelated instructions. In
particular, this ends up moving instructions that have been produced
by FastISel and will later be deleted. Moving them means that more
instructions than intended are removed.
Fix this by stopping the iteration when the insertion point is
reached.
Fixes https://github.com/llvm/llvm-project/issues/53243.
Differential Revision: https://reviews.llvm.org/D117489
SelectionDAG::getNode() canonicalises constants to the RHS if the
operation is commutative, but it doesn't do so for constant splat
vectors. Doing this early helps making certain folds on vector types,
simplifying the code required for target DAGCombines that are enabled
before Type legalization.
Somewhat to my surprise, DAGCombine doesn't seem to traverse the
DAG in a post-order DFS, so at the time of doing some custom fold where
the input is a MUL, DAGCombiner::visitMUL hasn't yet reordered the
constant splat to the RHS.
This patch leads to a few improvements, but also a few minor regressions,
which I traced down to D46492. When I tried reverting this change to see
if the changes were still necessary, I ran into some segfaults. Not sure
if there is some latent bug there.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D117794
This change folds (or (shl x, C0), (lshr y, C1)) to funnel shift iff C0
and C1 are constants where C0 + C1 is the bit-width of the shift
instructions.
Differential Revision: https://reviews.llvm.org/D116529
LLVM DebugInfo CodeGen synthesizes type declarations in type units when
referencing types that are not in type units. When those synthesized
types are templates and simplified template names (or mangled simplified
template names) are in use, the template arguments must be attached to
those declarations.
A deeper fix (with a CU or DICompositeType flag) that would also support
other uses of clang's -debug-forward-template-args (such as Sony's
platform) could/should be implemented to fix this more broadly.
Doing this causes a declaration of the internal linkage (anonymous
namespace) type to be emitted in the type unit, which would then be
ambiguous as to which internal linkage definition it refers to (since
the name is only valid internally).
It's possible these internal linkage types could be resolved relative to
the unit the TU is referred to from - but that doesn't seem ideal, and
there's no reason to put the type in a type unit since it can only be
defined in one CU anyway (since otherwise it'd be an ODR violation) & so
avoiding the type unit should be a smaller DWARF encoding anyway.
This also addresses an issue with Simplified Template Names where the
template parameter could not be rebuilt from the declaration emitted
into the TU (specifically for an enum non-type template parameter, where
looking up the enumerators is necessary to rebuild the full template
name)
Fixes parity codegen issue where we know all but the lowest bit is zero, we can replace the ICMPNE with 0 comparison with a ext/trunc
Differential Revision: https://reviews.llvm.org/D117983
Fixes parity codegen issue where we know all but the lowest bit is zero, we can replace the ICMPNE with 0 comparison with a ext/trunc
Differential Revision: https://reviews.llvm.org/D117983
Pulled out of D106237, this folds truncstore(extend(x)) back to store(x)
if the original store was legal. This can come up due to the order we
fold nodes. A fold from X86 needs to be adjusted to prevent infinite
loops, to have it pick the operand of a trunc more directly.
Differential Revision: https://reviews.llvm.org/D117901
Fix PR53163 by rounding the byte size of DW_TAG_base_type types up. Without
this fix we risk emitting types with a truncated size (including rounding
less-than-byte-sized types' sizes down to zero).
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D117124
This patch adds support for the MSVC /HOTPATCH flag: https://docs.microsoft.com/sv-se/cpp/build/reference/hotpatch-create-hotpatchable-image?view=msvc-170&viewFallbackFrom=vs-2019
The flag is translated to a new -fms-hotpatch flag, which in turn adds a 'patchable-function' attribute for each function in the TU. This is then picked up by the PatchableFunction pass which would generate a TargetOpcode::PATCHABLE_OP of minsize = 2 (which means the target instruction must resolve to at least two bytes). TargetOpcode::PATCHABLE_OP is only implemented for x86/x64. When targetting ARM/ARM64, /HOTPATCH isn't required (instructions are always 2/4 bytes and suitable for hotpatching).
Additionally, when using /Z7, we generate a 'hot patchable' flag in the CodeView debug stream, in the S_COMPILE3 record. This flag is then picked up by LLD (or link.exe) and is used in conjunction with the linker /FUNCTIONPADMIN flag to generate extra space before each function, to accommodate for live patching long jumps. Please see: d703b92296/lld/COFF/Writer.cpp (L1298)
The outcome is that we can finally use Live++ or Recode along with clang-cl.
NOTE: It seems that MSVC cl.exe always enables /HOTPATCH on x64 by default, although if we did the same I thought we might generate sub-optimal code (if this flag was active by default). Additionally, MSVC always generates a .debug$S section and a S_COMPILE3 record, which Clang doesn't do without /Z7. Therefore, the following MSVC command-line "cl /c file.cpp" would have to be written with Clang such as "clang-cl /c file.cpp /HOTPATCH /Z7" in order to obtain the same result.
Depends on D43002, D80833 and D81301 for the full feature.
Differential Revision: https://reviews.llvm.org/D116511
The GlobalISel combiner currently uses sign extension when manipulating
the LHS constant when combining a sequence of the following sequence of
machine instructions into a single constant:
```
%0:_(s32) = G_CONSTANT i32 <CONSTANT>
%1:_(p0) = G_INTTOPTR %0:_(s32)
%2:_(s64) = G_CONSTANT i64 <CONSTANT>
%3:_(p0) = G_PTR_ADD %1:_, %2:_(s64)
```
This causes an issue when the bit width of the first contant and the
target pointer size are different, as G_INTTOPTR has no sign extension
semantics.
This patch fixes this by capture an arbitrary precision in when matching
the constant, allowing the matching function to correctly zero extend
it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D116941
The tensorflow AOT compiler can cross-target, but it can't run on (for
example) arm64. We added earlier support where the AOT-ed header and object
would be built on a separate builder and then passed at build time to
a build host where the AOT compiler can't run, but clang can be otherwise
built.
To simplify such scenarios given we now support more than one AOT-able
case (regalloc and inliner), we make the AOT scenario centered on whether
files are generated, case by case (this includes the "passed from a
different builder" scenario).
This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of
case by case control. A builder can opt out of an AOT case by passing that case's
model path as `none`. Note that the overrides still take precedence.
This patch controls conditional compilation with case-specific flags,
which can be enabled locally, for the component where those are
available. We still keep an overall flag for some tests.
The 'development/training' mode is unchanged, because there the model is
passed from the command line and interpreted.
Differential Revision: https://reviews.llvm.org/D117752
Instead of constructing DebugVariables and looking up the order
in the comparison function, compute the order upfront and then sort
a vector of (order, instr).
This improves compile-time by -0.4% geomean on CTMark ReleaseLTO-g.
Differential Revision: https://reviews.llvm.org/D117575