33746 Commits

Author SHA1 Message Date
Jun Ma
00eef4f7c3 [SelectionDAG] Fix mismatched truncate when combine BUILD_VECTOR with EXTRACT_SUBVECTOR
Just use correct type for truncation. Fixes PR59625

Differential Revision: https://reviews.llvm.org/D145757
2023-03-13 08:59:52 +08:00
Simon Pilgrim
82dc04befd [DAG] visitZERO_EXTEND - pull out the repeated SDLoc(N) variables 2023-03-12 15:18:46 +00:00
Simon Pilgrim
4d7da0e711 [DAG] Cleanup the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold. NFC.
Preliminary cleanup before adding some additional legality and value tracking handling.
2023-03-12 15:01:33 +00:00
Simon Pilgrim
b53ea2b9c5 [DAG] visitAND - fold (and (any_ext V), c) -> (zero_ext (and (trunc V), c)) if profitable.
Try to more aggressively narrow masks of extended values.

This is mainly for cases where the mask is trying to zero out any_extended upper bits, assuming we can zext/trunc the values for free.

This catches a few actual missed folds, as well as helps canonicalize a number of other cases which were being caught in isel etc.

Differential Revision: https://reviews.llvm.org/D145866
2023-03-12 13:25:23 +00:00
Simon Pilgrim
fad852efe4 [DAG] combineShiftAnd1ToBitTest - improve support for peeking through truncations
Allows us to handle shift amounts that exceed the original bitwidth
2023-03-11 16:37:47 +00:00
Yuanfang Chen
9aae408d55 [NFC] fix typo funciton -> function
credits to @jmagee
2023-03-10 18:05:25 -08:00
Tim Northover
5c18444289 MachO: support custom section names on global variables
These attributes have been accepted in ELF for a while, and are generated by
Clang in some places, so it makes sense to support them on MachO too.

https://reviews.llvm.org/D143173
2023-03-10 18:23:25 +00:00
Sameer Sahasrabuddhe
fd98416d37 [llvm][Uniformity] consistently handle always-uniform instructions
An instruction that is "always uniform" is so even if it occurs in an
irreducible cycle. The output produced by such an instruction may depend on the
implementation defined cycle hierarchy, but that does not affect the uniformity
of the output. In other words, an "always uniform" instruction is uniform even
if it is not m-converged.

Reviewed By: ruiling, ronlieb

Differential Revision: https://reviews.llvm.org/D145572
2023-03-10 14:23:40 +05:30
Rong Xu
ebe09e2a95 [FSAFDO] Improve FS discriminator encoding
This change improves FS discriminators in the following ways:
(1) use call-stack debug information in the the to generate
discriminators: the same (src/line) DILs can now have same
discriminator value if they come from different call-stacks.
This effectively increases the usable discriminator values
for each round of FS discriminator pass.
(2) don't generate the FS discriminator for meta instructions
(i.e. instructions not emitted). This reduces the number
discriminators conflicts (for the case we run out of discriminator
bits for that pass).
(3) use less expensive hashing of xxHash64.

These improvements should bring better performance for FSAFDO
and they should be used by default. But this change creates
incompatible FS discriminators. For the iterative profile users,
they might see a performance drop in the first release with
this change (due to the fact that the profiles have the old
discriminators and the compiler uses the new discriminator).
We have measured that this is not more than 1.5% on several
benchmarks. Note the degradation should be gone in the second
release and one should expect a performance gain over the binary
without this change.

One possible solution to the iterative profile issue would be
separating discriminators for profile-use and the ones emitted to
the binary. This would require a mechanism to allow two sets of
discriminators to be maintained and then phasing out the first
approach. This is too much churn in the compiler and the
performance implications do not seem to be worth the effort.

Instead, we put the changes under an option so iterative profile
users can do a gradual rollout of this change. We will make the
option default value to true in a later patch and eventually
purge this option from the code base.

Differential Revision: https://reviews.llvm.org/D145171
2023-03-09 23:18:48 -08:00
Yeting Kuo
b2c48559c8 [IR][DAG][RISCV] Allow scalable vector ISD::STRICT_FP_EXTEND and RISC-V supports for vector ISD::STRICT_FP_EXTEND.
The patch mainly does two things. The first is allowing scalable vector
ISD::STRICT_FP_EXTEND. The second is making RISC-V customized lower
strict_fpextend to riscv_strict_fpextend_vl, the strict version of
riscv_fpextend_vl.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D145548
2023-03-09 17:37:59 +08:00
Felipe de Azevedo Piovezan
c0967995d2 [CodeGen] Prevent nullptr deref in genAlternativeCodeSequence
A pointer dereference was added (D141302) above an assert that checks
whether the pointer is null. This commit moves the assert above the
dereference and transforms it into an llvm_unreachable to better express
the intent that certain switch cases should never be reached.

Differential Revision: https://reviews.llvm.org/D145599
2023-03-08 13:41:32 -05:00
Juneyoung Lee
a66bc1c4a3 [DAGCombiner] Avoid converting (x or/xor const) + y to (x + y) + const if benefit is unclear
This patch resolves suboptimal code generation reported by https://github.com/llvm/llvm-project/issues/60571 .

DAGCombiner currently converts `(x or/xor const) + y` to `(x + y) + const` if this is valid.
However, if `.. + const` is broken down into a sequences of adds with carries, the benefit is not clear, introducing two more add(-with-carry) ops (total 6) in the case of the reported issue whereas the optimal sequence must only have 4 add(-with-carry)s.

This patch resolves this issue by allowing this conversion only when (1) `.. + const` is legal or promotable, or (2) `const` is a sign bit because it does not introduce more adds.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D144116
2023-03-08 18:13:57 +00:00
Paul Walker
adbdf273ef [CodeGenPrepare] Stop llvm.vscale() -> getelementptr(null, 1) transformation.
I've pulled this change from D145404 to land in isolation because
I'm concerned the code might be more important than the test
coverage might suggest (NOTE: the code has no test coverage).
2023-03-08 15:47:03 +00:00
Xiang1 Zhang
eed31bbb37 [NFC] Remove dead code in ExtAddrMode::print checked by coverty tool 2023-03-08 15:01:28 +08:00
Chen Zheng
fc26ab36a2 [DAGCombiner] don't use the pointer info for widen store
The merged store touches memory for other underlying objects, so mapping
the merged store to the first underlying object is not correct. For example
in https://github.com/llvm/llvm-project/issues/60744, the merged store is
not correctly analyzed as dependent with memory operations which are also
part of the merged store.

Fixes #60744

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D144711
2023-03-07 20:31:09 -05:00
Nikita Popov
ffe8f47d72 [IR] Add operator<< overload for CmpInst::Predicate (NFC)
I regularly try and fail to use this while debugging.
2023-03-07 15:10:56 +01:00
Jay Foad
0265dd9925 Fix "compatiable" typos 2023-03-07 12:57:39 +00:00
Noah Goldstein
c1ecd0a3f4 [DAGCombiner] Add fold for ~x + x -> -1
This is generally done by the InstCombine, but can be emitted as an
intermediate step and is cheap to handle.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D145177
2023-03-06 20:30:27 -06:00
Noah Goldstein
d4b24b4a55 [DAGCombiner] Add fold for ~x & x -> 0
This is generally done by the InstCombine, but can be emitted as an
intermediate step and is cheap to handle.

Differential Revision: https://reviews.llvm.org/D145143
2023-03-06 20:30:20 -06:00
Marco Elver
bdb4353ae0 [SelectionDAG] Optimize copyExtraInfo deep copy
It turns out that there are relatively trivial, albeit rare, cases that
require a MaxDepth of more than 16 (see added test). However, we want to
avoid having to rely on a large fixed MaxDepth.

Since these cases are relatively rare, apply the following strategy:

  1. Start with a low MaxDepth of 16 - if the entry node was not
     reached, we can return (the common case).

  2. If the entry node was reached, exponentially increase MaxDepth up
     to some large limit that should cover all cases and guard against
     stack exhaustion.

This retains the better performance with a low MaxDepth in the common
case, and in complex cases backs off and retries. On a whole, this is
preferable vs. starting with a large MaxDepth which would unnecessarily
penalize the common case where a low MaxDepth is sufficient.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D145386
2023-03-06 17:29:53 +01:00
Caroline Concatto
204800ad0a [IR][Legalization] Promote illegal deinterleave and interleave vectors
To make legalization easier, the operands and outputs have the same size for
these ISD Nodes. When legalizing the results in PromoteIntegerResult the operands
are legalized to the same size as the outputs.
The ISD Node has two output/results, therefore the legalizing functions update
both results/outputs.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D144846
2023-03-03 10:54:52 +00:00
Craig Topper
01487f384a [TypePromotion] Deference pointer before printing it in a debug message.
Without deferencing it just prints the value of the pointer which
isn't meaningful. Dereferencing prints the operand.
2023-03-02 23:43:36 -08:00
Marco Elver
7ecd2a23f5 [SelectionDAG] Fix missing lambda capture
Move MaxDepth into the lambda, since it is not needed outside. This
fixes some compilers that complain about missing capture:

  error C3493: 'MaxDepth' cannot be implicitly captured because no
  default capture mode has been specified

Fixes: f693932fbea7 ("[SelectionDAG] Transitively copy NodeExtraInfo on RAUW")
2023-03-02 23:47:36 +01:00
Aditya Nandakumar
00e55531df [GISel][CSE][NFC]: Handle mutual recursion when inserting node
GISel's CSE mechanism lazily inserts instructions into the CSE List
to improve on efficiency as well as efficacy of CSE
(for allowing partially built instructions to be fully built).

There's unfortunately a mutual recursion via
 `handleRecordedInsts -> handleRecordedInst -> insertNode-> handleRecordedInsts`.

So this change simply records that we're already draining this list so we can just bail out on the recursion.

No changes to codegen are expected as we're still draining/handling the temporary
list via pop_back and we should get the same sequence of instructions
whether we call pop_back in a loop at the top level or recursive.

https://reviews.llvm.org/D145006

reviewed by: dsanders
2023-03-02 14:42:38 -08:00
Marco Elver
f693932fbe [SelectionDAG] Transitively copy NodeExtraInfo on RAUW
During legalization of the SelectionDAG, some nodes are replaced with
arch-specific nodes. These may be complex nodes, where the root node no
longer corresponds to the node that should carry the extra info.

Fix the issue by copying extra info to the new node and all its new
transitive operands during RAUW. See code comments for more details.

This fixes the remaining pcsections-atomics.ll tests on X86.

v2: Optimize copyExtraInfo() deep copy. For now we assume that only
NodeExtraInfo that have PCSections set require deep copy. Furthermore,
limit the depth of graph search while pre-populating the visited set,
assuming the to-be-replaced subgraph 'From' has limited complexity. An
assertion catches if the maximum depth needs to be increased.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D144677
2023-03-02 23:07:19 +01:00
Craig Topper
06c6b787b2 [SelectionDAG][AArch64] Constant fold in SelectionDAG::getVScale if VScaleMin==VScaleMax.
Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D145113
2023-03-02 12:02:38 -08:00
Craig Topper
c546f13f1f [DAGCombiner] Replace LegalOperations check in visitSIGN_EXTEND with LegalTypes.
This is guarding a check for isTypeLegal so it should check is
LegalTypes.

Fixes PR61111.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D145139
2023-03-02 07:52:53 -08:00
Sander de Smalen
170e7a0ec2 [AArch64][SME2] Add CodeGen support for target("aarch64.svcount").
This patch adds AArch64 CodeGen support such that the type can be passed
and returned to/from functions, and also adds support to use this type in
load/store operations and PHI nodes.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D136862
2023-03-02 12:07:41 +00:00
J. Ryan Stinnett
22b8e82c12 [DebugInfo] Remove dbg.addr from CodeGen
As part of this work, removing `SDDbgValue::clearIsEmitted` originally added for
`dbg.addr` in 045c67769d7fe577fc38cccb6fb40fd814437447 was attempted, but it
appears some tests for `DBG_INSTR_REF` now depend on that behaviour as well, so
it was kept and comments were updated instead.

Part of `dbg.addr` removal
Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898

Differential Revision: https://reviews.llvm.org/D144800
2023-03-02 09:29:43 +00:00
J. Ryan Stinnett
f5b85c02e9 [DebugInfo][NFC] Remove FuncArgumentDbgValueKind::Addr from SelectionDAG
This removes the unused `FuncArgumentDbgValueKind::Addr` value originally added
by e24f5348798605a799c63ff09169d177d262cd37. The intent was to signal the
original intrinsic that marked a function argument, but the `Addr` part was
never used.

Part of `dbg.addr` removal
Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898

Differential Revision: https://reviews.llvm.org/D144794
2023-03-02 09:29:42 +00:00
Marco Elver
e0bc779000 Revert "[SelectionDAG] Transitively copy NodeExtraInfo on RAUW"
This reverts commit 7f635b90e7bdf1378fd9a65fc62b99e8e07d4aaf.

The current implementation causes pathological slowdowns in certain
cases: https://github.com/llvm/llvm-project/issues/61108
2023-03-02 09:39:44 +01:00
Yashwant Singh
5230f6c1c2 [llvm][GenericUniformity] Prevent assert while calculating temporal divergence
analyzeTemporalDivergence() was missing the check for always-uniform before
evaluating weather an instruction depends on a value defined in the cycle.
Fix for #60638
https://github.com/llvm/llvm-project/issues/60638

Reviewed By: sameerds, foad, #amdgpu

Differential Revision: https://reviews.llvm.org/D144070
2023-03-02 12:42:35 +05:30
Nick Desaulniers
9cec2b246e [RegAllocFast] insert additional spills along indirect edges of INLINEASM_BR
When generating spills (stores) for values produced by INLINEASM_BR
instructions, make sure to insert one spill per indirect target.
Otherwise the reload generated may load from a stack slot that has not
yet been stored to (resulting in a load of an uninitialized stack slot).

Link: https://github.com/llvm/llvm-project/issues/53562
Fixes: https://github.com/llvm/llvm-project/issues/60855

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D144907
2023-03-01 15:21:11 -08:00
Simon Pilgrim
73cdccad55 [DAG] expandIntMINMAX - attempt to match existing SETCC node
As noticed on D144789, when we have pairs of min/max nodes we often end up with multiple comparisons which we could reuse with commuted select ops, so check to see if a suitable SETCC already exists. This also allowed us to remove a similar X86 peephole.

There are other getSETCC cases where we could safely reuse other CondCodes as well - I've been trying to think of how we could reuse this logic in SelectionDAG but haven't found anything that always works well.

An alternative would be to have a TLI callback that returns a preferred CondCode from a list of options, I've noticed this helped fpclamptosat tests on some other targets (MVE + WebAssembly), but other tests suffered.

Differential Revision: https://reviews.llvm.org/D145065
2023-03-01 19:04:03 +00:00
David Green
337215ddf9 [DAG] ABD is not reassociative
I'm not sure how I missed this in the testing, but as far as I understand
whilst ABDS and ABDU are commutive they are not associative. This patch
disables reassociateOps from visitABD, fixing the problems found in #61069.
ABDU: https://alive2.llvm.org/ce/z/eiT5QG
ABDS: https://alive2.llvm.org/ce/z/HzE29l

Differential Revision: https://reviews.llvm.org/D145064
2023-03-01 16:22:13 +00:00
Nikita Popov
ddccc5ba44 [CodeGen] Always expand division larger than i128
Default MaxDivRemBitWidthSupported to 128, so that divisions larger
than 128 bits are always expanded, without requiring additional
configuration from the target.

Note that this may still emit calls to __udivti3 on 32-bit targets,
which likely don't have an implementation of that builtin. However,
I believe this is sufficient to fix
https://github.com/llvm/llvm-project/issues/60531, because Zig must
already be defining those builtins.

Differential Revision: https://reviews.llvm.org/D144871
2023-03-01 15:33:45 +01:00
Ben Shi
0d25418273 [NFC] Fix incorrect comment in VLIW packetizer
Reviewed By: bcain

Differential Revision: https://reviews.llvm.org/D145050
2023-03-01 21:19:06 +08:00
Caroline Concatto
cb96eba27c [IR][Legalization] Split illegal deinterleave and interleave vectors
To make legalization easier, the operands and outputs have the same size for
these ISD Nodes. When legalizing the results in SplitVectorResult the operands
are legalized to the same size as the outputs.
The ISD Node has two output/results, therefore the legalizing functions update
both results/outputs.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144744
2023-03-01 08:30:16 +00:00
Wei Xiao
3fd533fd33 [COFF][X86_64] Put jump table in .rdata for Windows
Put jump table in .rdata for Windows to align with that for Linux.
It can avoid loading the same code page into I$ and D$ simultaneously
and thus favor performance.

Differential Revision: https://reviews.llvm.org/D144701
2023-03-01 10:35:38 +08:00
Craig Topper
bf9e0ed1e6 [CodeGen] Use LLVM_ATTRIBUTE_UNUSED instead of LLVM_DUMP_METHOD on a raw_ostream operator<<.
LLVM_DUMP_METHOD includes ATTRIBUTE_NOINLINE. operator<< isn't
what we normally consider a dump method so it should be ok to inline.

This fixes a warning from gcc that some other declaration for some
other class was inline but this one is noinline. Seems like a bogus
warning from gcc really.
2023-02-27 18:12:18 -08:00
Vladislav Dzhidzhoev
3a51eed948 [AArch64][GlobalISel] Legalize G_SHUFFLE_VECTOR with smaller dest size
Legalize G_SHUFFLE_VECTOR having destination vector length smaller than
source vector length by reshaping destination vector.

Differential Revision: https://reviews.llvm.org/D144670
2023-02-27 23:46:44 +01:00
Michal Paszkowski
5ac69674bf [SPIR-V] Support TargetExtType for SPIR-V builtin types
This patch adds support for TargetExtType/target(...) representing
SPIR-V builtin types. After D135202, target(...) is the preferred way
for representing SPIR-V builtin types in LLVM IR and the only working
in the opaque pointer mode.

In order to maintain compatibility with LLVM IR generated by older
versions of Clang and LLVM/SPIR-V Translator, pointers-to-opaque-structs
denoting SPIR-V/OpenCL builtin types will be translated to equivalent
SPIR-V target extension types. This translation is only available in the
typed pointer mode (-opaque-pointers=0).

The relevant LIT tests with SPIR-V builtins were converted to use the
new target(...) notation.

Differential Revision: https://reviews.llvm.org/D144494
2023-02-27 21:39:25 +01:00
David Green
06daa515b2 [AArch64] Don't remove free sext_inreg(vector_extract(x)) if it leads to multiple extracts
If we have sext_inreg(vector_extract(x)) but the top bits are not used, DAG
will try to remove the sext_inreg, using vector_extract(x) directly. This can
lead to multiple uses of both sext_inreg(vector_extract(x)) and
vector_extract(x), leading to the generation of both umov and smov extracts.
This adds a target hook to prevent that under AArch64 where the sext_inreg can
be considered free if there are multiple uses of the sext and no uses of the
vector_extract. This helps fix a small regression from D144550.

Differential Revision: https://reviews.llvm.org/D144850
2023-02-27 19:20:10 +00:00
Marco Elver
7f635b90e7 [SelectionDAG] Transitively copy NodeExtraInfo on RAUW
During legalization of the SelectionDAG, some nodes are replaced with
arch-specific nodes. These may be complex nodes, where the root node no
longer corresponds to the node that should carry the extra info.

Fix the issue by copying extra info to the new node and all its new
transitive operands during RAUW. See code comments for more details.

This fixes the remaining pcsections-atomics.ll tests on X86.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D144677
2023-02-27 12:16:14 +01:00
Amara Emerson
4bc6434624 [GlobalISel] Fix an assertion failure in matchHoistLogicOpWithSameOpcodeHands().
We use this combine in the AArch64 postlegalizer combiner, which causes this
function to query the legalizer rules for the action for an invalid opcode/type
combination (G_AND and p0). Moving the legalizer query until after the validity
check in matchHoistLogicOpWithSameOpcodeHands() fixes this.
2023-02-26 15:42:57 -08:00
Noah Goldstein
e981e6d10e Add transform for (and/or (icmp eq/ne A,-1),(icmp eq/ne A,-1+C))->(and/or (icmp eq/ne (and ~A,-1+C),0))
This works of `-1+C` is a negative power of 2.

This can be more useful than the `AddAnd` case as `~A` does not
necessarily require materializing a constant. This makes the transform
worth it for X86 vector types.

Alive2 Links:
EQ: https://alive2.llvm.org/ce/z/P6u8cq
NE: https://alive2.llvm.org/ce/z/_Kkqp1

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D144284
2023-02-24 15:22:09 -06:00
Noah Goldstein
8c74c5402f Make (and/or (icmp eq/ne A,C0), (icmp eq/ne A,C1)) where IsPow(dif(C0,C1)) work for more patterns.
`(and/or (icmp eq/ne A,C0), (icmp eq/ne A,C1))` can be lowered to
`(icmp eq/ne (and (sub A, (smin C0, C1)), (not (sub (smax C0, C1), (smin C0, C1)))), 0)`
generically if `(sub (smax C0, C1), (smin C0,C1))` is a power of 2.

This covers the existing case of `(and/or (icmp eq/ne A, C_Pow2),(icmp eq/ne A, -C_Pow2))`
as well as other cases.

Alive2 Links:
EQ: https://alive2.llvm.org/ce/z/mLJiUW
NE: https://alive2.llvm.org/ce/z/TKnzUr

Differential Revision: https://reviews.llvm.org/D144283
2023-02-24 15:22:09 -06:00
Steve Merritt
750a6870eb [Codeview] Fix incorrect size determination for complex types.
In Codeview, the basic type of a complex represents the size
of an individual component rather than the sum of the real
and imaginary components.

Differential Revision: https://reviews.llvm.org/D143760
2023-02-24 09:20:52 -05:00
Serge Pavlov
7f81dd4dd6 [NFC] Make FPClassTest a bitmask enumeration
This is recommit of 2e416cdd52, fixed to be accepatble by GCC.
The original commit message is below.

With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.

Differential Revision: https://reviews.llvm.org/D144241
2023-02-24 15:12:16 +07:00
Jez Ng
865c2b0d15 [MC][nfc] Don't use a value after it has been std::move()'d
Reviewed By: serge-sans-paille

Differential Revision: https://reviews.llvm.org/D144662
2023-02-23 15:15:24 -05:00