33876 Commits

Author SHA1 Message Date
Nikita Popov
fe4dbbb467 [DAGCombiner] Fold add (mul x, C), x to mul x, C+1
While this is normally non-canonical IR, this pattern can appear
during SDAG lowering if the add is actually a getelementptr, as
illustrated in `@test_ptr`. This pattern comes up when doing
provenance-aware high-bit pointer tagging.

Proof: https://alive2.llvm.org/ce/z/DLoEcs

Fixes https://github.com/llvm/llvm-project/issues/62093.

Differential Revision: https://reviews.llvm.org/D148341
2023-04-17 12:33:46 +02:00
Akshay Khadse
8bf7f86d79 Fix uninitialized pointer members in CodeGen
This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D148303
2023-04-17 16:32:46 +08:00
Wang, Xin10
9fa721c7c6 remove useless call in MIRSampleProfile.cpp
This call getSummary returns a value but nobody take
it.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D148305
2023-04-17 04:14:28 -04:00
NAKAMURA Takumi
077a2a4bcd [CMake] Cleanup deps 2023-04-17 00:38:49 +09:00
Kazu Hirata
972983539b [llvm] Apply fixes from readability-redundant-control-flow (NFC) 2023-04-16 00:13:46 -07:00
Sergei Barannikov
38d84e3d76 [GISel] Legalize G_FSUB to G_FADD + G_FNEG even if G_FNEG is illegal
`G_FNEG` used to be legalized to `G_FSUB -0, x` causing infinite loop.
This is no longer the case after D84287.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D148187
2023-04-15 08:11:49 +03:00
Daniel Hoekwater
6b62166b4c Account for PATCHABLE instrs in Branch Relaxation
PATCHABLE_* instructions expand to up to 36-byte
sleds. Updating the size of PATCHABLE instructions
causes them to be outlined, so we need to add a
check to prevent the outliner from considering
basic blocks that contain PATCHABLE instructions.

Differential Revision: https://reviews.llvm.org/D147982
2023-04-14 16:14:50 -07:00
Bjorn Pettersson
40c60c025c [Passes] Remove the legacy DemandedBitsWrapperPass
Last user of DemandedBitsWrapperPass was the BDCE pass. Since
the legacy PM version of BDCE was removed in an earlier commit, this
patch removes the now unused DemandedBitsWrapperPass.

Differential Revision: https://reviews.llvm.org/D148336
2023-04-14 18:56:20 +02:00
Nikita Popov
62ef97e063 [llvm-c] Remove PassRegistry and initialization APIs
Remove C APIs for interacting with PassRegistry and pass
initialization. These are legacy PM concepts, and are no longer
relevant for the new pass manager.

Calls to these initialization functions can simply be dropped.

Differential Revision: https://reviews.llvm.org/D145043
2023-04-14 12:12:48 +02:00
Aiden Grossman
35714e3a9c [MLGO] Change MBB Profile Dump from using MBB numbers to MBB IDs
Currenty, setting the -mbb-profile-dump dumps a CSV file with blocks
inside an individual function identified by their MBB numbers. This
patch changes the MBBs to be identified by their ID which is set at MBB
creation and not changed afterwards, making it inherently stable
throughout the backend. This alleviates concerns with the MBB IDs
changing between the profile dump and what ends up in the final object
file. The MBBs inside the SHT_LLVM_BB_ADDR_MAP sections are also
identified using their MBB ID rather than number, so if we want to match
them up we need to identify the MBBs here by number.

Reviewed By: mtrofin, rahmanl

Differential Revision: https://reviews.llvm.org/D147366
2023-04-14 07:04:07 +00:00
Nick Desaulniers
fc4494dffa [StackProtector] don't check stack protector before calling nounwind functions
https://reviews.llvm.org/rGd656ae28095726830f9beb8dbd4d69f5144ef821
introduced a additional checks before calling noreturn functions in
response to this security paper related to Catch Handler Oriented
Programming (CHOP):
https://download.vusec.net/papers/chop_ndss23.pdf
See also:
https://bugs.chromium.org/p/llvm/issues/detail?id=30

This causes stack canaries to be inserted in C code which was
unexpected; we noticed certain Linux kernel trees stopped booting after
this (in functions trying to initialize the stack canary itself).
https://github.com/ClangBuiltLinux/linux/issues/1815

There is no point checking the stack canary like this when exceptions
are disabled (-fno-exceptions or function is marked noexcept) or for C
code.  The GCC patch for this issue does something similar:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a25982ada523689c8745d7fb4b1b93c8f5dab2e7

Android measured a 2% regression in RSS as a result of d656ae280957 and
undid it globally:
https://android-review.googlesource.com/c/platform/build/soong/+/2524336

Reviewed By: xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D147975
2023-04-13 09:37:06 -07:00
sgokhale
bb5befefc6 Revert "[CodeGen][ShrinkWrap] Split restore point"
This reverts commit 5f0bccc3d1a74111458c71f009817c9995f4bf83.

An issue has been reported here: https://github.com/ClangBuiltLinux/linux/issues/1833
2023-04-13 10:52:28 +05:30
Anshil Gandhi
6530bd3030 [BranchRelaxation] Correct JumpToFT value
Toggle true/false values of the JumpToFallThrough
parameter to simplify code and make it consistent
with the documentation for the `getFallThrough(..)`
method.

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D148139
2023-04-12 23:21:20 -06:00
Amara Emerson
719024a0d0 [GlobalISel][NFC] Add MachineInstr::getFirst[N]{Regs,LLTs}() helpers to extract regs & types.
These reduce the typing and clutter from:
Register Dst = MI.getOperand(0).getReg();
Register Src1 = MI.getOperand(1).getReg();
Register Src2 = MI.getOperand(2).getReg();
Register Src3 = MI.getOperand(3).getReg();
LLT DstTy = MRI.getType(Dst);
... etc etc

To just:
auto [Dst, Src1, Src2, Src3] = MI.getFirst4Regs();
auto [DstTy, Src1Ty, Src2Ty, Src3Ty] = MI.getFirst4LLTs();

Or even more concise:
auto [Dst, DstTy, Src1, Src1Ty, Src2, Src2Ty, Src3, Src3Ty] =
     MI.getFirst4RegLLTs();

Differential Revision: https://reviews.llvm.org/D144687
2023-04-12 16:43:14 -07:00
Amara Emerson
29c851f4e2 [GlobalISel] Move the truncstore_merge combine to the LoadStoreOpt pass and add support for an extra case.
If we have set of mergeable stores of shifts, but the original source value being shifted
is wider than the merged size, we should still be able to merge if we truncate first. To do this
however we need to search for stores speculatively up the block, without knowing exactly how
many stores we should see before we stop. The old algorithm has to match an exact number of
stores to fit the wide type, or it dies. The new one will try to set the wide type to however
many stores we found in the upwards block traversal and use later checks to verify if they're
a valid mergeable set.

The reason I need to move this to LoadStoreOpt is because the combiner works going top down
inside a block, which means that we end up doing partial merges because we haven't seen all
the possible stores before we mutate the MIR. In LoadStoreOpt we can go bottom up.

As a side effect of this change, we also end up doing better on an existing test case (missing_store)
since we manage to do a partial merge there.
2023-04-12 16:43:14 -07:00
Archibald Elliott
17cd511007 [DAGCombiner] Fix (shl (ctlz x) n) for non-power-of-two Data
This DAGCombine is not valid for some combinations of the known bits
of x and non-power-of-two widths of x. As shown in the bug:
- The bitwidth of x is 35 (n=5)
- The unknown bits of x is only the least significant bit
- This gives the result of the ctlz two possible values: 34 or 35, both
  of which will give 1 when left-shifted 5 bits.
- So the `eor x, 1` that this optimisation would give is not correct.

A similar instcombine optimisation is only applied when the width of x is
a power-of-two. GlobalISel does not have this bug, as shown by the testcase.

Fixes #61549

Differential Revision: https://reviews.llvm.org/D147518
2023-04-12 17:38:39 +01:00
Wang, Xin10
b00fc5ac99 Fix Mem leak in LLVMTargetMachine.cpp
If we go to line 302, with one of MCE or MAB is not nullptr, then we could
leak mem here.
Use unique_ptr to maintain these 2 pointer can avoid it.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D148003
2023-04-12 05:27:04 -04:00
Ellis Hoag
244be0b0de [InstrProf] Temporal Profiling
As described in [0], this extends IRPGO to support //Temporal Profiling//.

When `-pgo-temporal-instrumentation` is used we add the `llvm.instrprof.timestamp()` intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function `INSTR_PROF_PROFILE_SET_TIMESTAMP()`. A new field in the `llvm_prf_cnts` section stores each function's timestamp. Then in `llvm-profdata merge` we convert these function timestamps into a //trace// and add it to the indexed profile.

Since these traces could significantly increase the profile size, we've added `-max-temporal-profile-trace-length` and `-temporal-profile-trace-reservoir-size` to limit the length of a trace and the number of traces in a profile, respectively.

In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup.

Special thanks to Julian Mestre for helping with reservoir sampling.

[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D147287
2023-04-11 08:30:52 -07:00
Amaury Séchet
91105df3df [DAG] Peek through zext/trunc when matching (or (and X, (not Y)), Y).
This shows up in the wild, notably as a regression in D127115 .

Depends on D147821

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D147827
2023-04-11 13:48:22 +00:00
Felipe de Azevedo Piovezan
b2020fe3aa [DbgHistoryCalculator] Improve debug messages
I've found that a frequent source of debug information loss in optimized
code is due to DEBUG_VALUE intrinsics in a position of the instruction
stream that is outside the scope of the variable it describes.

Tracking these is pretty difficult with the existing debug messages of
the history calculator; this patch addresses the issue by making it
obvious when this event happens.

Differential Revision: https://reviews.llvm.org/D147718
2023-04-11 08:41:29 -04:00
Amaury Séchet
9041e1fa29 [DAG] Peek through zext/trunc in haveNoCommonBitsSet.
This limitation was discovered thanks to some regression in D127115 .

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D147821
2023-04-11 11:44:15 +00:00
Momchil Velikov
4ac6f99ae0 [LiveInterval] Fix live range overlap check
Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D145707
2023-04-11 11:11:30 +01:00
Alexis Engelke
8e59fe2d8e [FastISel] Correctly report prototype on miss
The type of a function is nowadays just an opaque pointer, which is not
helpful when analyzing FastISel misses. Instead print the actual
function type of the function.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D147716
2023-04-11 11:49:08 +02:00
sgokhale
5f0bccc3d1 [CodeGen][ShrinkWrap] Split restore point
This patch splits a restore point to allow it to only post-dominate blocks reachable by use
or def of CSRs(Callee Saved Registers)/FI(Frame Index).

Benchmarking this on SPEC2017, this gives around 4% improvement on povray and no significant change
for others.

Co-authored-by: junbuml

Differential Revision: https://reviews.llvm.org/D42600
2023-04-11 11:58:50 +05:30
Kazu Hirata
63c4967352 Use APInt::getOneBitSet (NFC) 2023-04-10 18:19:17 -07:00
Bing1 Yu
87c1ed5385 Change dyn_cast to cast
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D147923
2023-04-11 00:14:39 +08:00
wangpc
267708f9d5 [MachineOutliner] Add IsOutlined to MachineFunction
We add a field `IsOutlined` to indicate whether a MachineFunction
is outlined and set it true for outlined functions in MachineOutliner.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D146191
2023-04-10 10:57:29 +08:00
Kazu Hirata
c121f3a9fb [CodeGen] Use range-based for loops (NFC) 2023-04-08 16:22:39 -07:00
Nathan Lanza
87c0f67739 [Outliner] Add an option to only enable outlining of patterns above a certain threshold
Outlining isn't always a win when the saved instruction count is >= 1.
The overhead of representing a new function in the binary depends on
exception metadata and alignment. So parameterize this for local tuning.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D136774
2023-04-08 02:12:40 -04:00
Luo, Yuanke
9db75b23bd [Coverity] Initialize pointer memeber. 2023-04-06 17:29:53 +08:00
Valery Pykhtin
e09b33feec [CodeGen] Speedup stack slot sharing during stack coloring (interval overlapping test).
AMDGPU code with enabled address sanitizer generates tons of stack objects (> 200000 in my testcase) and
takes forever to compile due to the time spent on stack slot sharing.

While LiveRange::overlaps method has logarithmic complexity on the number of segments in the involved
liveranges the problem is that when a new interval is assigned to a used color it's tested against
overlapping every other assigned interval for that color.

Instead I decided to join all assigned intervals for a color into a single interval and this allows to
have logarithmic complexity on the number of segments for the joined interval.

This patch reduced time spent on stack slot coloring pass from 628 to 3 seconds on my testcase.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D146057
2023-04-06 07:23:45 +02:00
Craig Topper
f1924d965a [SelectionDAG] Expand VP SDNodes by default.
Differential Revision: https://reviews.llvm.org/D147643
2023-04-05 18:52:28 -07:00
Matt Arsenault
7907fd4961 RegAllocFast: Fix dropping subreg indexes on unassigned subreg defs
This was assuming all register operands were assigned to physical registers.
This should ignore the operands which weren't assigned in this run.

Fixes #61134
2023-04-05 18:25:51 -04:00
Felipe de Azevedo Piovezan
79a1e32915 [GlobalISel] Improve stack slot tracking in dbg.values
For IR like:

```
%alloca = alloca ...
dbg.value(%alloca, !myvar, OP_deref(<other_ops>))
```

GlobalISel lowers it to MIR:

```
%some_reg = G_FRAME_INDEX <stack_slot>
DBG_VALUE %some_reg, !myvar, OP_deref(<other_ops>)
```

In other words, if the value of `!myvar` can be obtained by
dereferencing an alloca, in MIR we say that the _location_ of a variable
is obtained by dereferencing register %some_reg (plus some
`<other_ops>`).

We can instead remove the use of `%some_reg`: the location of `!myvar`
_is_ `<stack_slot>` (plus some `<other_ops>`). This patch implements
this transformation, which improves debug information handling in O0, as
these registers hardly ever survive register allocation.

A note about testing: similar to what was done in D76934
(f24e2e9eebde4b7a1d), this patch exposed a bug in the Builder class when
using `-debug`, where we tried to print an incomplete instruction. The
changes in `MachineIRBuilder.cpp` address that.

Differential Revision: https://reviews.llvm.org/D147536
2023-04-05 08:21:00 -04:00
Sven van Haastregt
5af5ac4e3e Update mentions of reduction intrinsics; NFC
The intrinsics have been out of experimental since 322d0afd875d
("[llvm][mlir] Promote the experimental reduction intrinsics to be
first class intrinsics.", 2020-10-07); update some places that still
referred to them as experimental.
2023-04-05 11:49:41 +01:00
Dinar Temirbulatov
7f05bdf4ee [AArch64][SME] Fix an infinite loop in DAGCombine related to adding -force-streaming-compatible-sve flag.
Compiler hits infinite loop in DAGCombine. For force-streaming-compatible-sve
mode we have custom lowering for 128-bit vector splats and later in
DAGCombiner::SimplifyVCastOp() we scalarized SPLAT because we have custom
lowering for SME. Later, we restored SPLAT opertion via performMulCombine().
2023-04-05 10:10:55 +00:00
OCHyams
93c194fc9f [Assignment Tracking] Ignore zero-sized fragments
Such dbg.assigns will occur if you write zero-sized memcpys (see
https://reviews.llvm.org/D146987#4240016).

Handle this in AssignmentTrackingAnalysis (back end) rather than
AssignmentTrackingPass (declare-to-assign) in case it is possible to reproduce
this as a result of optimisations.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D147435
2023-04-05 09:31:23 +01:00
Hongtao Yu
5b461d5ec1 [FS-AFDO] Assign discriminators to pseudo probes
This is the first change for FS-AFDO integration with CSSPGO. There are more patches coming.

With pseudo probes, we do not assign FS discriminators to any other instructions since we will be using only probes for profile correlation.

Also call instructions are excluded since their dwarf discriminators are used for other purposes, i.e, storing probe ids. Since they are not getting a FS discriminator, they will also be excluded from MIR profile loading. The corresponding changes will be in the subsequent patches.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D147286
2023-04-04 17:04:37 -07:00
Hans Wennborg
91beab69cd Revert "Recommit DwarfEHPrepare: insert extra unwind paths for stack protector to instrument"
This broke Objective-C autorelease / retainAutoreleasedReturnValue, see
comments on the code review.

> This is a mitigation patch for
> https://bugs.chromium.org/p/llvm/issues/detail?id=30, where existing stack
> protection is skipped if a function is returned through by an unwinder rather
> than the normal call/return path. The recent patch D139254 added the ability to
> instrument a visible unwind path, at least in the IR case (I'm working on the
> SelectionDAG instrumentation too) but there are still invisible unwinds it
> can't reach.
>
> So this patch adds logic to DwarfEHPrepare that goes through a function,
> converting any call that might throw into an invoke to a simple resume cleanup,
> and adding cleanup clauses to existing landingpads that lack them. Obviously we
> don't really want to do this if it's wasted effort, so I also exposed
> requiresStackProtector from the actual StackProtector code to skip the extra
> paths if they won't be used.
>
> Changes:
>   * Move test to AArch64 directory as it relies on target presence.
>   * Re-add Dominator-tree maintenance. Accidentally cherry-picked wrong patch.
>   * Skip adding paths on Windows EH functions.
>
> https://reviews.llvm.org/D143637

This reverts commit 2d690684f66fabc9ac6a2c70fcff3b31c9520794.
2023-04-04 18:09:26 +02:00
Jay Foad
5509a18b5a [MachineVerifier] Try harder to verify SlotIndexes
Verify the SlotIndexes analysis after a pass that claims to preserve it,
even if there are no further passes (apart from the verifier itself)
that would use the analysis.

Differential Revision: https://reviews.llvm.org/D129201
2023-04-04 15:23:36 +01:00
Simon Pilgrim
00e3ae4471 [CodeGen] ExpandReductions - add reduce_and/or(<X x i1> V) -> icmp(iX bitcast(<X x i1> V)) canonicalization
This already exists in InstCombine but was missing from the late stage ExpandReductions pass

Fixes #53419
Fixes #61923

Differential Revision: https://reviews.llvm.org/D147452
2023-04-04 11:19:35 +01:00
Craig Topper
65f3794111 [SelectionDAG] Use MemVT for FoldingSetNodeID in SelectionDAG::getLoadVP.
Return types and operands are put in the ID by AddNodeIDNode. I'm
pretty sure this was supposed to be the memory VT.
2023-04-03 15:15:48 -07:00
Craig Topper
de92a20131 [SelectionDAG] Move variable declaration to its first assignment. NFC
We declared this variable and assigned it to true, but then overwrote
it before its first use.
2023-04-03 14:03:05 -07:00
Craig Topper
bb64fd571b [SelectionDAGBuilder] Use SmallVectorImpl& for function arguments. NFC
Make the reference const since we aren't modifying the vectors.
2023-04-03 14:03:05 -07:00
Jun Zhang
7657e50fef
[DAGCombiner] Fold avg(x, x) --> x
Signed-off-by: Jun Zhang <jun@junz.org>

Differential Revision: https://reviews.llvm.org/D147404
2023-04-03 16:57:50 +08:00
Craig Topper
b5f207e5b2 [SelectionDAG] Rename Flag->Glue. NFC 2023-04-02 19:46:51 -07:00
Simon Pilgrim
2434c8fcf9 [DAG] canCreateUndefOrPoison - add ISD::INSERT_VECTOR_ELT handling
If the inserted element index is guaranteed to be inbounds then a ISD::INSERT_VECTOR_ELT will not create poison/undef.
2023-04-02 16:28:26 +01:00
Simon Pilgrim
8153b92d9b [DAG] Add SelectionDAG::SplitScalar helper
Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source.

Differential Revision: https://reviews.llvm.org/D147264
2023-03-31 18:35:40 +01:00
David Green
7b6fae42f7 [InterleaveAccess] Check that binop shuffles have an undef second operand
It is expected that shuffles that we hoist through binops only have a single
vector operand, the other being undef/poison. The checks for
isDeInterleaveMaskOfFactor check that all the elements come from inside the
first vector, but with non-canonical shuffles the second operand could still
have a value. Add a quick check to make sure it is UndefValue as expected, to
make sure we don't run into problems with BinOpShuffles not using BinOps.

Fixes #61749

Differential Revision: https://reviews.llvm.org/D147306
2023-03-31 15:38:27 +01:00
Qiongsi Wu
f624372ccb [AIX][CodeGen] Renaming mroptr to xcoff-mroptr
This patch renames the `mroptr` option to `mxcoff-roptr` to indicate in the option itself that it is xcoff specific.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D147161
2023-03-31 10:09:48 -04:00