37021 Commits

Author SHA1 Message Date
Matt Arsenault
f4598194b5
DAG: Fold bitcast of scalar_to_vector to anyext (#122660)
scalar_to_vector is difficult to make appear and test,
but I found one case where this makes an observable difference.
It fires more often than this in the test suite, but most of them
have no net result in the final code. This helps reduce regressions
in a future commit.
2025-01-13 19:38:58 +07:00
Oliver Stannard
e2a071ece5
[MachineCP] Correctly handle register masks and sub-registers (#122472)
When passing an instruction with a register mask, the machine copy
propagation pass was dropping the information about some copy
instructions which define a register which is preserved by the mask,
because that register overlaps a register which is partially clobbered
by it. This resulted in a miscompilation for AArch64, because this
caused a live copy to be considered dead.

The fix is to clobber register masks by finding the set of reg units
which is preserved by the mask, and clobbering all units not in that
set.
2025-01-13 09:55:08 +00:00
Akshat Oke
4f96fb5fb3
Reapply "Spiller: Detach legacy pass and supply analyses instead (#119181)" (#122665)
Makes Inline Spiller amenable to the new PM.

This reapplies commit a531800344dc54e9c197a13b22e013f919f3f5e1 reverted
because of two unused private members reported on sanitizer bots.
2025-01-13 14:14:13 +05:30
Daniel Paoliello
d997a722c1
Fix build break in MIRPrinter (#122630) 2025-01-11 21:56:59 -08:00
Daniel Paoliello
5ee0a71df9
[aarch64][win] Add support for import call optimization (equivalent to MSVC /d2ImportCallOptimization) (#121516)
This change implements import call optimization for AArch64 Windows
(equivalent to the undocumented MSVC `/d2ImportCallOptimization` flag).

Import call optimization adds additional data to the binary which can be
used by the Windows kernel loader to rewrite indirect calls to imported
functions as direct calls. It uses the same [Dynamic Value Relocation
Table mechanism that was leveraged on x64 to implement
`/d2GuardRetpoline`](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618).

The change to the obj file is to add a new `.impcall` section with the
following layout:
```cpp
  // Per section that contains calls to imported functions:
  //  uint32_t SectionSize: Size in bytes for information in this section.
  //  uint32_t Section Number
  //  Per call to imported function in section:
  //    uint32_t Kind: the kind of imported function.
  //    uint32_t BranchOffset: the offset of the branch instruction in its
  //                            parent section.
  //    uint32_t TargetSymbolId: the symbol id of the called function.
```

NOTE: If the import call optimization feature is enabled, then the
`.impcall` section must be emitted, even if there are no calls to
imported functions.

The implementation is split across a few parts of LLVM:
* During AArch64 instruction selection, the `GlobalValue` for each call
to a global is recorded into the Extra Information for that node.
* During lowering to machine instructions, the called global value for
each call is noted in its containing `MachineFunction`.
* During AArch64 asm printing, if the import call optimization feature
is enabled:
- A (new) `.impcall` directive is emitted for each call to an imported
function.
- The `.impcall` section is emitted with its magic header (but is not
filled in).
* During COFF object writing, the `.impcall` section is filled in based
on each `.impcall` directive that were encountered.

The `.impcall` section can only be filled in when we are writing the
COFF object as it requires the actual section numbers, which are only
assigned at that point (i.e., they don't exist during asm printing).

I had tried to avoid using the Extra Information during instruction
selection and instead implement this either purely during asm printing
or in a `MachineFunctionPass` (as suggested in [on the
forums](https://discourse.llvm.org/t/design-gathering-locations-of-instructions-to-emit-into-a-section/83729/3))
but this was not possible due to how loading and calling an imported
function works on AArch64. Specifically, they are emitted as `ADRP` +
`LDR` (to load the symbol) then a `BR` (to do the call), so at the point
when we have machine instructions, we would have to work backwards
through the instructions to discover what is being called. An initial
prototype did work by inspecting instructions; however, it didn't
correctly handle the case where the same function was called twice in a
row, which caused LLVM to elide the `ADRP` + `LDR` and reuse the
previously loaded address. Worse than that, sometimes for the
double-call case LLVM decided to spill the loaded address to the stack
and then reload it before making the second call. So, instead of trying
to implement logic to discover where the value in a register came from,
I instead recorded the symbol being called at the last place where it
was easy to do: instruction selection.
2025-01-11 21:30:17 -08:00
Austin Kerbow
657fb4433e
[AMDGPU] Add target hook to isGlobalMemoryObject (#112781)
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
2025-01-11 09:57:57 -08:00
David Green
ab9a80a3ad
[DAG] Allow AssertZExt to scalarize. (#122463)
With range and undef metadata on a call we can have vector AssertZExt
generated on a target with no vector operations. The AssertZExt needs to
scalarize to a normal `AssertZext tin, ValueType`. I have added
AssertSext too, although I do not have a test case.

Fixes #110374
2025-01-11 16:29:06 +00:00
Sergei Barannikov
a475ae05fb
Revert "[ADT] Fix specialization of ValueIsPresent for PointerUnion" (#122557)
Reverts llvm/llvm-project#121847

Causes compile time regressions and allegedly miscompilation.
2025-01-11 03:36:34 +03:00
Sergei Barannikov
7b05367943
[ADT] Fix specialization of ValueIsPresent for PointerUnion (#121847)
Two instances of `PointerUnion` with different active members and null
value compare unequal. Currently, this results in counterintuitive
behavior when using functions from `Casting.h`, e.g.:

```C++
  PointerUnion<int *, float *> U;
  // U = (int *)nullptr;
  dyn_cast<int *>(U); // Aborts
  dyn_cast<float *>(U); // Aborts
  U = (float *)nullptr;
  dyn_cast<int *>(U); // OK
  dyn_cast<float *>(U); // OK
```

`dyn_cast` should abort in all cases because the argument is null.
Currently, it aborts only if the first member is active. This happens
because the partial template specialization of `ValueIsPresent` for
nullable types compares the union with a union constructed from nullptr,
and the two unions compare equal only if their active members are the
same.

This patch changed the specialization of `ValueIsPresent` for nullable
types to make `isPresent()` return false for all possible null values of
a PointerUnion, and fixes two places where the old behavior was
exploited.

Pull Request: https://github.com/llvm/llvm-project/pull/121847
2025-01-10 16:43:19 +03:00
Simon Pilgrim
9b49da2b31
Revert 86b1b0671cafd "MachineVerifier: Check stack protector is top-most in frame" (#122444)
Reverts llvm/llvm-project#121481

This is causing build failures on EXPENSIVE_CHECKS builds:
https://lab.llvm.org/buildbot/#/builders/187/builds/3653
https://lab.llvm.org/buildbot/#/builders/16/builds/11758
2025-01-10 12:10:45 +00:00
Nikita Popov
e9e7b2adcf
[SDAG] Set IsPostTypeLegalization flag in LegalizeDAG (#122278)
This runs after type legalization and as such should set
IsPostTypeLegalization when creating libcalls. I don't think this makes
any observable difference right now, but I ran into this issue in an
upcoming patch.
2025-01-10 12:25:36 +01:00
Guy David
86b1b0671c
MachineVerifier: Check stack protector is top-most in frame (#121481)
Somewhat paranoid, but mitigates potential bugs in the future that might
place it elsewhere and render the mechanism useless.
2025-01-10 10:33:02 +02:00
Nikita Popov
eeac0ffaf4 Revert "[MachineLICM] Use RegisterClassInfo::getRegPressureSetLimit (#119826)"
This reverts commit b4e17d4a314ed87ff6b40b4b05397d4b25b6636a.

This causes a large compile-time regression.
2025-01-10 09:05:06 +01:00
Akshat Oke
089555095b
Revert "Spiller: Detach legacy pass and supply analyses instead (#119… (#122426)
…181)"

This reverts commit a531800344dc54e9c197a13b22e013f919f3f5e1.
2025-01-10 12:23:07 +05:30
Akshat Oke
a531800344
Spiller: Detach legacy pass and supply analyses instead (#119181)
Makes Inline Spiller amenable to the new PM.
2025-01-10 11:46:56 +05:30
Mingming Liu
a6aa9365f7
[NFC][AsmPrinter] Pass MJTI by const reference instead of const pointer (#122365)
The caller `AsmPrinter::emitJumpTableInfo` checks [1] `MJTI` is not a
null pointer before calling `emitJumpTableEntry` or
`emitJumpTableSizesSection`.

This patch updates callee function's signature to accept const
reference, this way it's explicit `MJTI` won't be nullptr inside the
callee.

[1]
9d5299eb61/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (L2857)
2025-01-09 15:57:32 -08:00
Craig Topper
e2449f1bce
[SelectionDAG] Use SDNode::op_iterator instead of SDNodeIterator. NFC (#122147)
I think SDNodeIterator primarily exists because GraphTraits requires an
iterator that dereferences to SDNode*. op_iterator dereferences to
SDUse* which is implicitly convertible to SDValue.

This piece of code can use SDValue instead of SDNode* so we should
prefer to use the the more common op_iterator.
2025-01-09 09:09:55 -08:00
Pengcheng Wang
b4e17d4a31
[MachineLICM] Use RegisterClassInfo::getRegPressureSetLimit (#119826)
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of
`TargetRegisterInfo::getRegPressureSetLimit` with some logics to
adjust the limit by removing reserved registers.

It seems that we shouldn't use
`TargetRegisterInfo::getRegPressureSetLimit`
directly, just like the comment "This limit must be adjusted
dynamically for reserved registers" said.

Separate from https://github.com/llvm/llvm-project/pull/118787
2025-01-09 21:05:52 +08:00
Nicholas Guy
1b2943534f
[llvm] Fix crash caused by reprocessing complex reductions (#122077)
If a complex pattern had the shape of both a complex->complex reduction
and a complex->single reduction, the matching would recognise both and
deem the graph a valid transformation. Preventing this reprocessing
results in only one of these matching, meaning that in the case of an
invalid graph, we don't try to transform it anyway.
2025-01-09 08:31:57 +00:00
Hubert Tong
e438513f2e
[AIX][AsmPrinter] Fix unsigned subtraction wrap-around (#122214)
Unsigned subtraction wrap-around occurs in `emitGlobalConstantImpl` on
an AIX-specific code path from 8e4423eb0888 when a structure type has
zero elements.

With assertions enabled, this manifests as:
```
TypeSize llvm::StructLayout::getElementOffset(unsigned int) const: Assertion `Idx < NumElements && "Invalid element idx!"' failed.
```
2025-01-09 00:07:57 -04:00
Alexander Yermolovich
fce0314c38
[LLVM][DWARF] Create debug names entry for non-tu top level DIE (#121856)
When creating a Type Unit (TU), LLVM attempts to do so optimistically.
However, if this fails, it discards the TU state and creates the TU
within the Compilation Unit (CU). In such cases, an entry for the
top-level DIE is not created in the debug names table.
This can cause issues when running llvm-dwarfdump --debug-names
--verify, as the missing entry will result in verification failure.
To address this issue, this patch adds a call to the
updateAcceleratorTables when TU creation fails. This ensures that the
debug names table is updated correctly, even in cases where TU creation
fails.
2025-01-08 17:08:35 -08:00
Mikhail Gudim
f37bee1d92
[ReachingDefAnalysis][NFC] Rename PhysReg to Reg. (#122112)
This is in order to prepare for future MR where we will extend
`ReachingDefAnalysis` to stack slots.
2025-01-08 10:00:41 -05:00
Ryan Mansfield
67efbd0bf1
[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955) 2025-01-08 11:07:23 +01:00
abhishek-kaushik22
366e62a0cb
[X86] Combine uitofp <v x i32> to <v x half> (#121809)
Closes #121793
2025-01-08 16:49:29 +08:00
Sander de Smalen
82ec2d6aa4
[Coalescer] Consider NewMI's subreg index when updating lanemask. (#121780)
The code added in #116191 that updated the lanemasks for rematerialized
values checked if `DefMI`'s destination register had a subreg index.
This seems to have missed the following case:

```
  %0:gpr32 = MOVi32imm 1
  %1:gpr64 = SUBREG_TO_REG 0, %0:gpr32, %subreg.sub_32
```

which during rematerialization would have the following variables set:

```
  DefMI = %0:gpr32 = MOVi32imm 1

  NewMI = %3.sub_32:gpr64 = MOVi32imm 1   (rematerialized value)
```

When checking whether the lanemasks need to be generated, considering
whether DefMI's destination has a subreg index is insufficient, we
should look at DefMI's subreg index instead.

The added tests are a bit more involved, because I was not able to
reconstruct the issue without having some control flow in the test.
These tests come from actual reproducers.
2025-01-07 15:06:00 +00:00
Simon Pilgrim
1332db36ee [DAG] TransformFPLoadStorePair - early out if we're not loading a simple type
Its never going to transform into a legal integer type, so just bail - noticed while triaging the assertion reported in #121784
2025-01-07 13:37:23 +00:00
Sander de Smalen
5514865147
[Coalescer] Move code added in #116191 (#121779)
By moving the code a bit later, we can factor out some of the conditions
as those are now already tested.
This will also be useful when adding another fix on top that uses
`NewMI`'s subreg index (to follow as a separate PR).

The change is intended to be NFC.
2025-01-07 09:57:18 +00:00
Matt Arsenault
8c0483bba2
RegisterCoalescer: Fix assert on remat to copy-to-physreg with subregs (#121734)
Do not try to rematerialize a super-register def used by a subregister
extract copy into a copy to a physical register if the other pieces of
the
full physreg are live at the rematerialization point. It would insert
the
super-register def at the rematerialization point, and assert since the
other half of the register was already live.

This is analagous to the undef subregister def handling above,
which handled the virtual register case.

Fixes #120970
2025-01-07 12:22:23 +07:00
Simon Pilgrim
923675193b [DAG] VectorLegalizer::ExpandUINT_TO_FLOAT- pull out repeated getValueType calls. NFC. 2025-01-06 18:49:51 +00:00
Simon Pilgrim
112793a90e [DAG] expandUINT_TO_FP - use getShiftAmountConstant helper. NFC.
Don't bother with separate getShiftAmountTy/getConstant calls.
2025-01-06 18:49:50 +00:00
Amara Emerson
2d53eaff4a
[AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores.
This case is different from the earlier <8 x i1> case handled because it triggers
a legalization failure in lowerStore() that's intended for scalar code.

It also was triggering incorrect bitcast actions in the AArch64 rules that weren't
expecting truncating stores.

With these two fixed, more cases are handled. The code is still bad, including
some missing load promotion in our combiners that result in dead stores hanging
around at the end of codegen. Again, we can fix these in separate changes.

Reviewers: davemgreen, madhur13490, topperc, arsenm

Reviewed By: davemgreen

Pull Request: https://github.com/llvm/llvm-project/pull/121185
2025-01-06 10:22:48 -08:00
Amara Emerson
6b0807fe2b
[AArch64][GlobalISel] Add support for lowering trunc stores of vector bools.
This is essentially a port of TargetLowering::scalarizeVectorStore(), which
is used for the case where we have something like a store of <8 x s8> truncating
to <8 x s1> in memory. The naive lowering is a sequence of extracts to compute
a scalar value to store.

AArch64's DAG implementation has some more smarts to improve this further which
we can do later.

Reviewers: topperc, davemgreen

Pull Request: https://github.com/llvm/llvm-project/pull/121169
2025-01-06 10:21:42 -08:00
Matt Arsenault
93220e7e06
RegAllocGreedy: Fix use after free during last chance recoloring (#120697)
Last chance recoloring can delete the current fixed interval
during recursive assignment of interfering live intervals. Check
if the virtual register value was assigned before attempting the
unassignment, as is done in other scenarios. This relies on the fact
that we do not recycle virtual register numbers.

I have only seen this occur in error situations where the allocation
will fail, but I think this can theoretically happen in working
allocations.

This feels very brute force, but I've spent over a week debugging
this and this is what works without any lit regressions. The surprising
piece to me was that unspillable live ranges may be spilled, and
a number of tests rely on optimizations occurring on them. My other
attempts to fixed this mostly revolved around not identifying unspillable
live ranges as snippet copies. I've also discovered we're making some
unproductive live range splits with subranges. If we avoid such splits,
some of the unspillable copies disappear but mandating that be precise
to fix a use after free doesn't sound right.
2025-01-06 23:12:55 +07:00
Phoebe Wang
1547382033
[X86] Support lowering of FMINIMUMNUM/FMAXIMUMNUM (#121464) 2025-01-06 21:28:58 +08:00
Nicholas Guy
8e1b49c38e
Complex deinterleaving/single reductions build fix Reapply "Add support for single reductions in ComplexDeinterleavingPass (#112875)" (#120441)
This reverts commit 76714be5fd4ace66dd9e19ce706c2e2149dd5716, fixing the
build failure that caused the revert.

The failure stemmed from the complex deinterleaving pass identifying a
series of add operations as a "complex to single reduction", so when it
tried to transform this erroneously identified pattern, it faulted. The
fix applied is to ensure that complex numbers (or patterns that match
them) are used throughout, by checking if there is a deinterleave node
amidst the graph.
2025-01-06 09:59:32 +00:00
Amara Emerson
41ebbed280
[AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack.
Reviewers: davemgreen, topperc, arsenm

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/121171
2025-01-05 21:32:27 -08:00
Amara Emerson
7e3180a2c2
[AArch64][GlobalISel] Add support for widening vector store elements to s8.
Reviewers: topperc, arsenm, davemgreen

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/121170
2025-01-05 21:31:34 -08:00
Matt Arsenault
d34f7ead88
DAG: Fix assuming f16 is the only 16-bit fp type in concat vector combine (#121637)
This would see if there are mixed integer and FP types and pick an
equivalently sized FP type to use as the vector element type, and only
cast if there were mixed integers. We need to insert a cast if the types
are mixed, which may include different FP types.

Fixes #121601
2025-01-06 10:38:54 +07:00
Craig Topper
e32afded92
[LegalizeVectorOps] Use getBoolConstant instead of getAllOnesConstant in VectorLegalizer::UnrollVSETCC. (#121526)
This code should follow the target preference for boolean contents of a
vector type. We shouldn't assume that true is negative one.
2025-01-03 10:46:37 -08:00
Craig Topper
a4e47586b9
[ExpandMemCmp] Recognize canonical form of (icmp sle/sge X, 0) in getMemCmpOneBlock. (#121540)
This code recognizes special cases where the result of memcmp is
compared with 0. If the compare is sle/sge, then InstCombine
canonicalizes to (icmp slt X, 1) or (icmp sgt X, -1). We should
recognize those patterns too.
2025-01-03 10:23:13 -08:00
Craig Topper
715dcb2310
[ExpandMemCmp] Use m_SpecificInt to simplify code. NFC (#121532) 2025-01-03 09:19:54 -08:00
Craig Topper
4dfea22e77
[ExpandMemCmp][AArch64][PowerPC][RISCV][X86] Use llvm.ucmp instead of (sub (zext (icmp ugt)), (zext (icmp ult))). (#121530)
AArch64 and PowerPC look like a improvements.
RISC-V is neutral.
X86 trades a dependency breaking xor before a seta for a movsx after a
sbbb. Depending on how the result is used, this movsx might go away.
2025-01-03 09:19:32 -08:00
Acim Maravic
9d6527bc12
[CodeGen] Add MOTargetFlag4 to MachineMemOperand Flags (#120136) 2025-01-03 15:45:52 +01:00
Min-Yih Hsu
3cac26f541
[GISel] Combine (neg (min/max x, (neg x))) into (max/min x, (neg x)) (#120998)
This is the GISel version of #120666. Also supports both unsigned and
signed version of min & max.
2025-01-02 16:29:34 -08:00
Min-Yih Hsu
2291d0aba9
[DAGCombiner] Turn (neg (max x, (neg x))) into (min x, (neg x)) (#120666)
This pattern was originally spotted in 429.mcf by @topperc.

We already have a DAGCombiner pattern to turn `(neg (abs x))` into `(min
x, (neg x))`. But in some cases `(neg (max x, (neg x)))` is formed by an
expanded `abs` followed by a `neg` that is generated only after the
`abs` expansion. This patch adds a separate pattern to match cases like
this, as well as its inverse pattern: `(neg (min X, (neg X))) --> (max
X, (neg X))`.

This pattern is applicable to both signed and unsigned min/max.
2025-01-02 16:28:55 -08:00
Jay Foad
1849244685
[CodeGen] Remove atEnd method from defusechain iterators (#120610)
This was not used much and there are better ways of writing it.
2025-01-02 17:29:55 +00:00
Matt Arsenault
11e482c4a3
RegAllocGreedy: Add dummy priority advisor for writing MIR tests (#121207)
I regularly struggle reproducing failures in greedy due to changes
in priority when resuming the allocation from MIR vs. a complete
compilation starting at IR. That is, the fix in
e0919b189bf2df4f97f22ba40260ab5153988b14 did not really fix the
problem of the instruction distance mattering.

Add a way to bypass all of the priority heuristics for MIR tests,
by prioritizing only by virtual register number. Could also
give this a more specific name, like PrioritizeLowVirtRegNumber
2025-01-02 23:04:44 +07:00
Akshat Oke
50054ba2f4
[CodeGen] LiveRegMatrix: Use allocator through a unique_ptr (#120556)
`LIU::Matrix` holds on to a pointer to the allocator in LiveRegMatrix and is left hanging when the allocator moves with the LiveRegMatrix.

This extends the lifetime of the allocator so that it does not get destroyed when moving a LiveRegMatrix object.
2025-01-01 14:54:08 +05:30
Vikash Gupta
283806695a
[GlobalIsel] Add combine for select with constants (#121088)
The SelectionDAG Isel supports the both version of combines mentioned
below :
```
select Cond, Pow2, 0 --> (zext Cond)  << log2(Pow2) 
select Cond, 0, Pow2 --> (zext !Cond) << log2(Pow2)
```
The GlobalIsel for now only supports the first one defined in it's
generic combinerHelper.cpp. This patch adds the missing second one.
2025-01-01 11:14:53 +05:30
Simon Pilgrim
b3a7ab6f1f [DAG] Don't allow implicit truncation in extract_element(bitcast(scalar_to_vector(X))) -> trunc(srl(X,C)) fold
Limits #117900 to only fold when scalar_to_vector doesn't perform implicit truncation, as the scaled shift calculation doesn't currently account for this - this can be addressed in a future update.

Fixes #121306
2024-12-30 16:08:35 +00:00