llvm-project

Author	SHA1	Message	Date
Matt Arsenault	f4598194b5	DAG: Fold bitcast of scalar_to_vector to anyext (#122660 ) scalar_to_vector is difficult to make appear and test, but I found one case where this makes an observable difference. It fires more often than this in the test suite, but most of them have no net result in the final code. This helps reduce regressions in a future commit.	2025-01-13 19:38:58 +07:00
Oliver Stannard	e2a071ece5	[MachineCP] Correctly handle register masks and sub-registers (#122472 ) When passing an instruction with a register mask, the machine copy propagation pass was dropping the information about some copy instructions which define a register which is preserved by the mask, because that register overlaps a register which is partially clobbered by it. This resulted in a miscompilation for AArch64, because this caused a live copy to be considered dead. The fix is to clobber register masks by finding the set of reg units which is preserved by the mask, and clobbering all units not in that set.	2025-01-13 09:55:08 +00:00
Akshat Oke	4f96fb5fb3	Reapply "Spiller: Detach legacy pass and supply analyses instead (#119181 )" (#122665 ) Makes Inline Spiller amenable to the new PM. This reapplies commit a531800344dc54e9c197a13b22e013f919f3f5e1 reverted because of two unused private members reported on sanitizer bots.	2025-01-13 14:14:13 +05:30
Daniel Paoliello	d997a722c1	Fix build break in MIRPrinter (#122630 )	2025-01-11 21:56:59 -08:00
Daniel Paoliello	5ee0a71df9	[aarch64][win] Add support for import call optimization (equivalent to MSVC /d2ImportCallOptimization) (#121516 ) This change implements import call optimization for AArch64 Windows (equivalent to the undocumented MSVC `/d2ImportCallOptimization` flag). Import call optimization adds additional data to the binary which can be used by the Windows kernel loader to rewrite indirect calls to imported functions as direct calls. It uses the same [Dynamic Value Relocation Table mechanism that was leveraged on x64 to implement `/d2GuardRetpoline`](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618). The change to the obj file is to add a new `.impcall` section with the following layout: ```cpp // Per section that contains calls to imported functions: // uint32_t SectionSize: Size in bytes for information in this section. // uint32_t Section Number // Per call to imported function in section: // uint32_t Kind: the kind of imported function. // uint32_t BranchOffset: the offset of the branch instruction in its // parent section. // uint32_t TargetSymbolId: the symbol id of the called function. ``` NOTE: If the import call optimization feature is enabled, then the `.impcall` section must be emitted, even if there are no calls to imported functions. The implementation is split across a few parts of LLVM: * During AArch64 instruction selection, the `GlobalValue` for each call to a global is recorded into the Extra Information for that node. * During lowering to machine instructions, the called global value for each call is noted in its containing `MachineFunction`. * During AArch64 asm printing, if the import call optimization feature is enabled: - A (new) `.impcall` directive is emitted for each call to an imported function. - The `.impcall` section is emitted with its magic header (but is not filled in). * During COFF object writing, the `.impcall` section is filled in based on each `.impcall` directive that were encountered. The `.impcall` section can only be filled in when we are writing the COFF object as it requires the actual section numbers, which are only assigned at that point (i.e., they don't exist during asm printing). I had tried to avoid using the Extra Information during instruction selection and instead implement this either purely during asm printing or in a `MachineFunctionPass` (as suggested in [on the forums](https://discourse.llvm.org/t/design-gathering-locations-of-instructions-to-emit-into-a-section/83729/3)) but this was not possible due to how loading and calling an imported function works on AArch64. Specifically, they are emitted as `ADRP` + `LDR` (to load the symbol) then a `BR` (to do the call), so at the point when we have machine instructions, we would have to work backwards through the instructions to discover what is being called. An initial prototype did work by inspecting instructions; however, it didn't correctly handle the case where the same function was called twice in a row, which caused LLVM to elide the `ADRP` + `LDR` and reuse the previously loaded address. Worse than that, sometimes for the double-call case LLVM decided to spill the loaded address to the stack and then reload it before making the second call. So, instead of trying to implement logic to discover where the value in a register came from, I instead recorded the symbol being called at the last place where it was easy to do: instruction selection.	2025-01-11 21:30:17 -08:00
Austin Kerbow	657fb4433e	[AMDGPU] Add target hook to isGlobalMemoryObject (#112781 ) We want special handing for IGLP instructions in the scheduler but they should still be treated like they have side effects by other passes. Add a target hook to the ScheduleDAGInstrs DAG builder so that we have more control over this.	2025-01-11 09:57:57 -08:00
David Green	ab9a80a3ad	[DAG] Allow AssertZExt to scalarize. (#122463 ) With range and undef metadata on a call we can have vector AssertZExt generated on a target with no vector operations. The AssertZExt needs to scalarize to a normal `AssertZext tin, ValueType`. I have added AssertSext too, although I do not have a test case. Fixes #110374	2025-01-11 16:29:06 +00:00
Sergei Barannikov	a475ae05fb	Revert "[ADT] Fix specialization of ValueIsPresent for PointerUnion" (#122557 ) Reverts llvm/llvm-project#121847 Causes compile time regressions and allegedly miscompilation.	2025-01-11 03:36:34 +03:00
Sergei Barannikov	7b05367943	[ADT] Fix specialization of ValueIsPresent for PointerUnion (#121847 ) Two instances of `PointerUnion` with different active members and null value compare unequal. Currently, this results in counterintuitive behavior when using functions from `Casting.h`, e.g.: ```C++ PointerUnion<int , float > U; // U = (int )nullptr; dyn_cast<int >(U); // Aborts dyn_cast<float >(U); // Aborts U = (float )nullptr; dyn_cast<int >(U); // OK dyn_cast<float >(U); // OK ``` `dyn_cast` should abort in all cases because the argument is null. Currently, it aborts only if the first member is active. This happens because the partial template specialization of `ValueIsPresent` for nullable types compares the union with a union constructed from nullptr, and the two unions compare equal only if their active members are the same. This patch changed the specialization of `ValueIsPresent` for nullable types to make `isPresent()` return false for all possible null values of a PointerUnion, and fixes two places where the old behavior was exploited. Pull Request: https://github.com/llvm/llvm-project/pull/121847	2025-01-10 16:43:19 +03:00
Simon Pilgrim	9b49da2b31	Revert 86b1b0671cafd "MachineVerifier: Check stack protector is top-most in frame" (#122444 ) Reverts llvm/llvm-project#121481 This is causing build failures on EXPENSIVE_CHECKS builds: https://lab.llvm.org/buildbot/#/builders/187/builds/3653 https://lab.llvm.org/buildbot/#/builders/16/builds/11758	2025-01-10 12:10:45 +00:00
Nikita Popov	e9e7b2adcf	[SDAG] Set IsPostTypeLegalization flag in LegalizeDAG (#122278 ) This runs after type legalization and as such should set IsPostTypeLegalization when creating libcalls. I don't think this makes any observable difference right now, but I ran into this issue in an upcoming patch.	2025-01-10 12:25:36 +01:00
Guy David	86b1b0671c	MachineVerifier: Check stack protector is top-most in frame (#121481 ) Somewhat paranoid, but mitigates potential bugs in the future that might place it elsewhere and render the mechanism useless.	2025-01-10 10:33:02 +02:00
Nikita Popov	eeac0ffaf4	Revert "[MachineLICM] Use `RegisterClassInfo::getRegPressureSetLimit` (#119826 )" This reverts commit b4e17d4a314ed87ff6b40b4b05397d4b25b6636a. This causes a large compile-time regression.	2025-01-10 09:05:06 +01:00
Akshat Oke	089555095b	Revert "Spiller: Detach legacy pass and supply analyses instead (#119… (#122426 ) …181)" This reverts commit a531800344dc54e9c197a13b22e013f919f3f5e1.	2025-01-10 12:23:07 +05:30
Akshat Oke	a531800344	Spiller: Detach legacy pass and supply analyses instead (#119181 ) Makes Inline Spiller amenable to the new PM.	2025-01-10 11:46:56 +05:30
Mingming Liu	a6aa9365f7	[NFC][AsmPrinter] Pass MJTI by const reference instead of const pointer (#122365 ) The caller `AsmPrinter::emitJumpTableInfo` checks [1] `MJTI` is not a null pointer before calling `emitJumpTableEntry` or `emitJumpTableSizesSection`. This patch updates callee function's signature to accept const reference, this way it's explicit `MJTI` won't be nullptr inside the callee. [1] `9d5299eb61/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (L2857)`	2025-01-09 15:57:32 -08:00
Craig Topper	e2449f1bce	[SelectionDAG] Use SDNode::op_iterator instead of SDNodeIterator. NFC (#122147 ) I think SDNodeIterator primarily exists because GraphTraits requires an iterator that dereferences to SDNode. op_iterator dereferences to SDUse which is implicitly convertible to SDValue. This piece of code can use SDValue instead of SDNode* so we should prefer to use the the more common op_iterator.	2025-01-09 09:09:55 -08:00
Pengcheng Wang	b4e17d4a31	[MachineLICM] Use `RegisterClassInfo::getRegPressureSetLimit` (#119826 ) `RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from https://github.com/llvm/llvm-project/pull/118787	2025-01-09 21:05:52 +08:00
Nicholas Guy	1b2943534f	[llvm] Fix crash caused by reprocessing complex reductions (#122077 ) If a complex pattern had the shape of both a complex->complex reduction and a complex->single reduction, the matching would recognise both and deem the graph a valid transformation. Preventing this reprocessing results in only one of these matching, meaning that in the case of an invalid graph, we don't try to transform it anyway.	2025-01-09 08:31:57 +00:00
Hubert Tong	e438513f2e	[AIX][AsmPrinter] Fix unsigned subtraction wrap-around (#122214 ) Unsigned subtraction wrap-around occurs in `emitGlobalConstantImpl` on an AIX-specific code path from 8e4423eb0888 when a structure type has zero elements. With assertions enabled, this manifests as: ``` TypeSize llvm::StructLayout::getElementOffset(unsigned int) const: Assertion `Idx < NumElements && "Invalid element idx!"' failed. ```	2025-01-09 00:07:57 -04:00
Alexander Yermolovich	fce0314c38	[LLVM][DWARF] Create debug names entry for non-tu top level DIE (#121856 ) When creating a Type Unit (TU), LLVM attempts to do so optimistically. However, if this fails, it discards the TU state and creates the TU within the Compilation Unit (CU). In such cases, an entry for the top-level DIE is not created in the debug names table. This can cause issues when running llvm-dwarfdump --debug-names --verify, as the missing entry will result in verification failure. To address this issue, this patch adds a call to the updateAcceleratorTables when TU creation fails. This ensures that the debug names table is updated correctly, even in cases where TU creation fails.	2025-01-08 17:08:35 -08:00
Mikhail Gudim	f37bee1d92	[ReachingDefAnalysis][NFC] Rename `PhysReg` to `Reg`. (#122112 ) This is in order to prepare for future MR where we will extend `ReachingDefAnalysis` to stack slots.	2025-01-08 10:00:41 -05:00
Ryan Mansfield	67efbd0bf1	[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955 )	2025-01-08 11:07:23 +01:00
abhishek-kaushik22	366e62a0cb	[X86] Combine `uitofp <v x i32> to <v x half>` (#121809 ) Closes #121793	2025-01-08 16:49:29 +08:00
Sander de Smalen	82ec2d6aa4	[Coalescer] Consider NewMI's subreg index when updating lanemask. (#121780 ) The code added in #116191 that updated the lanemasks for rematerialized values checked if `DefMI`'s destination register had a subreg index. This seems to have missed the following case: ``` %0:gpr32 = MOVi32imm 1 %1:gpr64 = SUBREG_TO_REG 0, %0:gpr32, %subreg.sub_32 ``` which during rematerialization would have the following variables set: ``` DefMI = %0:gpr32 = MOVi32imm 1 NewMI = %3.sub_32:gpr64 = MOVi32imm 1 (rematerialized value) ``` When checking whether the lanemasks need to be generated, considering whether DefMI's destination has a subreg index is insufficient, we should look at DefMI's subreg index instead. The added tests are a bit more involved, because I was not able to reconstruct the issue without having some control flow in the test. These tests come from actual reproducers.	2025-01-07 15:06:00 +00:00
Simon Pilgrim	1332db36ee	[DAG] TransformFPLoadStorePair - early out if we're not loading a simple type Its never going to transform into a legal integer type, so just bail - noticed while triaging the assertion reported in #121784	2025-01-07 13:37:23 +00:00
Sander de Smalen	5514865147	[Coalescer] Move code added in #116191 (#121779 ) By moving the code a bit later, we can factor out some of the conditions as those are now already tested. This will also be useful when adding another fix on top that uses `NewMI`'s subreg index (to follow as a separate PR). The change is intended to be NFC.	2025-01-07 09:57:18 +00:00
Matt Arsenault	8c0483bba2	RegisterCoalescer: Fix assert on remat to copy-to-physreg with subregs (#121734 ) Do not try to rematerialize a super-register def used by a subregister extract copy into a copy to a physical register if the other pieces of the full physreg are live at the rematerialization point. It would insert the super-register def at the rematerialization point, and assert since the other half of the register was already live. This is analagous to the undef subregister def handling above, which handled the virtual register case. Fixes #120970	2025-01-07 12:22:23 +07:00
Simon Pilgrim	923675193b	[DAG] VectorLegalizer::ExpandUINT_TO_FLOAT- pull out repeated getValueType calls. NFC.	2025-01-06 18:49:51 +00:00
Simon Pilgrim	112793a90e	[DAG] expandUINT_TO_FP - use getShiftAmountConstant helper. NFC. Don't bother with separate getShiftAmountTy/getConstant calls.	2025-01-06 18:49:50 +00:00
Amara Emerson	2d53eaff4a	[AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. This case is different from the earlier <8 x i1> case handled because it triggers a legalization failure in lowerStore() that's intended for scalar code. It also was triggering incorrect bitcast actions in the AArch64 rules that weren't expecting truncating stores. With these two fixed, more cases are handled. The code is still bad, including some missing load promotion in our combiners that result in dead stores hanging around at the end of codegen. Again, we can fix these in separate changes. Reviewers: davemgreen, madhur13490, topperc, arsenm Reviewed By: davemgreen Pull Request: https://github.com/llvm/llvm-project/pull/121185	2025-01-06 10:22:48 -08:00
Amara Emerson	6b0807fe2b	[AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. This is essentially a port of TargetLowering::scalarizeVectorStore(), which is used for the case where we have something like a store of <8 x s8> truncating to <8 x s1> in memory. The naive lowering is a sequence of extracts to compute a scalar value to store. AArch64's DAG implementation has some more smarts to improve this further which we can do later. Reviewers: topperc, davemgreen Pull Request: https://github.com/llvm/llvm-project/pull/121169	2025-01-06 10:21:42 -08:00
Matt Arsenault	93220e7e06	RegAllocGreedy: Fix use after free during last chance recoloring (#120697 ) Last chance recoloring can delete the current fixed interval during recursive assignment of interfering live intervals. Check if the virtual register value was assigned before attempting the unassignment, as is done in other scenarios. This relies on the fact that we do not recycle virtual register numbers. I have only seen this occur in error situations where the allocation will fail, but I think this can theoretically happen in working allocations. This feels very brute force, but I've spent over a week debugging this and this is what works without any lit regressions. The surprising piece to me was that unspillable live ranges may be spilled, and a number of tests rely on optimizations occurring on them. My other attempts to fixed this mostly revolved around not identifying unspillable live ranges as snippet copies. I've also discovered we're making some unproductive live range splits with subranges. If we avoid such splits, some of the unspillable copies disappear but mandating that be precise to fix a use after free doesn't sound right.	2025-01-06 23:12:55 +07:00
Phoebe Wang	1547382033	[X86] Support lowering of FMINIMUMNUM/FMAXIMUMNUM (#121464 )	2025-01-06 21:28:58 +08:00
Nicholas Guy	8e1b49c38e	Complex deinterleaving/single reductions build fix Reapply "Add support for single reductions in ComplexDeinterleavingPass (#112875 )" (#120441 ) This reverts commit 76714be5fd4ace66dd9e19ce706c2e2149dd5716, fixing the build failure that caused the revert. The failure stemmed from the complex deinterleaving pass identifying a series of add operations as a "complex to single reduction", so when it tried to transform this erroneously identified pattern, it faulted. The fix applied is to ensure that complex numbers (or patterns that match them) are used throughout, by checking if there is a deinterleave node amidst the graph.	2025-01-06 09:59:32 +00:00
Amara Emerson	41ebbed280	[AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. Reviewers: davemgreen, topperc, arsenm Reviewed By: arsenm Pull Request: https://github.com/llvm/llvm-project/pull/121171	2025-01-05 21:32:27 -08:00
Amara Emerson	7e3180a2c2	[AArch64][GlobalISel] Add support for widening vector store elements to s8. Reviewers: topperc, arsenm, davemgreen Reviewed By: arsenm Pull Request: https://github.com/llvm/llvm-project/pull/121170	2025-01-05 21:31:34 -08:00
Matt Arsenault	d34f7ead88	DAG: Fix assuming f16 is the only 16-bit fp type in concat vector combine (#121637 ) This would see if there are mixed integer and FP types and pick an equivalently sized FP type to use as the vector element type, and only cast if there were mixed integers. We need to insert a cast if the types are mixed, which may include different FP types. Fixes #121601	2025-01-06 10:38:54 +07:00
Craig Topper	e32afded92	[LegalizeVectorOps] Use getBoolConstant instead of getAllOnesConstant in VectorLegalizer::UnrollVSETCC. (#121526 ) This code should follow the target preference for boolean contents of a vector type. We shouldn't assume that true is negative one.	2025-01-03 10:46:37 -08:00
Craig Topper	a4e47586b9	[ExpandMemCmp] Recognize canonical form of (icmp sle/sge X, 0) in getMemCmpOneBlock. (#121540 ) This code recognizes special cases where the result of memcmp is compared with 0. If the compare is sle/sge, then InstCombine canonicalizes to (icmp slt X, 1) or (icmp sgt X, -1). We should recognize those patterns too.	2025-01-03 10:23:13 -08:00
Craig Topper	715dcb2310	[ExpandMemCmp] Use m_SpecificInt to simplify code. NFC (#121532 )	2025-01-03 09:19:54 -08:00
Craig Topper	4dfea22e77	[ExpandMemCmp][AArch64][PowerPC][RISCV][X86] Use llvm.ucmp instead of (sub (zext (icmp ugt)), (zext (icmp ult))). (#121530 ) AArch64 and PowerPC look like a improvements. RISC-V is neutral. X86 trades a dependency breaking xor before a seta for a movsx after a sbbb. Depending on how the result is used, this movsx might go away.	2025-01-03 09:19:32 -08:00
Acim Maravic	9d6527bc12	[CodeGen] Add MOTargetFlag4 to MachineMemOperand Flags (#120136 )	2025-01-03 15:45:52 +01:00
Min-Yih Hsu	3cac26f541	[GISel] Combine `(neg (min/max x, (neg x)))` into `(max/min x, (neg x))` (#120998 ) This is the GISel version of #120666. Also supports both unsigned and signed version of min & max.	2025-01-02 16:29:34 -08:00
Min-Yih Hsu	2291d0aba9	[DAGCombiner] Turn `(neg (max x, (neg x)))` into `(min x, (neg x))` (#120666 ) This pattern was originally spotted in 429.mcf by @topperc. We already have a DAGCombiner pattern to turn `(neg (abs x))` into `(min x, (neg x))`. But in some cases `(neg (max x, (neg x)))` is formed by an expanded `abs` followed by a `neg` that is generated only after the `abs` expansion. This patch adds a separate pattern to match cases like this, as well as its inverse pattern: `(neg (min X, (neg X))) --> (max X, (neg X))`. This pattern is applicable to both signed and unsigned min/max.	2025-01-02 16:28:55 -08:00
Jay Foad	1849244685	[CodeGen] Remove atEnd method from defusechain iterators (#120610 ) This was not used much and there are better ways of writing it.	2025-01-02 17:29:55 +00:00
Matt Arsenault	11e482c4a3	RegAllocGreedy: Add dummy priority advisor for writing MIR tests (#121207 ) I regularly struggle reproducing failures in greedy due to changes in priority when resuming the allocation from MIR vs. a complete compilation starting at IR. That is, the fix in e0919b189bf2df4f97f22ba40260ab5153988b14 did not really fix the problem of the instruction distance mattering. Add a way to bypass all of the priority heuristics for MIR tests, by prioritizing only by virtual register number. Could also give this a more specific name, like PrioritizeLowVirtRegNumber	2025-01-02 23:04:44 +07:00
Akshat Oke	50054ba2f4	[CodeGen] LiveRegMatrix: Use allocator through a unique_ptr (#120556 ) `LIU::Matrix` holds on to a pointer to the allocator in LiveRegMatrix and is left hanging when the allocator moves with the LiveRegMatrix. This extends the lifetime of the allocator so that it does not get destroyed when moving a LiveRegMatrix object.	2025-01-01 14:54:08 +05:30
Vikash Gupta	283806695a	[GlobalIsel] Add combine for select with constants (#121088 ) The SelectionDAG Isel supports the both version of combines mentioned below : ``` select Cond, Pow2, 0 --> (zext Cond) << log2(Pow2) select Cond, 0, Pow2 --> (zext !Cond) << log2(Pow2) ``` The GlobalIsel for now only supports the first one defined in it's generic combinerHelper.cpp. This patch adds the missing second one.	2025-01-01 11:14:53 +05:30
Simon Pilgrim	b3a7ab6f1f	[DAG] Don't allow implicit truncation in extract_element(bitcast(scalar_to_vector(X))) -> trunc(srl(X,C)) fold Limits #117900 to only fold when scalar_to_vector doesn't perform implicit truncation, as the scaled shift calculation doesn't currently account for this - this can be addressed in a future update. Fixes #121306	2024-12-30 16:08:35 +00:00

1 2 3 4 5 ...

37021 Commits