llvm-project

Author	SHA1	Message	Date
Craig Topper	f58bc72759	Revert "[X86][ARM][RISCV][XCore][M68K] Invert the low bit to get the inverse predicate (NFC) (#151748 )" This reverts commit 518703806286c98bac7b84156738839f8bd55bef. Failing M68k build bot.	2025-08-04 15:24:52 -07:00
AZero13	5187038062	[X86][ARM][RISCV][XCore][M68K] Invert the low bit to get the inverse predicate (NFC) (#151748 ) All these platforms defined their predicate in such a way to allow bit twiddling to get inverse predicates	2025-08-04 14:45:04 -07:00
Simon Pilgrim	f1036d844e	[X86] X86InstrInfo::commuteInstructionImpl - remove (V)BLENDPD/S commutation to (V)MOVSD/S optsize handling (#144051 ) Just commute with (V)BLENDPD/S like all other BLEND instructions This is now handled more generally by the X86FixupInstTuningPass (OptSize fold occurs even without a scheduler model). First step towards #142972	2025-06-13 12:49:22 +01:00
Simon Pilgrim	054646f335	[X86] commuteInstructionImpl - assert that only MOVSDrr is being commuted to SHUFPDrri Noticed while preparing for #142972	2025-06-10 08:49:18 +01:00
Daniel Paoliello	a414877a7a	[x64][win] Add compiler support for x64 import call optimization (equivalent to MSVC /d2guardretpoline) (#126631 ) This is the x64 equivalent of #121516 Since import call optimization was originally [added to x64 Windows to implement a more efficient retpoline mitigation](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618) the section and constant names relating to this all mention "retpoline" and we need to mark indirect calls, control-flow guard calls and jumps for jump tables in the section alongside calls to imported functions. As with the AArch64 feature, this emits a new section into the obj which is used by the MSVC linker to generate the Dynamic Value Relocation Table and the section itself does not appear in the final binary. The Windows Loader requires a specific sequence of instructions be emitted when this feature is enabled: * Indirect calls/jumps must have the function pointer to jump to in `rax`. * Calls to imported functions must use the `rex` prefix and be followed by a 5-byte nop. * Indirect calls must be followed by a 3-byte nop.	2025-05-20 14:48:41 -07:00
Feng Zou	80547cd705	[X86][APX] Fix issues of suppressing APX for relocation (#139285 ) 1. There is ADD64rm_ND instruction emitted with GOTPCREL relocation. Handled it in "Suppress APX for relocation" pass and transformed it to ADD64rm with register operand in non-rex2 register class. The relocation type R_X86_64_CODE_6_GOTPCRELX will be added later for APX enabled with relocation. 2. The register class for operands in instruction with relocation is updated to non-rex2 one in "Suppress APX for relocation" pass, but it may be updated/recomputed to larger register class (like GR64_NOREX2RegClass to GR64RegClass). Fixed by not updating the register class if it's non-rex2 register class and APX support for relocation is disabled. 3. After "Suppress APX for relocation" pass, the instruction with relocation may be folded with add NDD instruction to a add NDD instruction with relocation. The later will be emitted to instruction with APX relocation type which breaks backward compatibility. Fixed by not folding instruction with GOTPCREL relocation with NDD instruction. 4. If the register in operand 0 of instruction with relocation is used in the PHI instruction, it may be replaced with operand 0 of PHI instruction (maybe EGPR) after PHI elimination and Machine Copy Propagation pass. Fixed by suppressing EGPR in operand 0 of PHI instruction to avoid APX relocation types emitted.	2025-05-12 20:56:07 +08:00
Feng Zou	bd6addc032	[X86][APX] Suppress EGPR/NDD instructions for relocations (#136660 ) Suppress EGPR/NDD instructions for relocations to avoid APX relocation types emitted. This is to keep backward compatibility with old version of linkers without APX support. The use case is to try APX features with LLVM + old built-in linker on RHEL9 OS which is expected to be EOL in 2032. If there are APX relocation types, the old version of linkers would raise "unsupported relocation type" error. Example: ``` $ llvm-mc -filetype=obj -o got.o -triple=x86_64-unknown-linux got.s $ ld got.o -o got.exe ld: got.o: unsupported relocation type 0x2b ... $ cat got.s ... movq foo@GOTPCREL(%rip), %r16 $ llvm-objdump -dr got.o ... 1: d5 48 8b 05 00 00 00 00 movq (%rip), %r16 0000000000000005: R_X86_64_CODE_4_GOTPCRELX foo-0x4 ```	2025-04-29 19:12:59 +08:00
Feng Zou	7a424276de	Revert "[X86][APX] Support peephole optimization with CCMP instruction (#129994 )" (#136796 ) This reverts commit 7ae75851b2e1570662261c97c13cfc65357c283d. There is a problem with peephole optimization for CCMP instruction. See the example below: C source code: ``` if (a > 2 \|\| (b && (a == 2))) { … } ``` MIR before peephole optimization: ``` TEST8rr %21:gr8, %21:gr8, implicit-def $eflags // b CCMP32ri %30:gr32, 2, 0, 5, implicit-def $eflags, implicit $eflags // a == 2 CCMP32ri %30:gr32, 3, 0, 5, implicit-def $eflags, implicit $eflags // a > 2 (transformed to a < 3) JCC_1 %bb.6, 2, implicit $eflags JMP_1 %bb.3 ``` Inputs: ``` a = 1, b = 0. ``` With the inputs above, the expected behavior is to jump to %bb.6 BB. After TEST8rr instruction being executed with b(%21) == 0, the ZF bit is set to 1 in eflags, so the eflags doesn't satisfy SCC condition in the following CCMP32ri instruction (for a==2 condition) which skips compare a(%30) with 2 and set flags in its payload to 0x202 (ZF = 0). The eflags satisfies the SCC condition in the 2nd CCMP32ri instruction which compares a(%30) with 3. It sets CF to 1 in eflags and the JCC instruction jumps to %bb.6 BB. But after adding CCMP support, peephole optimization eliminates the 2nd CCMP32ri instruction and updates the condition of JCC instruction to "BE" from "B". With the same inputs, JCC instruction falls through to the next instruction. It's not expected and the 2nd CCMP32ri should not be eliminated. ``` TEST8rr %21:gr8, %21:gr8, implicit-def $eflags // b CCMP32ri %30:gr32, 2, 0, 5, implicit-def $eflags, implicit $eflags // a == 2 JCC_1 %bb.6, 6, implicit $eflags JMP_1 %bb.3 ```	2025-04-25 10:55:31 +08:00
Evgenii Kudriashov	db97d56c97	[X86][APX] Handle AND_NF instruction for compare peephole (#136233 )	2025-04-19 01:48:40 +02:00
Philip Reames	f2ecd86e34	[Analysis] Remove implicit LocationSize conversion from uint64_t (#133342 ) This change removes the uint64_t constructor on LocationSize preventing implicit conversion, and fixes up the using APIs to adapt to the change. Note that I'm adding a couple of explicit conversion points on routines where passing in a fixed offset as an integer seems likely to have well understood semantics. We had an unfortunate case which arose if you tried to pass a TypeSize value to a parameter of LocationSize type. We'd find the implicit conversion path through TypeSize -> uint64_t -> LocationSize which works just fine for fixed values, but looses information and fails assertions if the TypeSize was scalable. This change breaks the first link in that implicit conversion chain since that seemed to be the easier one.	2025-04-18 07:46:31 -07:00
Connector Switch	cc354d6a6d	[NFC] Fix destroy typo. (#135640 )	2025-04-15 08:20:44 +08:00
Craig Topper	2ec88374e0	[X86] Use MCRegister. NFC	2025-03-29 11:14:06 -07:00
Philip Reames	236f938ef6	[CodeGen] Provide a target independent default for optimizeLoadInst [NFC] This just moves the x86 implementation into generic code since it appears to be suitable for any target. The heart of this transform is inside foldMemoryOperand so other targets won't actually kick in until they implement said API. This just removes one piece to implement in the process of enabling foldMemoryOperand.	2025-03-26 08:52:40 -07:00
Daniel Zabawa	5afa0fa9a6	[X86] Prevent APX NDD compression when it creates a partial write (#132051 ) APX NDD instructions may be compressed when the result is also a source. For 8/16b instructions, this may create partial register write hazards if a previous super-register def is within the partial reg update clearance, or incorrect code if the super-register is not dead. This change prevents compression when the super-register is marked as an implicit define, which the virtual rewriter already adds in the case where a subregister is defined but the super-register is not dead. The BreakFalseDeps interface is also updated to add implicit super-register defs for NDD instructions that would incur partial-write stalls if compressed to legacy ops.	2025-03-22 00:50:12 +08:00
Philip Reames	4d4d9d5d33	[TTI] Use TypeSize in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#132244 ) Motivation is supporting scalable spills and reloads, e.g. in https://github.com/llvm/llvm-project/pull/120524. Looking at this API, I'm suspicious that the access size should just be coming from the memory operand on the load or store, but we don't appear to be consistently setting that up. That's a larger change so I may or may not bother pursuing that.	2025-03-20 10:17:36 -07:00
Craig Topper	3fe914c9fa	[X86] Use Register and MCRegister. NFC	2025-03-15 23:15:28 -07:00
Craig Topper	86ae25d2be	[CodeGen][X86] Use Register in TTI unfoldMemoryOperand interface. NFC	2025-03-15 10:04:54 -07:00
Phoebe Wang	254951749f	[X86][APX] Remove the EFLAGS def operand rather than the last one (#131430 ) The last one may be an implict use, e.g., `IDIV32r %4:gr32, implicit-def dead $eax, implicit-def $edx, implicit-def dead $eflags, implicit $eax, implicit $edx` https://godbolt.org/z/KPKzj5c8K	2025-03-15 16:37:38 +08:00
Craig Topper	6b7daf2249	[MachineCombiner][Targets] Use Register in TII genAlternativeCodeSequence interface. NFC (#131272 )	2025-03-13 23:27:56 -07:00
Phoebe Wang	bc4b2c74fe	[X86][APX] Add NF instructions to convertToThreeAddress functions (#130969 ) Since #130488, we have NF instructions when converting to three address instructions.	2025-03-13 13:23:50 +08:00
Phoebe Wang	ad704ff62b	[X86][NF] Switch the order of Inst and &Target.getInstruction(NewRec) (#130739 ) Because Inst is ordered by Instruction ID.	2025-03-12 17:35:54 +08:00
Feng Zou	7ae75851b2	[X86][APX] Support peephole optimization with CCMP instruction (#129994 ) This extends `opitimizeCompareInstr` to re-use previous CCMP results if the previous comparison was with an immediates that was 1 bigger or smaller. Example: ``` CCMP x, 13, 2, 5 ... CCMP x, 12, 2, 5 ; can be removed if we change the SETg SETg ... ; x > 12 changed to SETge (x >= 13) & remove the 2nd CCMP ```	2025-03-12 09:24:10 +08:00
Phoebe Wang	507e0c3b67	[X86][APX] Try to replace non-NF with NF instructions when optimizeCompareInstr (#130488 ) https://godbolt.org/z/rWYdqnjjx	2025-03-10 21:08:01 +08:00
Craig Topper	571b787b83	[CodeGen] Change copyPhysReg interface to use Register instead of MCRegister. (#128473 ) NVPTX, SPIRV, and WebAssembly pass virtual registers to this function since they don't perform register allocation. We need to use Register to avoid a virtual register being converted to MCRegister by the caller.	2025-02-24 09:55:34 -08:00
Matt Arsenault	1f6165e184	X86: Fix convertToThreeAddress losing subregister indexes (#124098 ) This avoids dozens of regressions in a future patch. These primarily manifested as assertions where we had copies of 64-bit registers to 32-bit registers. This is testable in principle with hand written MIR, but that's a bit too much x86 for me.	2025-02-19 01:27:19 +07:00
Craig Topper	27e01d1d74	[X86] Use new Flags argument to storeRegToStackSlot to simplify code. NFC (#124658 ) Use the Flags argument to add FrameSetup directly instead of walking backwards to add the flag after the call.	2025-01-29 09:45:29 -08:00
Simon Pilgrim	90e9895a93	[X86] Handle BSF/BSR "zero-input pass through" behaviour (#123623 ) Intel docs have been updated to be similar to AMD and now describe BSF/BSR as not changing the destination register if the input value was zero, which allows us to support CTTZ/CTLZ zero-input cases by setting the destination to support a NumBits result (BSR is a bit messy as it has to be XOR'd to create a CTLZ result). VIA/Zhaoxin x86_64 CPUs have also been confirmed to match this behaviour. This patch adjusts the X86ISD::BSF/BSR nodes to take a "pass through" argument for zero-input cases, by default this is set to UNDEF to match existing behaviour, but it can be set to a suitable value if supported. There are still some limits to this - its only supported for x86_64 capable processors (and I've only enabled it for x86_64 codegen), and Intel CPUs sometimes zero the upper 32-bits of a pass through register when used for BSR32/BSF32 with a zero source value (i.e. the whole 64bits may not get passed through). Fixes #122004	2025-01-23 12:59:59 +00:00
Venkata Ramanaiah Nalamothu	f7d8336a2f	[llvm] Pass MachineInstr flags to storeRegToStackSlot/loadRegFromStackSlot (NFC) (#120622 ) This patch is in preparation to enable setting the MachineInstr::MIFlag flags, i.e. FrameSetup/FrameDestroy, on callee saved register spill/reload instructions in prologue/epilogue. This eventually helps in setting the prologue_end and epilogue_begin markers more accurately. The DWARF Spec in "6.4 Call Frame Information" says: The code that allocates space on the call frame stack and performs the save operation is called the subroutine’s prologue, and the code that performs the restore operation and deallocates the frame is called its epilogue. which means the callee saved register spills and reloads are part of prologue (a.k.a frame setup) and epilogue (a.k.a frame destruction), respectively. And, IIUC, LLVM backend uses FrameSetup/FrameDestroy flags to identify instructions that are part of call frame setup and destruction. In the trunk, while most targets consistently set FrameSetup/FrameDestroy on save/restore call frame information (CFI) instructions of callee saved registers, they do not consistently set those flags on the actual callee saved register spill/reload instructions. I believe this patch provides a clean mechanism to set FrameSetup/FrameDestroy flags on the actual callee saved register spill/reload instructions as needed. And, by having default argument of MachineInstr::NoFlags for Flags, this patch is a NFC. With this patch, the targets have to just pass FrameSetup/FrameDestroy flag to the storeRegToStackSlot/loadRegFromStackSlot calls from the target derived spillCalleeSavedRegisters and restoreCalleeSavedRegisters to set those flags on callee saved register spill/reload instructions. Also, this patch makes it very easy to set the source line information on callee saved register spill/reload instructions which is needed by the DwarfDebug.cpp implementation to set prologue_end and epilogue_begin markers more accurately. As per DwarfDebug.cpp implementation: prologue_end is the first known non-DBG_VALUE and non-FrameSetup location that marks the beginning of the function body epilogue_begin is the first FrameDestroy location that has been seen in the epilogue basic block With this patch, the targets have to just do the following to set the source line information on callee saved register spill/reload instructions, without hampering the LLVM's efforts to avoid adding source line information on the artificial code generated by the compiler. <Foo>InstrInfo::storeRegToStackSlot() { ... DebugLoc DL = Flags & MachineInstr::FrameSetup ? DebugLoc() : MBB.findDebugLoc(I); ... } <Foo>InstrInfo::loadRegFromStackSlot() { ... DebugLoc DL = Flags & MachineInstr::FrameDestroy ? MBB.findDebugLoc(I) : DebugLoc(); ... } While I understand this patch would break out-of-tree backend builds, I think it is in the right direction. One immediate use case that can benefit from this patch is fixing #120553 becomes simpler.	2025-01-22 13:36:39 +05:30
Phoebe Wang	9cd774d1e4	[X86][NFC] Move "_Int" after "k"/"kz" (#121450 ) Address comment at https://github.com/llvm/llvm-project/pull/121373#discussion_r1900402932	2025-01-02 21:02:19 +08:00
Simon Pilgrim	29f11f0a32	[X86] Add missing reg/imm attributes to VRNDSCALES instruction names (#117203 ) More canonicalization of the instruction names to make the predictable - more closely matches VRNDSCALEP / VROUND equivalent instructions	2024-11-22 17:45:30 +00:00
Phoebe Wang	7b5b01980c	Revert "[X86] Recognize POP/ADD/SUB modifying rsp in getSPAdjust. (#114265 ) (#117089 ) This reverts commit 6fb7cdff3d90c565b87a253ff7dbd36319879111.	2024-11-21 09:16:22 +08:00
Simon Pilgrim	3a5cf6d99b	[X86] Rename AVX512 VEXTRACT/INSERT??x? to VEXTRACT/INSERT??X? (#116826 ) Use uppercase in the subvector description ("32x2" -> "32X4" etc.) - matches what we already do in VBROADCAST??X?, and we try to use uppercase for all x86 instruction mnemonics anyway (and lowercase just for the arg description suffix).	2024-11-20 08:25:01 +00:00
Simon Pilgrim	7dcefb37a4	[X86] Tidyup up AVX512 FPCLASS instruction naming (#116661 ) FPCLASS is a unary instruction with an immediate operand - update the naming to match similar instructions (e.g. VPSHUFD) by only using the source reg/mem and immediate in the instruction name	2024-11-19 11:26:46 +00:00
Daniel Zabawa	6fb7cdff3d	[X86] Recognize POP/ADD/SUB modifying rsp in getSPAdjust. (#114265 ) This code assumed only PUSHes would appear in call sequences. However, if calls require frame-pointer/base-pointer spills, only the PUSH operations inserted by spillFPBP will be recognized, and the adjustments to frame object offsets in prologepilog will be incorrect. This change correctly reports the SP adjustment for POP and ADD/SUB to rsp, and an assertion for unrecognized instructions that modify rsp.	2024-11-14 17:20:16 +01:00
Phoebe Wang	08af115d97	Fix mistakes in #113532 (#115631 ) Found during review #115151	2024-11-10 12:46:21 +08:00
Kazu Hirata	dfe43bd1ca	[X86] Remove unused includes (NFC) (#115593 ) Identified with misc-include-cleaner.	2024-11-09 08:23:46 -08:00
Phoebe Wang	c72a751dab	[X86][AMX] Support AMX-TRANSPOSE (#113532 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-01 16:45:03 +08:00
Simon Pilgrim	c59ac1a2f6	[X86] Cleanup AVX512 VBROADCAST subvector instruction names. (#108888 ) This patch makes the `VBROADCAST*X` subvector broadcast instructions consistent - the `*X` section represents the original subvector type/size, but we were not correctly using the AVX512 Z/Z256/Z128 suffix to consistently represent the destination width (or we missed it entirely).	2024-09-18 10:34:35 +01:00
Simon Pilgrim	c91f2a259f	[X86] Consistently use 'k' for predicate mask registers in instruction names (#108780 ) We use 'k' for move instructions and to indicate masked variants of evex instructions, but otherwise we're very inconsistent when we use 'k' vs 'r'.	2024-09-17 08:57:57 +01:00
Simon Pilgrim	614a064cac	[X86] Add missing immediate qualifier to the (V)INSERT/EXTRACT/PERM2 instruction names (#108593 ) Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on	2024-09-15 11:42:13 +01:00
Simon Pilgrim	ba8e4246e2	[X86] Add missing immediate qualifier to the (V)INSERTPS instruction names (#108568 ) Matches (V)BLENDPS etc and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on	2024-09-15 11:27:36 +01:00
Kyungwoo Lee	93b8d07a75	[MachineOutliner][NFC] Refactor (#105398 ) This patch prepares the NFC groundwork for global outlining using CGData, which will follow https://github.com/llvm/llvm-project/pull/90074. - The `MinRepeats` parameter is now explicitly passed to the `getOutliningCandidateInfo` function, rather than relying on a default value of 2. For local outlining, the minimum number of repetitions is typically 2, but for the global outlining (mentioned above), we will optimistically create a single `Candidate` for each `OutlinedFunction` if stable hashes match a specific code sequence. This parameter is adjusted accordingly in global outlining scenarios. - I have also implemented `unique_ptr` for `OutlinedFunction` to ensure safe and efficient memory management within `FunctionList`, avoiding unnecessary implicit copies. This depends on https://github.com/llvm/llvm-project/pull/101461. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.	2024-08-27 14:38:36 -07:00
Piyou Chen	b01c006f73	[TII][RISCV] Add renamable bit to copyPhysReg (#91179 ) The renamable flag is useful during MachineCopyPropagation but renamable flag will be dropped after lowerCopy in some case. This patch introduces extra arguments to pass the renamable flag to copyPhysReg.	2024-08-27 10:08:43 +08:00
Temperatureblock	db3c3fc90a	Simple check to ignore Inline asm fwait insertion (#101686 ) Just a simple check to ignore Inline asm fwait insertion Fixes #101613	2024-08-12 22:36:58 +08:00
Phoebe Wang	b0329206db	[X86][AVX10.2] Support AVX10.2 VNNI FP16/INT8/INT16 new instructions (#101783 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965	2024-08-05 18:57:42 +08:00
Shengchen Kan	50cf413426	[X86,CodeGen] Return the correct condition code for SETZUCC llvm-issue: https://github.com/llvm/llvm-project/issues/101288	2024-07-31 14:09:08 +08:00
Pengcheng Wang	ed4e75d5e5	[CodeGen] Remove AA parameter of isSafeToMove (#100691 ) This `AA` parameter is not used and for most uses they just pass a nullptr. The use of `AA` was removed since 8d0383e.	2024-07-26 15:47:47 +08:00
Matt Arsenault	3cb5604d2c	MachineOutliner: Use PM to query MachineModuleInfo (#99688 ) Avoid getting this from the MachineFunction	2024-07-24 13:22:56 +04:00
Nikita Popov	4169338e75	[IR] Don't include Module.h in Analysis.h (NFC) (#97023 ) Replace it with a forward declaration instead. Analysis.h is pulled in by all passes, but not all passes need to access the module.	2024-06-28 14:30:47 +02:00
Haohai Wen	be00190ce3	[TII][X86] Do not schedule frame-setup/frame-destory instructions (#96611 ) frame-setup/frame-destroy instruction can not be scheduled around by PostRAScheduler. Their order is critical for SEH.	2024-06-26 17:08:59 +08:00

1 2 3 4 5 ...

1728 Commits