llvm-project

Author	SHA1	Message	Date
AtariDreams	0a54b36d5e	[X86] Resolve FIXME: Create cld only when needed (#82415 ) Only use cld when we also have rep instructions, are calling a function, or contain inline asm.	2024-02-28 12:32:58 +00:00
Alex Lorenz	dd70aef05a	[x86_64][windows][swift] do not use Swift async extended frame for wi… (#80468 ) …ndows x86_64 targets that use windows 64 prologue Windows x86_64 stack frame layout is currently not compatible with Swift's async extended frame, which reserves the slot right below RBP (RBP-8) for the async context pointer, as it doesn't account for the fact that a stack object in a win64 frame can be allocated at the same location. This can cause issues at runtime, for instance, Swift's TCA test code has functions that fail because of this issue, as they spill a value to that slack slot, which then gets overwritten by a store into address returned by the @llvm.swift.async.context.addr() intrinsic (that ends up being RBP - 8), leading to an incorrect value being used at a later point when that stack slot is being read from again. This change drops the use of async extended frame for windows x86_64 subtargets and instead uses the x32 based approach of allocating a separate stack slot for the stored async context pointer. Additionally, LLDB which is the primary consumer of the extended frame makes assumptions like checking for a saved previous frame pointer at the current frame pointer address, which is also incompatible with the windows x86_64 frame layout, as the previous frame pointer is not guaranteed to be stored at the current frame pointer address. Therefore the extended frame layout can be turned off to fix the current miscompile without introducing regression into LLDB for windows x86_64 as it already doesn't work correctly. I am still investigating what should be made for LLDB to support using an allocated stack slot to store the async frame context instead of being located at RBP - 8 for windows.	2024-02-05 10:19:26 -08:00
Philip Reames	3ff7caea33	[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339 )	2024-02-01 17:52:35 -08:00
Oskar Wirga	ff4636a4ab	Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940 ) This is a fix for the regression seen in https://github.com/llvm/llvm-project/pull/79498 > Currently, the way that recomputeLiveIns works is that it will recompute the livein registers for that MachineBasicBlock but it matters what order you call recomputeLiveIn which can result in incorrect register allocations down the line. Now we do not recompute the entire CFG but we do ensure that the newly added MBB do reach convergence.	2024-01-30 19:33:04 -08:00
Nikita Popov	07a1925b8b	Revert "Refactor recomputeLiveIns to operate on whole CFG (#79498 )" This reverts commit 59bf60519fc30d9d36c86abd83093b068f6b1e4b. Introduces a major compile-time regression.	2024-01-26 22:33:17 +01:00
Oskar Wirga	59bf60519f	Refactor recomputeLiveIns to operate on whole CFG (#79498 ) Currently, the way that recomputeLiveIns works is that it will recompute the livein registers for that MachineBasicBlock but it matters what order you call recomputeLiveIn which can result in incorrect register allocations down the line. This PR fixes that by simply recomputing the liveins for the entire CFG until convergence is achieved. This makes it harder to introduce subtle bugs which alter liveness.	2024-01-26 11:25:36 -08:00
Shengchen Kan	cb112eb16c	[X86][CodeGen] Teach frame lowering to spill/reload registers w/ PUSHP/POPP, PUSH2[P]/POP2[P] (#73292 ) #73092 supported the encoding/decoding for PUSHP/POPP #73233 supported the encoding/decoding for PUSH2[P]/POP2[P] In this patch, we teach frame lowering to spill/reload registers w/ these instructions. 1. Use PPX for balanced spill/reload 2. Use PUSH2/POP2 for continuous spills/reloads 3. PUSH2/POP2 must be 16B-aligned on the stack, so pad when necessary	2023-11-27 21:37:07 +08:00
Shengchen Kan	5169100ecd	[NFC][X86] Clang-format X86FrameLowering.cpp (#73287 )	2023-11-24 14:12:20 +08:00
Kazu Hirata	01702c3f7f	[llvm] Stop including llvm/ADT/SmallSet.h (NFC) Identified with clangd.	2023-11-11 12:32:15 -08:00
Bill Wendling	9e41c284e0	[NFC][CodeGen] Create method to clear registers (#66958 ) Place the architecuture-specific logic to clear registers in a single place and call it via a TargetInstrInfo method. This will allow one to add instructions to clear registers holding the stack protector guard value before return, but do it in non-architecture-specific code.	2023-09-21 15:57:35 -07:00
Antonio Abbatangelo	b7e110fcfe	[X86] Align stack to 16-bytes on 32-bit with X86_INTR call convention Adds a dynamic stack alignment to functions under the interrupt call convention on x86-32. This fixes the issue where the stack can be misaligned on entry, since x86-32 makes no guarantees about the stack pointer position when the interrupt service routine is called. The alignment is done by overriding X86RegisterInfo::shouldRealignStack, and by setting the correct alignment in X86FrameLowering::calculateMaxStackAlign. This forces the interrupt handler to be dynamically aligned, generating the appropriate `and` instruction in the prologue and `lea` in the epilogue. The `no-realign-stack` attribute can be used as an opt-out. Fixes #26851 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D151400	2023-06-01 17:00:34 +08:00
Kyle Huey	3be667ae5a	[X86] Use the CFA when appropriate for better variable locations around calls. Without frame pointers, the locations of variables on the stack are emitted relative to the stack pointer (via the stack pointer being the value of DW_AT_frame_base on the subprogram). If a call modifies the stack pointer this results in the locations being wrong and the debugger displaying the wrong values for variables. By using DW_OP_call_frame_cfa in these situations the emitted location for the variable will automatically handle changes in the stack pointer (provided LLVM is emitting the correct CFI directives elsewhere, of course). The CFA needs to be adjusted for the size of the stack frame (including the return address) to allow the variable locations themselves to remain unchanged by this patch. Certain LLDB features cannot cope with DW_OP_call_frame_cfa, so this change is heuristically limited to the cases where it's necessary for correctness to minimize the fallout there. Reviewed By: #debug-info, scott.linder, jryans, jmorse Differential Revision: https://reviews.llvm.org/D143463	2023-05-23 20:24:55 +00:00
Shengchen Kan	f603809637	[X86] Move encoding optimization for PUSH32i, PUSH64i to MC lowering, NFCI	2023-05-20 17:59:43 +08:00
Shengchen Kan	89ca4eb002	[X86][NFC] Correct the instruction names for PUSH16i, PUSH32i Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D151012	2023-05-20 17:33:42 +08:00
Shengchen Kan	c81a121f3f	Revert "Revert "[X86] Remove patterns for ADC/SBB with immediate 8 and optimize during MC lowering, NFCI"" This reverts commit cb16b33a03aff70b2499c3452f2f817f3f92d20d. In fact, the test https://bugs.chromium.org/p/chromium/issues/detail?id=1446973#c2 already passed after 5586bc539acb26cb94e461438de01a5080513401	2023-05-19 22:21:56 +08:00
Hans Wennborg	cb16b33a03	Revert "[X86] Remove patterns for ADC/SBB with immediate 8 and optimize during MC lowering, NFCI" This caused compiler assertions, see comment on https://reviews.llvm.org/D150107. This also reverts the dependent follow-up change: > [X86] Remove patterns for ADD/AND/OR/SUB/XOR/CMP with immediate 8 and optimize during MC lowering, NFCI > > This is follow-up of D150107. > > In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be > shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp. > > Differential Revision: https://reviews.llvm.org/D150949 This reverts commit 2ef8ae134828876ab3ebda4a81bb2df7b095d030 and 5586bc539acb26cb94e461438de01a5080513401.	2023-05-19 14:43:33 +02:00
Shengchen Kan	5586bc539a	[X86] Remove patterns for ADD/AND/OR/SUB/XOR/CMP with immediate 8 and optimize during MC lowering, NFCI This is follow-up of D150107. In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp. Differential Revision: https://reviews.llvm.org/D150949	2023-05-19 18:22:30 +08:00
J. Ryan Stinnett	d6e4c4f8c1	Revert "[X86] Use the CFA as the DWARF frame base for better variable locations around calls." This reverts commit d421f5226048e4a5d88aab157d0f4d434c43f208. LLDB tests are failing as shown in https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/55133/testReport/	2023-05-15 16:53:52 +01:00
Kyle Huey	d421f52260	[X86] Use the CFA as the DWARF frame base for better variable locations around calls. Prior to this patch, for the DWARF frame base LLVM uses the frame pointer register if available, otherwise the stack pointer register. If the stack pointer register is being used and a call or other code modifies the stack pointer during the body of the function this results in the locations being wrong and the debugger displaying the wrong values for variables. By using DW_OP_call_frame_cfa in these situations the emitted location for the variable will automatically handle changes in the stack pointer. The CFA needs to be adjusted for the offset between the frame pointer/stack pointer to allow the variable locations themselves to remain unchanged by this patch. Reviewed By: #debug-info, scott.linder, jryans Differential Revision: https://reviews.llvm.org/D143463	2023-05-15 15:10:02 +01:00
Tom Dohrmann	f6154364f6	fix stack probe lowering for x86_intrcc The x86_intrcc calling convention will build two STACKALLOC_W_PROBING machine instructions if the function takes an error code. This is caused by an additional call to emitSPUpdate in llvm/lib/Target/X86/X86FrameLowering.cpp:1650. Previously only the first STACKALLOC_W_PROBING machine instruction was properly handled, the second one was simply ignored. This lead to miscompilations where the stack pointer wasn't properly updated (see https://github.com/rust-lang/rust/issues/109918). This patch fixes this by handling all STACKALLOC_W_PROBING machine instructions. To be honest I don't quite understand why this didn't lead to more noticeable miscompilations previously. This is my first time contributing to LLVM. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D150033	2023-05-09 16:31:42 +08:00
Luo, Yuanke	30b141a3b9	[X86] Fix the incorrect displacement for prolog/epilog The bug is introduced in rGe4ceb5a7bb9b which set the wrong offset from the stack base. This patch is to fix the bug. Differential Revision: https://reviews.llvm.org/D146862	2023-03-25 12:24:37 +08:00
Luo, Yuanke	e4ceb5a7bb	[X86] Create extra prolog/epilog for stack realignment Fix some bugs and reland e4c1dfed38370b4 and 614c63bec6d67c. 1. Run argument stack rebase pass before the reserved physical register is finalized. 2. Add LEA pseudo instruction to prevent the instruction being eliminated. 3. Don't support X32.	2023-03-22 22:20:27 +08:00
Luo, Yuanke	da8260a9b1	Revert "[X86] Create extra prolog/epilog for stack realignment" This reverts commit e4c1dfed38370b4933f05c8e24b1d77df56b526c.	2023-03-21 20:30:29 +08:00
Luo, Yuanke	e4c1dfed38	[X86] Create extra prolog/epilog for stack realignment The base pointer register is reserved by compiler when there is dynamic size alloca and stack realign in a function. However the base pointer register is not defined in X86 ABI, so user can use this register in inline assembly. The inline assembly would clobber base pointer register without being awared by user. This patch is to create extra prolog to save the stack pointer to a scratch register and use this register to reference argument from stack. For some calling convention (e.g. regcall), there may be few scratch register. Below is the example code for such case. ``` extern int bar(void p); long long foo(size_t size, char c, int id) { __attribute__((__aligned__(64))) int a; char p = (char *)alloca(size); asm volatile ("nop"::"S"(405):); asm volatile ("movl %0, %1"::"r"(id), "m"(a):); p[2] = 8; memset(p, c, size); return bar(p); } ``` And below prolog/epilog will be emit for this case. ``` leal 4(%esp), %ebx .cfi_def_cfa %ebx, 0 andl $-128, %esp pushl -4(%ebx) ... leal 4(%ebx), %esp .cfi_def_cfa %esp, 4 ``` Differential Revision: https://reviews.llvm.org/D145650	2023-03-21 08:09:56 +08:00
Theodoros Kasampalis	2e0940c6a0	[X86] Fix for offsets of CFA directives `emitPrologue` may insert stack pointer adjustment in tail call optimized functions where the callee argument stack size is bigger than the caller's. In such a case, the adjustment must be taken into account when generating CFA directives. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D143618	2023-02-28 10:30:06 +08:00
Luo, Yuanke	ba8a520512	[X86] Fix the base pointer save/restore bug Previous the stack slot for spilling base pointer register is allocated after the stack realignment. When the stack is naturally aligned the stack slot is share with other data that allocated from stack and cause data corrupt. Another issue is the stack slot for save/restore the callee saved register is not fixed for each function. It depends on the register usage of them in each function. This patch is to recalculate the offset the stack slot for base pointer register during the prolog/epilog insert pass, and allocate the stack slot when spilling callee saved registers. Differential Revision: https://reviews.llvm.org/D144625	2023-02-25 20:52:13 +08:00
Shengchen Kan	011e4abb49	[X86][MC][bugfix] Report error for mismatched modifier in inline asm and remove function getX86SubSuperRegisterOrZero ``` MCRegister getX86SubSuperRegister(MCRegister Reg, unsigned Size, bool High = false); ``` A strange behavior of the functions `getX86SubSuperRegister` was introduced by llvm-svn:145579: The returned register may not match the parameters when a 8-bit high register is required. And llvm-svn: 175762 refined the code and dropped the comments, then we knew nothing happened there from the code :-( These two functions are only called with `Size=8` and `High=true` in two places. One is in `X86FixupBWInsts.cpp` for liveness of registers and the other is in `X86AsmPrinter.cpp` for inline asm. For the first one, we provide an alternative in this patch. For the second one, the strange behaviour caused a bug that an erorr was not reported for mismatched modifier. ``` void f() { char x; asm volatile ("mov %%ah, %h0" :"=r"(x)::"%eax", "%ebx", "%ecx", "%edx", "edi", "esi"); } ``` ``` $ gcc -S test.c error: extended registers have no high halves ``` ``` $ clang -S test.c no error ``` so we fix the bug in this patch. `getX86SubSuperRegister` is just a wrapper of `getX86SubSuperRegisterOrZero` with a `assert`. I belive we should remove the latter. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D142834	2023-02-02 10:08:56 +08:00
Stefan Gränitz	3b387d1070	Lift EHPersonalities from Analysis to IR (NFC) Computing EH-related information was only relevant for analysis passes so far. Lifting it to IR will allow the IR Verifier to calculate EH funclet coloring and validate funclet operand bundles in a follow-up step. Reviewed By: rnk, compnerd Differential Revision: https://reviews.llvm.org/D138122	2023-01-27 18:05:13 +01:00
Christudasan Devadasan	b5efec4b27	[CodeGen] Additional Register argument to storeRegToStackSlot/loadRegFromStackSlot With D134950, targets get notified when a virtual register is created and/or cloned. Targets can do the needful with the delegate callback. AMDGPU propagates the virtual register flags maintained in the target file itself. They are useful to identify a certain type of machine operands while inserting spill stores and reloads. Since RegAllocFast spills the physical register itself, there is no way its virtual register can be mapped back to retrieve the flags. It can be solved by passing the virtual register as an additional argument. This argument has no use when the spill interfaces are called during the greedy allocator or even the PrologEpilogInserter and can pass a null register in such cases. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138656	2022-12-17 11:55:34 +05:30
Josh Stone	9b8fcd04ef	[X86] Fix cmp order in probing BuildStackAlignAND Due to reversed arguments, the loop start was almost always skipping the whole loop, since FinalStackProbed is probably less than StackPtr for large alignments. The intent was to skip the loop if the first sub on StackPtr made it less than FinalStackProbed already, so flip it. Reviewed By: serge-sans-paille Differential Revision: https://reviews.llvm.org/D139756	2022-12-13 12:10:39 -08:00
Fangrui Song	b0df70403d	[Target] llvm::Optional => std::optional The updated functions are mostly internal with a few exceptions (virtual functions in TargetInstrInfo.h, TargetRegisterInfo.h). To minimize changes to LLVMCodeGen, GlobalISel files are skipped. https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-04 22:43:14 +00:00
Tim Northover	b32280baf9	X86: relax EFLAGS liveness check when generating stack probes. The probes are all inserted at the iterator passed into the functions, so that's where any EFLAGS clobbering will happen and where we need it to be dead. Fixes: https://github.com/llvm/llvm-project/issues/59121	2022-11-30 11:44:39 +00:00
Josh Stone	cb46ffdbf4	[X86] Use BuildStackAdjustment in stack probes This has the advantage of dealing with live EFLAGS, using LEA instead of SUB if needed to avoid clobbering. That also respects feature "lea-sp". We could allow unrolled stack probing from blocks with live-EFLAGS, if canUseAsEpilogue learns when emitStackProbeInlineGeneric will be used. Differential Revision: https://reviews.llvm.org/D134495	2022-09-23 09:30:32 -07:00
Josh Stone	26c37b461a	[X86] Don't allow prologue stack probing with live EFLAGS Fixes https://github.com/llvm/llvm-project/issues/49509 Differential Revision: https://reviews.llvm.org/D134494	2022-09-23 09:30:32 -07:00
Eli Friedman	cfd2c5ce58	Untangle the mess which is MachineBasicBlock::hasAddressTaken(). There are two different senses in which a block can be "address-taken". There can be a BlockAddress involved, which means we need to map the IR-level value to some specific block of machine code. Or there can be constructs inside a function which involve using the address of a basic block to implement certain kinds of control flow. Mixing these together causes a problem: if target-specific passes are marking random blocks "address-taken", if we have a BlockAddress, we can't actually tell which MachineBasicBlock corresponds to the BlockAddress. So split this into two separate bits: one for BlockAddress, and one for the machine-specific bits. Discovered while trying to sort out related stuff on D102817. Differential Revision: https://reviews.llvm.org/D124697	2022-08-16 16:15:44 -07:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Kazu Hirata	ba0407ba86	[llvm] Use range-based for loops (NFC)	2022-08-07 00:16:21 -07:00
Bill Wendling	8f2ec974d1	[X86] Move target-generic code into CodeGen [NFC] This code is the same for all platforms. Differential Revision: https://reviews.llvm.org/D124566	2022-04-27 15:37:28 -07:00
Matt Arsenault	3659780d58	MachineModuleInfo: Remove UsesMorestackAddr This is x86 specific, and adds statefulness to MachineModuleInfo. Instead of explicitly tracking this, infer if we need to declare the symbol based on the reference previously inserted. This produces a small change in the output due to the move from AsmPrinter::doFinalization to X86's emitEndOfAsmFile. This will now be moved relative to other end of file fields, which I'm assuming doesn't matter (e.g. the __morestack_addr declaration is now after the .note.GNU-split-stack part) This also produces another small change in code if the module happened to define/declare __morestack_addr, but I assume that's invalid and doesn't really matter.	2022-04-20 11:10:20 -04:00
Matt Arsenault	d7938b1a81	MachineModuleInfo: Move HasSplitStack handling to AsmPrinter This is used to emit one field in doFinalization for the module. We can accumulate this when emitting all individual functions directly in the AsmPrinter, rather than accumulating additional state in MachineModuleInfo. Move the special case behavior predicate into MachineFrameInfo to share it. This now promotes it to generic behavior. I'm assuming this is fine because no other target implements adjustForSegmentedStacks, or has tests using the split-stack attribute.	2022-04-20 10:54:29 -04:00
Fangrui Song	ac6878b330	[X86] Set frame-setup/frame-destroy on prologue/epilogue CFI instructions This approach is used by AArch64/RISCV to make frame-setup/frame-destroy instructions contiguous instead of being interleaved by CFI instructions. Code checking `MBBI->getFlag(MachineInstr::FrameSetup) \|\| MBBI->isCFIInstruction()` can be simplified to just check FrameSetup. This helps locate all CFI instructions in the prologue, which can be handy to use .cfi_remember_state/.cfi_restore_state to decrease unwind table size (D114545). Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122541	2022-03-31 23:04:50 -07:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00
Nico Weber	a278250b0f	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169	2022-03-10 07:59:22 -05:00
serge-sans-paille	7f230feeea	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169	2022-03-10 10:00:30 +01:00
Bill Wendling	74aa44a887	[X86] Zero out the 32-bit GPRs explicitly This should ensure that only the 32-bit xors are emitted, and not the 64-bit xors. Differential Revision: https://reviews.llvm.org/D119523	2022-02-10 23:09:00 -08:00
Reid Kleckner	f3481f43bb	[X86] Only force FP usage in the presence of pushf/popf on Win64 This ensures that the Windows unwinder will work at every instruction boundary, and allows other targets to read and write flags without setting up a frame pointer. Fixes GH-46875 Differential Revision: https://reviews.llvm.org/D119391	2022-02-09 18:23:16 -08:00
Bill Wendling	d295a53a92	[X86] Specify Undef for the registers we xor Fixes expensive check failures from D110869.	2022-02-09 02:06:12 -08:00
Bill Wendling	deaf22bc0e	[X86] Implement -fzero-call-used-regs option The "-fzero-call-used-regs" option tells the compiler to zero out certain registers before the function returns. It's also available as a function attribute: zero_call_used_regs. The two upper categories are: - "used": Zero out used registers. - "all": Zero out all registers, whether used or not. The individual options are: - "skip": Don't zero out any registers. This is the default. - "used": Zero out all used registers. - "used-arg": Zero out used registers that are used for arguments. - "used-gpr": Zero out used registers that are GPRs. - "used-gpr-arg": Zero out used GPRs that are used as arguments. - "all": Zero out all registers. - "all-arg": Zero out all registers used for arguments. - "all-gpr": Zero out all GPRs. - "all-gpr-arg": Zero out all GPRs used for arguments. This is used to help mitigate Return-Oriented Programming exploits. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D110869	2022-02-08 17:42:54 -08:00
Jim Lin	d6b0734837	[NFC] Use Register instead of unsigned	2022-01-19 20:17:04 +08:00
Alexander Shaposhnikov	22430ede7e	[CodeGen] Rename emitCalleeSavedFrameMoves This diff renames emitCalleeSavedFrameMoves to avoid conflicts with non-virtual methods of derived classes having the same name but different semantics. E.g. the class AArch64FrameLowering used to have (non-virtual) "emitCalleeSavedFrameMoves" but it started to override TargetFrameLowering::emitCalleeSavedFrameMoves after https://github.com/llvm/llvm-project/commit/c3e6555616 though its usage and semantics didn't change. P.S. for x86 there was no conflict because the signature of non-virtual X86FrameLowering::emitCalleeSavedFrameMoves is different Test plan: make check-all Differential revision: https://reviews.llvm.org/D114140	2022-01-10 01:33:04 +00:00

1 2 3 4 5 ...

551 Commits