llvm-project

Author	SHA1	Message	Date
Fangrui Song	eaef645a58	[test] Update stack_guard_remat.ll	2024-01-22 14:18:23 -08:00
Philip Reames	8675952583	[RISCV] Add coverage for shuffles splitable using exact VLEN Test coverage for an upcoming transform.	2024-01-22 14:00:51 -08:00
Douglas Yung	cc2c8ab21f	Require asserts for llvm/test/CodeGen/PowerPC/sms-regpress.mir.	2024-01-22 13:51:03 -08:00
Fangrui Song	4cb90ca8f8	[Thumb,ELF] Fix access to dso_preemptable __stack_chk_guard with static relocation model (#78950 ) PR #70014 fixes A32 to use GOT for dso_preemptable `__stack_chk_guard` with static relocation model (e.g. -fPIE/-fPIC LTO compiles with -no-pie linking). This patch fixes such `__stack_chk_guard` access for Thumb1 and Thumb2. Note: `t2LDRLIT_ga_pcrel` is only for ELF. mingw needs `.refptr.__stack_chk_guard` (https://reviews.llvm.org/D92738). Fix #64999	2024-01-22 13:16:31 -08:00
gulfemsavrun	b00aa1c77b	Revert "Reapply [hwasan] Update dbg.assign intrinsics in HWAsan pass … (#79053 ) …#78606" This reverts commit 76160718df7c1f31ff50a4964d749c2b9d83f9cf because it caused an assertion failure in emitDbgValue function in Codegen in Clang Linux toolchain builders for Fuchsia. https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8758181086086431185/+/u/clang/build/stdout	2024-01-22 12:44:46 -08:00
Alexandre Ganea	bb28442c0b	[CodeGen][X86] Fix lowering of tailcalls when `-ms-hotpatch` is used (#77245 ) Previously, tail jump pseudo-opcodes were skipped by the `encodeInstruction()` call inside `X86AsmPrinter::LowerPATCHABLE_OP`. This caused emission of a 2-byte NOP and dropping of the tail jump. With this PR, we change `PATCHABLE_OP` to not wrap the first `MachineInstr` anymore, but inserting itself before, leaving the instruction unaltered. At lowering time in `X86AsmPrinter`, we now "look ahead" for the next non-pseudo `MachineInstr` and lower+encode it, to inspect its size. If the size is below what `PATCHABLE_OP` expects, it inserts NOPs; otherwise it does nothing. That way, now the first `MachineInstr` is always lowered as usual even if `"patchable-function"="prologue-short-redirect"` is used. Fixes https://github.com/llvm/llvm-project/issues/76879, https://github.com/llvm/llvm-project/issues/76958 and https://github.com/llvm/llvm-project/issues/59039	2024-01-22 14:19:08 -05:00
Jeremy Morse	d7fb9eb818	[DebugInfo][RemoveDIs] Handle DPValues in SelectOptimize (#79005 ) When there are debug intrinsics in-between groups of select instructions, select-optimise sinks them into the "end" block. This needs to be replicated for DPValues, the non-instruction variable assignment object. Implement that and add a RUN line to a test that was sensitive to this to ensure it gets tested. (The exact range of instructions being transformed here is a little fiddly, hence I've gone with a helper lambda).	2024-01-22 18:12:24 +00:00
OCHyams	76160718df	Reapply [hwasan] Update dbg.assign intrinsics in HWAsan pass #78606 llvm.dbg.assign intrinsics have 2 {value, expression} pairs; fix hwasan to update the second expression. Fixes #76545	2024-01-22 17:07:44 +00:00
Jeremy Morse	f188f4589c	[DebugInfo] Disable a test runline temporarily This is a follow-up to 8c1b7fba1fb -- GlobalISel currently doesn't handle RemoveDIs mode debug-info, but will (see #75228). Disable this runline until then. (This is a patch-landing ordering problem)	2024-01-22 16:32:32 +00:00
Kerry McLaughlin	d4d81acb52	[AArch64][SME2] Extend SMEABIPass to handle functions with new ZT0 state (#78848 ) updateNewZAFunctions is extended to generate the following on entry to a function with either the "aarch64_pstate_za_new" or "arm_new_zt0" attribute: - Private-ZA interface: commit any active lazy-saves & enable PSTATE.ZA. - "aarch64_pstate_za_new": zero ZA. - "arm_new_zt0": zero ZT0. Additionally, PSTATE.ZA should disabled before returning if the function has a private-ZA interface.	2024-01-22 16:30:43 +00:00
Emma Pilkington	4897b9888f	[AMDGPU] Make a few more tests default COV agnostic (#78926 )	2024-01-22 11:22:57 -05:00
Simon Pilgrim	27eb8d53ae	[X86] printConstant - add ConstantVector handling	2024-01-22 15:59:55 +00:00
Simon Pilgrim	74ab7958bd	[X86] printZeroUpperMove - add support for constant vectors. Allows cases where movss/movsd etc. are loading constant (ConstantDataSequential) sub-vectors, ensuring we pad with the correct number of zero upper elements by making repeated printConstant calls to print zeroes in a matching int/fp format.	2024-01-22 15:40:46 +00:00
Jeremy Morse	8c1b7fba1f	[SelectionDAG][DebugInfo][RemoveDIs] Handle entry value variables in DPValues too (#78726 ) This patch abstracts visitEntryValueDbgValue to deal with the substance of variable locations (Value, Var, Expr, DebugLoc) rather than how they're stored. That allows us to call it from handleDebugValue, which is similarly abstracted. This allows the entry-value behaviour (see the test) to be supported with non-instruction debug-info too!.	2024-01-22 15:39:35 +00:00
Matthew Devereau	6ba62f4f25	[AArch64][SME2] Refine fcvtu/fcvts/scvtf/ucvtf (#77947 ) Rename intrinsics for fcvtu to fcvtzu and fcvts to fcvtzs. Use llvm_anyvector_ty for both multi vector returns and operands, therefore the return and operands can be specified in the intrinsic call, e.g. @llvm.aarch64.sve.scvtf.x4.nxv4f32.nxv4i32	2024-01-22 15:11:49 +00:00
Florian Hahn	ff1cde5ba2	[AArch64] Add vec3 load/store tests with GEPs with const offsets. Extra tests for https://github.com/llvm/llvm-project/pull/78637 https://github.com/llvm/llvm-project/pull/78632	2024-01-22 15:02:41 +00:00
Jeremy Morse	52a8bed426	[DebugInfo][RemoveDIs] Adjust AMDGPU passes to work with DPValues (#78736 ) This patch tweaks two AMDGPU passes to use iterators rather than instruction pointers for expressing an insertion point. This is needed to accurately support DPValues, the non-instruction storage object for debug-info. Two tests were sensitive to this change (variable assignments were being put in the wrong place), and I've added extra run-lines with the "try new debug-info..." flag. These get tested on our public buildbot to ensure they continue to work accurately.	2024-01-22 14:25:08 +00:00
chuongg3	bfef161a80	[AArch64][GlobalISel] Legalize Shifts for Smaller/Larger Vectors (#78750 ) Legalize shl/lshr/ashr for smaller/larger vector widths with legal element sizes Smaller than legal vector types does not work at the moment as it relies on G_ANYEXT to work with smaller than legal vector types	2024-01-22 14:08:26 +00:00
Orlando Cazalet-Hyams	5266c1285b	Revert "[hwasan] Update dbg.assign intrinsics in HWAsan pass" (#78971 ) Reverts llvm/llvm-project#78606 https://lab.llvm.org/buildbot/#/builders/77/builds/33963	2024-01-22 13:30:50 +00:00
Tuan Chuong Goh	ab1b4991cf	[AArch64] Adding tests for shifts	2024-01-22 13:13:56 +00:00
Rin Dobrescu	365aa1574a	[AArch64] Convert UADDV(add(zext, zext)) into UADDLV(concat). (#78301 ) We can convert a UADDV(add(zext(64-bit source), zext(64-bit source))) into UADDLV(concat), where the concat represents the 64-bit zext sources.	2024-01-22 11:59:40 +00:00
Orlando Cazalet-Hyams	a590f2315f	[hwasan] Update dbg.assign intrinsics in HWAsan pass (#78606 ) llvm.dbg.assign intrinsics have 2 {value, expression} pairs; fix hwasan to update the second expression. Fixes #76545	2024-01-22 11:38:00 +00:00
chuongg3	50df08cd43	[GlobalISel][AArch64] Combine Vector Reduction Add Long (#76241 ) ADDLV(ADDLP) => ADDLV Removes unnecessary ADDLP instruction Already exists for SDAG, adding for GlobalISel	2024-01-22 10:05:37 +00:00
Wang Pengcheng	5cd8d53cac	[RISCV] Teach RISCVMergeBaseOffset to handle inline asm (#78945 ) For inline asm with memory operands, we can merge the offset into the second operand of memory constraint operands. Differential Revision: https://reviews.llvm.org/D158062	2024-01-22 17:36:32 +08:00
Pierre van Houtryve	ac296b696c	[AMDGPU] Drop verify from SIMemoryLegalizer tests (#78697 ) SIMemoryLegalizer tests were slow, with most of them taking 4.5 to 5.3s to complete and that's on a fast machine. I also recall seeing them in the slowest tests list on build bots. This removes the verify-machineinstrs option from these tests to speed them up, bringing the slowest test down to +-2s. Verifier still runs in EXPENSIVE_CHECKS builds.	2024-01-22 10:31:37 +01:00
Fangrui Song	3b943c0203	[Thumb,test] Improve __stack_chk_guard test	2024-01-22 00:20:40 -08:00
Ryotaro KASUGA	7556626dcf	[CodeGen][MachinePipeliner] Limit register pressure when scheduling (#74807 ) In software pipelining, when searching for the Initiation Interval (II), `MachinePipeliner` tries to reduce register pressure, but doesn't check how many variables can actually be alive at the same time. As a result, a lot of register spills/fills can be generated after register allocation, which might cause performance degradation. To prevent such cases, this patch adds a check phase that calculates the maximum register pressure of the scheduled loop and reject it if the pressure is too high. This can be enabled this by specifying `pipeliner-register-pressure`. Additionally, an II search range is currently fixed at 10, which is too small to find a schedule when the above algorithm is applied. Therefore this patch also adds a new option `pipeliner-ii-search-range` to specify the length of the range to search. There is one more new option `pipeliner-register-pressure-margin`, which can be used to estimate a register pressure limit less than actual for conservative analysis. Discourse thread: https://discourse.llvm.org/t/considering-register-pressure-when-deciding-initiation-interval-in-machinepipeliner/74725	2024-01-22 17:06:37 +09:00
XinWang10	dd6fec5d4f	[X86][APX]Support lowering for APX promoted AMX-TILE instructions (#78689 ) The enc/dec of promoted AMX-TILE instructions have been supported in https://github.com/llvm/llvm-project/pull/76210. This patch support lowering for promoted AMX-TILE instructions and integrate test to existing tests.	2024-01-22 11:33:23 +08:00
XinWang10	d3cd1ce6ab	[X86] Add lowering tests for promoted CMPCCXADD and update CC representation (#78685 ) https://github.com/llvm/llvm-project/pull/76125 supported the enc/dec for CMPCCXADD instructions, this patch 1. Add lowering test for promoted CMPCCXADD 2. Update the representation of condition code for promoted CMPCCXADD to align with the existing one	2024-01-22 11:32:03 +08:00
Emma Pilkington	bc82cfb38d	[AMDGPU] Add an asm directive to track code_object_version (#76267 ) Named '.amdhsa_code_object_version'. This directive sets the e_ident[ABIVERSION] in the ELF header, and should be used as the assumed COV for the rest of the asm file. This commit also weakens the --amdhsa-code-object-version CL flag. Previously, the CL flag took precedence over the IR flag. Now the IR flag/asm directive take precedence over the CL flag. This is implemented by merging a few COV-checking functions in AMDGPUBaseInfo.h.	2024-01-21 11:54:47 -05:00
Fangrui Song	d0230446d2	[AArch64] Remove non-sensible define nonlazybind test nonlazybind is for declarations, not for definitions. We could test the behavior, but the output would be misleading.	2024-01-20 23:39:07 -08:00
Fangrui Song	f9614b328a	[AArch64] Improve nonlazybind test Prepare for -fno-plt implementation.	2024-01-20 22:16:29 -08:00
Kerry McLaughlin	a8a3711e74	[AArch64][SME2] Preserve ZT0 state around function calls (#78321 ) If a function has ZT0 state and calls a function which does not preserve ZT0, the caller must save and restore ZT0 around the call. If the caller shares ZT0 state and the callee is not shared ZA, we must additionally call SMSTOP/SMSTART ZA around the call. This patch adds new AArch64ISDNodes for spilling & filling ZT0. Where requiresPreservingZT0 is true, ZT0 state will be preserved across a call.	2024-01-20 12:06:00 +00:00
Jay Foad	63d7ca924f	[AMDGPU] Add GFX12 llvm.amdgcn.s.wait.*cnt intrinsics (#78723 )	2024-01-20 11:44:42 +00:00
Craig Topper	9396891271	[RISCV] Don't look for sext in RISCVCodeGenPrepare::visitAnd. We want to know the upper 33 bits of the And Input are zero. SExt only guarantees they are the same. We originally checked for SExt or ZExt when we were using isImpliedByDomCondition because a ZExt may have been changed to SExt before we visited the And. We are no longer using isImpliedByDomCondition so we can only look for zext with the nneg flag. While here, switch to PatternMatch to simplify the code. Fixes #78783	2024-01-19 14:44:47 -08:00
Craig Topper	66cea7143a	[RISCV] Add test case for #78783 . NFC	2024-01-19 14:44:47 -08:00
Arthur Eubanks	86eaf6083b	[X86] Refine X86DAGToDAGISel::isSExtAbsoluteSymbolRef() (#76191 ) We just need to check if the global is large or not. In the kernel code model, globals are in the negative 2GB of the address space, so globals can be a sign extended 32-bit immediate. In other code models, small globals are in the low 2GB of the address space, so sign extending them is equivalent to zero extending them.	2024-01-19 14:11:18 -08:00
Craig Topper	9ae28fb9d3	[RISCV] Prevent RISCVMergeBaseOffsetOpt from calling getVRegDef on a physical register. (#78762 ) Fixes #78679.	2024-01-19 12:15:08 -08:00
Min-Yih Hsu	5330daad41	[RISCV] Add support for Smepmp 1.0 (#78489 ) Smepmp is a supervisor extension that prevents privileged processes from accessing unprivileged program and data. Spec: https://github.com/riscv/riscv-tee/blob/main/Smepmp/Smepmp.pdf	2024-01-19 11:09:35 -08:00
Durgadoss R	43531e7196	[LLVM][NVPTX] Add cp.async.bulk.commit/wait intrinsics (#78698 ) This patch adds NVVM intrinsics and NVPTX codegen for the bulk variants of the async-copy commit/wait instructions. lit tests are added to verify the generated PTX. PTX Doc link: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-commit-group Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-01-19 10:42:33 -08:00
Jay Foad	89226ecbb9	[AMDGPU] Do not widen scalar loads on GFX12 (#78724 ) GFX12 has subword scalar loads so there is no need to do this.	2024-01-19 15:30:07 +00:00
Jay Foad	ed12388082	[AMDGPU] Do not emit `V_DOT2C_F32_F16_e32` on GFX12 (#78709 ) That instruction is not supported on GFX12. Added a testcase which previously crashed without this change. Co-authored-by: pvanhout <pierre.vanhoutryve@amd.com>	2024-01-19 14:36:27 +00:00
Simon Pilgrim	a2a0089ac3	[X86] movsd/movss/movd/movq - add support for constant comments (#78601 ) If we're loading a constant value, print the constant (and the zero upper elements) instead of just the shuffle mask. This did require me to move the shuffle mask handling into addConstantComments as we can't handle this in the MC layer.	2024-01-19 14:21:26 +00:00
Sander de Smalen	340054e561	[AArch64][SME] Remove combination of private-ZA and preserves_za. (#78563 ) The new Clang attributes no longer support the combination of having a private-ZA function that preserves ZA. The use of __arm_preserves("za") means that ZA is shared and preserved. There wasn't that much benefit to the special handling of this, because in practice it only meant that we'd avoid restoring the lazy-save afterwards, but it still needed setting up a lazy-save (with the possibility of using a 0-sized buffer). Perhaps a new attribute will be added in the future to support this case, at which point we can revert back some of the changes removed in this patch. But for now removing this code simplifies things.	2024-01-19 13:48:44 +00:00
Danila Malyutin	9ad7d8f0e4	[Statepoint] Optimize Location structure size (#78600 ) Reduce its size from 24 to 12 bytes. Improves memory consumption when dealing with statepoint-heavy code.	2024-01-19 17:15:36 +04:00
David Spickett	955417ade2	Revert "[llvm][AArch64] Copy all operands when expanding BLR_BTI bundle (#78267 )" This reverts commit 228aecbcf106a50c30b1f8f1915d61850860cbcd. Failing expensive checks: https://lab.llvm.org/buildbot/#/builders/16/builds/59798	2024-01-19 12:06:30 +00:00
Jay Foad	879cbe06ed	[AMDGPU] Fix predicates for BUFFER_ATOMIC_CSUB pattern (#78701 ) Use OtherPredicates to avoid interfering with other uses of SubtargetPredicate for GFX12.	2024-01-19 12:01:31 +00:00
David Spickett	228aecbcf1	[llvm][AArch64] Copy all operands when expanding BLR_BTI bundle (#78267 ) Fixes #77915 Previously I based the operand copying on expandCALL_RVMARKER but did not understand it properly at the time. This lead to me dropping the arguments of the function being branched to. This fixes that by copying all operands from the BLR_BTI to the BL/BLR without skipping anything. I've updated the existing test by adding function arguments.	2024-01-19 11:22:43 +00:00
Mirko Brkušanin	0185c76456	[AMDGPU] Fix test for expensive-checks build (#78687 )	2024-01-19 11:32:02 +01:00
Leon Clark	2759cfa0c3	[AMDGPU] Remove unnecessary add instructions in ctlz.i8 (#77615 ) Add custom lowering for ctlz.i8 to avoid multiple add/sub operations. --------- Co-authored-by: Leon Clark <leoclark@amd.com> Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2024-01-19 10:16:46 +00:00

... 21 22 23 24 25 ...

52796 Commits