llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	b2c5e9b9bf	[ARM] iabs.ll - regenerate test checks	2024-05-14 16:36:39 +01:00
David Green	8fc9e3d577	[DAG] Lower frem of power-2 using div/trunc/mul+sub (#91148 ) If we are lowering a frem and the divisor is known to be an integer power-2, we can use the formula 'frem = x - trunc(x / d) * d'. This avoids the more expensive call to fmod. The results are identical as fmod so long as d is a power-2 (so the mul does not round incorrectly), and the sign of the return is either always positive or not important for zeroes (nsz). Unfortunately Alive2 does not handle this well at the moment. I was using exhaustive checking to test this: (https://gist.github.com/davemgreen/6078015f30d3bacd1e9572f8db5d4b64). I found this in cpythons implementation of float_pow. I currently added it as a DAG combine for frem with power-2 fp constants.	2024-05-10 14:58:48 +01:00
David Green	9e1a49cba7	[AArch64][ARM] Add tests for frem power2 lowering. NFC	2024-05-09 17:15:35 +01:00
Chris Copeland	651bdb96b1	[ARM] Armv8-R does not require fp64 or neon. (#88287 ) This was [addressed for AArch64 here](https://github.com/llvm/llvm-project/pull/79004), but the same applies to ARM. Move the enablement of neon+fp64 to `-mcpu=cortex-r52`, which optionally supports these features.	2024-05-07 11:48:30 +01:00
Eleanor Bonnici	c12bc57e23	Do not use R12 for indirect tail calls with PACBTI (#82661 ) When compiling for thumbv8.1m with +pacbti and making an indirect tail call, the compiler was free to put the function pointer into R12. This is incorrect because R12 is restored to contain authentication code for the caller's return address. This patch excludes R12 from the set of registers the compiler can put the function pointer in. Fixes https://github.com/llvm/llvm-project/issues/75998	2024-04-30 15:29:07 +01:00
Qiu Chaofan	4a8f2f2e1a	[Legalizer] Expand fmaximum and fminimum (#67301 ) According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and propagates NaN. Expand the nodes on targets not supporting the operation, by adding extra check for NaN and using is_fpclass to check zero signs.	2024-04-29 15:09:54 +08:00
Yingwei Zheng	4d28d3f93b	[SDAG] Turn umin into smin if the saturation pattern is broken (#88505 ) As we canonicalizes smin with non-negative operands into umin in the middle-end, the saturation pattern will be broken. This patch reverts the transform in DAGCombine to fix the regression on ARM. Fixes https://github.com/llvm/llvm-project/issues/85706.	2024-04-16 01:28:28 +08:00
Victor Campos	45137766ca	[ARM][Thumb2] Mark BTI-clearing instructions as scheduling region boundaries (#87982 ) Following https://github.com/llvm/llvm-project/pull/68313 this patch extends the idea to M-profile PACBTI. The Machine Scheduler can reorder instructions within a scheduling region depending on the scheduling policy set. If a BTI-clearing instruction happens to partake in one such region, it might be moved around, therefore ending up where it shouldn't. The solution is to mark all BTI-clearing instructions as scheduling region boundaries. This essentially means that they must not be part of any scheduling region, and as consequence never get moved: - PAC - PACBTI - BTI - SG Note that PAC isn't BTI-clearing, but it's replaced by PACBTI late in the compilation pipeline. As far as I know, currently it isn't possible to organically obtain code that's susceptible to the bug: - Instructions that write to SP are region boundaries. PAC seems to always be followed by the pushing of r12 to the stack, so essentially PAC is always by itself in a scheduling region. - CALL_BTI is expanded into a machine instruction bundle. Bundles are unpacked only after the last machine scheduler run. Thus setjmp and BTI can be separated only if someone deliberately run the scheduler once more. - The BTI insertion pass is run late in the pipeline, only after the last machine scheduling has run. So once again it can be reordered only if someone deliberately runs the scheduler again. Nevertheless, one can reasonably argue that we should prevent the bug in spite of the compiler not being able to produce the required conditions for it. If things change, the compiler will be robust against this issue. The tests written for this are contrived: bogus MIR instructions have been added adjacent to the BTI-clearing instructions in order to have them inside non-trivial scheduling regions.	2024-04-15 10:58:30 +01:00
Björn Pettersson	33e6b488be	[SelectionDAG] Fix and improve TargetLowering::SimplifySetCC (#87646 ) The load narrowing part of TargetLowering::SimplifySetCC is updated according to this: 1) The offset calculation (for big endian) did not work properly for non byte-sized types. This is basically solved by an early exit if the memory type isn't byte-sized. But the code is also corrected to use the store size when calculating the offset. 2) To still allow some optimizations for non-byte-sized types the TargetLowering::isPaddedAtMostSignificantBitsWhenStored hook is added. By default it assumes that scalar integer types are padded starting at the most significant bits, if the type needs padding when being stored to memory. 3) Allow optimizing when isPaddedAtMostSignificantBitsWhenStored is true, as that hook makes it possible for TargetLowering to know how the non byte-sized value is aligned in memory. 4) Update the algorithm to always search for a narrowed load with a power-of-2 byte-sized type. In the past the algorithm started with the the width of the original load, and then divided it by two for each iteration. But for a type such as i48 that would just end up trying to narrow the load into a i24 or i12 load, and then we would fail sooner or later due to not finding a newVT that fulfilled newVT.isRound(). With this new approach we can narrow the i48 load into either an i8, i16 or i32 load. By checking if such a load is allowed (e.g. alignment wise) for any "multiple of 8 offset", then we can find more opportunities for the optimization to trigger. So even for a byte-sized type such as i32 we may now end up narrowing the load into loading the 16 bits starting at offset 8 (if that is allowed by the target). The old algorithm did not even consider that case. 5) Also start using getObjectPtrOffset instead of getMemBasePlusOffset when creating the new ptr. This way we get "nsw" on the add.	2024-04-12 16:18:12 +02:00
Bjorn Pettersson	bcf047a4ed	[ARM][PowerPC] Add regression tests for narrowing load in TargetLowering::SimplifySetCC These test cases show some miscomplies for big-endian when dealing with non byte-sized loads. One part of the problem is that LLVM IR isn't really telling where the padding goes for non byte-sized loads/stores. So currently TargetLowering::SimplifySetCC can't assume anything about it. But the implementation also do not consider that the TypeStoreSize could be larger than the TypeSize, resulting in the offset calculation being wrong for big-endian. Pre-commit for https://github.com/llvm/llvm-project/pull/87646	2024-04-12 16:14:39 +02:00
Arthur Eubanks	5d6d8dcd29	[clang][llvm] Remove "implicit-section-name" attribute (#87906 ) D33412/D33413 introduced this to support a clang pragma to set section names for a symbol depending on if it would be placed in bss/data/rodata/text, which may not be known until the backend. However, for text we know that only functions will go there, so just directly set the section in clang instead of going through a completely separate attribute. Autoupgrade the "implicit-section-name" attribute to directly setting the section on a Fuction.	2024-04-11 12:29:29 -07:00
AtariDreams	c5d000b1a8	[Thumb] Resolve FIXME: Use 'mov hi, $src; mov $dst, hi' (#81908 ) Consider the following: ldr r0, [r4] ldr r7, [r0, #4] cmp r7, r3 bhi .LBB0_6 cmp r0, r2 push {r0} pop {r4} bne .LBB0_3 movs r0, r6 pop {r4, r5, r6, r7} pop {r1} bx r1 Here is a snippet of the generated THUMB1 code of the K&R malloc function that clang currently compiles to. push {r0} ends up being popped to pop {r4}. movs r4, r0 would destroy the flags set by cmp right above. The compiler has no alternative in this case, except one: the only alternative is to transfer through a high register. However, it seems like LLVM does not consider that this is a valid approach, even though it is a free clobbering a high register. This patch addresses the FIXME so the compiler can do that when it can in r10 or r11, or r12.	2024-04-05 10:18:22 +01:00
Victor Campos	74373c1bef	Revert "[ARM][Thumb2] Mark BTI-clearing instructions as scheduling region boundaries" (#87699 ) Reverts llvm/llvm-project#79173 The testcase fails in non-asserts builds.	2024-04-04 21:29:21 +01:00
Victor Campos	5ad320abe3	[ARM][Thumb2] Mark BTI-clearing instructions as scheduling region boundaries (#79173 ) Following https://github.com/llvm/llvm-project/pull/68313 this patch extends the idea to M-profile PACBTI. The Machine Scheduler can reorder instructions within a scheduling region depending on the scheduling policy set. If a BTI-clearing instruction happens to partake in one such region, it might be moved around, therefore ending up where it shouldn't. The solution is to mark all BTI-clearing instructions as scheduling region boundaries. This essentially means that they must not be part of any scheduling region, and as consequence never get moved: - PAC - PACBTI - BTI - SG Note that PAC isn't BTI-clearing, but it's replaced by PACBTI late in the compilation pipeline. As far as I know, currently it isn't possible to organically obtain code that's susceptible to the bug: - Instructions that write to SP are region boundaries. PAC seems to always be followed by the pushing of r12 to the stack, so essentially PAC is always by itself in a scheduling region. - CALL_BTI is expanded into a machine instruction bundle. Bundles are unpacked only after the last machine scheduler run. Thus setjmp and BTI can be separated only if someone deliberately run the scheduler once more. - The BTI insertion pass is run late in the pipeline, only after the last machine scheduling has run. So once again it can be reordered only if someone deliberately runs the scheduler again. Nevertheless, one can reasonably argue that we should prevent the bug in spite of the compiler not being able to produce the required conditions for it. If things change, the compiler will be robust against this issue. The tests written for this are contrived: bogus MIR instructions have been added adjacent to the BTI-clearing instructions in order to have them inside non-trivial scheduling regions.	2024-04-04 12:44:32 +01:00
Jonas Paulsson	7564566779	Reapply "Move assertion for AdjustsStack from PEI to MachineVerifier (#85698 )" - The check is now actually done in both PEI and the MachineVerifier. - More .mir tests trivially updated with "adjustsStack: true" as needed.	2024-03-21 20:24:57 -04:00
David Green	686f4599cf	[ARM] Regenerate some check lines. NFC	2024-03-21 13:45:44 +00:00
Jonas Paulsson	9ebd329ad8	Revert "Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698 )" This reverts commit 05bde30585710a51592eee0a6cf6df8184d09c92. Reverting due to verifier complaints with expensive checks on build-bot.	2024-03-20 11:48:30 -04:00
Jonas Paulsson	05bde30585	Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698 ) Have the verifier report a missing AdjustsStack flag rather than waiting until PEI asserts.	2024-03-20 10:29:12 -04:00
Sivan Shani	5e688f0dbd	[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm Re-land 634b0243b8f7acc85af4f16b70e91d86ded4dc83. T1 allow for an optional registers list, the register list must be {d0-d15}. T2 define a mandatory register list, the register list must be {d0-d31}. The requirements for T1/T2 are as follows: T1 T2 Require: v8-M.Main, v8.1-M.Main, secure state secure state 16 D Regs valid valid 32 D Regs UNDEFINED valid No D Regs NOP NOP	2024-03-11 14:27:28 +00:00
Jay Foad	fd3eaf76ba	[GISel] Enforce G_PTR_ADD RHS type matching index size for addr space (#84352 )	2024-03-09 09:07:22 +00:00
David Green	5f058398ab	[ARM] Mark AESD and AESE instructions as commutative. Similar to #83390, this marks AESD and AESE as commutative, as the logic of the instructions starts as a XOR between the two operands.	2024-03-03 16:56:21 +00:00
Fangrui Song	d89b771ef5	[ARM] Add alias tests for ROPI/RWPI https://reviews.llvm.org/D23195 does not test aliases.	2024-03-02 10:33:57 -08:00
Tomas Matheson	03420f570e	Revert "[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116 )" This reverts commit 634b0243b8f7acc85af4f16b70e91d86ded4dc83. Failing EXPENSIVE_CHECKS builds with "undefined physical register".	2024-02-29 09:48:29 +00:00
SivanShani-Arm	634b0243b8	[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116 ) T1 allows for an optional registers list, the register list must be {d0-d15}. T2 defines a mandatory register list, the register list must be {d0-d31}. The requirements for T1/T2 are as follows: T1 T2 Require: v8-M.Main, v8.1-M.Main, secure state secure state 16 D Regs valid valid 32 D Regs UNDEFINED valid No D Regs NOP NOP	2024-02-28 17:02:51 +00:00
ostannard	749384c08e	[ARM] Update IsRestored for LR based on all returns (#82745 ) PR #75527 fixed ARMFrameLowering to set the IsRestored flag for LR based on all of the return instructions in the function, not just one. However, there is also code in ARMLoadStoreOptimizer which changes return instructions, but it set IsRestored based on the one instruction it changed, not the whole function. The fix is to factor out the code added in #75527, and also call it from ARMLoadStoreOptimizer if it made a change to return instructions. Fixes #80287.	2024-02-26 12:23:25 +00:00
Oliver Stannard	8779cf68e8	Pre-commit test showing bug #80287 This test shows the bug where LR is used as a general-purpose register on a code path where it is not spilled to the stack.	2024-02-26 12:21:13 +00:00
Jack Styles	28233408a2	[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. (#77770 ) When using Greedy Register Allocation, there are times where early-clobber values are ignored, and assigned the same register. This is illeagal behaviour for these intructions. To get around this, using Pseudo instructions for early-clobber registers gives them a definition and allows Greedy to assign them to a different register. This then meets the ARM Architecture Reference Manual and matches the defined behaviour. This patch takes the existing RISC-V patch and makes it target independent, then adds support for the ARM Architecture. Doing this will ensure early-clobber restraints are followed when using the ARM Architecture. Making the pass target independent will also open up possibility that support other architectures can be added in the future.	2024-02-26 12:12:31 +00:00
Serge Pavlov	213b0ae497	[GlobalISel][ARM] legalize G_FPENV_RESET for soft-float mode (#81456 )	2024-02-12 17:46:59 +07:00
Serge Pavlov	b0785cd1cb	[GlobalISel][ARM] Support missing case for G_CONSTANT (#80555 ) Global Instruction Selector could not select the code: %0:gprb(s32) = G_CONSTANT i32 -1 In DAG selector the similar code is selected to the instruction MVNi using custom operand `mod_imm_not`. Changing its definition from `PatLeaf` to `ImmLeaf` and providing counterpart for `imm_not_XFORM` make the relevant rule available for GlobalISel too.	2024-02-07 12:53:20 +07:00
Nikita Popov	b31fffbc7f	[ARM] Convert tests to opaque pointers (NFC)	2024-02-05 13:56:59 +01:00
Craig Topper	6590d0fed5	[DAGCombiner][ARM] Teach reduceLoadWidth to handle (and (srl (load), C, ShiftedMask)) (#80342 ) If we have a shifted mask, we may be able to reduce the load width to the width of the non-zero part of the mask and use an offset to the base address to remove the srl. The offset is given by C+trailingzeros(ShiftedMask). Then we add a final shl to restore the trailing zero bits. I've use the ARM test because that's where the existing (and (srl (load))) tests were. The X86 test was modified to keep the H register.	2024-02-04 16:05:51 -08:00
Serge Pavlov	b4eb7a10c0	[GlobalISel][ARM] Legalze set_fpenv and get_fpenv (#79852 ) Implement handling of get/set floating point environment for ARM in Global Instruction Selector. Lowering of these intrinsics to operations on FPSCR was previously inplemented in DAG selector, in GlobalISel it is reused.	2024-02-04 12:30:33 +07:00
Harald van Dijk	52864d9c7b	[ARM] Switch to soft promoting half types. (#80440 ) The traditional promotion is known to generate wrong code. Fixes #73805.	2024-02-02 21:40:40 +00:00
Quentin Dian	112fba974c	[MIRPrinter] Don't print line break when there is no instructions (NFC) (#80147 ) Per #80143, we can remove the extra line break when there is no instruction.	2024-02-01 22:10:52 +08:00
Simon Pilgrim	ea2984287d	[ARM] Add ctpop codegen tests	2024-02-01 11:42:18 +00:00
Quentin Dian	b7738e275d	[MIRPrinter] Don't print space when there is no successor (#80143 ) Extra space causes the checks generated by update_mir_test_checks to be unavailable. ``` # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4 # RUN: llc -mtriple=x86_64-- -o - %s -run-pass=none -verify-machineinstrs -simplify-mir \| FileCheck %s --- name: foo body: \| ; CHECK-LABEL: name: foo ; CHECK: bb.0: ; CHECK-NEXT: successors: ; CHECK-NEXT: {{ $}} ; CHECK-NEXT: {{ $}} ; CHECK-NEXT: bb.1: ; CHECK-NEXT: RET 0, $eax bb.0: successors: bb.1: RET 0, $eax ... ``` The failure log is as follows: ``` llvm/test/CodeGen/MIR/X86/unreachable-block-print.mir:9:16: error: CHECK-NEXT: is on the same line as previous match ; CHECK-NEXT: {{ $}} ^ <stdin>:21:13: note: 'next' match was here successors: ^ <stdin>:21:13: note: previous match ended here successors: ```	2024-01-31 22:35:41 +08:00
Alfie Richards	de75e5079a	[ARM][NEON] Add constraint to vld2 Odd/Even Pseudo instructions. (#79287 ) This ensures the odd/even pseudo instructions are allocated to the same register range. This fixes #71763	2024-01-31 14:08:02 +00:00
Fangrui Song	4cb90ca8f8	[Thumb,ELF] Fix access to dso_preemptable __stack_chk_guard with static relocation model (#78950 ) PR #70014 fixes A32 to use GOT for dso_preemptable `__stack_chk_guard` with static relocation model (e.g. -fPIE/-fPIC LTO compiles with -no-pie linking). This patch fixes such `__stack_chk_guard` access for Thumb1 and Thumb2. Note: `t2LDRLIT_ga_pcrel` is only for ELF. mingw needs `.refptr.__stack_chk_guard` (https://reviews.llvm.org/D92738). Fix #64999	2024-01-22 13:16:31 -08:00
Fangrui Song	3b943c0203	[Thumb,test] Improve __stack_chk_guard test	2024-01-22 00:20:40 -08:00
Simon Pilgrim	d92ce344bf	Revert faecc736e2ac3cd8c77 #74443 [DAG] isSplatValue - node is a splat if all demanded elts have the same whole constant value (#74443 ) Relying on ComputeKnownBits to find a splat is causing miscompilations where a shift of zero is being assumed to give zero, but further simplification leads to a shift of zero by undef, resulting in an unexpected undef value. Fixes #78109	2024-01-17 15:59:33 +00:00
Alfie Richards	60c775769b	[ARM] Add missing earlyclobber to sqrshr and uqrshl instructions. (#77782 ) This avoids possible undefined behavior using the same register for Rm and Rda. Additionally adds a check in MC to produce an error upon parsing this case.	2024-01-16 10:30:16 +00:00
Nick Anderson	f1ec0d12bb	Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#77182 ) Port CodeGenPrepare to new pass manager and dependency BasicBlockSectionsProfileReader Fixes: #75380 Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>	2024-01-09 13:32:59 +07:00
Orlando Cazalet-Hyams	10b03e6662	[RemoveDIs] Handle DPValues in FastISel (#76952 ) The change is fairly mechanical: 1. Factor code from `FastISel::selectIntrinsicCall`, which converts debug intrinsics into debug instructions, into functions (NFC). 2. Call those functions for DPValues attached to instructions too. The test updates look the same as other RemoveDIs changes: re-run the tests with `--try-experimental-debuginfo-iterators`, which checks the output is identical using the new debug info format (if it has been enabled in the cmake configuration). Depends on #76941 (otherwise some modified tests spuriously fail).	2024-01-05 15:11:47 +00:00
Simon Pilgrim	7648371c25	Revert 4d7c5ad58467502fcbc433591edff40d8a4d697d "[NewPM] Update CodeGenPreparePass reference in CodeGenPassBuilder (#77054 )" Revert e0c554ad87d18dcbfcb9b6485d0da800ae1338d1 "Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)" Revert #75380 and #77054 as they were breaking EXPENSIVE_CHECKS buildbots: https://lab.llvm.org/buildbot/#/builders/104	2024-01-05 12:28:10 +00:00
Nick Anderson	e0c554ad87	Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380 ) Port CodeGenPrepare to new pass manager and dependency BasicBlockSectionsProfileReader Fixes: #64560 Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>	2024-01-05 13:47:56 +07:00
Serge Pavlov	2f81788067	[ARM][FPEnv] Lowering of fpmode intrinsics (#74054 ) LLVM intrinsics `get_fpmode`, `set_fpmode` and `reset_fpmode` operate control modes, the bits of FP environment that affect FP operations. On ARM these bits are in FPSCR together with the status bits. The implementation of these intrinsics produces code close to that of functions `fegetmode` and `fesetmode` from GLIBC. Pull request: https://github.com/llvm/llvm-project/pull/74054	2023-12-18 18:57:36 +07:00
ostannard	4888218d03	[ARM] Do not emit unwind tables when saving LR around outlined call (#69611 ) In some cases, the machine outliner needs to preserve LR across an outlined call by pushing it onto the stack. Previously, this also generated unwind table instructions, which is incorrect because EHABI unwind tables cannot represent different stack frames a different points in the function, so the extra unwind info applied to the entire function. The outliner code already avoided generating CFI instructions, but EHABI unwind data is generated later from the actual instructions, so we need to avoid using the FrameSetup and FrameDestroy flags to prevent unwind data being generated.	2023-12-14 14:46:13 +00:00
Shih-Po Hung	b97c5a9554	[VPlan] Add a test for testing unused interleave recipes (#75026 ) - Precommit of tests from #71360. - Replace `undef` pointer operands and add stores to avoid the loads being optmized away.	2023-12-14 21:16:11 +08:00
Simon Pilgrim	b7fc78255e	Revert rG2047ab00eaf0a17e71ce5e8a5b27a8c90f034c3d "[VPlan] Add a test for testing unused interleave recipes (#75026 )" vplan-unused-interleave-group.ll is causing buildbot failures	2023-12-14 10:25:41 +00:00
Shih-Po Hung	2047ab00ea	[VPlan] Add a test for testing unused interleave recipes (#75026 ) - Precommit of tests from #71360. - Replace `undef` pointer operands and add stores to avoid the loads being optmized away.	2023-12-14 17:36:58 +08:00

1 2 3 4 5 ...

4878 Commits