llvm-project

Author	SHA1	Message	Date
David Green	637aa61732	[ARM] Fix VBICimm and VORRimm generation under Big endian. (#107813 ) This is a smaller follow on to #105519 that fixes VBICimm and VORRimm too. The logic behind lowering vector immediates under big endian Neon/MVE is to treat them in natural lane ordering (same as little endian), and VECTOR_REG_CAST them to the correct type (as opposed to creating the constants in big endian form and bitcasting them). This makes sure that is done when creating VORRIMM and VBICIMM.	2024-09-13 10:59:57 +01:00
David Green	11eae671b7	[ARM] Add and extend big-endian testing for vorrimm and vbicimm. NFC	2024-09-07 15:36:54 +01:00
Austin	3242e77841	[ARM][Codegen] Fix vector data miscompilation in arm32be (#105519 ) Fix #102418， resolved the issue of generating incorrect vrev during vectorization in big-endian scenarios	2024-09-07 14:09:29 +08:00
Craig Topper	e6e857cdf9	[GISel] Use Function::getFunctionType() instead of getType() in some remarks. (#107651 ) getType() on a Function is always 'ptr'. We should use getFunctionType() so we get the function signature.	2024-09-06 19:59:44 -07:00
anjenner	4af249fe6e	Add usub_cond and usub_sat operations to atomicrmw (#105568 ) These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.	2024-09-06 16:19:20 +01:00
Nikita Popov	a7697c8655	[ARM] Do not assume alignment in vld1xN and vst1xN intrinsics (#106984 ) These intrinsics currently assume natural alignment. Instead, respect the alignment attribute on the intrinsic. Teach InstCombine to improve that alignment. If desired I could also adjust the clang frontend to add alignment annotations equivalent to the previous behavior, but I don't see any indication that such an assumption is correct in the ARM intrinsics docs. Fixes https://github.com/llvm/llvm-project/issues/59081.	2024-09-05 09:26:53 +02:00
Nikita Popov	224112f833	[ARM] Regenerate test checks (NFC)	2024-09-02 14:15:03 +02:00
Oliver Stannard	9cf68679c4	[ARM] Fix failure to register-allocate CMP_SWAP_64 pseudo-inst (#106721 ) This test case was failing to compile with a "ran out of registers during register allocation" error at -O0. This was because CMP_SWAP_64 has 3 operands which must be an even-odd register pair, and two other GPR operands. All of the def operands are also early-clobber, so registers can't be shared between uses and defs. Because the function has an over-aligned alloca it needs frame and base pointers, so r6 and r11 are both reserved. That leaves r0/r1, r2/r3, r4/r5 and r8/r9 as the only valid register pairs, and if the two individual GPR operands happen to get allocated to registers in different pairs then only 2 pairs will be available for the three GPRPair operands. To fix this, I've merged the two GPR operands into a single GPRPair operand. This means that the instruction now has 4 GPRPair operands, which can always be allocated without relying on luck. This does constrain register allocation a bit more, but this pseudo instruction is only used at -O0, so I don't think that's a problem.	2024-09-02 08:54:10 +01:00
Stephen Tozer	3d08ade7bd	[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149 ) This patch is part of a set of patches that add an `-fextend-lifetimes` flag to clang, which extends the lifetimes of local variables and parameters for improved debuggability. In addition to that flag, the patch series adds a pragma to selectively disable `-fextend-lifetimes`, and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes` for this pointers only. All changes and tests in these patches were written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer) has handled review and merging. The extend lifetimes flag is intended to eventually be set on by `-Og`, as discussed in the RFC here: https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850 This patch implements a new intrinsic instruction in LLVM, `llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand and has no effect other than "using" its operand, to ensure that its operand remains live until after the fake use. This patch does not emit fake uses anywhere; the next patch in this sequence causes them to be emitted from the clang frontend, such that for each variable (or this) a fake.use operand is inserted at the end of that variable's scope, using that variable's value. This patch covers everything post-frontend, which is largely just the basic plumbing for a new intrinsic/instruction, along with a few steps to preserve the fake uses through optimizations (such as moving them ahead of a tail call or translating them through SROA). Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>	2024-08-29 17:53:32 +01:00
Kiran	c50d11e6d9	Revert "[ARM] musttail fixes" committed by accident, see #104795 This reverts commit a2088a24dad31ebe44c93751db17307fdbe1f0e2.	2024-08-27 11:17:17 +01:00
Kiran	ad468da038	Revert "Seperate frontend changes, add debug directives, remove redundant stuff from tests" This reverts commit 1a908c6be3317bbbac73e6a6fc52cabefbdebf7d.	2024-08-27 10:46:18 +01:00
Kiran	1a908c6be3	Seperate frontend changes, add debug directives, remove redundant stuff from tests	2024-08-27 10:44:06 +01:00
Kiran	a2088a24da	[ARM] musttail fixes Backend: - Caller and callee arguments no longer have to match, just to take up the same space, as they can be changed before the call - Allowed tail calls if callee and callee both (or neither) use sret, wheras before it would be dissalowed if either used sret - Allowed tail calls if byval args are used - Added debug trace for IsEligibleForTailCallOptimisation Frontend (clang): - Do not generate extra alloca if sret is used with musttail, as the space for the sret is allocated already Change-Id: Ic7f246a7eca43c06874922d642d7dc44bdfc98ec	2024-08-27 10:44:06 +01:00
David Green	9f82f6daa5	[ARM] Add a number of extra vmovimm tests for BE. NFC	2024-08-24 20:20:23 +01:00
David Green	05d17a1c70	[GlobalISel] Bail out early for big-endian (#103310 ) If we continue through the function we can currently hit crashes. We can bail out early and fall back to SDAG. Fixes #103032	2024-08-19 18:50:47 +01:00
Craig Topper	abc1acf8df	[TargetLowering][AMDGPU][ARM][RISCV][X86] Teach SimplifyDemandedBits to combine (srl (sra X, C1), ShAmt) -> sra(X, C1+ShAmt) (#101751 ) If the upper bits of the shr aren't demanded. This helps with cases where the outer srl was originally an sra and was converted to a srl by SimplifyDemandedBits before it had a chance to combine with the inner sra. This can occur when the inner sra was part of a sign_extend_inreg expansion. There are some regressions in ARM and Thumb2.	2024-08-14 08:44:57 -07:00
Pierre van Houtryve	7389545d0d	Reapply "[AMDGPU] Always lower s/udiv64 by constant to MUL" (#101942 ) Reland #100723, fixing the ARM issue at the cost of a small loss of optimization in `test/CodeGen/AMDGPU/fshr.ll` Solves #100383	2024-08-12 09:00:22 +02:00
Peter Rong	74e4694b8c	[LTO] enable `ObjCARCContractPass` only on optimized build (#101114 ) \#92331 tried to make `ObjCARCContractPass` by default, but it caused a regression on O0 builds and was reverted. This patch trys to bring that back by: 1. reverts the [revert](`1579e9ca9c`). 2. `createObjCARCContractPass` only on optimized builds. Tests are updated to refelect the changes. Specifically, all `O0` tests should not include `ObjCARCContractPass` Signed-off-by: Peter Rong <PeterRong@meta.com>	2024-08-09 13:04:25 -07:00
David Green	dad1cb9cf9	[ARM] Regenerate big-endian-vmov.ll. NFC	2024-08-09 15:24:54 +01:00
Simon Pilgrim	13d04fa560	[DAG] Add legalization handling for ABDS/ABDU (#92576 ) (REAPPLIED) Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization. abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs)) Alive2: https://alive2.llvm.org/ce/z/dVdMyv REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization	2024-08-08 11:39:05 +01:00
Sergei Barannikov	34157f694c	[ARM] Fix operand order of tBLXr in a test (NFC) (#102312 ) The $noreg should be a part of `pred` complex operand.	2024-08-08 01:12:45 +03:00
Simon Pilgrim	e4e96b3e26	Revert b1234ddbe2652aa7948242a57107ca7ab12fd2f8. "[DAG] Add legalization handling for ABDS/ABDU (#92576 )" Reverting #92576 while we identify a reported regression	2024-08-07 17:11:25 +01:00
Oliver Stannard	d06303ffc1	[ARM] t2CALL_BTI pseudo-inst clobbers LR (#102117 ) The t2CALL_BTI pseudo-instruction expands to a tBL instruction, so needs the same implicit uses and defs as it.	2024-08-07 10:24:17 +01:00
Simon Pilgrim	b1234ddbe2	[DAG] Add legalization handling for ABDS/ABDU (#92576 ) Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization. abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs)) Alive2: https://alive2.llvm.org/ce/z/dVdMyv	2024-08-06 10:18:06 +01:00
Alexis Engelke	fa92d51f9e	[VP] Merge ExpandVP pass into PreISelIntrinsicLowering (#101652 ) Similar to #97727; avoid an extra pass over the entire IR by performing the lowering as part of the pre-isel-intrinsic-lowering pass.	2024-08-06 09:27:59 +02:00
Martin Storsjö	8dd065d5bc	[ARM] [Windows] Use IMAGE_SYM_CLASS_STATIC for private functions (#101828 ) For functions with private linkage, pick IMAGE_SYM_CLASS_STATIC rather than IMAGE_SYM_CLASS_EXTERNAL; GlobalValue::isInternalLinkage() only checks for InternalLinkage, while GlobalValue::isLocalLinkage() checks for both InternalLinkage and PrivateLinkage. This matches what the AArch64 target does, since commit 3406934e4db4bf95c230db072608ed062c13ad5b. This activates a preexisting fix for the AArch64 target from 1e7f592a890aad860605cf5220530b3744e107ba, for the ARM target as well. When a relocation points at a symbol, one usually can convey an offset to the symbol by encoding it as an immediate in the instruction. However, for the ARM and AArch64 branch instructions, the immediate stored in the instruction is ignored by MS link.exe (and lld-link matches this aspect). (It would be simple to extend lld-link to support it - but such object files would be incompatible with MS link.exe.) This was worked around by 1e7f592a890aad860605cf5220530b3744e107ba by emitting symbols into the object file symbol table, for temporary symbols that otherwise would have been omitted, if they have the class IMAGE_SYM_CLASS_STATIC, in order to avoid needing an offset in the relocated instruction. This change gives the symbols generated from functions with the IR level "private" linkage the right class, to activate that workaround. This fixes https://github.com/llvm/llvm-project/issues/100101, fixing code generation for coroutines for Windows on ARM. After the change in f78688134026686288a8d310b493d9327753a022, coroutines generate a function with private linkage, and calls to this function were previously broken for this target.	2024-08-04 23:20:45 +03:00
Sergei Barannikov	411d31ad69	Partially revert 92e18ffd803365c64910760ba20278f875d93681 (#101673 ) It is likely to cause stage2 build failures: https://lab.llvm.org/buildbot/#/builders/122/builds/389 https://lab.llvm.org/buildbot/#/builders/79/builds/552 I don't have an ARM machine to investigate, so I'm just reverting ARM changes to see if it helps make the bots green again.	2024-08-02 16:38:31 +03:00
Sergei Barannikov	92e18ffd80	[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#99752 ) The main change is adding CTPOP to `RuntimeLibcalls.def` to allow targets to use LibCall action for CTPOP. DAG legalizers are changed accordingly.	2024-08-02 12:29:39 +03:00
Alexis Engelke	b5fc083dc3	[CodeGen] Merge lowerConstantIntrinsics into pre-isel lowering (#97727 ) Currently, the LowerConstantIntrinsics pass does an RPO traversal of every function... only to find that many functions don't have constant intrinsics (is.constant, objectsize). In the CodeGen pipeline, there is already a pre-isel intrinsic lowering pass, which iterates over intrinsic declarations and lowers all users. Call lowerConstantIntrinsics from this pass to avoid the extra iteration over the entire IR and the RPO traversal.	2024-08-01 17:44:32 +02:00
Simon Pilgrim	1b4be6a474	[ARM] Regenerate vselect_imax.ll	2024-07-29 15:53:42 +01:00
John Brawn	f0bd705c9b	[CodeGen] Restore MachineBlockPlacement block ordering (#99351 ) PR #91843 changed the algorithm used to find the next unplaced block so that it iterates through the blocks in BlockFilter instead of iterating through the blocks in the function and checking if they are in the block filter. Unfortunately this sometimes results in a different block ordering being chosen, as the order of blocks in BlockFilter comes from the order in MachineLoopInfo, and in some cases this differs from the order they are in the function. This can also give an end result that has worse performance. Fix this by making collectLoopBlockSet place blocks in its output in the order that they are in the function.	2024-07-24 10:49:50 +01:00
Simon Pilgrim	5bd38a98d7	[DAG] ComputeNumSignBits - subo_carry(x,x,c) -> bitwidth 'allsignbits' (#99935 ) Handle cases where the subo_carry is subtracting the same operand (=zero) - so only the subtraction of the 0/1 carry bit is affecting the result, giving a 0/-1 allsignbits value. Noticed while improving ABDS/ABDU expansion.	2024-07-23 11:49:12 +01:00
Volodymyr Vasylkun	e094abde42	[SelectionDAG] Expand [US]CMP using arithmetic on boolean values instead of selects (#98774 ) The previous expansion of [US]CMP was done using two selects and two compares. It produced decent code, but on many platforms it is better to implement [US]CMP nodes by performing the following operation: ``` [us]cmp(x, y) = (x [us]> y) - (x [us]< y) ``` This patch adds this new expansion, as well as a hook in TargetLowering to allow some targets to still use the select-based approach. AArch64 and SystemZ are currently the only targets to prefer the former approach, but other targets may also start to use it if it provides for better codegen.	2024-07-16 20:56:18 +01:00
Florian Hahn	d0d05aec3b	[Darwin] Fix availability of exp10 for watchOS, tvOS, xROS. (#98542 ) Update availability information added in 1eb7f055d9a. exp10 is available on iOS >= 7.0 and macOS >= 10.9. On all other platforms, it is available on any version. Also drop the x86 check, as the availability only depends on the OS version, not the target platform. PR: https://github.com/llvm/llvm-project/pull/98542	2024-07-11 22:57:34 +01:00
Daniel Kiss	1782810b84	[Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819 ) So far branch protection, sign return address, guarded control stack attributes are only emitted as module flags to indicate the functions need to be generated with those features. The problem is in case of an LTO build the module flags are merged with the `min` rule which means if one of the module is not build with sign return address then the features will be turned off for all functions. Due to the functions take the branch-protection and sign-return-address features from the module flags. The sign-return-address is function level option therefore it is expected functions from files that is compiled with -mbranch-protection=pac-ret to be protected. The inliner might inline functions with different set of flags as it doesn't consider the module flags. This patch adds the attributes to all functions and drops the checking of the module flags for the code generation. Module flag is still used for generating the ELF markers. Also drops the "true"/"false" values from the branch-protection-enforcement, branch-protection-pauth-lr, guarded-control-stack attributes as presence of the attribute means it is on absence means off and no other option. Releand with test fixes.	2024-07-10 11:32:41 +02:00
Daniel Kiss	4b2daeccc7	Revert "[Clang][ARM][AArch64] Alway emit protection attributes for functions." (#98284 ) Reverts llvm/llvm-project#82819	2024-07-10 10:22:38 +02:00
Daniel Kiss	e15d67cfc2	[Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819 ) So far branch protection, sign return address, guarded control stack attributes are only emitted as module flags to indicate the functions need to be generated with those features. The problem is in case of an LTO build the module flags are merged with the `min` rule which means if one of the module is not build with sign return address then the features will be turned off for all functions. Due to the functions take the branch-protection and sign-return-address features from the module flags. The sign-return-address is function level option therefore it is expected functions from files that is compiled with -mbranch-protection=pac-ret to be protected. The inliner might inline functions with different set of flags as it doesn't consider the module flags. This patch adds the attributes to all functions and drops the checking of the module flags for the code generation. Module flag is still used for generating the ELF markers. Also drops the "true"/"false" values from the branch-protection-enforcement, branch-protection-pauth-lr, guarded-control-stack attributes as presence of the attribute means it is on absence means off and no other option.	2024-07-10 10:06:14 +02:00
Manish Kausik H	69192e0193	[LegalizeDAG] Optimize CodeGen for `ISD::CTLZ_ZERO_UNDEF` (#83039 ) Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case. The details of the optimization are outlined in #82075 Fixes #82075 Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>	2024-07-08 14:01:32 +01:00
hstk30-hw	ef465bf8b1	[ARM] Fix arm32be softfp mode miscompilation for neon sdiv (#97883 ) Related issue: https://github.com/llvm/llvm-project/issues/97782	2024-07-08 14:18:38 +08:00
Yingwei Zheng	4997af98a0	[SimplifyCFG] Simplify nested branches (#97067 ) This patch folds the following pattern (I don't know what to call this): ``` bb0: br i1 %cond1, label %bb1, label %bb2 bb1: br i1 %cond2, label %bb3, label %bb4 bb2: br i1 %cond2, label %bb4, label %bb3 bb3: ... bb4: ... ``` into ``` bb0: %cond = xor i1 %cond1, %cond2 br i1 %cond, label %bb4, label %bb3 bb3: ... bb4: ... ``` Alive2: https://alive2.llvm.org/ce/z/5iOJEL Closes https://github.com/llvm/llvm-project/issues/97022. Closes https://github.com/llvm/llvm-project/issues/83417. I found this pattern in some verilator-generated code, which is widely used in RTL simulation. This fold will reduces branches and improves the performance of CPU frontend. To my surprise, this pattern is also common in C/C++ code base. Affected libraries/applications: cmake/cvc5/freetype/git/gromacs/jq/linux/openblas/openmpi/openssl/php/postgres/ruby/sqlite/wireshark/z3/...	2024-07-01 03:35:39 +08:00
Nikita Popov	00ae6bb6c2	[ARM] Regenerate MIR test (NFC)	2024-06-26 15:40:10 +02:00
Serge Pavlov	4c9b71dd91	[GlobalISel][ARM] Legalze set_fpmode and get_fpmode (#96467 ) Implement handling of get/set floating point control modes for ARM in Global Instruction Selector.	2024-06-26 19:41:44 +07:00
Eli Friedman	39a0aa5876	[SelectionDAG] Lower llvm.ldexp.f32 to ldexp() on Windows. (#95301 ) This reduces codesize. As discussed in #92707.	2024-06-25 10:25:48 -07:00
Lucas Duarte Prates	78ff617d3f	[ARM] CMSE security mitigation on function arguments and returned values (#89944 ) The ABI mandates two things related to function calls: - Function arguments must be sign- or zero-extended to the register size by the caller. - Return values must be sign- or zero-extended to the register size by the callee. As consequence, callees can assume that function arguments have been extended and so can callers with regards to return values. Here lies the problem: Nonsecure code might deliberately ignore this mandate with the intent of attempting an exploit. It might try to pass values that lie outside the expected type's value range in order to trigger undefined behaviour, e.g. out of bounds access. With the mitigation implemented, Secure code always performs extension of values passed by Nonsecure code. This addresses the vulnerability described in CVE-2024-0151. Patches by Victor Campos. --------- Co-authored-by: Victor Campos <victor.campos@arm.com>	2024-06-20 10:22:01 +01:00
Farzon Lotfi	7ad12a7c04	[ARM] Add tan intrinsic lowering (#95439 ) - `ARMISelLowering.cpp` - Add f16 type and neon and mve vector support for tan	2024-06-14 10:35:50 -04:00
Pierre van Houtryve	ab0d01a5f0	[MC] Cache MCRegAliasIterator (#93510 ) AMDGPU has a lot of registers, almost 9000. Many of those registers have aliases. For instance, SGPR0 has a ton of aliases due to the presence of register tuples. It's even worse if you query the aliases of a register tuple itself. A large register tuple can have hundreds of aliases because it may include 16 registers, and each of those registers have their own tuples as well. The current implementation of MCRegAliasIterator is not good at this. In some extreme cases it can iterate, 7000 more times than necessary, just giving duplicates over and over again and using a lot of expensive iterators. This patch implements a cache system for MCRegAliasIterator. It does the expensive part only once and then saves it for us so the next iterations on that register's aliases are just a map lookup. Furthermore, the cached data is uniqued (and sorted). Thus, this speeds up code by both speeding up the iterator itself, but also by minimizing the number of loop iterations users of the iterator do.	2024-06-14 11:20:45 +02:00
David Green	706e197540	[CodeGen] Remove target SubRegLiveness flags (#95437 ) This removes the uses of target flags to disable subreg liveness, relying on the `-enable-subreg-liveness` flag instead. The `-enable-subreg-liveness` flag has been changed to take precedence over the subtarget if set, and one use of `Subtarget->enableSubRegLiveness()` has been changed to `MRI->subRegLivenessEnabled()` to make sure the option properly applies.	2024-06-14 08:51:56 +01:00
Nikita Popov	db08b0999d	[ARM][AArch64] Bail out if CandidatesWithoutStackFixups is empty (#95410 ) The following code assumes that RepeatedSequenceLocs is non-empty. Bail out if there are less than 2 candidates left, as no outlining is possible in that case. The same check is already present in all the other places where elements from RepeatedSequenceLocs may be dropped. This fixes the issue reported at: https://github.com/llvm/llvm-project/pull/93965#issuecomment-2151989716	2024-06-14 09:29:21 +02:00
Paul T Robinson	32add2435f	Fix test to have correct requirements (#95106 )	2024-06-11 06:04:09 -07:00
Paul T Robinson	3f88311124	[Driver] Rearrange some Apple version testing (#94514 ) There were four tests in Driver that actually tested bits of Driver and bits of CodeGen, and therefore had target restrictions. Rework those four tests into one Driver test (with no target restrictions) and two target-specific CodeGen tests.	2024-06-11 07:51:21 -04:00

1 2 3 4 5 ...

4955 Commits