llvm-project

Author	SHA1	Message	Date
Yingwei Zheng	38a44bdc93	[CodeGenPrepare] Reverse the canonicalization of isInf/isNanOrInf (#81572 ) In commit `2b582440c1`, we canonicalize the isInf/isNanOrInf idiom into fabs+fcmp for better analysis/codegen (See also the discussion in https://github.com/llvm/llvm-project/pull/76338). This patch reverses the fabs+fcmp to `is.fpclass`. If the `is.fpclass` is not supported by the target, it will be expanded by TLI. Fixes the regression introduced by `2b582440c1` and https://github.com/llvm/llvm-project/pull/80414#issuecomment-1936374206.	2024-03-18 18:27:45 +08:00
Stephen Tozer	d128448efd	Revert "Reapply "[RemoveDIs] Print non-intrinsic debug info in textual IR output (#79281 )"" Reverted due to some test failures on some buildbots. https://lab.llvm.org/buildbot/#/builders/67/builds/14669 This reverts commit aa436493ab7ad4cf323b0189c15c59ac9dc293c7.	2024-02-27 10:17:24 +00:00
Stephen Tozer	aa436493ab	Reapply "[RemoveDIs] Print non-intrinsic debug info in textual IR output (#79281 )" Fixes the prior issue in which the symbol for a cl-arg was unavailable to some binaries. This reverts commit dc06d75ab27b4dcae2940fc386fadd06f70faffe.	2024-02-27 09:59:08 +00:00
Stephen Tozer	dc06d75ab2	Revert "[RemoveDIs] Print non-intrinsic debug info in textual IR output (#79281 )" Reverted due to failures on buildbots, where a new cl flag was placed in the wrong file, resulting in link errors. https://lab.llvm.org/buildbot/#/builders/198/builds/8548 This reverts commit 0b398256b3f72204ad1f7c625efe4990204e898a.	2024-02-26 18:49:18 +00:00
Stephen Tozer	0b398256b3	[RemoveDIs] Print non-intrinsic debug info in textual IR output (#79281 ) This patch adds support for printing the proposed non-instruction debug info ("RemoveDIs") out to textual IR. This patch does not add any bitcode support, parsing support, or documentation. Printing of the new format is controlled by a flag added in this patch, `--write-experimental-debuginfo`, which defaults to false. The new format will be printed iff this flag is true, so whether we use the IR format is completely independent of whether we use non-instruction debug info during LLVM passes (which is controlled by the `--try-experimental-debuginfo-iterators` flag). Even with the flag disabled, some existing tests need to be updated, as this patch causes debug intrinsic declarations to be changed in a round trip, such that they always appear at the end of a module and have no attributes (this has no functional change on the module). The design of this new IR format was proposed previously on Discourse, and any further discussion about the design can still be contributed there: https://discourse.llvm.org/t/rfc-debuginfo-proposed-changes-to-the-textual-ir-representation-for-debug-values/73491	2024-02-26 18:22:05 +00:00
Nikita Popov	2d69827c5c	[Transforms] Convert tests to opaque pointers (NFC)	2024-02-05 11:57:34 +01:00
Nick Anderson	f1ec0d12bb	Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#77182 ) Port CodeGenPrepare to new pass manager and dependency BasicBlockSectionsProfileReader Fixes: #75380 Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>	2024-01-09 13:32:59 +07:00
Simon Pilgrim	7648371c25	Revert 4d7c5ad58467502fcbc433591edff40d8a4d697d "[NewPM] Update CodeGenPreparePass reference in CodeGenPassBuilder (#77054 )" Revert e0c554ad87d18dcbfcb9b6485d0da800ae1338d1 "Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)" Revert #75380 and #77054 as they were breaking EXPENSIVE_CHECKS buildbots: https://lab.llvm.org/buildbot/#/builders/104	2024-01-05 12:28:10 +00:00
Nick Anderson	e0c554ad87	Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380 ) Port CodeGenPrepare to new pass manager and dependency BasicBlockSectionsProfileReader Fixes: #64560 Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>	2024-01-05 13:47:56 +07:00
Jeremy Morse	f85a38e21c	Follow up to d0858bffa11, add missing REQUIRES x86	2023-12-06 17:48:50 +00:00
Jeremy Morse	d0858bffa1	[DebugInfo][RemoveDIs] Maintain DPValues on skipped instrs in CGP (#74602 ) It turns out that CodeGenPrepare will skip over consecutive select instructions as it knows it can optimise them all at the same time. This is unfortunate for the RemoveDIs project to remove intrinsic-based debug-info, because that means debug-info attached to those skipped instructions doesn't get seen by optimizeInst and so updated. Add code to handle debug-info on those skipped instructions manually. This code will also have been slower when it had dbg.values stuffed in between instructions, but with RemoveDIs it'll go faster because the dbg.values won't break up the select sequence.	2023-12-06 17:25:33 +00:00
zhongyunde 00443407	d6f4d5209f	[CGP][AArch64] Rebase the common base offset for better ISel When all the large const offsets masked with the same value from bit-12 to bit-23. Fold add x8, x0, #2031, lsl #12 add x8, x8, #960 ldr x9, [x8, x8] ldr x8, [x8, #2056] into add x8, x0, #2031, lsl #12 ldr x9, [x8, #960] ldr x8, [x8, #3016]	2023-12-05 09:01:41 +08:00
Jeremy Morse	3ef98bcd46	[DebugInfo][RemoveDIs] Support maintaining DPValues in CodeGenPrepare (#73660 ) CodeGenPrepare needs to support the maintenence of DPValues, the non-instruction replacement for dbg.value intrinsics. This means there are a few functions we need to duplicate or replicate the functionality of: * fixupDbgValue for setting users of sunk addr GEPs, * The remains of placeDbgValues needs a DPValue implementation for sinking * Rollback of RAUWs needs to update DPValues * Rollback of instruction removal needs supporting (see github #73350) * A few places where we have to use iterators rather than instructions. There are three places where we have to use the setHeadBit call on iterators to indicate which portion of debug-info records we're about to splice around. This is because CodeGenPrepare, unlike other optimisation passes, is very much concerned with which block an operation occurs in and where in the block instructions are because it's preparing things to be in a format that's good for SelectionDAG. There isn't a large amount of test coverage for debuginfo behaviours in this pass, hence I've added some more.	2023-11-30 15:29:05 +00:00
Nikita Popov	c9832da350	[CGP] Drop nneg flag when moving zext past instruction (#72103 ) Fix the issue by not reusing the zext at all. The code already handles creation of new zexts if more than one is needed. Always use that code-path instead of trying to reuse the old zext in some case. (Alternatively we could also drop poison-generating flags on the old zext, but it seems cleaner to not reuse it at all, especially if it's not always possible anyway.) Fixes https://github.com/llvm/llvm-project/issues/72046.	2023-11-14 09:03:06 +01:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Paul Walker	6383785bad	[SVE][CodeGenPrepare] Sink address calculations that match SVE gather/scatter addressing modes. (#66996 ) SVE supports scalar+vector and scalar+extw(vector) addressing modes. However, the masked gather/scatter intrinsics take a vector of addresses, which means address computations can be hoisted out of loops. The is especially true for things like offsets where the true size of offsets is lost by the time you get to code generation. This is problematic because it forces the code generator to legalise towards `<vscale x 2 x ty>` vectors that will not maximise bandwidth if the main block datatypes is in fact i32 or smaller. This patch sinks GEPs and extends for cases where one of the above addressing modes can be used. NOTE: There are cases where it would be better to split the extend in two with one half hoisted out of a loop and the other within the loop. Whilst true I think this switch of default is still better than before because the extra extends are an improvement over being forced to split a gather/scatter.	2023-10-11 13:20:08 +01:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Alex Richardson	83c4227ab7	Auto-generate test checks for tests affected by D141060 These files had manual CHECK lines which make the diff from D141060 very difficult to review.	2023-10-04 10:51:35 -07:00
Jay Foad	e0919b189b	[CodeGen] Renumber slot indexes before register allocation (#66334 ) RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering all slot indexes with the default instruction distance ensures that this estimate will be as accurate as possible, and will not depend on the history of how instructions have been added to and removed from SlotIndexes's maps. This also means that enabling -early-live-intervals, which runs the SlotIndexes analysis earlier, will not cause large amounts of churn due to different register allocator decisions.	2023-09-19 11:18:12 +01:00
Serguei Katkov	a701b7e368	[CGP] Remove dead PHI nodes before elimination of mostly empty blocks Before elimination of mostly empty block it makes sense to remove dead PHI nodes. It open more opportunity for elimination plus eliminates dead code itself. It appeared that change results in failing many unit tests and some of them I've updated and for another one I disable this optimization. The pattern I observed in the tests is that there is a infinite loop without side effects. As a result after elimination of dead phi node all other related instruction are also removed and tests stops to check what it is expected. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D158503	2023-08-29 04:35:06 +00:00
Harvin Iriawan	db158c7c83	[AArch64] Update generic sched model to A510 Refresh of the generic scheduling model to use A510 instead of A55. Main benefits are to the little core, and introducing SVE scheduling information. Changes tested on various OoO cores, no performance degradation is seen. Differential Revision: https://reviews.llvm.org/D156799	2023-08-21 12:25:15 +01:00
Jordan Rupprecht	f5b5a30858	Revert "[CodeGenPrepare][NFC] Update the dominator tree instead of rebuilding it" This reverts commit 0b1d1cdb89322c277baf5221218a830195fef9d4. It causes a clang crash. Details will be posted to D153638.	2023-08-01 23:08:55 -07:00
Momchil Velikov	0b1d1cdb89	[CodeGenPrepare][NFC] Update the dominator tree instead of rebuilding it Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D153638	2023-08-01 18:07:03 +01:00
Matthias Braun	02ba5b8c6b	Ignore load/store until stack address computation No longer conservatively assume a load/store accesses the stack when we can prove that we did not compute any stack-relative address up to this point in the program. We do this in a cheap not-quite-a-dataflow-analysis: Assume `NoStackAddressUsed` when all predecessors of a block already guarantee it. Process blocks in reverse post order to guarantee that except for loop headers we have processed all predecessors of a block before processing the block itself. For loops we accept the conservative answer as they are unlikely to be shrink-wrappable anyway. Differential Revision: https://reviews.llvm.org/D152213	2023-06-26 13:50:36 -07:00
Nikita Popov	b7bd3a734c	[CGP] Fix infinite loop in icmp operand swapping Don't swap the operands if they're the same. Fixes the issue reported at https://reviews.llvm.org/D152541#4427017.	2023-06-16 15:50:12 +02:00
Serguei Katkov	d119c386cd	[CGP] Additional tests for removing operand of assume. NFC.	2023-06-16 11:52:46 +07:00
Serguei Katkov	d57ed844fe	[CGP] Add test to show the missed case in remove llvm.assume	2023-06-07 17:20:57 +07:00
Joshua Cranmer	3ac1cef866	[CodeGen] Fix crash in CodeGenPrepare::optimizeGatherScatterInst. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D151141	2023-05-23 15:02:03 -04:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Jake Egan	08533f8b86	Revert "[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation" These commits are causing a test-suite build failure on AIX. Revert for now for time to investigate. https://lab.llvm.org/buildbot/#/builders/214/builds/5779/steps/9/logs/stdio This reverts commit bd87a2449da0c82e63cebdf9c131c54a5472e3a7 and 4c72266830ffa332ebb7cf1d3bbd6c56d001fa0f.	2023-02-14 15:20:06 -05:00
Alex Richardson	bd87a2449d	[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation This function was added for ARM targets, but aligning global/stack pointer arguments passed to memcpy/memmove/memset can improve code size and performance for all targets that don't have fast unaligned accesses. This adds a generic implementation that adjusts the alignment to pointer size if unaligned accesses are slow. Review D134168 suggests that this significantly improves performance on synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134282	2023-02-09 10:11:40 +00:00
Fangrui Song	ca65969c3e	[AArch64] Unconditionally use DW_EH_PE_indirect\|DW_EH_PE_pcrel personality/lsda/ttype encodings For -fno-pic, without DW_EH_PE_indirect, the personality routine pointer in a CIE needs an R_AARCH64_ABS64 relocation. In common configurations that `__gcc_personality_v0` is defined in a shared object, this will lead to a discouraged canonical PLT entry, or, if `ld.lld -z notext` (betwen D122459 and D143136), a dynamic R_AARCH64_ABS64 relocation with an incorrect offset: https://github.com/llvm/llvm-project/issues/60392 Since GCC uses DW_EH_PE_indirect for -fno-pic code (the behavior hasn't changed since the initial port in 2012), let's follow suit by simplifying the code. ( For tiny and small code models, we use DW_EH_PE_sdata8 instead of GCC's DW_EH_PE_sdata4. This is a deliberate choice to support personality-.eh_frame offset > 2GiB. This is unneeded for small code model since "Max text segment size < 2GiB" but making `-fno-pic -mcmodel={tiny,small}` different seems unnecessary: the scenarios that uses both -fno-pic and C++ exceptions have been increasingly rare now, so there is little advantage optimizing for the little size saving with code complexity. ) --- Two clang/test/Interpreter tests would fail without 6747fc07d1aa94e22622e278e5a02ba70675ac9b ([ORC] Use JITLink as the default linker for LLJIT on Linux/arm64.) Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D143039	2023-02-05 10:46:43 -08:00
NAKAMURA Takumi	5b2549b0d2	Revert "[AArch64] Unconditionally use DW_EH_PE_indirect\|DW_EH_PE_pcrel personality/lsda/ttype encodings" It causes failurs in clang-interpreter. This reverts commit 565a1fb1334b8cf510af1338cae3f50815a99f90, aka llvmorg-17-init-1048-g565a1fb1334b	2023-02-04 20:49:39 +09:00
Fangrui Song	565a1fb133	[AArch64] Unconditionally use DW_EH_PE_indirect\|DW_EH_PE_pcrel personality/lsda/ttype encodings For -fno-pic, without DW_EH_PE_indirect, the personality routine pointer in a CIE needs an R_AARCH64_ABS64 relocation. In common configurations that `__gcc_personality_v0` is defined in a shared object, this will lead to a discouraged canonical PLT entry, or, if `ld.lld -z notext` (betwen D122459 and D143136), a dynamic R_AARCH64_ABS64 relocation with an incorrect offset: https://github.com/llvm/llvm-project/issues/60392 Since GCC uses DW_EH_PE_indirect for -fno-pic code (the behavior hasn't changed since the initial port in 2012), let's follow suit by simplifying the code. ( For tiny and small code models, we use DW_EH_PE_sdata8 instead of GCC's DW_EH_PE_sdata4. This is a deliberate choice to support personality-.eh_frame offset > 2GiB. This is necessary for small code model since "Max text segment size < 2GiB" but it is unnecessary to make `-fno-pic -mcmodel={tiny,small}` different: The scenarios that uses both -fno-pic and C++ exceptions have been increasingly rare now, so there is little advantage optimizing for the little size saving with code complexity. ) Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D143039	2023-02-03 10:41:04 -08:00
David Green	28de5f99bb	[AArch64] Sink to umull if we know tops bits are zero. This is an extension to the code for sinking splats to multiplies, where if we can detect that the top bits are known-zero we can treat the instruction like a zext. The existing code was also adjusted in the process to make it more precise about only sinking if both operands are zext or sext. Differential Revision: https://reviews.llvm.org/D141275	2023-01-16 10:44:38 +00:00
Paul Walker	eae26b6640	[IRBuilder] Use canonical i64 type for insertelement index used by vector splats. Instcombine prefers this canonical form (see getPreferredVectorIndex), as does IRBuilder when passing the index as an integer so we may as well use the prefered form from creation. NOTE: All test changes are mechanical with nothing else expected beyond a change of index type from i32 to i64. Differential Revision: https://reviews.llvm.org/D140983	2023-01-11 14:08:06 +00:00
Nikita Popov	33a3139f80	[CGP] Avoid branch on poison UB in test (NFC)	2023-01-03 14:52:48 +01:00
Nikita Popov	02b02cd050	[CodeGenPrepare] Avoid branch on undef UB in tests (NFC)	2023-01-03 13:51:00 +01:00
Matt Arsenault	d9e51e7552	CodeGenPrepare: Convert most tests to opaque pointers NVPTX/dont-introduce-addrspacecast.ll required manually removing a check for a bitcast. AArch64/combine-address-mode.ll required rerunning update_test_checks Mips required some manual updates due to a CHECK-NEXT coming after a deleted bitcast. ARM/sink-addrmode.ll needed one small manual fix. Excludes one X86 function which needs more attention.	2022-11-28 09:21:59 -05:00
Matt Arsenault	e446d1d93f	CodeGenPrepare: Don't use anonymous values some in tests These are always an obstacle to test updates, and often break after running opaquify scripts on them.	2022-11-27 10:30:37 -05:00
Matt Arsenault	8824318512	X86: Make test check more precise This is really checking an i8*, not an i8.	2022-11-27 10:17:38 -05:00
Matt Arsenault	ffb20958cd	CodeGenPrepare: Don't use undef base pointers in addressing mode test This broke after the opaquify script.	2022-11-27 10:15:31 -05:00
Alex Richardson	16f9c5577d	[SimplifyLibCalls] Retain attributes added by Builder.CreateMem* This currently does not make much of a difference (only one tests is affected), but it is helpful e.g. for the out-of-tree CHERI target where Builder.CreateMemCpy() can add attributes other than parameter alignment. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D135075	2022-10-04 13:11:34 +00:00
Florian Hahn	b2c195da6d	[CGP] Update failing test missed in 81a11da762577.	2022-09-15 19:35:25 +01:00
Florian Hahn	81a11da762	[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl. This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops using a wide shuffle creating a v64i8 vector, selecting groups of 3 zero elements and an element from the input. This is profitable on AArch64 where such shuffles can be lowered to tbl instructions, but only in loops, because it requires materializing 4 masks, which can be done in the loop preheader. This is the only reason the transform is part of CGP. If there's a better alternative I missed, please let me know. The same goes for the shouldReplaceZExtWithShuffle hook which guards this. I am not sure if this transform will be beneficial on other targets, but it seems like there is no way other convenient way. This improves the generated code for loops like the one below in combination with D96522. int foo(uint8_t p, int N) { unsigned long long sum = 0; for (int i = 0; i < N ; i++, p++) { unsigned int v = p; sum += (v < 127) ? v : 256 - v; } return sum; } https://clang.godbolt.org/z/Wco866MjY Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D120571	2022-09-15 19:18:13 +01:00
Alex Bradbury	c44c1e9d3e	[RISCV] Implement isMaskAndCmp0FoldingBeneficial hook This hook is currently only used by CodeGenPrepare, which will sink and duplicate an 'and' into a block that has an 'icmp 0' user of it if the hook returns true. This hook is less useful for RISC-V than for targets like AArch64 that have a TBZ (test bit and branch if zero instruction), but may still be profitable if Zbs is available and a BEXTI can be selected. Conservatively, we return false even if Zbs is enabled for any masks that fit in the ANDI immediate because it's possible the only use is a branch on the result, and ANDI+BNEZ => BEXTI+BNEZ isn't a profitable transformation. Differential Revision: https://reviews.llvm.org/D131492	2022-09-13 18:54:00 +01:00
Xiang1 Zhang	16743c9534	[CodeGen] Limit building time in CodeGenPrepare for huge function Details: Currently CodeGenPrepare is very time consuming in handling big functions. Old Algorithm : It iterate each BB in function, and go on handle very instructions in BB. Due to some instruction optimizations may affect the BBs' dominate tree. The old logic will re-iterate and try optimize for each BB. Suppose we have a big function with 20000 BBs, If we handled the last BB with fine tuning the dominate tree. We need totally re-iterate and try optimize the 20000 BBs from the beginning. The Complex is near N! And we really encounter somes big tests (> 20000 BBs) that cost more than 30 mins in this pass. (Debug version compiler will cost 2 hours here) What this patch do for huge function ? It mainly changes the iteration way for optimization. 1 We do optimizeBlock for each BB (that is same with old way). And, in the meaning time, If BB is changed/updated in the optimization, it will be put into FreshBBs (try do optimizeBlock again). The new created BB at previous iteration will also put into FreshBBs. 2 For the BBs which not updated at previous iteration, we directly skip it. Strictly speaking, here may miss some opportunity, but the probability is very small. 3 For Instructions in single BB, we do optimizeInst for each instruction. If optimizeInst change the instruction dominator in this BB, rather than break and go back to optimize the first BB (the old way), we directly iterate instructions (to do optimizeInst) in this updated BB again (the new way). What this patch do for small/normal (not huge) function ? It is same with the Old Algorithm. (NFC) Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D129352	2022-09-07 10:05:40 +08:00
Arthur Eubanks	19d4f5e649	[test] Add missing REQUIRES: arm-registered-target	2022-07-20 10:59:07 -07:00

1 2 3 4 5 ...

405 Commits