llvm-project

Author	SHA1	Message	Date
Craig Topper	2a21260ea8	[SelectionDAG] Use getVectorElementPointer in DAGCombiner::replaceStoreOfInsertLoad. (#74249 ) This ensures we clip the index to be in bounds of the vector we are inserting into. If the index is out of bounds the results of the insert element is poison. If we don't clip the index we can write memory that was not part of the original store. Fixes #74248 #75557.	2023-12-14 20:25:16 -08:00
paperchalice	9bd32d78a9	[CodeGen] Update DwarfEHPreparePass references in `CodeGenPassBuilder.h` (#74068 ) Forgot to update the counterpart in `CodeGenPassBuilder.h`. Also Rename `dwarfehprepare` -> `dwarf-eh-prepare`.	2023-12-11 09:26:01 +08:00
Simon Pilgrim	22df0886a1	[DAG] Don't split f64 constant stores if the fp imm is legal (#74622 ) If the target can generate a specific fp immediate constant, then don't split the store into 2 x i32 stores Another cleanup step for #74304	2023-12-07 10:33:03 +00:00
Vitaly Buka	7e3aeee3bf	[NFC][asan] Replace AsanInited/ENSURE_ASAN_INITED with TryAsanInitFromRtl (#74172 )	2023-12-04 14:56:21 -08:00
Craig Topper	5bc391a7c9	[SelectionDAG] Use getVectorElementPointer in DAGCombiner::replaceStoreOfInsertLoad. (#74249 ) This ensures we clip the index to be in bounds of the vector we are inserting into. If the index is out of bounds the results of the insert element is poison. If we don't clip the index we can write memory that was not part of the original store. Fixes #74248.	2023-12-04 11:11:37 -08:00
Craig Topper	755c28a940	[GISel][Mips] Infer alignment when creating memory operand for G_VASTART. (#74004 )	2023-11-30 19:55:23 -08:00
Igor Kirillov	63917e1975	[MachineLICM] Allow hoisting loads from invariant address (#70796 ) Sometimes, loads can appear in a loop after the LICM pass is executed the final time. For example, ExpandMemCmp pass creates loads in a loop, and one of the operands may be an invariant address. This patch extends the pre-regalloc stage MachineLICM by allowing to hoist invariant loads from loops that don't have any stores or calls and allows load reorderings.	2023-11-16 11:12:10 +00:00
Craig Topper	8d24d3900e	[Mips] In LowerShift*Parts, xor with bits-1 instead of -1. (#71149 ) If we start with an i128 shift, the initial shift amount would usually have zeros in bit 8 and above. xoring the shift amount with -1 will set those upper bits to 1. If DAGCombiner is able to prove those bits are now 1, then the shift that uses the xor will be replaced with undef. Which we don't want. Reduce the xor constant to VT.bits-1 where VT is half the size of the larger shift type. This avoids toggling the upper bits. The hardware shift instruction only uses the lower bits of the shift amount. I assume the code used NOT because the hardware doesn't use the upper bits, but that isn't compatible with the LLVM poison semantics. Fixes #71142.	2023-11-03 10:08:00 -07:00
Tobias Stadler	373c343a77	Reland: [GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND Reland 3686a0b after fixing an exposed miscompile in #68840 Differential Revision: https://reviews.llvm.org/D159140	2023-11-02 00:18:19 +01:00
Craig Topper	7fde4ffbd3	[Mips][GISel] Fix a couple issues with passing f64 in 32-bit GPRs. (#69131 ) MipsIncomingValueHandler::assignCustomValue should return 1 instead of 2. The return value is the number of additional ArgLocs being consumed. It's assumed that at least 1 is consumed. Correct the LocVT used for the spill when there are no registers left. It should be f64 instead of i32. This allows a workaround to be removed in the SelectionDAG path.	2023-10-25 11:28:22 -07:00
Matthias Braun	e3cf80c5c1	BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that: * Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room. * Spread the difference between hottest/coldest block as much as possible to increase precision. * If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.	2023-10-24 20:27:39 -07:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Tobias Stadler	305fbc1b32	Revert "[GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND" This reverts commit 3686a0b611c65f0d7190345b8e3e73cdca9fa657. This seems to have broken some sanitizer tests: https://lab.llvm.org/buildbot/#/builders/184/builds/7721	2023-09-29 03:35:40 +02:00
Tobias Stadler	3686a0b611	[GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND The legalizer currently generates lots of G_AND artifacts. For example between boolean uses and defs there is always a G_AND with a mask of 1, but when the target uses ZeroOrOneBooleanContents, this is unnecessary. Currently these artifacts have to be removed using post-legalize combines. Omitting these artifacts at their source in the artifact combiner has a few advantages: - We know that the emitted G_AND is very likely to be useless, so our KnownBits call is likely worth it. - The G_AND and G_CONSTANT can interrupt e.g. G_UADDE/... sequences generated during legalization of wide adds which makes it harder to detect these sequences in the instruction selector (e.g. useful to prevent unnecessary reloading of AArch64 NZCV register). - This cleans up a lot of legalizer output and even improves compilation-times. AArch64 CTMark geomean: `O0` -5.6% size..text; `O0` and `O3` ~-0.9% compilation-time (instruction count). Since this introduces KnownBits into code-paths used by `O0`, I reduced the default recursion depth. This doesn't seem to make a difference in CTMark, but should prevent excessive recursive calls in the worst case. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D159140	2023-09-29 02:11:57 +02:00
Mirko Brkusanin	72e3713009	[IRTranslator] Set NUW flag for inbounds gep and load/store offsets Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D159515	2023-09-22 16:16:28 +02:00
Yingwei Zheng	b423e1f05d	[SDAG][RISCV] Avoid neg instructions when lowering atomic_load_sub with a constant rhs This patch avoids creating (sub x0, rhs) when lowering atomic_load_sub with a constant rhs. Comparison with GCC: https://godbolt.org/z/c5zPdP7j4 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158673	2023-09-16 17:09:41 +08:00
Fangrui Song	cfc1a87878	[test] Change llc -march= to -mtriple= & llvm-mc -arch= to -triple= Similar to 806761a7629df268c8aed49657aeccffa6bca449	2023-09-11 15:11:01 -07:00
Fangrui Song	806761a762	[test] Change llc -march= to -mtriple= The issue is uncovered by #47698: for IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. riscv64-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly.	2023-09-11 14:42:37 -07:00
Mikael Holmen	d1e685df45	[test] Add -verify-coalescing to testcase and fix problems Apparently the testcase coalesce-partial-redundant-reguse-terminator.mir was broken in a way that -verify-coalescing detected. Update the testcase so -verify-coalescing doesn't complain and so that it still exposes the problem originally fixed in 6c062b7641623. Differential Revision: https://reviews.llvm.org/D158397	2023-08-22 07:20:53 +02:00
Nikita Popov	69bd66b3ce	[Tests] Remove some and/or constant expressions in tests (NFC) In preparation for their removal in D158081.	2023-08-21 12:05:32 +02:00
Craig Topper	c6dee6982f	[GlobalISel][Mips] Sync G_UADDE and G_USUBE legalization with LegalizeDAG. This modifies the G_UADDE legalizaton to a version that looks shorter on Mips and RISC-V when feeding the equivalent IR to SelectionDAG. This also removes the boolean select from G_USUBE. Comments taken from LegalizeDAG and tweaked. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D158232	2023-08-17 20:36:55 -07:00
Craig Topper	ebb2e5ebb2	[GlobalISel][Mips] Correct corner case in G_UADDE legalization. If carryin was 1, and RHS is 0xffffffff we were not giving a carry out. In that case Res would be equal to LHS, so Res <u LHS would be false. But there should be a carry out since carryin+RHS wraps around to 0. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D157943	2023-08-17 15:06:16 -07:00
Fangrui Song	4c89277095	[Mips][MC] AttemptToFoldSymbolOffsetDifference: revert isMicroMips special case D52985/D57677 added a .gcc_except_table workaround, but the new behavior doesn't match GNU assembler. ``` void foo(); int bar() { foo(); try { throw 1; } catch (int) { return 1; } return 0; } clang --target=mipsel-linux-gnu -mmicromips -S a.cc mipsel-linux-gnu-gcc -mmicromips -c a.s -o gnu.o .uleb128 ($cst_end0)-($cst_begin0) // bit 0 is not forced to 1 .uleb128 ($func_begin0)-($func_begin0) // bit 0 is not forced to 1 ``` I have inspected `.gcc_except_table` output by `mipsel-linux-gnu-gcc -mmicromips -c a.cc`. The `.uleb128` values are not forced to set the least significant bit. In addition, D57677's adjustment (even->odd) to CodeGen/Mips/micromips-b-range.ll is wrong. PC-relative `.long func - .` values will differ from GNU assembler as well. The original intention of D52985 seems unclear to me. I think whatever goal it wants to achieve should be moved to an upper layer. This isMicroMips special case has caused problems to fix MCAssembler::relaxLEB to use evaluateAsAbsolute instead of evaluateKnownAbsolute, which is needed to proper support R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128. Differential Revision: https://reviews.llvm.org/D157655	2023-08-16 23:11:59 -07:00
pvanhout	c3cfbbc416	[GlobalISel] Add dead flags to implicit defs in ISel Checks for implicit defs that are unused within a pattern and mark them as dead. This is done directly at the TableGen level forr efficiency. The instructions are directly created with the "dead" operand and no further analysis is needed later. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D157273	2023-08-09 14:20:51 +02:00
Matt Arsenault	4d42e8b5d1	Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd. The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.	2023-07-31 20:15:45 -04:00
Vitaly Buka	a496c8be6e	Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" And dependent commits. Details in D150388. This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b. No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll Reviewed By: alexfh Differential Revision: https://reviews.llvm.org/D156381	2023-07-26 22:13:32 -07:00
Nikita Popov	e49103b279	[Mips] Fix argument lowering for illegal vector types (PR63608) The Mips MSA ABI requires that legal vector types are passed in scalar registers in packed representation. E.g. a type like v16i8 would be passed as two i64 registers. The implementation attempts to do the same for illegal vectors with non-power-of-two element counts or non-power-of-two element types. However, the SDAG argument lowering code doesn't support this, and it is not easy to extend it to support this (we would have to deal with situations like passing v7i18 as two i64 values). This patch instead opts to restrict the special argument lowering to only vectors with power-of-two elements and round element types. Everything else is lowered naively, that is by passing each element in promoted registers. Fixes https://github.com/llvm/llvm-project/issues/63608. Differential Revision: https://reviews.llvm.org/D154445	2023-07-24 12:07:09 +02:00
Simon Pilgrim	3ad4f92f83	[DAG] More aggressively (extract_vector_elt (build_vector x, y), c) iff element is zero constant We currently don't extract vector elements from multi-use build vectors unless TLI.aggressivelyPreferBuildVectorSources accepts them, which seems a little extreme for constant build vectors (especially as under some cases ComputeKnownBits will indirectly extract the data for us). This is causing a few regressions in some upcoming SimplifyDemandedBits work I'm looking at, all of which just need to know that the element is zero, so I've tweaked the fold to accept zero elements as well, which will typically fold very easily. Differential Revision: https://reviews.llvm.org/D155582	2023-07-18 17:31:34 +01:00
Brad Smith	7973d51965	[Mips] Set setMaxAtomicSizeInBitsSupported Set setMaxAtomicSizeInBitsSupported for Mips. Set the value as appropriate for 64-bit MIPS vs 32-bit. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D141189	2023-07-15 17:29:25 -04:00
Yashwant Singh	b7836d8562	[CodeGen]Allow targets to use target specific COPY instructions for live range splitting Replacing D143754. Right now the LiveRangeSplitting during register allocation uses TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a problem as we have both vector and scalar copies. Vector copies perform a copy over a vector register but only on the lanes(threads) that are active. This is mostly sufficient however we do run into cases when we have to copy the entire vector register and not just active lane data. One major place where we need that is live range splitting. Allowing targets to use their own copy instructions(if defined) will provide a lot of flexibility and ease to lower these pseudo instructions to correct MIR. - Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range splitting. - Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline. Reviewed By: arsenm, cdevadas, kparzysz Differential Revision: https://reviews.llvm.org/D150388	2023-07-07 22:29:50 +05:30
Luke Lau	742fb8b5c7	[DAGCombine] Fold (store (insert_elt (load p)) x p) -> (store x) If we have a store of a load with no other uses in between it, it's considered dead and is removed. So sometimes when legalizing a fixed length vector store of an insert, we end up producing better code through scalarization than without. An example is the follow below: %a = load <4 x i64>, ptr %x %b = insertelement <4 x i64> %a, i64 %y, i32 2 store <4 x i64> %b, ptr %x If this is scalarized, then DAGCombine successfully removes 3 of the 4 stores which are considered dead, and on RISC-V we get: sd a1, 16(a0) However if we make the vector type legal (-mattr=+v), then we lose the optimisation because we don't scalarize it. This patch attempts to recover the optimisation for vectors by identifying patterns where we store a load with a single insert inbetween, replacing it with a scalar store of the inserted element. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D152276	2023-06-28 22:45:04 +01:00
Matt Arsenault	80e2c26dfd	RegisterCoalescer: Fix name of pass I finally snapped and fixed this inconsistency.	2023-06-21 10:30:43 -04:00
Fangrui Song	49b61ead47	[XRay][test] Make tests less sensitive to .Ltmp/Ltmp label changes	2023-06-18 13:32:40 -07:00
Amaury Séchet	e879fded2a	[NFC] Autogenerate several Mips test.	2023-06-14 22:27:15 +00:00
Matt Arsenault	eece6ba283	IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic optimizations. This will enable legalization of f16->f32 on targets without f16 support. Implement a somewhat horrible inline expansion for targets without libcall support. This could be better if we could introduce control flow (GlobalISel version not yet implemented). Support for strictfp legalization is less complete but works for the simple cases.	2023-06-06 17:07:18 -04:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
AdityaK	805f51f9fe	Remove Android-mips related tests Split from: https://reviews.llvm.org/D146565, already reviewed there.	2023-03-23 14:06:50 -07:00
Chen Zheng	4f0ed16a46	Reland rGf35a09daebd0a90daa536432e62a2476f708150d and rG63854f91d3ee1056796a5ef27753648396cac6ec [DAGCombiner] handle more store value forwarding When lowering calls on target like PPC, some stack loads will be generated for by value parameters. Node CALLSEQ_START prevents such loads from being combined. Suggested by @RolandF, this patch removes the unnecessary loads for the byval parameter by extending ForwardStoreValueToDirectLoad Reviewed By: nemanjai, RolandF Differential Revision: https://reviews.llvm.org/D138899	2023-03-12 21:59:18 -04:00
Nikita Popov	ddccc5ba44	[CodeGen] Always expand division larger than i128 Default MaxDivRemBitWidthSupported to 128, so that divisions larger than 128 bits are always expanded, without requiring additional configuration from the target. Note that this may still emit calls to __udivti3 on 32-bit targets, which likely don't have an implementation of that builtin. However, I believe this is sufficient to fix https://github.com/llvm/llvm-project/issues/60531, because Zig must already be defining those builtins. Differential Revision: https://reviews.llvm.org/D144871	2023-03-01 15:33:45 +01:00
Arthur Eubanks	7c6b46e87e	Revert "[DAGCombiner] handle more store value forwarding" This reverts commit f35a09daebd0a90daa536432e62a2476f708150d. Causes miscompiles, see D138899	2023-02-13 19:07:28 -08:00
Andrew Savonichev	c65b4d64d4	[SelectionDAG] Do not second-guess alignment for alloca Alignment of an alloca in IR can be lower than the preferred alignment on purpose, but this override essentially treats the preferred alignment as the minimum alignment. The patch changes this behavior to always use the specified alignment. If alignment is not set explicitly in LLVM IR, it is set to DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign. Tests are changed as well: explicit alignment is increased to match the preferred alignment if it changes output, or omitted when it is hard to determine the right value (e.g. for pointers, some structs, or weird types). Differential Revision: https://reviews.llvm.org/D135462	2023-02-09 18:45:20 +03:00
YunQiang Su	b4b95dee31	MIPS: fix build from IR files, nan2008 and FpAbi When we use llc or lld to compiler IR files, the features +nan2008 and +fpxx/+fp64 are not used. Thus wrong format files are produced. In IR files, the attributes are only set for function while not the whole compile units. So we extract the attributes from the first function and use it for the whole unit. isFPXXDefault: for o32, the FPXX should always be the default, no matter about the vendors. Of course some distributions with FP64 default enabled should be listed explicit. Let's add them in future if we know about one. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D140270	2023-02-06 20:36:11 -08:00
Chen Zheng	f35a09daeb	[DAGCombiner] handle more store value forwarding When lowering calls on target like PPC, some stack loads will be generated for by value parameters. Node CALLSEQ_START prevents such loads from being combined. Suggested by @RolandF, this patch removes the unnecessary loads for the byval parameter by extending ForwardStoreValueToDirectLoad Reviewed By: nemanjai, RolandF Differential Revision: https://reviews.llvm.org/D138899	2023-02-01 21:06:17 -05:00
Roman Lebedev	cc39c3b17f	[Codegen][LegalizeIntegerTypes] New legalization strategy for scalar shifts: shift through stack https://reviews.llvm.org/D140493 is going to teach SROA how to promote allocas that have variably-indexed loads. That does bring up questions of cost model, since that requires creating wide shifts. Indeed, our legalization for them is not optimal. We either split it into parts, or lower it into a libcall. But if the shift amount is by a multiple of CHAR_BIT, we can also legalize it throught stack. The basic idea is very simple: 1. Get a stack slot 2x the width of the shift type 2. store the value we are shifting into one half of the slot 3. pad the other half of the slot. for logical shifts, with zero, for arithmetic shift with signbit 4. index into the slot (starting from the base half into which we spilled, either upwards or downwards) 5. load 6. split loaded integer This works for both little-endian and big-endian machines: https://alive2.llvm.org/ce/z/YNVwd5 And better yet, if the original shift amount was not a multiple of CHAR_BIT, we can just shift by that remainder afterwards: https://alive2.llvm.org/ce/z/pz5G-K I think, if we are going perform shift->shift-by-parts expansion more than once, we should instead go through stack, which is what this patch does. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D140638	2023-01-14 19:12:18 +03:00
Nikita Popov	8c45d1ad06	[Mips] Convert some tests to opaque pointers (NFC) I'm not sure why, but the absence of bitcasts / no-op GEPs causes the branch delay slot to be used. Differential Revision: https://reviews.llvm.org/D141593	2023-01-13 10:34:27 +01:00
Nikita Popov	840ecce6be	[Mips] Convert some tests to opaque pointers (NFC) Dropped bitcasts result in dropped COPYs in MIR.	2023-01-12 12:40:04 +01:00
Nikita Popov	4f40923103	[Mips] Regenerate test checks (NFC)	2023-01-12 12:24:20 +01:00
Nikita Popov	60442f0d44	[CodeGen] Convert some tests to opaque pointers (NFC) These are mostly MIR tests, which I did not handle during previous conversions.	2023-01-05 13:21:20 +01:00
Fangrui Song	c348abce68	Revert D138179 "MIPS: fix build from IR files, nan2008 and FpAbi" This reverts commit 9739bb81aed490bfcbcbbac6970da8fb7232fd34. It causes `.module is not permitted after generating code` for Linux kernel's `ARCH=mips 32r1_defconfig` clang+GNU as build. It's confirmed as a defect, but the proper fix needs time to sort out.	2022-12-22 11:48:55 -08:00

1 2 3 4 5 ...

1736 Commits