1736 Commits

Author SHA1 Message Date
Craig Topper
2a21260ea8 [SelectionDAG] Use getVectorElementPointer in DAGCombiner::replaceStoreOfInsertLoad. (#74249)
This ensures we clip the index to be in bounds of the vector we are
inserting into. If the index is out of bounds the results of the insert
element is poison. If we don't clip the index we can write memory that
was not part of the original store.

Fixes #74248 #75557.
2023-12-14 20:25:16 -08:00
paperchalice
9bd32d78a9
[CodeGen] Update DwarfEHPreparePass references in CodeGenPassBuilder.h (#74068)
Forgot to update the counterpart in `CodeGenPassBuilder.h`. Also Rename `dwarfehprepare` -> `dwarf-eh-prepare`.
2023-12-11 09:26:01 +08:00
Simon Pilgrim
22df0886a1
[DAG] Don't split f64 constant stores if the fp imm is legal (#74622)
If the target can generate a specific fp immediate constant, then don't split the store into 2 x i32 stores

Another cleanup step for #74304
2023-12-07 10:33:03 +00:00
Vitaly Buka
7e3aeee3bf
[NFC][asan] Replace AsanInited/ENSURE_ASAN_INITED with TryAsanInitFromRtl (#74172) 2023-12-04 14:56:21 -08:00
Craig Topper
5bc391a7c9
[SelectionDAG] Use getVectorElementPointer in DAGCombiner::replaceStoreOfInsertLoad. (#74249)
This ensures we clip the index to be in bounds of the vector we are
inserting into. If the index is out of bounds the results of the insert
element is poison. If we don't clip the index we can write memory that
was not part of the original store.

Fixes #74248.
2023-12-04 11:11:37 -08:00
Craig Topper
755c28a940
[GISel][Mips] Infer alignment when creating memory operand for G_VASTART. (#74004) 2023-11-30 19:55:23 -08:00
Igor Kirillov
63917e1975
[MachineLICM] Allow hoisting loads from invariant address (#70796)
Sometimes, loads can appear in a loop after the LICM pass is executed
the final time. For example, ExpandMemCmp pass creates loads in a loop,
and one of the operands may be an invariant address.
This patch extends the pre-regalloc stage MachineLICM by allowing to
hoist invariant loads from loops that don't have any stores or calls
and allows load reorderings.
2023-11-16 11:12:10 +00:00
Craig Topper
8d24d3900e
[Mips] In LowerShift*Parts, xor with bits-1 instead of -1. (#71149)
If we start with an i128 shift, the initial shift amount would usually
have zeros in bit 8 and above. xoring the shift amount with -1 will set
those upper bits to 1. If DAGCombiner is able to prove those bits are
now 1, then the shift that uses the xor will be replaced with undef.
Which we don't want.

Reduce the xor constant to VT.bits-1 where VT is half the size of the
larger shift type. This avoids toggling the upper bits. The hardware
shift instruction only uses the lower bits of the shift amount. I assume
the code used NOT because the hardware doesn't use the upper bits, but
that isn't compatible with the LLVM poison semantics.

Fixes #71142.
2023-11-03 10:08:00 -07:00
Tobias Stadler
373c343a77 Reland: [GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND
Reland 3686a0b after fixing an exposed miscompile in #68840

Differential Revision: https://reviews.llvm.org/D159140
2023-11-02 00:18:19 +01:00
Craig Topper
7fde4ffbd3
[Mips][GISel] Fix a couple issues with passing f64 in 32-bit GPRs. (#69131)
MipsIncomingValueHandler::assignCustomValue should return 1 instead of
2. The return value is the number of additional ArgLocs being consumed.
It's assumed that at least 1 is consumed.

Correct the LocVT used for the spill when there are no registers left.
It should be f64 instead of i32. This allows a workaround to be removed
in the SelectionDAG path.
2023-10-25 11:28:22 -07:00
Matthias Braun
e3cf80c5c1
BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:

* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
2023-10-24 20:27:39 -07:00
Jay Foad
7b3bbd83c0 Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.
2023-10-09 12:31:32 +01:00
Jay Foad
2501ae58e3
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
2023-10-09 11:44:41 +01:00
Tobias Stadler
305fbc1b32 Revert "[GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND"
This reverts commit 3686a0b611c65f0d7190345b8e3e73cdca9fa657.
This seems to have broken some sanitizer tests:
https://lab.llvm.org/buildbot/#/builders/184/builds/7721
2023-09-29 03:35:40 +02:00
Tobias Stadler
3686a0b611 [GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND
The legalizer currently generates lots of G_AND artifacts.
For example between boolean uses and defs there is always a G_AND with a mask of 1, but when the target uses ZeroOrOneBooleanContents, this is unnecessary.
Currently these artifacts have to be removed using post-legalize combines.
Omitting these artifacts at their source in the artifact combiner has a few advantages:
- We know that the emitted G_AND is very likely to be useless, so our KnownBits call is likely worth it.
- The G_AND and G_CONSTANT can interrupt e.g. G_UADDE/... sequences generated during legalization of wide adds which makes it harder to detect these sequences in the instruction selector (e.g. useful to prevent unnecessary reloading of AArch64 NZCV register).
- This cleans up a lot of legalizer output and even improves compilation-times.
AArch64 CTMark geomean: `O0` -5.6% size..text; `O0` and `O3` ~-0.9% compilation-time (instruction count).

Since this introduces KnownBits into code-paths used by `O0`, I reduced the default recursion depth.
This doesn't seem to make a difference in CTMark, but should prevent excessive recursive calls in the worst case.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D159140
2023-09-29 02:11:57 +02:00
Mirko Brkusanin
72e3713009 [IRTranslator] Set NUW flag for inbounds gep and load/store offsets
Patch by: Acim Maravic

Differential Revision: https://reviews.llvm.org/D159515
2023-09-22 16:16:28 +02:00
Yingwei Zheng
b423e1f05d
[SDAG][RISCV] Avoid neg instructions when lowering atomic_load_sub with a constant rhs
This patch avoids creating (sub x0, rhs) when lowering atomic_load_sub with a constant rhs.
Comparison with GCC: https://godbolt.org/z/c5zPdP7j4

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158673
2023-09-16 17:09:41 +08:00
Fangrui Song
cfc1a87878 [test] Change llc -march= to -mtriple= & llvm-mc -arch= to -triple=
Similar to 806761a7629df268c8aed49657aeccffa6bca449
2023-09-11 15:11:01 -07:00
Fangrui Song
806761a762 [test] Change llc -march= to -mtriple=
The issue is uncovered by #47698: for IR files without a target triple,
-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple, leaving a target triple which
may not make sense, e.g. riscv64-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without a target
triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead
of rejecting it outrightly.
2023-09-11 14:42:37 -07:00
Mikael Holmen
d1e685df45 [test] Add -verify-coalescing to testcase and fix problems
Apparently the testcase
 coalesce-partial-redundant-reguse-terminator.mir
was broken in a way that -verify-coalescing detected.

Update the testcase so -verify-coalescing doesn't complain and so
that it still exposes the problem originally fixed in 6c062b7641623.

Differential Revision: https://reviews.llvm.org/D158397
2023-08-22 07:20:53 +02:00
Nikita Popov
69bd66b3ce [Tests] Remove some and/or constant expressions in tests (NFC)
In preparation for their removal in D158081.
2023-08-21 12:05:32 +02:00
Craig Topper
c6dee6982f [GlobalISel][Mips] Sync G_UADDE and G_USUBE legalization with LegalizeDAG.
This modifies the G_UADDE legalizaton to a version that looks shorter
on Mips and RISC-V when feeding the equivalent IR to SelectionDAG.
This also removes the boolean select from G_USUBE.

Comments taken from LegalizeDAG and tweaked.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158232
2023-08-17 20:36:55 -07:00
Craig Topper
ebb2e5ebb2 [GlobalISel][Mips] Correct corner case in G_UADDE legalization.
If carryin was 1, and RHS is 0xffffffff we were not giving a carry
out.

In that case Res would be equal to LHS, so Res <u LHS would be false.
But there should be a carry out since carryin+RHS wraps around to 0.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D157943
2023-08-17 15:06:16 -07:00
Fangrui Song
4c89277095 [Mips][MC] AttemptToFoldSymbolOffsetDifference: revert isMicroMips special case
D52985/D57677 added a .gcc_except_table workaround, but the new behavior
doesn't match GNU assembler.
```
void foo();
int bar() {
  foo();
  try { throw 1; }
  catch (int) { return 1; }
  return 0;
}

clang --target=mipsel-linux-gnu -mmicromips -S a.cc
mipsel-linux-gnu-gcc -mmicromips -c a.s -o gnu.o

.uleb128 ($cst_end0)-($cst_begin0)     // bit 0 is not forced to 1
.uleb128 ($func_begin0)-($func_begin0) // bit 0 is not forced to 1
```

I have inspected `.gcc_except_table` output by `mipsel-linux-gnu-gcc -mmicromips -c a.cc`.
The `.uleb128` values are not forced to set the least significant bit.

In addition, D57677's adjustment (even->odd) to CodeGen/Mips/micromips-b-range.ll is wrong.
PC-relative `.long func - .` values will differ from GNU assembler as well.

The original intention of D52985 seems unclear to me. I think whatever
goal it wants to achieve should be moved to an upper layer.

This isMicroMips special case has caused problems to fix MCAssembler::relaxLEB to use evaluateAsAbsolute instead of evaluateKnownAbsolute,
which is needed to proper support R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128.

Differential Revision: https://reviews.llvm.org/D157655
2023-08-16 23:11:59 -07:00
pvanhout
c3cfbbc416 [GlobalISel] Add dead flags to implicit defs in ISel
Checks for implicit defs that are unused within a pattern and mark them as dead.

This is done directly at the TableGen level forr efficiency.
The instructions are directly created with the "dead" operand and no further analysis is needed later.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D157273
2023-08-09 14:20:51 +02:00
Matt Arsenault
4d42e8b5d1 Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.
2023-07-31 20:15:45 -04:00
Vitaly Buka
a496c8be6e Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.

No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D156381
2023-07-26 22:13:32 -07:00
Nikita Popov
e49103b279 [Mips] Fix argument lowering for illegal vector types (PR63608)
The Mips MSA ABI requires that legal vector types are passed in
scalar registers in packed representation. E.g. a type like v16i8
would be passed as two i64 registers.

The implementation attempts to do the same for illegal vectors with
non-power-of-two element counts or non-power-of-two element types.
However, the SDAG argument lowering code doesn't support this, and
it is not easy to extend it to support this (we would have to deal
with situations like passing v7i18 as two i64 values).

This patch instead opts to restrict the special argument lowering
to only vectors with power-of-two elements and round element types.
Everything else is lowered naively, that is by passing each element
in promoted registers.

Fixes https://github.com/llvm/llvm-project/issues/63608.

Differential Revision: https://reviews.llvm.org/D154445
2023-07-24 12:07:09 +02:00
Simon Pilgrim
3ad4f92f83 [DAG] More aggressively (extract_vector_elt (build_vector x, y), c) iff element is zero constant
We currently don't extract vector elements from multi-use build vectors unless TLI.aggressivelyPreferBuildVectorSources accepts them, which seems a little extreme for constant build vectors (especially as under some cases ComputeKnownBits will indirectly extract the data for us).

This is causing a few regressions in some upcoming SimplifyDemandedBits work I'm looking at, all of which just need to know that the element is zero, so I've tweaked the fold to accept zero elements as well, which will typically fold very easily.

Differential Revision: https://reviews.llvm.org/D155582
2023-07-18 17:31:34 +01:00
Brad Smith
7973d51965 [Mips] Set setMaxAtomicSizeInBitsSupported
Set setMaxAtomicSizeInBitsSupported for Mips. Set the value as appropriate for 64-bit MIPS vs 32-bit.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D141189
2023-07-15 17:29:25 -04:00
Yashwant Singh
b7836d8562 [CodeGen]Allow targets to use target specific COPY instructions for live range splitting
Replacing D143754. Right now the LiveRangeSplitting during register allocation uses
TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a
problem as we have both vector and scalar copies. Vector copies perform a copy over
a vector register but only on the lanes(threads) that are active. This is mostly sufficient
however we do run into cases when we have to copy the entire vector register and
not just active lane data. One major place where we need that is live range splitting.

Allowing targets to use their own copy instructions(if defined) will provide a lot of
flexibility and ease to lower these pseudo instructions to correct MIR.

- Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range
 splitting.
- Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline.

Reviewed By: arsenm, cdevadas, kparzysz

Differential Revision: https://reviews.llvm.org/D150388
2023-07-07 22:29:50 +05:30
Luke Lau
742fb8b5c7 [DAGCombine] Fold (store (insert_elt (load p)) x p) -> (store x)
If we have a store of a load with no other uses in between it, it's
considered dead and is removed. So sometimes when legalizing a fixed
length vector store of an insert, we end up producing better code
through scalarization than without.
An example is the follow below:

  %a = load <4 x i64>, ptr %x
  %b = insertelement <4 x i64> %a, i64 %y, i32 2
  store <4 x i64> %b, ptr %x

If this is scalarized, then DAGCombine successfully removes 3 of the 4
stores which are considered dead, and on RISC-V we get:

  sd a1, 16(a0)

However if we make the vector type legal (-mattr=+v), then we lose the
optimisation because we don't scalarize it.

This patch attempts to recover the optimisation for vectors by
identifying patterns where we store a load with a single insert
inbetween, replacing it with a scalar store of the inserted element.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D152276
2023-06-28 22:45:04 +01:00
Matt Arsenault
80e2c26dfd RegisterCoalescer: Fix name of pass
I finally snapped and fixed this inconsistency.
2023-06-21 10:30:43 -04:00
Fangrui Song
49b61ead47 [XRay][test] Make tests less sensitive to .Ltmp/Ltmp label changes 2023-06-18 13:32:40 -07:00
Amaury Séchet
e879fded2a [NFC] Autogenerate several Mips test. 2023-06-14 22:27:15 +00:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
AdityaK
805f51f9fe Remove Android-mips related tests
Split from: https://reviews.llvm.org/D146565, already reviewed there.
2023-03-23 14:06:50 -07:00
Chen Zheng
4f0ed16a46 Reland rGf35a09daebd0a90daa536432e62a2476f708150d and rG63854f91d3ee1056796a5ef27753648396cac6ec
[DAGCombiner] handle more store value forwarding

When lowering calls on target like PPC, some stack loads
will be generated for by value parameters. Node CALLSEQ_START
prevents such loads from being combined.

Suggested by @RolandF, this patch removes the unnecessary
loads for the byval parameter by extending ForwardStoreValueToDirectLoad

Reviewed By: nemanjai, RolandF

Differential Revision: https://reviews.llvm.org/D138899
2023-03-12 21:59:18 -04:00
Nikita Popov
ddccc5ba44 [CodeGen] Always expand division larger than i128
Default MaxDivRemBitWidthSupported to 128, so that divisions larger
than 128 bits are always expanded, without requiring additional
configuration from the target.

Note that this may still emit calls to __udivti3 on 32-bit targets,
which likely don't have an implementation of that builtin. However,
I believe this is sufficient to fix
https://github.com/llvm/llvm-project/issues/60531, because Zig must
already be defining those builtins.

Differential Revision: https://reviews.llvm.org/D144871
2023-03-01 15:33:45 +01:00
Arthur Eubanks
7c6b46e87e Revert "[DAGCombiner] handle more store value forwarding"
This reverts commit f35a09daebd0a90daa536432e62a2476f708150d.

Causes miscompiles, see D138899
2023-02-13 19:07:28 -08:00
Andrew Savonichev
c65b4d64d4 [SelectionDAG] Do not second-guess alignment for alloca
Alignment of an alloca in IR can be lower than the preferred alignment
on purpose, but this override essentially treats the preferred
alignment as the minimum alignment.

The patch changes this behavior to always use the specified
alignment. If alignment is not set explicitly in LLVM IR, it is set to
DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign.

Tests are changed as well: explicit alignment is increased to match
the preferred alignment if it changes output, or omitted when it is
hard to determine the right value (e.g. for pointers, some structs, or
weird types).

Differential Revision: https://reviews.llvm.org/D135462
2023-02-09 18:45:20 +03:00
YunQiang Su
b4b95dee31 MIPS: fix build from IR files, nan2008 and FpAbi
When we use llc or lld to compiler IR files, the features +nan2008 and +fpxx/+fp64 are not used.
Thus wrong format files are produced.

In IR files, the attributes are only set for function while not the whole compile units.
So we extract the attributes from the first function and use it for the whole unit.

isFPXXDefault: for o32, the FPXX should always be the default, no matter about the vendors.
Of course some distributions with FP64 default enabled should be listed explicit.
Let's add them in future if we know about one.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D140270
2023-02-06 20:36:11 -08:00
Chen Zheng
f35a09daeb [DAGCombiner] handle more store value forwarding
When lowering calls on target like PPC, some stack loads
will be generated for by value parameters. Node CALLSEQ_START
prevents such loads from being combined.

Suggested by @RolandF, this patch removes the unnecessary
loads for the byval parameter by extending ForwardStoreValueToDirectLoad

Reviewed By: nemanjai, RolandF

Differential Revision: https://reviews.llvm.org/D138899
2023-02-01 21:06:17 -05:00
Roman Lebedev
cc39c3b17f
[Codegen][LegalizeIntegerTypes] New legalization strategy for scalar shifts: shift through stack
https://reviews.llvm.org/D140493 is going to teach SROA how to promote allocas
that have variably-indexed loads. That does bring up questions of cost model,
since that requires creating wide shifts.

Indeed, our legalization for them is not optimal.
We either split it into parts, or lower it into a libcall.
But if the shift amount is by a multiple of CHAR_BIT,
we can also legalize it throught stack.

The basic idea is very simple:
1. Get a stack slot 2x the width of the shift type
2. store the value we are shifting into one half of the slot
3. pad the other half of the slot. for logical shifts, with zero, for arithmetic shift with signbit
4. index into the slot (starting from the base half into which we spilled, either upwards or downwards)
5. load
6. split loaded integer

This works for both little-endian and big-endian machines:
https://alive2.llvm.org/ce/z/YNVwd5

And better yet, if the original shift amount was not a multiple of CHAR_BIT,
we can just shift by that remainder afterwards: https://alive2.llvm.org/ce/z/pz5G-K

I think, if we are going perform shift->shift-by-parts expansion more than once,
we should instead go through stack, which is what this patch does.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D140638
2023-01-14 19:12:18 +03:00
Nikita Popov
8c45d1ad06 [Mips] Convert some tests to opaque pointers (NFC)
I'm not sure why, but the absence of bitcasts / no-op GEPs causes
the branch delay slot to be used.

Differential Revision: https://reviews.llvm.org/D141593
2023-01-13 10:34:27 +01:00
Nikita Popov
840ecce6be [Mips] Convert some tests to opaque pointers (NFC)
Dropped bitcasts result in dropped COPYs in MIR.
2023-01-12 12:40:04 +01:00
Nikita Popov
4f40923103 [Mips] Regenerate test checks (NFC) 2023-01-12 12:24:20 +01:00
Nikita Popov
60442f0d44 [CodeGen] Convert some tests to opaque pointers (NFC)
These are mostly MIR tests, which I did not handle during previous
conversions.
2023-01-05 13:21:20 +01:00
Fangrui Song
c348abce68 Revert D138179 "MIPS: fix build from IR files, nan2008 and FpAbi"
This reverts commit 9739bb81aed490bfcbcbbac6970da8fb7232fd34.
It causes `.module is not permitted after generating code`
for Linux kernel's `ARCH=mips 32r1_defconfig` clang+GNU as build.
It's confirmed as a defect, but the proper fix needs time to sort out.
2022-12-22 11:48:55 -08:00