6194 Commits

Author SHA1 Message Date
Tim Northover
6b98824a58 AArch64: emit fcmp ord %a, zeroinitializer as a single fcmeq.
Most "ord" checks need two real-world compares to implement, but this is the
canonical form of a "!isnan" check, which is equivalent to comparing the input
for equality against itself.
2022-12-07 19:17:30 +00:00
Amara Emerson
53445f5b1c [GlobalISel] Add a new G_INVOKE_REGION_START instruction to fix an EH bug.
We currently have a bug where the legalizer, when dealing with phi operands,
may create instructions in the phi's incoming blocks at points which are effectively
dead due to a possible exception throw.

Say we have:

throwbb:
  EH_LABEL
  x0 = %callarg1
  BL @may_throw_call
  EH_LABEL
  B returnbb

bb:
  %v = phi i1 %true, throwbb, %false....

When legalizing we may need to widen the i1 %true value, and to do that we need
to create new extension instructions in the incoming block. Our insertion point
currently is the MBB::getFirstTerminator() which puts the IP before the unconditional
branch terminator in throwbb. These extensions may never be executed if the call
throws, and therefore we need to emit them before the call (but not too early, since
our new instruction may need values defined within throwbb as well).

throwbb:
  EH_LABEL
  x0 = %callarg1
  BL @may_throw_call
  EH_LABEL
  %true = G_CONSTANT i32 1 ; <<<-- ruh'roh, this never executes if may_throw_call() throws!
  B returnbb

bb:
  %v = phi i32 %true, throwbb, %false....

To fix this, I've added two new instructions. The main idea is that G_INVOKE_REGION_START
is a terminator, which tries to model the fact that in the IR, the original invoke inst
is actually a terminator as well. By using that as the new insertion point, we
make sure to place new instructions on always executing paths.

Unfortunately we still need to make the legalizer use a new insertion point API
that I've added, since the existing `getFirstTerminator()` method does a reverse
walk up the block, and any non-terminator instructions cause it to bail out. To
avoid impacting compile time for all `getFirstTerminator()` uses, I've added a new
method that does a forward walk instead.

Differential Revision: https://reviews.llvm.org/D137905
2022-12-07 10:28:51 -08:00
Anton Sidorenko
f8ed709345 [MachineCombiner] Extend reassociation logic to handle inverse instructions
Machine combiner supports generic reassociation only of associative and
commutative instructions, for example (A + X) + Y => (X + Y) + A. However, we
can extend this generic support to handle patterns like
(X + A) - Y => (X - Y) + A), where `-` is the inverse of `+`.
This patch adds interface functions to process reassociation patterns of
associative/commutative instructions and their inverse variants with minimal
changes in backends.

Differential Revision: https://reviews.llvm.org/D136754
2022-12-07 13:50:28 +03:00
David Sherwood
bfb6f47e9e [SVE] Change some bfloat lane intrinsics to use i32 immediates
Almost all of the other SVE LLVM IR intrinsics take i32 values
for lane indices or other immediates. We should bring the bfloat
intrinsics in line with that. It will also make it easier to
add support for the SVE2.1 float intrinsics in future, since
they reuse the same underlying instruction classes.

I've maintained backwards compatibility with the old i64 variants
and used the autoupgrade mechanism.

Differential Revision: https://reviews.llvm.org/D138788
2022-12-07 09:19:54 +00:00
Sanjay Patel
adc7c589c3 [SDAG] try to convert bit set/clear to signbit test when trunc is free
(X & Pow2MaskC) == 0 --> (trunc X) >= 0
(X & Pow2MaskC) != 0 --> (trunc X) <  0

This was noted as a regression in the post-commit feedback for D112634
(where we canonicalized IR differently).

For x86, this saves a few instruction bytes. AArch64 seems neutral.

Differential Revision: https://reviews.llvm.org/D139363
2022-12-06 11:34:48 -05:00
Sanjay Patel
772c2f461b [AArch64][RISCV][x86] add tests for masked val equality with 0; NFC 2022-12-06 11:34:48 -05:00
Sander de Smalen
5922a04dbd [AArch64][SVE2p1] Make use of REVD instruction.
Reversing double-words within a quard-word is possible using the REVD instruction
when SVE2p1 is enabled.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D139119
2022-12-06 15:42:32 +00:00
chenglin.bi
5cd900ce3c [AArch64] Transform shift+and to shift+shift to select more shifted register
and (shl/srl/sra, x, c), mask --> shl (srl/sra, x, c1), c2

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D138904
2022-12-06 23:29:56 +08:00
bipmis
bda1f0b96c Add tests which can be matched to umull 2022-12-06 12:49:05 +00:00
Archibald Elliott
83b3304dd2 [AArch64] Implement __arm_rsr128/__arm_wsr128
This only contains the SelectionDAG implementation. GlobalISel to
follow.

The broad approach is:
- Introduce new builtins for 128-bit wide instructions.
- Lower these to @llvm.read_register.i128/@llvm.write_register.i128
- Introduce target-specific ISD nodes which have legal operands (two
  i64s rather than an i128). These are named AArch64::{MRRS, MSRR} to
  match the instructions they are for. These are a little complex as
  they need to match the "shape" of what they're replacing or the
  legaliser complains.
- Select these using the existing tryReadRegister/tryWriteRegister to
  share the MDString parsing code, and introduce additional code to
  ensure these are selected into the right MRRS/MSRR instructions. What
  makes this hard is ensuring that the two i64s end up in an XSeqPair
  register pair, because SelectionDAG doesn't care that much about
  register classes if it can avoid doing so.

The main change to existing code is the reorganisation of
tryReadRegister and tryWriteRegister to try to keep the string parsing
code separate from the instruction creating code.

This also includes the changes to clang to define and use the ACLE
feature macro named `__ARM_FEATURE_SYSREG128`.

Contributors:
  Sam Elliott
  Lucas Prates

Differential Revision: https://reviews.llvm.org/D139086
2022-12-06 11:39:05 +00:00
Ties Stuij
94e7e58fa4 [AArch64] implement GPR (U/S)(MIN/MAX) instruction SDag support
Using SelectionDag, lower umin, umax, smin, smax intrinsics to corresponding
UMIN, UMAX, SMIN, SMAX instructions when feat CSSC is available.

See specs for corresponding immediate and register versions in:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D138813
2022-12-06 10:57:49 +00:00
Ties Stuij
eaea4608e6 [AArch64] lower abs intrinsic to new ABS instruction in SelDag
When feature CSSC is available, the SelectionDag abs intrinsic should map to the
new scalar ABS instruction.

Additionally, the SIMDTwoScalarD tablegen defm includes a pattern match for
scalar i64, which we don't want to use when CSSC is enabled.

spec:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/ABS--Absolute-value-

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D138812
2022-12-06 10:48:21 +00:00
Ties Stuij
2f778e60c9 [AArch64] SelectionDag codegen for gpr CTZ instruction
When feature CSSC is available we should use instruction CTZ in SelectionDag
where applicable:

- CTTZ intrinsics are lowered to using the gpr CTZ instruction
- BITREVERSE -> CTLZ instruction pattern gets replaced by CTZ

spec:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CTZ--Count-Trailing-Zeros-

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D138811
2022-12-06 10:42:07 +00:00
Kristina Bessonova
4e958b4d7c [llvm-objdump] Avoid using mapping symbols as branch target labels
The main motivation for this change is to avoid ambiguity because
mapping symbol names may not be unique across a binary and do not allow uniquely
identifying target address. So that mapping symbols used as branch target
labels make llvm-objdump output less readable.

Another point is that mapping symbols sometimes appear in
non-allocatable sections, like debug info sections which make objdump
output even more confusing.

For example, a small AArch64 executable may contain plenty of `$d[.*]`
symbols and none of them would be useful as a label for resolving
a branch or a memory operand target address:

```
  0000000000000254 l       .note.ABI-tag	0000000000000000 $d
  00000000000008d4 l       .eh_frame            0000000000000000 $d
  0000000000000868 l       .rodata              0000000000000000 $d
  0000000000011028 l       .data                0000000000000000 $d
  0000000000010db8 l       .fini_array          0000000000000000 $d
  0000000000010db0 l       .init_array          0000000000000000 $d
  00000000000008e8 l       .eh_frame            0000000000000000 $d
  0000000000011034 l       .bss                 0000000000000000 $d
```

Note that GNU objdump doesn't use mapping symbols as branch target
labels for all targets that support such symbols (ARM, AArch64, CSKY).

Differential Revision: https://reviews.llvm.org/D139131
2022-12-06 12:19:12 +02:00
Leonard Chan
d629db535f Reland "[llvm] Teach FastISel for AArch64 about tagged globals"
This reverts commit aacf17aa0ca8a67efc0ad2d4cfd90e551b5d6a7f.

Fixed by using the right register class for the movk.
2022-12-05 23:27:36 +00:00
Leonard Chan
aacf17aa0c Revert "[llvm] Teach FastISel for AArch64 about tagged globals"
This reverts commit 7358c29a42714eb8d7d7bcdb58688d20430689e4.

This broke an upstream builder:
https://lab.llvm.org/buildbot/#/builders/16/builds/39356
2022-12-05 22:45:04 +00:00
Leonard Chan
7358c29a42 [llvm] Teach FastISel for AArch64 about tagged globals
This addresses https://github.com/llvm/llvm-project/issues/57750. For
some globals, the tag wasn't propagated correctly because the necessary
movk wasn't emitted sometimes.

Differential Revision: https://reviews.llvm.org/D138615
2022-12-05 22:16:55 +00:00
Philip Reames
186c192261 [SDAG] Allow scalable vectors in SimplifyDemanded routines
This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines.

The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.

Differential Revision: https://reviews.llvm.org/D137190
2022-12-05 12:42:16 -08:00
Jonas Paulsson
5ecd363295 Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
This reverts commit 122efef8ee9be57055d204d52c38700fe933c033.

- Patch fixed to not reuse definitions from predecessors in EH landing pads.
- Late review suggestions (by MaskRay) have been addressed.
- M68k/pipeline.ll test updated.
- Init captures added in processBlock() to avoid capturing structured bindings.
- RISCV has this disabled for now.

Original commit message:

A new pass MachineLateInstrsCleanup is added to be run after PEI.

This is a simple pass that removes redundant and identical instructions
whenever found by scanning the MF once while keeping track of register
definitions in a map. These instructions are typically immediate loads
resulting from rematerialization, and address loads emitted by target in
eliminateFrameInde().

This is enabled by default, but a target could easily disable it by means of
'disablePass(&MachineLateInstrsCleanupID);'.

This late cleanup is naturally not "optimal" in removing instructions as it
is done by looking at phys-regs, but still quite effective. It would be
desirable to improve other parts of CodeGen and avoid these redundant
instructions in the first place, but there are no ideas for this yet.

Differential Revision: https://reviews.llvm.org/D123394

Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
2022-12-05 12:53:50 -06:00
Philip Reames
7969ab85e0 [SDAG] Allow scalable vectors in ComputeKnownBits (try 2)
This was previously reverted due to a hang on a Hexagon bot.  This turned out to be a bug in the Hexagon backend around how splat_vectors are legalized (which they're using for fixed length vectors!).  I adjusted this patch to remove the implicit truncate support.  This hides the hexagon bug for now, and unblocks the rest of the change.

Original commit message:

This is the SelectionDAG equivalent of D136470, and is thus an alternate patch to D128159.

The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.

This patch also includes an implementation for SPLAT_VECTOR as without it, the lane wise reasoning has no base case. The original patch which inspired this (D128159), also included STEP_VECTOR. I plan to do that as a separate patch.

Differential Revision: https://reviews.llvm.org/D137140
2022-12-05 08:52:37 -08:00
bipmis
081b7f6b03 [AAch64] Optimize muls with operands having enough sign bits.
Muls with 64bit operands where each of the operand is having more than 32 sign bits, we can generate a single smull instruction on a 32bit operand.

Differential Revision: https://reviews.llvm.org/D138817
2022-12-05 15:08:31 +00:00
Dmitry Vyukov
dbe8c2c316 Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-12-05 14:40:31 +01:00
Vladislav Dzhidzhoev
f32cafedf0 [GlobalISel][DebugInfo] Propagate debug location for localized constants
After IRTranslator pass, constants are deduplicated and translated into instructions at entry block, having debug locations lost.
Localization of constants may cause emission of extra zero lines in debug_line section, like here https://godbolt.org/z/ecvsxxfKn. In this example, constant gets placed as
a first instruction in entry block, and despite it has no debug location, AsmPrinter emits zero line for it.

If a localized constant has the only user, we can assume that it has the same debug location as its user, since they are placed consequently.

Differential Revision: https://reviews.llvm.org/D128192
2022-12-05 16:38:24 +03:00
Sander de Smalen
4d2f0f723a [AArch64][SME] Avoid going through memory for streaming-compatible splats
Reviewed By: david-arm, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D139111
2022-12-05 13:04:30 +00:00
Tiehu Zhang
7927722a74 [AArch64][SVE2] Add patterns for eor3
Add patterns for:
    eor x, (eor y, z) -> eor3 x, y, z

Reviewed By: dmgreen, sdesmalen
Differential Revision: https://reviews.llvm.org/D138793
2022-12-05 18:16:55 +08:00
Jonas Paulsson
122efef8ee Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions.""
This reverts commit 17db0de330f943833296ae72e26fa988bba39cb3.

Some more bots got broken - need to investigate.
2022-12-05 00:52:00 +01:00
Jonas Paulsson
17db0de330 Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
Init captures added in processBlock() to avoid capturing structured bindings,
which caused the build problems (with clang).

RISCV has this disabled for now until problems relating to post RA pseudo
expansions are resolved.
2022-12-03 14:15:15 -06:00
David Green
16a72a0f87 [AArch64] Enable the select optimize pass for AArch64
This enabled the select optimize patch for ARM Out of order AArch64
cores. It is trying to solve a problem that is difficult for the
compiler to fix. The criteria for when a csel is better or worse than a
branch depends heavily on whether the branch is well predicted and the
amount of ILP in the loop (as well as other criteria like the core in
question and the relative performance of the branch predictor).  The
pass seems to do a decent job though, with the inner loop heuristics
being well implemented and doing a better job than I had expected in
general, even without PGO information.

I've been doing quite a bit of benchmarking. The headline numbers are
these for SPEC2017 on a Neoverse N1:
  500.perlbench_r   -0.12%
  502.gcc_r         0.02%
  505.mcf_r         6.02%
  520.omnetpp_r     0.32%
  523.xalancbmk_r   0.20%
  525.x264_r        0.02%
  531.deepsjeng_r   0.00%
  541.leela_r       -0.09%
  548.exchange2_r   0.00%
  557.xz_r          -0.20%

Running benchmarks with a combination of the llvm-test-suite plus
several versions of SPEC gave between a 0.2% and 0.4% geomean
improvement depending on the core/run. The instruction count went down
by 0.1% too, which is a good sign, but the results can be a little
noisy.  Some issues from other benchmarks I had ran were improved in
rGca78b5601466f8515f5f958ef8e63d787d9d812e. In summary well predicted
branches will see in improvement, badly predicted branches may get
worse, and on average performance seems to be a little better overall.

This patch enables the pass for AArch64 under -O3 for cores that will
benefit for it. i.e. not in-order cores that do not fit into the "Assume
infinite resources that allow to fully exploit the available
instruction-level parallelism" cost model. It uses a subtarget feature
for specifying when the pass will be enabled, which I have enabled under
cpu=generic as the performance increases for out of order cores seems
larger than any decreases for inorder, which were minor.

Differential Revision: https://reviews.llvm.org/D138990
2022-12-03 16:08:58 +00:00
Matt Arsenault
a74c5707be Fix some test files with executable permissions 2022-12-02 17:12:03 -05:00
Matt Arsenault
46584de02c AArch64/GlobalISel: Convert tests to opaque pointers
inttoptr_add.ll had a mangled bitcast constantexpr.
translate-gep.ll: Restored a 0 GEP
2022-12-02 16:19:38 -05:00
Matt Arsenault
9585500fea GlobalISel: Replace bitcast test pointer usage
This won't be meaningful with opaque pointers (I guess we could leave
a ptr to ptr bitcast, or allow same sized address space bitcasts).
2022-12-02 16:19:38 -05:00
Matt Arsenault
8c04c78cfa AArch64/GlobalISel: Regenerate test checks
Try to shrink the diff in the opaque pointer conversion. Had to work
around some update_mir_test_checks bugs. It seems to struggle when the
successor list is empty around the blank line checks it inserts.
2022-12-02 16:19:38 -05:00
Ties Stuij
82a5f1c62b [AArch64] use CNT for ISD::popcnt and ISD::parity if available
These are the two places where we explicitly want to use cnt in
SelectionDAG when feature CSSC is available: ISD::popcnt and ISD::parity

For both, we need to make sure we're emitting optimized code for i32 (and
lower), i64 and i128. The most optimal way is of course using the GPR CNT
instruction. If we don't have CSSC, but we do have neon, we'll use floating
point CNT. If all fails, we'll fall back on the general GPR popcnt and parity
implementations.

spec:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CNT--Count-bits-

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D138808
2022-12-02 11:27:14 +00:00
chenglin.bi
e63f64bd14 [AArch64] Precommit test for D138904; NFC
shift + and -> shift + shift to select more shfited registers.
2022-12-02 10:59:03 +08:00
Jonas Paulsson
8ef4632681 Revert "[CodeGen] Add new pass for late cleanup of redundant definitions."
Temporarily revert and fix buildbot failure.

This reverts commit 6d12599fd4134c1da63198c74a25490d28c733f6.
2022-12-01 13:29:24 -05:00
Jonas Paulsson
6d12599fd4 [CodeGen] Add new pass for late cleanup of redundant definitions.
A new pass MachineLateInstrsCleanup is added to be run after PEI.

This is a simple pass that removes redundant and identical instructions
whenever found by scanning the MF once while keeping track of register
definitions in a map. These instructions are typically immediate loads
resulting from rematerialization, and address loads emitted by target in
eliminateFrameInde().

This is enabled by default, but a target could easily disable it by means of
'disablePass(&MachineLateInstrsCleanupID);'.

This late cleanup is naturally not "optimal" in removing instructions as it
is done by looking at phys-regs, but still quite effective. It would be
desirable to improve other parts of CodeGen and avoid these redundant
instructions in the first place, but there are no ideas for this yet.

Differential Revision: https://reviews.llvm.org/D123394

Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
2022-12-01 13:21:35 -05:00
Sander de Smalen
d32c9e8384 Reland "[AArch64][SME]: Generate streaming-compatible code for ld2-alloca."
Phabricator review for this patch was D138791
2022-12-01 14:48:30 +00:00
Sander de Smalen
ea11f4ff0a Reland "[AArch64][SME]: Add precursory tests for D138791"
This reverts commit 06846596eb1768eea06778a5b6da31145e84e461.
2022-12-01 14:48:30 +00:00
David Sherwood
06846596eb Revert "[AArch64][SME]: Add precursory tests for D138791"
This reverts commit 45adca0f52af346a131163d1cc3e4a08baf7f0f1.
2022-12-01 11:14:01 +00:00
David Sherwood
4a5ccf4e93 Revert "[AArch64][SME]: Generate streaming-compatible code for ld2-alloca."
This reverts commit 279c0a83aa22cd35d4b7c7c52b85d2a86f2528a7.
2022-12-01 10:22:21 +00:00
Freddy Ye
89f36dd8f3 [X86] Add ExpandLargeFpConvert Pass and enable for X86
As stated in
https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528,
this implementation is very similar to ExpandLargeDivRem, which expands
‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions
with a bitwidth above a threshold into auto-generated functions. This is
useful for targets like x86_64 that cannot lower fp convertions with more
than 128 bits. The expanded nodes are referring from the IR generated by
`compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`,
and etc.

Corner cases:
1. For fp16: as there is no related builtins added in compliler-rt. So I
mainly utilized the fp32 <-> fp16 lib calls to implement.
2. For fp80: as this pass is soft fp emulation and no fp80 instructions can
help in this problem. I recommend users to deprecate this usage. For now, the
implementation uses fp128 as the temporary conversion type and inserts
fptrunc/ext at top/end of the function.
3. For bf16: as clang FE currently doesn't support bf16 algorithm operations
(convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for
now.
4. For unsigned FPToI: since both default hardware behaviors and libgcc are
ignoring "returns 0 for negative input" spec. This pass follows this old way
to ignore unsigned FPToI. See this example:
https://gcc.godbolt.org/z/bnv3jqW1M

The end-to-end tests are uploaded at https://reviews.llvm.org/D138261

Reviewed By: LuoYuanke, mgehre-amd

Differential Revision: https://reviews.llvm.org/D137241
2022-12-01 13:47:43 +08:00
Hassnaa Hamdi
2bda5a6287 [AArch64][SME][NFC]: Enable lowering truncate for enhancement.
Enable lowering truncate to enhance the generated code.
2022-12-01 03:54:28 +00:00
Hassnaa Hamdi
279c0a83aa [AArch64][SME]: Generate streaming-compatible code for ld2-alloca.
To generate code compatible to streaming mode:
 - disable lowering interleaved load to avoid generating invalid NEON intrinsics.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D138791
2022-12-01 02:31:01 +00:00
Hassnaa Hamdi
45adca0f52 [AArch64][SME]: Add precursory tests for D138791
Testing files:
 - ld2-alloca.ll
2022-12-01 02:31:01 +00:00
Hassnaa Hamdi
1dee88fac1 [AArch64][SME]: Add streaming-compatible testing files.
Testing files:
 - int-compares.ll
 - int-immediates.ll
 - log-reduce.ll

Reviewed By: david-arm, sdesmalen

Differential Revision: https://reviews.llvm.org/D138717
2022-12-01 01:37:36 +00:00
Hassnaa Hamdi
43eef96833 [AArch64][SME]: Add streaming-compatible testing files.
Testing files:
 - limit-duplane.ll
 - optimize-ptrue.ll
 - ptest.ll

Reviewed By: david-arm, sdesmalen

Differential Revision: https://reviews.llvm.org/D138768
2022-12-01 01:15:44 +00:00
Hassnaa Hamdi
de7222b2dd [AArch64][SME]: Add streaming-compatible testing files.
Testing files:
 - subvector.ll
 - permute-rev.ll
 - permute-zip-uzp-trn.ll
 - vector-shuffle.ll

Reviewed By: david-arm, sdesmalen

Differential Revision: https://reviews.llvm.org/D138683
2022-12-01 00:52:08 +00:00
Marco Elver
b95646fe70 Revert "Use-after-return sanitizer binary metadata"
This reverts commit d3c851d3fc8b69dda70bf5f999c5b39dc314dd73.

Some bots broke:

- https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8796062278266465473/overview
- https://lab.llvm.org/buildbot/#/builders/124/builds/5759/steps/7/logs/stdio
2022-11-30 23:35:50 +01:00
Dmitry Vyukov
d3c851d3fc Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-11-30 14:50:22 +01:00
Philip Reames
fc0efb7e78 [SDAG] Allow scalable vectors in ComputeNumSignBits (try 2)
I had reverted this before the holiday week because a problem was reported with a related change (D137140 - scalable vector known bits in DAG).  I had initially confused the two patches, and then decided to leave this reverted out an abundance of caution.  Now that we're through the holiday week, reapplying.

I also roled in fixes for several post commit review comments that hadn't landed with the original change.

Original commit message

This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines.

The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.

Differential Revision: https://reviews.llvm.org/D137141
2022-11-29 08:25:05 -08:00