82080 Commits

Author SHA1 Message Date
Akshat Oke
73b0e8a191
[AMDGPU][NewPM] Port AMDGPUOpenCLEnqueuedBlockLowering to NPM (#122434) 2025-01-13 17:52:30 +05:30
Sander de Smalen
3efe83291f [AArch64] Fix chain for calls from agnostic-ZA functions.
The lowering code was using the wrong chain value, which meant that
the 'smstart' after the call from streaming agnostic-ZA functions ->
non-streaming private-ZA functions was incorrectly removed from the DAG.
2025-01-13 12:06:50 +00:00
Sam Tebbs
795e35a653
Reland "[LoopVectorizer] Add support for partial reductions" with non-phi operand fix. (#121744)
This relands the reverted #120721 with a fix for cases where neither
reduction operand are the reduction phi. Only
63114239cc8d26225a0ef9920baacfc7cc00fc58 and
63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted
PR.

---------

Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
2025-01-13 11:20:35 +00:00
quic_hchandel
171d3edd05
[RISCV] Add Qualcomm uC Xqciint (Interrupts) extension (#122256)
This extension adds eleven instructions to accelerate interrupt
servicing.

The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/latest

This patch adds assembler only support.

---------

Co-authored-by: Harsh Chandel <hchandel@qti.qualcomm.com>
2025-01-13 16:36:05 +05:30
Durgadoss R
7e2eb0f83e
[NVPTX] Add float to tf32 conversion intrinsics (#121507)
This patch adds the missing variants of float to tf32 conversion
intrinsics, with their corresponding lit tests.

PTX Spec link:

https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-01-13 16:17:42 +05:30
Akshat Oke
f431f93a77
[CodeGen][NewPM] Use proper NPM AtomicExpandPass in AMDGPU (#122086)
`PassRegistry.def` already has this entry, but the dummy definition was
being pulled instead.

I couldn't reproduce the build failures that FIXME referenced, maybe the
Dummy pass getting in the way was part of the cause.
2025-01-13 10:38:24 +05:30
Akshat Oke
7bf1cb702b
[AMDGPU][NewPM] Port AMDGPURemoveIncompatibleFunctions to NPM (#122261) 2025-01-13 10:11:40 +05:30
Shilei Tian
f15da5fb78
[AMDGPU] Fix an invalid cast in AMDGPULateCodeGenPrepare::visitLoadInst (#122494)
Fixes: SWDEV-507695
2025-01-12 23:40:25 -05:00
Sameer Sahasrabuddhe
77e6f434ec
[SPIRV] convergence anchor intrinsic does not have a parent token (#122230) 2025-01-13 09:54:57 +05:30
Justin Bogner
0e51b54b7a
[DirectX] Implement the resource.store.rawbuffer intrinsic (#121282)
This introduces `@llvm.dx.resource.store.rawbuffer` and generalizes the
buffer store docs under DirectX/DXILResources.

Fixes #106188
2025-01-12 18:52:20 -07:00
Simon Pilgrim
be6c752e15
[X86] X86FixupVectorConstantsPass - use VPMOVSX/ZX extensions for PS/PD domain moves (#122601)
For targets with free domain moves, or AVX512 support, allow the use of VPMOVSX/ZX extension loads to reduce the load sizes.

I've limited this to extension to i32/i64 types as we're mostly interested in shuffle mask loading here, but we could include i16 types as well just as easily.

Inspired by a regression on #122485
2025-01-12 15:59:05 +00:00
Daniel Paoliello
5ee0a71df9
[aarch64][win] Add support for import call optimization (equivalent to MSVC /d2ImportCallOptimization) (#121516)
This change implements import call optimization for AArch64 Windows
(equivalent to the undocumented MSVC `/d2ImportCallOptimization` flag).

Import call optimization adds additional data to the binary which can be
used by the Windows kernel loader to rewrite indirect calls to imported
functions as direct calls. It uses the same [Dynamic Value Relocation
Table mechanism that was leveraged on x64 to implement
`/d2GuardRetpoline`](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618).

The change to the obj file is to add a new `.impcall` section with the
following layout:
```cpp
  // Per section that contains calls to imported functions:
  //  uint32_t SectionSize: Size in bytes for information in this section.
  //  uint32_t Section Number
  //  Per call to imported function in section:
  //    uint32_t Kind: the kind of imported function.
  //    uint32_t BranchOffset: the offset of the branch instruction in its
  //                            parent section.
  //    uint32_t TargetSymbolId: the symbol id of the called function.
```

NOTE: If the import call optimization feature is enabled, then the
`.impcall` section must be emitted, even if there are no calls to
imported functions.

The implementation is split across a few parts of LLVM:
* During AArch64 instruction selection, the `GlobalValue` for each call
to a global is recorded into the Extra Information for that node.
* During lowering to machine instructions, the called global value for
each call is noted in its containing `MachineFunction`.
* During AArch64 asm printing, if the import call optimization feature
is enabled:
- A (new) `.impcall` directive is emitted for each call to an imported
function.
- The `.impcall` section is emitted with its magic header (but is not
filled in).
* During COFF object writing, the `.impcall` section is filled in based
on each `.impcall` directive that were encountered.

The `.impcall` section can only be filled in when we are writing the
COFF object as it requires the actual section numbers, which are only
assigned at that point (i.e., they don't exist during asm printing).

I had tried to avoid using the Extra Information during instruction
selection and instead implement this either purely during asm printing
or in a `MachineFunctionPass` (as suggested in [on the
forums](https://discourse.llvm.org/t/design-gathering-locations-of-instructions-to-emit-into-a-section/83729/3))
but this was not possible due to how loading and calling an imported
function works on AArch64. Specifically, they are emitted as `ADRP` +
`LDR` (to load the symbol) then a `BR` (to do the call), so at the point
when we have machine instructions, we would have to work backwards
through the instructions to discover what is being called. An initial
prototype did work by inspecting instructions; however, it didn't
correctly handle the case where the same function was called twice in a
row, which caused LLVM to elide the `ADRP` + `LDR` and reuse the
previously loaded address. Worse than that, sometimes for the
double-call case LLVM decided to spill the loaded address to the stack
and then reload it before making the second call. So, instead of trying
to implement logic to discover where the value in a register came from,
I instead recorded the symbol being called at the last place where it
was easy to do: instruction selection.
2025-01-11 21:30:17 -08:00
Kazu Hirata
bfe93aedcc [AMDGPU] Fix a warning
This patch fixes:

  llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:255:18: error: private
  field 'DAG' is not used [-Werror,-Wunused-private-field]
2025-01-11 13:06:37 -08:00
Austin Kerbow
657fb4433e
[AMDGPU] Add target hook to isGlobalMemoryObject (#112781)
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
2025-01-11 09:57:57 -08:00
Marius Kamp
1eed46960c
[AArch64] Eliminate Common Subexpression of CSEL by Reassociation (#121350)
If we have a CSEL instruction that depends on the flags set by a
(SUBS x c) instruction and the true and/or false expression is
(add (add x y) -c), we can reassociate the latter expression to
(add (SUBS x c) y) and save one instruction.

Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb

We can extend this transformation for slightly different constants. For
example, if we have (add (add x y) -(c-1)) and a the comparison x <u c,
we can transform the comparison to x <=u c-1 to eliminate the comparison
instruction, too. Similarly, we can transform (x == 0) to (x <u 1).

Proofs for the transformations that alter the constants:
https://alive2.llvm.org/ce/z/3nVqgR

Fixes #119606.
2025-01-11 16:26:11 +00:00
Simon Pilgrim
b622cc67d0 [X86] LowerCTPOP - check if the operand is a constant when collecting KnownBits
Under certain circumstances, lowering of other instructions can result in computeKnownBits being able to detect a constant that it couldn't previously.

Fixes #122580
2025-01-11 13:41:50 +00:00
Michael Clark
212cba0ef3
[X86] Correct the cdisp8 encoding for VSCATTER/VGATHER prefetch (#122051)
during differential fuzzing, I found 8 more instructions with disp8
offset multiplier differences to binutils. somewhat sure there is a bug
in the X86 LLVM disp8 offset multipliers for this subset of vector
scatter and gather prefetch instructions. please check and refer to the
previous pull request: https://github.com/llvm/llvm-project/pull/120340

these vector scatter and gather prefetch instructions also have an
unusual k mask operand position but I have not addressed this with this
patch as I am unsure how to change the Intel format in the tablegen
file.

```
hex:	62 f2 fd 49 c6 4c 51 01
llvm:	vgatherpf0dpd	{k1}, zmmword ptr [rcx + 2*ymm2 + 4]
ours:	vgatherpf0dpd	qword ptr * 8 [rcx + 2*ymm2 + 8] {k1}
gnu:	vgatherpf0dpd 	QWORD PTR [rcx+ymm2*2+0x8]{k1}

hex:	62 f2 7d 49 c7 4c 51 01
llvm:	vgatherpf0qps	{k1}, ymmword ptr [rcx + 2*zmm2 + 8]
ours:	vgatherpf0qps	dword ptr * 8 [rcx + 2*zmm2 + 4] {k1}
gnu:	vgatherpf0qps	DWORD PTR [rcx+zmm2*2+0x4]{k1}

hex:	62 f2 fd 49 c6 54 51 01
llvm:	vgatherpf1dpd	{k1}, zmmword ptr [rcx + 2*ymm2 + 4]
ours:	vgatherpf1dpd	qword ptr * 8 [rcx + 2*ymm2 + 8] {k1}
gnu:	vgatherpf1dpd	QWORD PTR [rcx+ymm2*2+0x8]{k1}

hex:	62 f2 7d 49 c7 54 51 01
llvm:	vgatherpf1qps	{k1}, ymmword ptr [rcx + 2*zmm2 + 8]
ours:	vgatherpf1qps	dword ptr * 8 [rcx + 2*zmm2 + 4] {k1}
gnu:	vgatherpf1qps	DWORD PTR [rcx+zmm2*2+0x4]{k1}

hex:	62 f2 fd 49 c6 6c 51 01
llvm:	vscatterpf0dpd	{k1}, zmmword ptr [rcx + 2*ymm2 + 4]
ours:	vscatterpf0dpd	qword ptr * 8 [rcx + 2*ymm2 + 8] {k1}
gnu:	vscatterpf0dpd	QWORD PTR [rcx+ymm2*2+0x8]{k1}

hex:	62 f2 7d 49 c7 6c 51 01
llvm:	vscatterpf0qps	{k1}, ymmword ptr [rcx + 2*zmm2 + 8]
ours:	vscatterpf0qps	dword ptr * 8 [rcx + 2*zmm2 + 4] {k1}
gnu:	vscatterpf0qps	DWORD PTR [rcx+zmm2*2+0x4]{k1}

hex:	62 f2 fd 49 c6 74 51 01
llvm:	vscatterpf1dpd	{k1}, zmmword ptr [rcx + 2*ymm2 + 4]
ours:	vscatterpf1dpd	qword ptr * 8 [rcx + 2*ymm2 + 8] {k1}
gnu:	vscatterpf1dpd QWORD PTR [rcx+ymm2*2+0x8]{k1}

hex:	62 f2 7d 49 c7 74 51 01
llvm:	vscatterpf1qps	{k1}, ymmword ptr [rcx + 2*zmm2 + 8]
ours:	vscatterpf1qps	dword ptr * 8 [rcx + 2*zmm2 + 4] {k1}
gnu:	vscatterpf1qps DWORD PTR [rcx+zmm2*2+0x4]{k1}
```
2025-01-11 16:51:20 +08:00
Craig Topper
0384069d6c [AArch64] Use GenericTable PrimaryKey to remove some SearchIndexes. NFC 2025-01-10 23:04:28 -08:00
Craig Topper
a418eb1c0d [ARM] Use GenericTable PrimaryKey to remove one of the SearchIndexes for BankedRegsList. NFC 2025-01-10 22:04:42 -08:00
Kevin McAfee
515946b290
[NFC][NVPTX] Small style cleanup for NVPTXISelDAGToDAG.* (#122538) 2025-01-10 17:06:04 -08:00
Sergei Barannikov
a475ae05fb
Revert "[ADT] Fix specialization of ValueIsPresent for PointerUnion" (#122557)
Reverts llvm/llvm-project#121847

Causes compile time regressions and allegedly miscompilation.
2025-01-11 03:36:34 +03:00
Craig Topper
7979e1ba29 [RISCV] Add a default assignment of Inst{12-7} to RVInst16CSS. NFC
Some bits need to be overwritten by child classes, but at
least a few of the upper bits are common to all child classes.
2025-01-10 14:28:54 -08:00
Austin Kerbow
2e5c298281
[AMDGPU] Add backward compatibility layer for kernarg preloading (#119167)
Add a prologue to the kernel entry to handle cases where code designed
for kernarg preloading is executed on hardware equipped with
incompatible firmware. If hardware has compatible firmware the 256 bytes
at the start of the kernel entry will be skipped. This skipping is done
automatically by hardware that supports the feature.

A pass is added which is intended to be run at the very end of the
pipeline to avoid any optimizations that would assume the prologue is a
real predecessor block to the actual code start. In reality we have two
possible entry points for the function. 1. The optimized path that
supports kernarg preloading which begins at an offset of 256 bytes. 2.
The backwards compatible entry point which starts at offset 0.
2025-01-10 11:39:02 -08:00
Farzon Lotfi
b900379e26
[HLSL] Reapply Move length support out of the DirectX Backend (#121611) (#122337)
## Changes
- Delete DirectX length intrinsic
- Delete HLSL length lang builtin
- Implement length algorithm entirely in the header.

## History
- In the past if an HLSL intrinsic lowered to either a spirv op code or
a DXIL opcode we represented it with intrinsics

## Why we are moving away?
- To make HLSL apis more portable the team decided that it makes sense
for some intrinsics to be defined only in the header.
- Since there tends to be more SPIRV opcodes than DXIL opcodes the plan
is to support SPIRV opcodes either with target specific builtins or via
pattern matching.
2025-01-10 14:16:27 -05:00
Raphael Moreira Zinsly
6f53886a9a
[RISCV] Add stack clash vector support (#119458)
Use the probe loop structure to allocate vector code in the stack as
well. We add the pseudo instruction RISCV::PROBED_STACKALLOC_RVV to
differentiate from the normal loop.
2025-01-10 09:48:21 -08:00
Durgadoss R
372044ee09
[NVPTX] Add TMA Bulk Copy intrinsics (#122344)
PR #96083 added intrinsics for async copy of 'tensor' data
using TMA. Following a similar design, this PR adds intrinsics
for async copy of bulk data (non-tensor variants) through TMA.

* These intrinsics optionally support multicast and cache_hints,
   as indicated by the boolean arguments at the end of the intrinsics.
* The backend looks through these flag arguments and lowers to the
   appropriate PTX instructions.
* Lit tests are added for all combinations of these intrinsics in
   cp-async-bulk.ll.
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst file.

PTX Spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-01-10 22:31:53 +05:30
Santanu Das
9d7df23f4d
[Hexagon] Add missing pattern for v8i1 type (#120703)
HexagonISD::PFALSE and PTRUE patterns do not form independently in
general as they are treated like operands of all 0s or all 1s. Eg: i32 =
transfer HEXAGONISD::PFALSE.
In this case, v8i1 = HEXAGONISD::PFALSE is formed independently without
accompanying opcode.

This patch adds a pattern to transfer all 0s or all 1s to a scalar
register and then use that register and this PFALSE/PTRUE opcode to
transfer to a predicate register like v8i1.
2025-01-10 09:54:02 -06:00
Paul Bowen-Huggett
bbb53d1a8c
[NFC] Make AMDGPUCombinerHelper methods const (#121903)
(This replaces #121740. Sorry for wasting your time.)

This is a follow-up to a previous commit (ee7ca0d) which eliminated
several "TODO: make CombinerHelper methods const" remarks. As promised
in that ealier commit, this change completes the set by also making the
methods of AMDGPUCombinerHelper const so that the Helper member of
AMDGPUPreLegalizerCombinerImpl can be const rather than explicitly
mutable.
2025-01-10 22:43:14 +07:00
Simon Pilgrim
35a392553d
[X86] widenSubVector - widen from smaller build vector if the upper elements are already the same padding elements (#122445)
Further simplifies some shuffle masks to help additional combines
2025-01-10 15:13:53 +00:00
Philip Reames
24bb180e8a
[RISCV] Attempt to widen SEW before generic shuffle lowering (#122311)
This takes inspiration from AArch64 which does the same thing to assist
with zip/trn/etc.. Doing this recursion unconditionally when the mask
allows is slightly questionable, but seems to work out okay in practice.

As a bit of context, it's helpful to realize that we have existing logic
in both DAGCombine and InstCombine which mutates the element width of in
an analogous manner. However, that code has two restriction which
prevent it from handling the motivating cases here. First, it only
triggers if there is a bitcast involving a different element type.
Second, the matcher used considers a partially undef wide element to be
a non-match. I considered trying to relax those assumptions, but the
information loss for undef in mid-level opt seemed more likely to open a
can of worms than I wanted.
2025-01-10 07:12:24 -08:00
Jakub Chlanda
c575a7d1e9
[AMDGPU] Provide default value in get intrinsic param type (#122448)
Make sure that a default value (nullptr) is returned from
`getIntrinsicParamType`, also validate uses of this helper function.
2025-01-10 15:29:42 +01:00
Sergei Barannikov
7b05367943
[ADT] Fix specialization of ValueIsPresent for PointerUnion (#121847)
Two instances of `PointerUnion` with different active members and null
value compare unequal. Currently, this results in counterintuitive
behavior when using functions from `Casting.h`, e.g.:

```C++
  PointerUnion<int *, float *> U;
  // U = (int *)nullptr;
  dyn_cast<int *>(U); // Aborts
  dyn_cast<float *>(U); // Aborts
  U = (float *)nullptr;
  dyn_cast<int *>(U); // OK
  dyn_cast<float *>(U); // OK
```

`dyn_cast` should abort in all cases because the argument is null.
Currently, it aborts only if the first member is active. This happens
because the partial template specialization of `ValueIsPresent` for
nullable types compares the union with a union constructed from nullptr,
and the two unions compare equal only if their active members are the
same.

This patch changed the specialization of `ValueIsPresent` for nullable
types to make `isPresent()` return false for all possible null values of
a PointerUnion, and fixes two places where the old behavior was
exploited.

Pull Request: https://github.com/llvm/llvm-project/pull/121847
2025-01-10 16:43:19 +03:00
David Green
5a069eac5f
[AArch64] Don't try to sink and(load) (#122274)
If we sink the and in and(load), CGP can hoist is back again to the
load, getting into an infinite loop. This prevents sinking the and in
this case.

Fixes #122074
2025-01-10 11:54:46 +00:00
Mirko Brkušanin
3def49cb64
[AMDGPU] Remove s_wakeup_barrier instruction (#122277) 2025-01-10 11:30:22 +01:00
Usha Gupta
4c853be667
[AArch64] Replace uaddlv with addv for popcount operation (#121934)
Replace `uaddlv` with `addv` for popcount operation as it is simpler
operation.

On certain platforms like Cortex-A510, `addv` has a latency of 3 cycles
whereas `uaddlv` has a latency of 4 cycles

GCC generates `addv` as well:
https://godbolt.org/z/MnYG9jcEo
2025-01-10 09:47:50 +00:00
LiqinWeng
98e5962b7c
[RISCV][CostModel] Add cost for fabs/fsqrt of type bf16/f16 (#118608) 2025-01-10 17:22:51 +08:00
Matt Arsenault
46ca6dfb5f
AMDGPU: Add disjoint to or produced from lowering vector ops (#122424) 2025-01-10 16:21:53 +07:00
Jay Foad
fd922c4b4f
[CodeGen] Add const to getAddrModeArguments argument. NFC. (#122335) 2025-01-10 09:19:25 +00:00
Jakub Chlanda
01a7d4e26b
[AMDGPU] Allow selection of BITOP3 for some 2 opcodes and B32 cases (#122267)
This came up in downstream static analysis - as a dead code.

Admittedly, it depends on what the intention was when checking for [`if
(NumOpcodes == 2 &&
IsB32)`](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp#L3792C3-L3792C32)
and I took a guess that for certain cases the selection should take
place.

If that's incorrect, that whole if statement can be removed, as it is
after a check for: [`if (NumOpcodes <
4)`](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp#L3788)
2025-01-10 07:49:11 +01:00
Heejin Ahn
a8e1135baa
[WebAssembly] Add -wasm-use-legacy-eh option (#122158)
This replaces the existing `-wasm-enable-exnref` with
`-wasm-use-legacy-eh` option, in an effort to make the new standardized
exnref proposal the 'default' state and the legacy proposal needs to be
separately enabled an option. But given that most users haven't switched
to the new proposal and major web browsers haven't turned it on by
default, this `-wasm-use-legacy-eh` is turned on by default, so nothing
will change for now for the functionality perspective.

This also removes the restriction that `-wasm-enable-exnref` be only
used with `-wasm-enable-eh` because this option is enabled by default.
This option does not have any effect when `-wasm-enable-eh` is not used.
2025-01-09 22:36:10 -08:00
Craig Topper
6829f30883
[RISCV] Add a default common assignment of Inst{6-2} to the RVInst16CI base class. NFC (#122377)
Many instructions assign all or a subset of Inst{6-2} to Imm{4-0}. Make
this the default. Subsets of Inst{6-2} can be overridden as needed by
derived classes/records which we already do with Inst{12} in a few
places.
2025-01-09 22:11:04 -08:00
ssijaric-nv
a4472c7dac
[AArch64] Fix the size passed to __trampoline_setup (#118234)
The trampoline size is 36 bytes on AArch64. The runtime function
__trampoline_setup aborts as it expects the trampoline size of at least 36 
bytes, and the size passed is 20 bytes. Fix the inconsistency in
AArch64TargetLowering::LowerINIT_TRAMPOLINE.
2025-01-09 22:09:08 -08:00
Chinmay Deshpande
211bcf67aa
[AMDGPU] Implement IR variant of isFMAFasterThanFMulAndFAdd (#121465) 2025-01-10 09:05:41 +05:30
Shao-Ce SUN
369c61744a
[RISCV] Fix the cost of llvm.vector.reduce.and (#119160)
I added some CodeGen test cases related to reduce. To maintain
consistency, I also added cases for instructions like
`vector.reduce.or`.

For cases where `v1i1` type generates `VFIRST`, please refer to:
https://reviews.llvm.org/D139512.
2025-01-10 10:10:42 +08:00
Craig Topper
41e4018f9c
[RISCV][VLOPT] Simplify code by removing extra temporary variables. NFC (#122333)
Just do the conditional operator in the return statement.
2025-01-09 18:05:41 -08:00
Craig Topper
b11fe33aea
[RISCV] Correct the cost model for the i1 reduce.add and reduce.or. (#122349)
reduce.add uses the same sequence as reduce.xor. reduce.or should use
vmor not vmxor.
2025-01-09 18:05:22 -08:00
Michael Maitland
d0373dbe7c
[RISCV][VLOPT] Add vadc to isSupportedInstr (#122345) 2025-01-09 19:44:40 -05:00
Michael Maitland
04e54cc19f
[RISCV][VLOPT] Add Vector Single-Width Averaging Add and Subtract to isSupportedInstr (#122351) 2025-01-09 19:39:12 -05:00
Craig Topper
5d88a84ecd [RISCV] Simplify some RISCVInstrInfoC classes by removing arguments that never change. NFC 2025-01-09 16:21:55 -08:00
Thurston Dang
4f42e16516
[hwasan] Omit tag check for null pointers (#122206)
If the pointer to be checked is statically known to be zero, the tag
check will always pass since:
1) the tag is zero
2) shadow memory for address 0 is initialized to 0 and never updated.
We can therefore elide the tag check.

We perform the elision in two places:
1) the HWASan pass
2) when lowering the CHECK_MEMACCESS intrinsic. Conceivably, the HWASan
pass may encounter a "cannot currently statically prove to be null"
pointer (and is therefore unable to omit the intrinsic) that later
optimization passes convert into a statically known-null pointer. As a
last line of defense, we perform elision here too.

This also updates the tests from
https://github.com/llvm/llvm-project/pull/122186
2025-01-09 13:48:26 -08:00