190428 Commits

Author SHA1 Message Date
Matt Arsenault
f4598194b5
DAG: Fold bitcast of scalar_to_vector to anyext (#122660)
scalar_to_vector is difficult to make appear and test,
but I found one case where this makes an observable difference.
It fires more often than this in the test suite, but most of them
have no net result in the final code. This helps reduce regressions
in a future commit.
2025-01-13 19:38:58 +07:00
Akshat Oke
73b0e8a191
[AMDGPU][NewPM] Port AMDGPUOpenCLEnqueuedBlockLowering to NPM (#122434) 2025-01-13 17:52:30 +05:30
Sander de Smalen
3efe83291f [AArch64] Fix chain for calls from agnostic-ZA functions.
The lowering code was using the wrong chain value, which meant that
the 'smstart' after the call from streaming agnostic-ZA functions ->
non-streaming private-ZA functions was incorrectly removed from the DAG.
2025-01-13 12:06:50 +00:00
Sam Tebbs
795e35a653
Reland "[LoopVectorizer] Add support for partial reductions" with non-phi operand fix. (#121744)
This relands the reverted #120721 with a fix for cases where neither
reduction operand are the reduction phi. Only
63114239cc8d26225a0ef9920baacfc7cc00fc58 and
63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted
PR.

---------

Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
2025-01-13 11:20:35 +00:00
quic_hchandel
171d3edd05
[RISCV] Add Qualcomm uC Xqciint (Interrupts) extension (#122256)
This extension adds eleven instructions to accelerate interrupt
servicing.

The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/latest

This patch adds assembler only support.

---------

Co-authored-by: Harsh Chandel <hchandel@qti.qualcomm.com>
2025-01-13 16:36:05 +05:30
Haojian Wu
d2ba364440 Fix an unused-variable warning in release build. 2025-01-13 12:03:35 +01:00
Durgadoss R
7e2eb0f83e
[NVPTX] Add float to tf32 conversion intrinsics (#121507)
This patch adds the missing variants of float to tf32 conversion
intrinsics, with their corresponding lit tests.

PTX Spec link:

https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-01-13 16:17:42 +05:30
Jay Foad
a3b3c26048
[TableGen] Use assert instead of PrintFatalError in TGLexer. NFC. (#122303)
Do not use the PrintFatalError diagnostic machinery for conditions that
can never happen with any input.
2025-01-13 10:30:55 +00:00
Oliver Stannard
e2a071ece5
[MachineCP] Correctly handle register masks and sub-registers (#122472)
When passing an instruction with a register mask, the machine copy
propagation pass was dropping the information about some copy
instructions which define a register which is preserved by the mask,
because that register overlaps a register which is partially clobbered
by it. This resulted in a miscompilation for AArch64, because this
caused a live copy to be considered dead.

The fix is to clobber register masks by finding the set of reg units
which is preserved by the mask, and clobbering all units not in that
set.
2025-01-13 09:55:08 +00:00
Akshat Oke
4f96fb5fb3
Reapply "Spiller: Detach legacy pass and supply analyses instead (#119181)" (#122665)
Makes Inline Spiller amenable to the new PM.

This reapplies commit a531800344dc54e9c197a13b22e013f919f3f5e1 reverted
because of two unused private members reported on sanitizer bots.
2025-01-13 14:14:13 +05:30
Mel Chen
56a37a3c76
[SLPVectorizer] Refactor HorizontalReduction::createOp (NFC) (#121549)
This patch simplifies select-based integer min/max reductions by
utilizing `llvm::getMinMaxReductionPredicate`, and generates
intrinsic-based min/max reductions by utilizing
`llvm::getMinMaxReductionIntrinsicOp`.
2025-01-13 16:11:31 +08:00
Akshat Oke
f431f93a77
[CodeGen][NewPM] Use proper NPM AtomicExpandPass in AMDGPU (#122086)
`PassRegistry.def` already has this entry, but the dummy definition was
being pulled instead.

I couldn't reproduce the build failures that FIXME referenced, maybe the
Dummy pass getting in the way was part of the cause.
2025-01-13 10:38:24 +05:30
Akshat Oke
7bf1cb702b
[AMDGPU][NewPM] Port AMDGPURemoveIncompatibleFunctions to NPM (#122261) 2025-01-13 10:11:40 +05:30
Shilei Tian
f15da5fb78
[AMDGPU] Fix an invalid cast in AMDGPULateCodeGenPrepare::visitLoadInst (#122494)
Fixes: SWDEV-507695
2025-01-12 23:40:25 -05:00
Sameer Sahasrabuddhe
77e6f434ec
[SPIRV] convergence anchor intrinsic does not have a parent token (#122230) 2025-01-13 09:54:57 +05:30
Justin Bogner
0e51b54b7a
[DirectX] Implement the resource.store.rawbuffer intrinsic (#121282)
This introduces `@llvm.dx.resource.store.rawbuffer` and generalizes the
buffer store docs under DirectX/DXILResources.

Fixes #106188
2025-01-12 18:52:20 -07:00
Florian Hahn
8df64ed777 [LV] Don't consider IV increments uniform if exit value is used outside.
In some cases, there might be a chain of uniform instructions producing
the exit value. To generate correct code in all cases, consider the IV
increment not uniform, if there are users outside the loop.

Instead, let VPlan narrow the IV, if possible using the logic from
3ff1d01985752.

Test case from #122602 verified with Alive2:
    https://alive2.llvm.org/ce/z/bA4EGj

Fixes https://github.com/llvm/llvm-project/issues/122496.
Fixes https://github.com/llvm/llvm-project/issues/122602.
2025-01-12 22:03:21 +00:00
Florian Hahn
3ff1d01985 Recommit "[VPlan] Try to narrow wide and replicating recipes to uniform recipes."
This reverts commit 0ebb3ac7c92c4c1c44e7f3d17832d75ec5a42a67.

Re-applies commit with typos fixed.
2025-01-12 20:10:28 +00:00
Florian Hahn
0ebb3ac7c9 Revert "[VPlan] Try to narrow wide and replicating recipes to uniform recipes."
This reverts commit 1afba19913253dda865a8e57b37b9f4dabead1ac.

Typo breaking the build
2025-01-12 19:37:45 +00:00
Florian Hahn
1afba19913 [VPlan] Try to narrow wide and replicating recipes to uniform recipes.
Use the existing VPlan-based analysis to identify recipes that only have
their first lane demanded and transform them to uniform recpliate
recipes. This simplifies the generated code in some places and prepares
for fixing https://github.com/llvm/llvm-project/issues/122496.
2025-01-12 19:32:01 +00:00
Kazu Hirata
43fdd6e81d
[memprof] Migrate away from PointerUnion::is (NFC) (#122622)
Note that PointerUnion::is have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

In this patch, I'm calling call().getBase() for an instance of
PointerUnion.  call() alone would return an instance of IndexCall,
which wraps PointerUnion.  Note that isa<> cannot directly accept an
instance of IndexCall, at least without defining CastInfo.

I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
2025-01-12 11:06:42 -08:00
Simon Pilgrim
be6c752e15
[X86] X86FixupVectorConstantsPass - use VPMOVSX/ZX extensions for PS/PD domain moves (#122601)
For targets with free domain moves, or AVX512 support, allow the use of VPMOVSX/ZX extension loads to reduce the load sizes.

I've limited this to extension to i32/i64 types as we're mostly interested in shuffle mask loading here, but we could include i16 types as well just as easily.

Inspired by a regression on #122485
2025-01-12 15:59:05 +00:00
Ruhung
4f7dc1b55a
[InstCombine] Fold (add (add A, 1), (sext (icmp ne A, 0))) to call umax(A, 1) (#122491)
Transform (add (add A, 1), (sext (icmp ne A, 0))) into call umax(A, 1).

Fixes #121853.

Alive2: https://alive2.llvm.org/ce/z/TweTan
2025-01-12 16:51:58 +01:00
Ramkumar Ramachandra
66badf224a
VT: teach a special-case optz about samesign (#122590)
There is a narrow special-case in isImpliedCondICmps that can benefit
from being taught about samesign. Since it costs us nothing to implement
it, teach it about samesign, for completeness. This patch marks the
completion of the effort to teach ValueTracking about samesign.
2025-01-12 15:19:29 +00:00
Kareem Ergawy
42da12063f
[flang][OpenMP] Extend delayed privatization for omp.simd (#122156)
Adds support for delayed privatization for `simd` directives. This PR
includes PFT down to LLVM IR lowering.
2025-01-12 07:46:58 +01:00
Daniel Paoliello
d997a722c1
Fix build break in MIRPrinter (#122630) 2025-01-11 21:56:59 -08:00
Daniel Paoliello
5ee0a71df9
[aarch64][win] Add support for import call optimization (equivalent to MSVC /d2ImportCallOptimization) (#121516)
This change implements import call optimization for AArch64 Windows
(equivalent to the undocumented MSVC `/d2ImportCallOptimization` flag).

Import call optimization adds additional data to the binary which can be
used by the Windows kernel loader to rewrite indirect calls to imported
functions as direct calls. It uses the same [Dynamic Value Relocation
Table mechanism that was leveraged on x64 to implement
`/d2GuardRetpoline`](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618).

The change to the obj file is to add a new `.impcall` section with the
following layout:
```cpp
  // Per section that contains calls to imported functions:
  //  uint32_t SectionSize: Size in bytes for information in this section.
  //  uint32_t Section Number
  //  Per call to imported function in section:
  //    uint32_t Kind: the kind of imported function.
  //    uint32_t BranchOffset: the offset of the branch instruction in its
  //                            parent section.
  //    uint32_t TargetSymbolId: the symbol id of the called function.
```

NOTE: If the import call optimization feature is enabled, then the
`.impcall` section must be emitted, even if there are no calls to
imported functions.

The implementation is split across a few parts of LLVM:
* During AArch64 instruction selection, the `GlobalValue` for each call
to a global is recorded into the Extra Information for that node.
* During lowering to machine instructions, the called global value for
each call is noted in its containing `MachineFunction`.
* During AArch64 asm printing, if the import call optimization feature
is enabled:
- A (new) `.impcall` directive is emitted for each call to an imported
function.
- The `.impcall` section is emitted with its magic header (but is not
filled in).
* During COFF object writing, the `.impcall` section is filled in based
on each `.impcall` directive that were encountered.

The `.impcall` section can only be filled in when we are writing the
COFF object as it requires the actual section numbers, which are only
assigned at that point (i.e., they don't exist during asm printing).

I had tried to avoid using the Extra Information during instruction
selection and instead implement this either purely during asm printing
or in a `MachineFunctionPass` (as suggested in [on the
forums](https://discourse.llvm.org/t/design-gathering-locations-of-instructions-to-emit-into-a-section/83729/3))
but this was not possible due to how loading and calling an imported
function works on AArch64. Specifically, they are emitted as `ADRP` +
`LDR` (to load the symbol) then a `BR` (to do the call), so at the point
when we have machine instructions, we would have to work backwards
through the instructions to discover what is being called. An initial
prototype did work by inspecting instructions; however, it didn't
correctly handle the case where the same function was called twice in a
row, which caused LLVM to elide the `ADRP` + `LDR` and reuse the
previously loaded address. Worse than that, sometimes for the
double-call case LLVM decided to spill the loaded address to the stack
and then reload it before making the second call. So, instead of trying
to implement logic to discover where the value in a register came from,
I instead recorded the symbol being called at the last place where it
was easy to do: instruction selection.
2025-01-11 21:30:17 -08:00
goldsteinn
17ef436e3d
[ValueTracking] Take into account whether zero is poison when computing CR for ct{t,l}z (#122548) 2025-01-11 15:11:11 -06:00
goldsteinn
cc995ad064
[InstSimpify] Simplifying (xor (sub C_Mask, X), C_Mask) -> X (#122552)
- **[InstSimpify] Add tests for simplifying `(xor (sub C_Mask, X),
C_Mask)`; NFC**
- **[InstSimpify] Simplifying `(xor (sub C_Mask, X), C_Mask)` -> `X`**

Helps address regressions with folding `clz(Pow2)`.

Proof: https://alive2.llvm.org/ce/z/zGwUBp
2025-01-11 15:10:42 -06:00
Kazu Hirata
bfe93aedcc [AMDGPU] Fix a warning
This patch fixes:

  llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:255:18: error: private
  field 'DAG' is not used [-Werror,-Wunused-private-field]
2025-01-11 13:06:37 -08:00
Florian Hahn
7f59b4e998
[VPlan] Skip non-induction phi recipes in legalizeAndOptimizeInductions.
The body of the loop only applies to wide induction recipes, skip any other
header phi recipes up-frond
2025-01-11 20:33:02 +00:00
Austin Kerbow
657fb4433e
[AMDGPU] Add target hook to isGlobalMemoryObject (#112781)
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
2025-01-11 09:57:57 -08:00
David Green
ab9a80a3ad
[DAG] Allow AssertZExt to scalarize. (#122463)
With range and undef metadata on a call we can have vector AssertZExt
generated on a target with no vector operations. The AssertZExt needs to
scalarize to a normal `AssertZext tin, ValueType`. I have added
AssertSext too, although I do not have a test case.

Fixes #110374
2025-01-11 16:29:06 +00:00
Marius Kamp
1eed46960c
[AArch64] Eliminate Common Subexpression of CSEL by Reassociation (#121350)
If we have a CSEL instruction that depends on the flags set by a
(SUBS x c) instruction and the true and/or false expression is
(add (add x y) -c), we can reassociate the latter expression to
(add (SUBS x c) y) and save one instruction.

Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb

We can extend this transformation for slightly different constants. For
example, if we have (add (add x y) -(c-1)) and a the comparison x <u c,
we can transform the comparison to x <=u c-1 to eliminate the comparison
instruction, too. Similarly, we can transform (x == 0) to (x <u 1).

Proofs for the transformations that alter the constants:
https://alive2.llvm.org/ce/z/3nVqgR

Fixes #119606.
2025-01-11 16:26:11 +00:00
Simon Pilgrim
b622cc67d0 [X86] LowerCTPOP - check if the operand is a constant when collecting KnownBits
Under certain circumstances, lowering of other instructions can result in computeKnownBits being able to detect a constant that it couldn't previously.

Fixes #122580
2025-01-11 13:41:50 +00:00
Mingjie Xu
876fa60f08
[TySan] Skip instrumentation for function declarations (#122488)
Skip function declarations for instrumentation.

Fixes https://github.com/llvm/llvm-project/issues/122467
2025-01-11 20:15:21 +08:00
Amr Hesham
642e493d4d
[InstCombine] Convert fshl(x, 0, y) to shl(x, and(y, BitWidth - 1)) when BitWidth is pow2 (#122362)
Convert `fshl(x, 0, y)` to `shl(x, and(y, BitWidth - 1))` when BitWidth
is pow2

Alive2 proof: https://alive2.llvm.org/ce/z/3oTEop
Fixes: #122235
2025-01-11 11:48:05 +01:00
Ramkumar Ramachandra
f38c40bff3
VT: teach isImpliedCondMatchingOperands about samesign (#122474)
Move isImplied{True,False}ByMatchingCmp from CmpInst to ICmpInst, so
that it can operate on CmpPredicate instead of CmpInst::Predicate, and
teach it about samesign. There are two callers of this function, and we
choose to migrate the one in ValueTracking, namely
isImpliedCondMatchingOperands to CmpPredicate, hence teaching it about
samesign, with visible test impact.
2025-01-11 09:08:57 +00:00
Michael Clark
212cba0ef3
[X86] Correct the cdisp8 encoding for VSCATTER/VGATHER prefetch (#122051)
during differential fuzzing, I found 8 more instructions with disp8
offset multiplier differences to binutils. somewhat sure there is a bug
in the X86 LLVM disp8 offset multipliers for this subset of vector
scatter and gather prefetch instructions. please check and refer to the
previous pull request: https://github.com/llvm/llvm-project/pull/120340

these vector scatter and gather prefetch instructions also have an
unusual k mask operand position but I have not addressed this with this
patch as I am unsure how to change the Intel format in the tablegen
file.

```
hex:	62 f2 fd 49 c6 4c 51 01
llvm:	vgatherpf0dpd	{k1}, zmmword ptr [rcx + 2*ymm2 + 4]
ours:	vgatherpf0dpd	qword ptr * 8 [rcx + 2*ymm2 + 8] {k1}
gnu:	vgatherpf0dpd 	QWORD PTR [rcx+ymm2*2+0x8]{k1}

hex:	62 f2 7d 49 c7 4c 51 01
llvm:	vgatherpf0qps	{k1}, ymmword ptr [rcx + 2*zmm2 + 8]
ours:	vgatherpf0qps	dword ptr * 8 [rcx + 2*zmm2 + 4] {k1}
gnu:	vgatherpf0qps	DWORD PTR [rcx+zmm2*2+0x4]{k1}

hex:	62 f2 fd 49 c6 54 51 01
llvm:	vgatherpf1dpd	{k1}, zmmword ptr [rcx + 2*ymm2 + 4]
ours:	vgatherpf1dpd	qword ptr * 8 [rcx + 2*ymm2 + 8] {k1}
gnu:	vgatherpf1dpd	QWORD PTR [rcx+ymm2*2+0x8]{k1}

hex:	62 f2 7d 49 c7 54 51 01
llvm:	vgatherpf1qps	{k1}, ymmword ptr [rcx + 2*zmm2 + 8]
ours:	vgatherpf1qps	dword ptr * 8 [rcx + 2*zmm2 + 4] {k1}
gnu:	vgatherpf1qps	DWORD PTR [rcx+zmm2*2+0x4]{k1}

hex:	62 f2 fd 49 c6 6c 51 01
llvm:	vscatterpf0dpd	{k1}, zmmword ptr [rcx + 2*ymm2 + 4]
ours:	vscatterpf0dpd	qword ptr * 8 [rcx + 2*ymm2 + 8] {k1}
gnu:	vscatterpf0dpd	QWORD PTR [rcx+ymm2*2+0x8]{k1}

hex:	62 f2 7d 49 c7 6c 51 01
llvm:	vscatterpf0qps	{k1}, ymmword ptr [rcx + 2*zmm2 + 8]
ours:	vscatterpf0qps	dword ptr * 8 [rcx + 2*zmm2 + 4] {k1}
gnu:	vscatterpf0qps	DWORD PTR [rcx+zmm2*2+0x4]{k1}

hex:	62 f2 fd 49 c6 74 51 01
llvm:	vscatterpf1dpd	{k1}, zmmword ptr [rcx + 2*ymm2 + 4]
ours:	vscatterpf1dpd	qword ptr * 8 [rcx + 2*ymm2 + 8] {k1}
gnu:	vscatterpf1dpd QWORD PTR [rcx+ymm2*2+0x8]{k1}

hex:	62 f2 7d 49 c7 74 51 01
llvm:	vscatterpf1qps	{k1}, ymmword ptr [rcx + 2*zmm2 + 8]
ours:	vscatterpf1qps	dword ptr * 8 [rcx + 2*zmm2 + 4] {k1}
gnu:	vscatterpf1qps DWORD PTR [rcx+zmm2*2+0x4]{k1}
```
2025-01-11 16:51:20 +08:00
Craig Topper
0384069d6c [AArch64] Use GenericTable PrimaryKey to remove some SearchIndexes. NFC 2025-01-10 23:04:28 -08:00
Fangrui Song
7c886d5d92 PassTimingInfo: test TheTimeInfo first. NFC
TheTimeInfo is a member variable and is often non-null, allowing the
caller `getPassTimer` to skip one check.
2025-01-10 22:12:47 -08:00
Craig Topper
a418eb1c0d [ARM] Use GenericTable PrimaryKey to remove one of the SearchIndexes for BankedRegsList. NFC 2025-01-10 22:04:42 -08:00
Veera
2d5f07c828
[InstCombine] Fold X udiv Y to X lshr cttz(Y) if Y is a power of 2 (#121386)
Fixes #115767

This PR folds `X udiv Y` to `X lshr cttz(Y)` if Y is a power of two
since bitwise operations are faster than division.

Proof: https://alive2.llvm.org/ce/z/qHmLta
2025-01-11 13:56:13 +08:00
Fangrui Song
24bd9bc0b5 [Timer] Remove signpots overhead on unsupported systems
startTimer/stopTimer are frequently called. It's important to reduce
overhead.
2025-01-10 20:46:57 -08:00
Teresa Johnson
799955eb17
[ThinLTO] Skip opt pipeline and summary wrapper pass on empty modules (#120143)
Follow up to PR118508, to avoid unnecessary compile time for an empty
combind regular LTO module if all modules end up being ThinLTO only.

This required minor changes to a few tests to ensure they weren't empty.
2025-01-10 19:33:20 -08:00
Fangrui Song
0de18e72c6
-ftime-report: reorganize timers
The code generation time is unclear in the -ftime-report output:

* The two clang timers "Code Generation Time" and "LLVM IR Generation
  Time" are in the default group "Miscellaneous Ungrouped Timers".
* There is also a "Clang front-end time" group, which actually includes
  code generation time.

```
===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0611 (  1.7%)   0.0099 (  4.4%)   0.0710 (  1.9%)   0.0713 (  1.9%)  LLVM IR Generation Time
   3.5140 ( 98.3%)   0.2165 ( 95.6%)   3.7306 ( 98.1%)   3.7342 ( 98.1%)  Code Generation Time
   3.5751 (100.0%)   0.2265 (100.0%)   3.8016 (100.0%)   3.8055 (100.0%)  Total
...
===-------------------------------------------------------------------------===
                          Clang front-end time report
===-------------------------------------------------------------------------===
  Total Execution Time: 3.9108 seconds (3.9146 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   3.6802 (100.0%)   0.2306 (100.0%)   3.9108 (100.0%)   3.9146 (100.0%)  Clang front-end timer
   3.6802 (100.0%)   0.2306 (100.0%)   3.9108 (100.0%)   3.9146 (100.0%)  Total
```

This patch

* renames "Clang front-end time report" (FrontendAction time) to "Clang
  time report",
* renames "Clang front-end" to "Front end",
* moves "LLVM IR Generation" into the group,
* replaces "Code Generation time" with "Optimizer" (middle end) and
  "Machine code generation" (back end).

```
% clang -c sqlite3.i -w -ftime-report -mllvm -sort-timers=0
...
===-------------------------------------------------------------------------===
                               Clang time report
===-------------------------------------------------------------------------===
  Total Execution Time: 1.5922 seconds (1.5972 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.5107 ( 35.9%)   0.0105 (  6.2%)   0.5211 ( 32.7%)   0.5222 ( 32.7%)  Front end
   0.2464 ( 17.3%)   0.0340 ( 20.0%)   0.2804 ( 17.6%)   0.2814 ( 17.6%)  LLVM IR generation
   0.6240 ( 43.9%)   0.1235 ( 72.7%)   0.7475 ( 47.0%)   0.7503 ( 47.0%)  Machine code generation
   0.0413 (  2.9%)   0.0018 (  1.0%)   0.0431 (  2.7%)   0.0433 (  2.7%)  Optimizer
   1.4224 (100.0%)   0.1698 (100.0%)   1.5922 (100.0%)   1.5972 (100.0%)  Total
```

Pull Request: https://github.com/llvm/llvm-project/pull/122225
2025-01-10 19:25:18 -08:00
GeorgeHuyubo
9b528ed380
Debuginfod cache use index cache settings and include real file name (#120814)
This PR include two changes:
1. Change debuginfod cache file name to include origin file name, the
new file name would be something like:

llvmcache-13267c5f5d2e3df472c133c8efa45fb3331ef1ea-liblzma.so.5.2.2.debuginfo.dwp
So it will provide more information in image list instead of a plain
llvmcache-123
2. Switch debuginfod cache to use lldb index cache settings. Currently
we don't have proper settings for setting the cache path or the cache
expiration time for debuginfod cache. We want to use the lldb index
cache settings, as they make sense to be in the same place and have the
same TTL.

---------

Co-authored-by: George Hu <georgehuyubo@gmail.com>
2025-01-10 18:13:46 -08:00
Vitaly Buka
8af4d206e0
[NFCI][BoundsChecking] Apply nosanitize on local-bounds instrumentation (#122416)
Should be NFCI as we run sanitizer, like msan, before local-bounds.
2025-01-10 18:11:19 -08:00
Mircea Trofin
6329355860
[ctxprof] Move test serialization to yaml (#122545)
We have a textual representation of contextual profiles for test scenarios, mainly. This patch moves that to YAML instead of JSON. YAML is more succinct and readable (some of the .ll tests should be illustrative). In addition, JSON is parse-able by the YAML reader.

A subsequent patch will address deserialization.

(thanks, @kazutakahirata, for showing me how to use the llvm YAML reader/writer APIs, which I incorrectly thought to be more low-level than the JSON ones!)
2025-01-10 18:04:25 -08:00
Fangrui Song
af4d76d909
[Support] Reduce globaal variable overhead after #121663
* Construct frequently-accessed TimerLock/DefaultTimerGroup early to
  reduce overhead.
* Rename `aquireDefaultGroup` to `acquireTimerGlobals` and restore
  ManagedStatic::claim. https://reviews.llvm.org/D76099

* Drop mtg::. We use internal linkage, so mtg:: is unneeded and might
  mislead users. In addition, llvm/ code almost never introduces a named
  namespace not in llvm::. Drop mtg::.
* Replace some unique_ptr with optional to reduce overhead.
* Switch to `functionName()`.
* Simplify `llvm::initTimerOptions` and `TimerGroup::constructForStatistics()`

Pull Request: https://github.com/llvm/llvm-project/pull/122429
2025-01-10 17:59:28 -08:00