llvm-project

History

Leandro Lacerda 75bf739208

[libc][gpu] Disable loop unrolling in the throughput benchmark loop (#153971 )

This patch makes GPU throughput benchmark results more comparable across
targets by disabling loop unrolling in the benchmark loop.

Motivation:
* PTX (post-LTO) evidence on NVPTX: for libc `sin`, the generated PTX
shows the `throughput` loop unrolled 8x at `N=128` (one iteration
advances the input pointer by 64 bytes = 8 doubles), interleaving eight
independent chains before the back-edge. This hides latency and
significantly reduces cycles/call as the batch size `N` grows.
* Observed scaling (NVPTX measurements): with unrolling enabled, `sin`
dropped from ~3,100 cycles/call at `N=1` to ~360 at `N=128`. After
enforcing `#pragma clang loop unroll(disable)`, results stabilized
(e.g., from ~3100 cycles/call at `N=1` to ~2700 at `N=128`).
* libdevice contrast: the libdevice `sin` path did not exhibit a similar
drop in our measurements, and the PTX appears as compact internal calls
rather than a long FMA chain, leaving less ILP for the outer loop to
extract.

What this change does:
* Applies `#pragma clang loop unroll(disable)` to the GPU `throughput()`
loop in both NVPTX and AMDGPU backends.

Leaving unrolling entirely to the optimizer makes apples-to-apples
comparisons uneven (e.g., libc vs. vendor). Disabling unrolling yields
fairer, more consistent numbers.

2025-08-16 20:14:26 +00:00

AOR_v20.02

…

benchmarks

[libc][gpu] Disable loop unrolling in the throughput benchmark loop (#153971 )

2025-08-16 20:14:26 +00:00

cmake

Revert "[libc] Add -Wextra for libc tests" (#153169 )

2025-08-12 11:40:14 +00:00

config

[libc][math][c++23] Add bf16fma{,f,l,f128} math functions (#153231 )

2025-08-13 23:26:15 +05:30

docs

[libc][math][docs] Add documentation for BFloat16 type (#153475 )

2025-08-15 20:07:33 +05:30

examples

[libc] Fix broken links in libc (#145199 )

2025-06-23 15:51:43 -07:00

fuzzing

[libc] Fuzz tests for fsqrt, f16sqrt, and hypot (#150489 )

2025-07-25 17:15:26 +00:00

hdr

[libc] Add struct_sched_param proxy header (#151722 )

2025-08-01 11:34:06 -07:00

include

[libc] Fix typo and amend restrict qualifier (#152410 )

2025-08-07 16:45:14 -07:00

lib

[libc] Fix building bitcode library for GPU (#100491 )

2024-07-26 13:17:17 -05:00

shared

[libc][math] Refactor coshf implementation to header-only in src/__support/math folder. (#153427 )

2025-08-14 17:19:47 +03:00

src

[libc][math] Refactor coshf implementation to header-only in src/__support/math folder. (#153427 )

2025-08-14 17:19:47 +03:00

startup

[libc] Add startup code for ARM v7-A, ARM v7-R variants (#153576 )

2025-08-15 09:17:50 +00:00

test

[libc] Fix mbrtowc test (#153721 )

2025-08-15 11:44:33 -03:00

utils

[libc][math][c++23] Add bf16fma{,f,l,f128} math functions (#153231 )

2025-08-13 23:26:15 +05:30

.clang-tidy

[libc] fix readability-identifier-naming.ConstexprFunctionCase (#83345 )

2024-02-28 14:52:02 -08:00

.gitignore

…

CMakeLists.txt

[libc] Add hooks for extra options in running hermetic tests (#147931 )

2025-07-15 11:43:51 +01:00

LICENSE.TXT

…

Maintainers.rst

[libc] Add myself as maintainer for Public Headers / hdrgen (#135209 )

2025-04-11 11:33:52 -07:00

README.txt

…

README.txt

LLVM libc
=========

This directory and its subdirectories contain source code for llvm-libc,
a retargetable implementation of the C standard library.

LLVM is open source software. You may freely distribute it under the terms of
the license agreement found in LICENSE.txt.