We currently generate code like this on x86 for a jump table with 5 elements,
assuming the call target is in rbx:
lea global_addr(%rip), %rax # initialize temporary rax with base address
mov %rbx, %rcx # initialize another temporary rcx for index (rbx will be used for the call, so it is still live)
sub %rax, %rcx # compute `address - base`
ror $0x3, %rcx # compute `(address - base) ror 3` i.e. index
cmp $0x4, %rcx # check index <= 4
ja .Ltrap
[...]
.Ltrap:
ud1
A more efficient instruction sequence, that only needs one temporary
register and one fewer instruction, is possible by subtracting the
address we are testing from the fixed address instead of vice versa:
lea (global_addr + 4*8)(%rip), %rax # initialize temporary rax with address of last element
sub %rbx, %rax # compute `last element - address`
ror $0x3, %rax # compute `(last element - address) ror 3` i.e. 4 - index
cmp $0x4, %rax # check 4 - index <= 4 (same as above)
ja .Ltrap
[...]
.Ltrap:
ud1
Change LowerTypeTests to generate that sequence. As a consequence, the
order of bits in the bitsets is reversed. Because it doesn't matter how we
do the subtraction on other architectures (to the best of my knowledge),
do so unconditionally.
Reviewers: fmayer, vitalybuka
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/142887