51 Commits

Author SHA1 Message Date
Brox Chen
2e58f6024a
[AMDGPU][True16] t16 pseudo for mubuffer d16 load/store (#178822)
create t16 pseudos for mubuffer d16 load/store with vgpr16 in vdst/vdata
and use these t16 pseudo for isel pattern. Lower them back to d16
machine inst in mc level.
2026-02-04 10:54:11 -05:00
macurtis-amd
5d018e93fe
AMDGPU: Perform zero/any extend combine into permute (#177370)
Increases opportunities to generate permutes.
Motivated sub-optimal code generation of a CK kernel.
2026-01-28 10:47:22 -06:00
Matt Arsenault
056e5a32c8
AMDGPU: Change ABI of 16-bit scalar values for gfx6/gfx7 (#175795)
Keep bf16/f16 values encoded as the low half of a 32-bit register,
instead of promoting to float. This avoids unwanted FP effects
from the fpext/fptrunc which should not be implied by just
passing an argument. This also fixes ABI divergence between
SelectionDAG and GlobalISel.

I've wanted to make this change for ages, and failed the last
few times. The main complication was the hack to return
shader integer types in SGPRs, which now needs to inspect
the underlying IR type.
2026-01-22 18:34:06 +00:00
Matt Arsenault
a97f5ec95f
AMDGPU: Change ABI of 16-bit element vectors on gfx6/7 (#175781)
Fix ABI on old subtargets so match new subtargets, packing
16-bit element subvectors into 32-bit registers. Previously
this would be scalarized and promoted to i32/float.

Note this only changes the vector cases. Scalar i16/half are
still promoted to i32/float for now. I've unsuccessfully tried
to make that switch in the past, so leave that for later.

This will help with removal of softPromoteHalfType.
2026-01-22 17:24:29 +01:00
Carl Ritson
1d0a85a78b
[AMDGPU][True16][CodeGen] Add patterns to reduce intermediates (#162047)
Add patterns which reduce or operations to register sequences when
combining i16 values to i32. This removes many intermediate VGPRs and
reduces registers pressure.
2025-10-13 13:37:26 +09:00
Brox Chen
c50ed05cad
[AMDGPU][True16][CodeGen] use vgpr16 for zext patterns (reopen #153894) (#154211)
recreate this patch from
https://github.com/llvm/llvm-project/pull/153894

It seems ISel sliently ignore the `i64 = zext i16` with a chained
`reg_sequence` pattern and thus this is causing a selection failure in
hip test. Recreate a new patch with an alternative pattern, and added a
ll test global-extload-gfx11plus.ll
2025-08-20 10:26:49 -04:00
Brox Chen
d49aab10bd
Revert "[AMDGPU][True16][CodeGen] use vgpr16 for zext patterns (#1538… (#154163)
This reverts commit 7c53c6162bd43d952546a3ef7d019babd5244c29.

This patch hit an issue in hip test. revert and will reopen later
2025-08-18 14:01:19 -04:00
Brox Chen
7c53c6162b
[AMDGPU][True16][CodeGen] use vgpr16 for zext patterns (#153894)
Update true16 mode with zext patterns using vgpr16 for 16bit data types.
This stop isel from inserting invalid "vgpr32 = copy vgpr16"
2025-08-18 11:01:57 -04:00
Shilei Tian
fc0653f31c
[RFC][NFC][AMDGPU] Remove -verify-machineinstrs from llvm/test/CodeGen/AMDGPU/*.ll (#150024)
Recent upstream trends have moved away from explicitly using `-verify-machineinstrs`, as it's already covered by the expensive checks. This PR removes almost all `-verify-machineinstrs` from tests in `llvm/test/CodeGen/AMDGPU/*.ll`, leaving only those tests where its removal currently causes failures.
2025-07-23 13:42:46 -04:00
Brox Chen
5138b61a25
[AMDGPU][True16][Codegen] remove packed build_vector pattern from true16 (#148715)
Some of the packed build_vector use vgpr_32 for i16/f16/bf16. 

In gfx11, bf16 arithmetic get promoted to f32 and this is done via v2i16
pack. In true16 mode this v2i16 pack is selected to a
build_vector/v_lshlrev pattern which only accepts VGPR32. This causes
isel to insert an illegal copy "vgpr32 = copy vgpr16" between def and
use. In the end this illegal copy confuses cse pass and trigger wrong
code elimination.

Remove the packed build_vector pattern from true16. After removal, ISel
will use vgpr16 build_vector patterns instead.
2025-07-18 12:55:11 -04:00
Ruiling, Song
0487db1f13
MachineScheduler: Improve instruction clustering (#137784)
The existing way of managing clustered nodes was done through adding
weak edges between the neighbouring cluster nodes, which is a sort of
ordered queue. And this will be later recorded as `NextClusterPred` or
`NextClusterSucc` in `ScheduleDAGMI`.

But actually the instruction may be picked not in the exact order of the
queue. For example, we have a queue of cluster nodes A B C. But during
scheduling, node B might be picked first, then it will be very likely
that we only cluster B and C for Top-Down scheduling (leaving A alone).

Another issue is:
```
   if (!ReorderWhileClustering && SUa->NodeNum > SUb->NodeNum)
      std::swap(SUa, SUb);
   if (!DAG->addEdge(SUb, SDep(SUa, SDep::Cluster)))
```
may break the cluster queue.

For example, we want to cluster nodes (order as in `MemOpRecords`): 1 3
2. 1(SUa) will be pred of 3(SUb) normally. But when it comes to (3, 2),
As 3(SUa) > 2(SUb), we would reorder the two nodes, which makes 2 be
pred of 3. This makes both 1 and 2 become preds of 3, but there is no
edge between 1 and 2. Thus we get a broken cluster chain.

To fix both issues, we introduce an unordered set in the change. This
could help improve clustering in some hard case.

One key reason the change causes so many test check changes is: As the
cluster candidates are not ordered now, the candidates might be picked
in different order from before.

The most affected targets are: AMDGPU, AArch64, RISCV.

For RISCV, it seems to me most are just minor instruction reorder, don't
see obvious regression.

For AArch64, there were some combining of ldr into ldp being affected.
With two cases being regressed and two being improved. This has more
deeper reason that machine scheduler cannot cluster them well both
before and after the change, and the load combine algorithm later is
also not smart enough.

For AMDGPU, some cases have more v_dual instructions used while some are
regressed. It seems less critical. Seems like test `v_vselect_v32bf16`
gets more buffer_load being claused.
2025-06-05 15:28:04 +08:00
Brox Chen
6dbc01e801
[AMDGPU][True16][CodeGen] update GFX11Plus codegen test with true16 flag (#135078)
This is a NFC patch.

This patch run a bulk update on CodeGen tests that are impacted by the
true16 features. This patch applies:
1. duplicate GFX11plus runlines and apply them with
"+mattr=+real-true16" and "+mattr=-real-true16"
2. update the test with the update script

For some GISEL runlines, the current CodeGen do not fully support the
true16 version. Still update the runlines, but comment out the failing
one, and added a "FIXME-TRUE16" comment to that test for easier
tracking. These test will be fixed in the following patches.

This is in a transition state that we support both
"+real-true16/-real-true16" in our code base. We plan to move to
"+real-true16" as default, and finally remove "-real-true16" mode and
test lines.
2025-04-23 13:06:52 -04:00
Ana Mihajlovic
52a3247196
[AMDGPU] Select (xor i1 (divergent trunc:i32 x), -1) -> cmp_neq x, 1 (#133698) 2025-04-11 13:03:01 +02:00
Matt Arsenault
331250c6fa
AMDGPU: Replace ptr addrspace(3) undef in tests with poison (#131049) 2025-03-13 13:28:55 +07:00
Matt Arsenault
6705d812b8
AMDGPU: Replace ptr addrspace(1) undefs with poison (#130900)
Many tests use store to undef as a placeholder use, so just replace
all of these with poison.
2025-03-13 08:25:02 +07:00
Matt Arsenault
1bb43068f1
PeepholeOpt: Allow introducing subregister uses on reg_sequence (#127052)
This reverts d246cc618adc52fdbd69d44a2a375c8af97b6106. We now handle
composing subregister extracts through reg_sequence.
2025-02-22 09:16:14 +07:00
Lucas Ramirez
6206f5444f
[AMDGPU] Occupancy w.r.t. workgroup size range is also a range (#123748)
Occupancy (i.e., the number of waves per EU) depends, in addition to
register usage, on per-workgroup LDS usage as well as on the range of
possible workgroup sizes. Mirroring the latter, occupancy should
therefore be expressed as a range since different group sizes generally
yield different achievable occupancies.

`getOccupancyWithLocalMemSize` currently returns a scalar occupancy
based on the maximum workgroup size and LDS usage. With respect to the
workgroup size range, this scalar can be the minimum, the maximum, or
neither of the two of the range of achievable occupancies. This commit
fixes the function by making it compute and return the range of
achievable occupancies w.r.t. workgroup size and LDS usage; it also
renames it to `getOccupancyWithWorkGroupSizes` since it is the range of
workgroup sizes that produces the range of achievable occupancies.

Computing the achievable occupancy range is surprisingly involved.
Minimum/maximum workgroup sizes do not necessarily yield maximum/minimum
occupancies i.e., sometimes workgroup sizes inside the range yield the
occupancy bounds. The implementation finds these sizes in constant time;
heavy documentation explains the rationale behind the sometimes
relatively obscure calculations.

As a justifying example, consider a target with 10 waves / EU, 4 EUs/CU,
64-wide waves. Also consider a function with no LDS usage and a flat
workgroup size range of [513,1024].

- A group of 513 items requires 9 waves per group. Only 4 groups made up
of 9 waves each can fit fully on a CU at any given time, for a total of
36 waves on the CU, or 9 per EU. However, filling as much as possible
the remaining 40-36=4 wave slots without decreasing the number of groups
reveals that a larger group of 640 items yields 40 waves on the CU, or
10 per EU.
- Similarly, a group of 1024 items requires 16 waves per group. Only 2
groups made up of 16 waves each can fit fully on a CU ay any given time,
for a total of 32 waves on the CU, or 8 per EU. However, removing as
many waves as possible from the groups without being able to fit another
equal-sized group on the CU reveals that a smaller group of 896 items
yields 28 waves on the CU, or 7 per EU.

Therefore the achievable occupancy range for this function is not [8,9]
as the group size bounds directly yield, but [7,10].

Naturally this change causes a lot of test churn as instruction
scheduling is driven by achievable occupancy estimates. In most unit
tests the flat workgroup size range is the default [1,1024] which,
ignoring potential LDS limitations, would previously produce a scalar
occupancy of 8 (derived from 1024) on a lot of targets, whereas we now
consider the maximum occupancy to be 10 in such cases. Most tests are
updated automatically and checked manually for sanity. I also manually
changed some non-automatically generated assertions when necessary.

Fixes #118220.
2025-01-23 16:07:57 +01:00
Jay Foad
3cf539fb04
[AMDGPU] Combine or remove redundant waitcnts at the end of each MBB (#87539)
Call generateWaitcnt unconditionally at the end of
SIInsertWaitcnts::insertWaitcntInBlock. Even if we don't need to
generate a new waitcnt instruction it has the effect of combining or
removing redundant waitcnts that were already present. Tests show
various small improvements in waitcnt placement.
2024-04-04 10:14:16 +01:00
David Majnemer
cc13f3ba45
Correctly round FP -> BF16 when SDAG expands such nodes (#82399)
We did something pretty naive:
- round FP64 -> BF16 by first rounding to FP32
- skip FP32 -> BF16 rounding entirely
- taking the top 16 bits of a FP32 which will turn some NaNs into
infinities

Let's do this in a more principled way by rounding types with more
precision than FP32 to FP32 using round-inexact-to-odd which will negate
double rounding issues.
2024-02-21 12:37:02 -05:00
Krzysztof Drewniak
b497234146
[AMDGPU] Make maximum hard clause size a subtarget feature (#81287)
gfx11 chips may, in some conditions, behave incorrectly with S_CLAUSE
instructions (hard clauses) containing more than 32 operations (that is,
whose arguments exceed 0x1f). However, gfx10 targets will work
successfully with clauses of up to length 63.

Therefore, define the MaxHardClauseLength property on GCNSubtarget and
make it a subtarget feature via tablegen, thus allowing us to specify,
both now and in the future, the maximum viable size of clauses on
various hardware from the tablegen definition. If MaxHardClauseLength is
0, which is the default, the hardware does not support hard clauses.
2024-02-15 13:58:31 -06:00
Fangrui Song
9e9907f1cf
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.

This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:

```
  LLVM :: CodeGen/AMDGPU/fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fabs.ll
  LLVM :: CodeGen/AMDGPU/floor.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
  LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
  LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
2024-01-16 21:54:58 -08:00
Matt Arsenault
47685633a7
AMDGPU: Make v4bf16 a legal type (#76217)
Gets a few code quality improvements. A few cases are worse
from losing load narrowing.
Depends #76213 #76214 #76215
2024-01-05 08:35:07 +07:00
Matt Arsenault
460ffcddd9
AMDGPU: Make bf16/v2bf16 legal types (#76215)
There are some intrinsics are using i16 vectors in place of bfloat
vectors.
Move towards making bf16 vectors legal so these can migrate. Leave the
larger vectors for a later change.

Depends #76213 #76214
2024-01-04 22:31:18 +07:00
Matt Arsenault
c7952d8860 AMDGPU: Add a few more bfloat codegen tests 2023-12-22 12:31:42 +07:00
Matt Arsenault
b01adc6bed AMDGPU: Strengthen some bfloat tests
Fix bitcast test, which was splitting apart phis intended to force
bitcasts that survive all the way to selection.

Disable the amdgpu-codegenprepare phi splitting, which defeats the technique
of using a phi to ensure a bitcast reaches all the way to selection. Also
add a variety of bfloat tests. These probably need revisiting to avoid the
cast folding into argument loads. Also round out set of bfloat bitcast and
ABI tests.

Add codegen tests for more bf16 operations The promotion of these works
contrary to the comment.
2023-12-20 19:33:45 +07:00
Jeffrey Byrnes
372115fadd [AMDGPU] Precommit test for i8 vector CopyToReg handling patch
Adds test to show impact on cross block CopyToReg & CopyFromReg handling for n x i8, and shows NFC on CC

Differential Revision: https://reviews.llvm.org/D159303

Change-Id: Ib6d9802dbebe8e3245e4ccfd4a6f23357de8c480
2023-09-13 11:27:15 -07:00
Jay Foad
f2c164c815 [AMDGPU] Do not wait for vscnt on function entry and return
SIInsertWaitcnts inserts waitcnt instructions to resolve data
dependencies. The GFX10+ vscnt (VMEM store count) counter is never used
in this way. It is only used to resolve memory dependencies, and that is
handled by SIMemoryLegalizer. Hence there is no need to conservatively
wait for vscnt to be 0 on function entry and before returns.

Differential Revision: https://reviews.llvm.org/D153537
2023-07-04 12:22:38 +01:00
Ivan Kosarev
813f6a495b [AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 12.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D152905
2023-06-23 13:33:06 +01:00
Matt Arsenault
d85e849ff4 AMDGPU: Convert some assorted tests to opaque pointers 2022-12-01 21:40:30 -05:00
Jay Foad
f510045d82 [CodeGen] Remove unneeded regex escaping in FileCheck patterns. NFC.
Take advantage of D117117 to simplify all {{\[}} to [ and {{\]}} to ].

Differential Revision: https://reviews.llvm.org/D117298
2022-02-18 16:10:56 +00:00
Matt Arsenault
729bf9b26b AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't
really much value in supporting the old attempt to vary the argument
placement based on uses. This resulted in more argument shuffling code
anyway.

Also have the option stop implying all inputs need to be passed. This
will no rely on the amdgpu-no-* attributes to avoid passing
unnecessary values.
2021-12-04 10:49:18 -05:00
Tony
2f499b9aff [AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at
system scope for the address spaces accessed. This will ensure all
relevant caches will be bypassed.

A volatile atomic is not changed and still only bypasses caches upto
the level specified by the SyncScope operand.

Differential Revision: https://reviews.llvm.org/D94214
2021-01-09 00:52:33 +00:00
Mircea Trofin
cdfd4c5c1a [NFC] Removed unused prefixes in test/CodeGen/AMDGPU
More patches to follow. This covers the pertinent tests starting with e,
f, and g.

Differential Revision: https://reviews.llvm.org/D94124
2021-01-05 19:18:30 -08:00
Matt Arsenault
06c192d454 OpaquePtr: Bulk update tests to use typed byval
Upgrade of the IR text tests should be the only thing blocking making
typed byval mandatory. Partially done through regex and partially
manual.
2020-11-20 14:00:46 -05:00
Sebastian Neubauer
a343b9b032 Revert "[AMDGPU] Insert waitcnt after returning from call"
This reverts commit ca907bfb57d8ad3ec3bcc2cff2abab7b1b933af6.

According to michel.daenzer,
> This completely broke the Mesa radeonsi driver on Navi 14. Xorg +
> xterm come up with major corruption & psychedelic colours.
2020-09-23 17:16:39 +02:00
Sebastian Neubauer
ca907bfb57 [AMDGPU] Insert waitcnt after returning from call
When memory operations are outstanding on function calls, either the
caller or the callee can insert a waitcnt to ensure that all reads are
finished.
Calls need some time to be executed, so if the callee inserts the
waitcnt, filling the instruction buffer and waiting for memory will be
interleaved, hiding some latency. This comes at the cost of having a
waitcnt inside functions that may not be needed as no memory operations
are outstanding.

For function calls, this is already implemented. The same principal
applies to returns: If the caller inserts a waitcnt after the call, the
callee does not have to wait and the return and memory operation can be
run in parallel.

This commit implements waiting in the caller after returning from a
function call.

Differential Revision: https://reviews.llvm.org/D87674
2020-09-23 12:17:59 +02:00
Matt Arsenault
71131db689 AMDGPU: Improve <2 x i24> arguments and return value handling
This was asserting for GlobalISel. For SelectionDAG, this was
passing this on the stack. Instead, scalarize this as if it were a
32-bit vector.
2020-09-16 11:21:56 -04:00
Jonathan Roelofs
7c5d2bec76 [llvm] Fix missing FileCheck directive colons
https://reviews.llvm.org/D77352
2020-04-06 09:59:08 -06:00
Matt Arsenault
34c8b835b1 AMDGPU: Don't fix emergency stack slot at offset 0
This forced the caller to be aware of this, which is an ugly ABI
feature.

Partially reverts r295877. The original reasons for doing this are
mostly fixed. Alloca is now in a non-0 address space, so it should be
OK to have 0 as a valid pointer. Since we treat the absolute address
as the pointer value, this part only really needed to apply to
kernels.

Since r357093, we avoid the need to increment/decrement the offset
register in more cases, and since r354816 the scavenger can fail
without spilling, so it's less critical that we try to avoid an offset
that fits in the MUBUF offset.

Restrict to callable functions for now to split this into 2 steps to
limit thte number of test updates and in case anything breaks.

llvm-svn: 362665
2019-06-05 22:37:50 +00:00
Matt Arsenault
b812b7a45e AMDGPU: Invert frame index offset interpretation
Since the beginning, the offset of a frame index has been consistently
interpreted backwards. It was treating it as an offset from the
scratch wave offset register as a frame register. The correct
interpretation is the offset from the SP on entry to the function,
before the prolog. Frame index elimination then should select either
SP or another register as an FP.

Treat the scratch wave offset on kernel entry as the pre-incremented
SP. Rely more heavily on the standard hasFP and frame pointer
elimination logic, and clean up the private reservation code. This
saves a copy in most callee functions.

The kernel prolog emission code is still kind of a mess relying on
checking the uses of physical registers, which I would prefer to
eliminate.

Currently selection directly emits MUBUF instructions, which require
using a reference to some register. Use the register chosen for SP,
and then ignore this later. This should probably be cleaned up to use
pseudos that don't refer to any specific base register until frame
index elimination.

Add a workaround for shaders using large numbers of SGPRs. I'm not
sure these cases were ever working correctly, since as far as I can
tell the logic for figuring out which SGPR is the scratch wave offset
doesn't match up with the shader input initialization in the shader
programming guide.

llvm-svn: 362661
2019-06-05 22:20:47 +00:00
Tim Renouf
361b5b2193 [AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.

SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58902

Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9
llvm-svn: 356659
2019-03-21 12:01:21 +00:00
Matt Arsenault
57b5966dad DAG: Handle odd vector sizes in calling conv splitting
This already worked if only one register piece was used,
but didn't if a type was split into multiple, unequal
sized pieces.

Fixes not splitting 3i16/v3f16 into two registers for
AMDGPU.

This will also allow fixing the ABI for 16-bit vectors
in a future commit so that it's the same for all subtargets.

llvm-svn: 341801
2018-09-10 11:49:23 +00:00
Matt Arsenault
8f9dde94b7 AMDGPU: Stop wasting argument registers with v3i32/v3f32
SelectionDAGBuilder widens v3i32/v3f32 arguments to
to v4i32/v4f32 which consume an additional register.
In addition to wasting argument space, this produces extra
instructions since now it appears the 4th vector component has
a meaningful value to most combines.

llvm-svn: 338197
2018-07-28 14:11:34 +00:00
Matt Arsenault
02dc7e19e2 AMDGPU: Make v4i16/v4f16 legal
Some image loads return these, and it's awkward working
around them not being legal.

llvm-svn: 334835
2018-06-15 15:15:46 +00:00
Yaxun Liu
2a22c5deff [AMDGPU] Switch to the new addr space mapping by default
This requires corresponding clang change.

Differential Revision: https://reviews.llvm.org/D40955

llvm-svn: 324101
2018-02-02 16:07:16 +00:00
Matt Arsenault
84445dd13c AMDGPU: Use gfx9 carry-less add/sub instructions
llvm-svn: 319491
2017-11-30 22:51:26 +00:00
Matt Arsenault
9a7e29ae91 AMDGPU: Use stricter regexes for add instructions
Match the entire _co as one optional piece rather than
a set of characters to match multiple times.

llvm-svn: 319275
2017-11-29 02:25:14 +00:00
Dmitry Preobrazhensky
a0342dc9eb [AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev}
See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765

Reviewers: tamazov, SamWot, arsenm, vpykhtin

Differential Revision: https://reviews.llvm.org/D40088

llvm-svn: 318675
2017-11-20 18:24:21 +00:00
Matt Arsenault
d1867c0345 AMDGPU: Don't place arguments in emergency stack slot
When finding the fixed offsets for function arguments,
this needs to skip over the 4 bytes reserved for the
emergency stack slot.

llvm-svn: 309776
2017-08-02 00:59:51 +00:00
Matt Arsenault
b34635550a AMDGPU: Return correct type during argument lowering
The type needs to be casted back to the original argument type.
Fixes an assert that for some reason is only run when
using -debug.

Includes an additional combine to avoid test regressions
from having conversions mixed with multiple Assert[SZ]ext
nodes. On subtargets where i16 is legal, this was producing an i32
register with an i16 AssertZExt, truncated to i16 with another i8
AssertZExt.

t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
t3: i16 = truncate t2
t5: i16 = AssertZext t3, ValueType:ch:i8
t6: i8 = truncate t5
t7: i32 = zero_extend t6
llvm-svn: 308082
2017-07-15 05:52:59 +00:00