The pivot used to fix divisibility in Smith normal form is stale. This will not affect correctness, but can lower efficiency since the outer loop will be executed more times. Thanks for @benquike of discovering this.
has_trait
mgpumoduleLoadJIT