If a `do concurrent` loop is offloaded then there should be no CUDA data
transfer in it. Update the semantic and lowering to take that into
account.
`AssignmentChecker` has to be put into a separate pass because the
checkers in `SemanticsVisitor` cannot have the same `Enter/Leave`
functions. The `DoForallChecker` already has `Eneter/Leave` functions
for the `DoConstruct`.
In section 3.4.2, some example of illegal data transfer using expression
are given. One of it is when multiple device objects are part of an
expression in the rhs. Current implementation allow a single device
object in such case. This patch adds a similar restriction.