The options might have some difference on an older GPU: limiting register usage to 80 per kernel might allow for more active threads on each SM, but those threads will instead employ slower local memory in place of faster registers. As C0der says, try both.
On the other hand, the GPU workload for v2's shorter chain length is very light: speed for C8->CW lookup will be determined mostly by disk search anyway, so I guess you will not notice much (if any) difference overall.