VISUAL always extends and clears selection on mode switching, so it doesn't fit. It's a bit challenging to explain in a brief comment, so maybe check "Improving on the editing model" section https://kakoune.org/why-kakoune/why-kakoune.html#_why_kakoun...