There's already software that does this: https://github.com/master-of-zen/Av1an
Encoding this way should indeed improve quality slightly. Whether that is actually noticeable/measurable... I'm not sure.
I've messed around with av1an. Keep in mind the software used for scene chunking, L-SMASH, is only documented in Japanese [1], but it does the trick pretty well as long as you're not messing with huge dimensions like HD VR where you have video dimensions that do stuff like crash quicktime on a mac
ffmpeg and x265 allow you to do this too. frame-threads=1 will use 1 thread per frame addressing the issue OP mentioned, without big perf penalty, in contrary to
'pools' switch which sets the threads to be used for encoding.
[1] http://l-smash.github.io/l-smash/