Sure you can. You're just adding two (probably normal-ish) distibutions: the distribution of human response times for processing the chess move, and the distribution of human response times for parsing the instructions.
You are right, of course, and I knew I was going to regret the use of even the toned-down term "probably normal-ish". The question is whether the one can meaningfully ask which text is faster to parse, despite the "noise" the comes from the time to actually perform the chess move. I suspect the answer will be 'yes'.
I think the answer will be 'no' as solving a chess problem is a learned skill and not as innate as reading and understanding text and the time delta of comprehension between two similar text passages << time to solve chess problem distributed between users of low and high skill.
Why would the distribution of user skill differ across test groups? There's no reason it should. The expected time to solve the puzzle itself would be the same across both groups.
In any case, they are typically long tailed and weird looking. Not normal, at least in my experience of working with them