"this images comes from the persons POV that ask you questions"
instead of "a person holding something" "you are holding xyz" wold be better