Although it is more difficult in 3D, it is still a very solvable problem, that won't work always, but in 99% of cases if the camera angle is not very low.
Indeed, the checkerboard itself gives valuable information about the pose of the checkerboard, it is even used for calibration in multi-view geometry: https://en.wikipedia.org/wiki/Chessboard_detection
I wonder how much easier it would be if you narrow down the likely set of pieces for each square. There are obviously some positions for some pieces that are invalid (bishop on the wrong colored square), but there are probably a lot of other positions that are so uncommon that they could be discounted.
What would make it easier IMHO is to make it a top-down version only. Take a photo from the top, then the program breaks down the board into 8x8 squares, feeds each square into a classification algorithm that you will train on a bunch of hand labeled images. Fine tune the model as you gather more data.
Probably not much. How many are actually impossible if you factor in pawn promotion though? The only one that comes to mind immediately for any individual piece is pawns on their own first row and a board without two kings.
Compared to all the other feats of machine learning that have blown my mind, parsing a photo of a limited set of a handful of different piece variants, in two colors, that located on a grid, doesn't seem too difficult.