The systems they tested against the LLMs are mostly used as a part of a larger system. A more fair comparison would be to use something like MinerU [1] and proper benchmark like the OHR Bench [2] and Reductos table bench [3]. This paper is really bad...