Sometimes the PDF of a book is big because the book's packed with important illustrations and charts - like a textbook or journal paper.
Other times a PDF of a book is big because someone scanned it and didn't have trustworthy OCR, so they figured distributing images of text at 1.5 MB per page was better than risking OCR errors.
You clean up the data after you acquire it, not before.