I wouldn't say it's gentle but it certainly is a great book. Great exercise problems. Some of the proofs are so elegantly done, especially the way calculus of variations is avoided.
David Mackay's book hand holds a little more than Cover and Thomas, although it's remit is more than just information theory.
Sorry just saw this. The first 50 or so pages are gold, not under a chapter proper. Then most of part IV for non-physicists for exposure to the statistical mechanician's worldview, especially chapters 27-33. I'm not an expert on sections 1-3, so I can't make very high value claims on their relative value. But, everytime I did look into topics covered in those chapters, I found clarity in explanation and the "how" of things by coming back to the book. The entire neural networks section contains tons of nuggets that proved both prescient and (mostly) timeless.