If you have parquet on S3, using an engine like Dremio (or any engine based on arrow) can give you some impressive performance. Key innovations in OSS on data analytics/data lake:
Arrow - Columnar in memory format;
Gandiva - LLVM based execution kernel;
Arrow flight - Wire protocol based on arrow;
Project Nessie - A git like workflow for data lakes