Hacker News new | ask | show | jobs
by johnnunn 1365 days ago
I have a use case, where my company's application logs will be shipped to S3 in a directory structure such as application/timestamp(one_hour)_logs.parquet. We want to build a simple developer focussed UI, where we can query for a given application for a time range and retrieve a bunch of s3 blobs in that time range and brute force search for the desired string. I see that roapi offers a REST interface for a fixed set of files but I would like to dynamically glob newer files. Are there are alternatives that can be used too ? Thanks
3 comments

If you're already using parquet, it might be worth looking at the concept of datasets e.g. https://arrow.apache.org/docs/python/generated/pyarrow.parqu...
Amazon Athena + AWS Glue for schema discovery can do this.
Trino can do this.