|
|
|
|
|
by jillesvangurp
2757 days ago
|
|
If you have json line formatted stuff (or csv) and an aws account, you can do some nice things with Athena and SQL. We have a few simple backoffice tools that I've implemented around simple sql queries on data dumped from various systems that we have in json format. Awesome, if you want to do some quick selects, joins, etc. If you are going to process this amount of data, don't load it all into memory and process line by line. Also do that concurrently if you have more than one CPU core available. I've done this with ruby, python, Java, and misc shell tools like jq. Use what you are comfortable with and what gets results quickly. One neat trick with jq is to use it to convert json objects to csv and to then pipe that into csvkit for some quick and dirty sql querying. Generally gets tedious beyond a few hundred MB. I recommend switching to Athena or something similar if that becomes a regular thing for you. |
|