| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nieksand 4914 days ago
	I don't see your python code defining a distribution or sort key for Redshift which is an important design consideration. (For my own use case of log analysis, I sort on datetime and use an "even" distribution). Also doesn't look like you ran "vacuum" or "analyze" after doing the loads to Redshift. So the query optimizer has no statistics to drive its decisions. And as others have pointed out, your 30 GB data set is pretty tiny. You could look at some of the in-memory DB options out there if you need to speed things up.