Hacker News new | ask | show | jobs
by SOLAR_FIELDS 3475 days ago
Hopefully that is a dump as well. At least with Hadoop you have a little more control over resources with the right configuration. When we run Hive on our main HDFS cluster without limiting mappers and reducers it will happily bog down the entire cluster to give you what you need - but limiting the mappers, while slowing your throughput - eliminates the risk of resource hog for noncritical queries. Even then, copy of the data with dedicated resources is still preferred because without a lot of mappers and reducers your Hive queries are going to slow to a crawl.
1 comments

Solar_Fields and Bigger_Cheese, my apologies for not specifying the db. In this case I'm talking about a production level data warehouse specifically designed to perform complex queries and not a prod app db. Yes, I agree- best practice is to separate the two. Glad you found the article interesting.