| speaking from experience with dpi. We did a POC project on analysing dpi data with hadoop,spark and other big data data tech. You are right about the volumes, but wrong about it being impractical. The volumes with a relatively small opco: - +-7m subs - 250gb just for the protocol classification. * - Then you also have url logs etc Key factors that reduce the costs and investment: - commodity hardware (with hadoop etc) - distributed - query patterns - you do not need to store every single record. The data can be aggregated up to hourly, daily, monthly the older it is This is what we did, data was aggregated which significantly reduced storage. Tested various options:
Hive, hbase, druid Edit:
* per day |