Hacker News new | ask | show | jobs
by skwirl 4737 days ago
According to the PRISM slides, the program costs $20 million per year. This article doesn't mention that, although it should, because it is telling. If the author believes that this program could be implemented at a minimum of $187 million per year, then that $20 million claim is problematic.

Either the $20 million claim is wrong, and then all the information on the slides is suspect, or it is correct, and the scope of PRISM is much smaller than is widely believed and is believed by the author of this article. Or the author of this article properly understands the scope and is in error in his calculation.

7 comments

PRISM is probably just a small part of the all-encompassing spying program.
Right, PRISM is just the project that gives the NSA access to data stored on other (Google, Facebook et al.) servers. The active eavesdropping and other systems aren't included.
The data storage, processing, and development costs for the bulk of the programs that intercept/store the raw data are likely not included, either.

Based on the new slides[1] from WaPo, PRISM collection spans many other non-PRISM programs, such as the now-known MAINWAY (internet metadata), MARINA (internet content), and NUCLEON (voice content).

[1] http://www.washingtonpost.com/wp-srv/special/politics/prism-...

One of the fundamental issues with these discussions is that things such as what you're saying get thrown into the mix when we're talking about things we've actually seen evidence for.

We've, for a very long time, said things like, "this is probably happening". That is in no way whatsoever a novel idea. What is novel, and why these discussions are happening so frequently now, is that we have evidence that a 20mil/year program is actually happening.

So when we're talking about things we have evidence for, let's please avoid throwing in conjecture.

Either the $20 million claim is wrong, and then all the information on the slides is suspect, or it is correct, and the scope of PRISM is much smaller than is widely believed and is believed by the author of this article.

Multiple replies below have already questioned this either-or choice you present. From what I know about government agency presentations to higher-level authorities who set budgets, the likely claim on the slide is that the marginal cost of PRISM-as-such in an environment in which NSA already has other programs and the facilities to run them is just an insubstantial $20 million. And on the more extravagant assumptions of the submitted article, that might very well be a true claim for a PRISM program that gathers and analyzes quite a lot of data. That's especially likely if NSA has low-cost in-house software development capabilities, as it surely does.

It's funny, when I first saw the slide I assumed that $20mm was the anual fee for other intelligence organisations to gain access to PRISM.
If you assume that the servers only retain data for one month, then the server costs are cut by a factor of 12 and you end with €168M/12=€14M (roughly $18M). And a total cost of $22M.

Additionally the posting assumes that all the data is stored, that is a lot of cat videos. With decent preprocessing you can probably cut the data rate by a rather large factor ( I would assume at least 100, since you do not need to store warez or the NYT homepage.) Then to do the opposite estimate, by assuming that the system is CPU bound, one needs hardware to process 120 GB/s. With roughly $10M you can then buy a few thousand machines, and your PRISM software needs to handle something like ~50 MB/s per machine. ( Which may or may not be a reasonable data rate, depending on the sophistication of the algorithms, and how much can be discarded very easily.)

Big Data is just data before grep is applied.
Right - I'm off to write my "email stored steganographically in cat videos" service...
> This is a worst case scenario that does not include potential discounts due to renting such a high volume of hardware and traffic or acquiring the aforementioned hardware (which incurs a higher initial investment but lower recurring costs) .

"worst case scenario" is emphasized in the article.

The author counts the storage on a yearly basis (servers to store a year of data). If you allow an expiration date for the records (let's say 4 years), after that period you can spend less on hardware, as you can free space from the old records. Then you only need to spend money on the traffic difference (as the traffic would increase in 4 years).

As the storage boxes in the article also have a nice CPU, the collected data can be indexed and then compressed, saving a lot of space.

Given that the Internet grows exponentially year by year, while the cost to store a bit of information drops in a similar fashion, I doubt there is any money to be saved by deleting old data. The save in system complexity is likely to handily outweigh the additional cost of storage, not to mention it isn't worth one iota of frustration and bad reviews if data an analyst wants is not available.
$20M doesn't necessarily include infra (e.g. which is already covered in Utah), could just be program running costs.
Author says it is worst case upper bound.