Hacker News new | ask | show | jobs
by mirimir 2260 days ago
I did "data science" for about a decade, consulting with plaintiffs firms and state AGs on antitrust and fraud cases. For each case, the work flow was roughly this:

-- write discovery requests

-- review production, and check out data and documentation

-- write supplementary discovery requests

-- review production, and check out data and documentation

[repeat as needed]

-- analyze data, and write deposition questions

-- help attorneys wring answers from deponents

[repeat as needed]

-- analyze data, and produce required output

-- write parts of briefs and expert reports

I generally did that in consultation with testimonial experts and their data analysts. Sometimes that didn't happen until we'd documented the case enough to know that it was worth it. And occasionally small cases settled with just me as the "expert".

It's a small industry, and not easy to get into, unless you know key players at key firms. But the money's pretty good, and the work can be exciting. I loved being that guy in depositions whispering questions to the attorneys :)

This all involved pretty simple calculation of damages, through comparing what actually happened vs what would have happened but for the illegal behavior. But-for models were typically based on benchmarks.

After data cleanup in UltraEdit, I did most of the analysis in SQL Server. I used Excel for charting and final calculations.

1 comments

I would expect "data science" is doing some form of numerical analysis. Otherwise it's just record keeping... with computers.
The hardest part of what I did was getting enough documentation to understand the data. Sometimes we got fixed width text files, with no in formation about column definitions. Or column names. Or what values in descriptive columns meant. Stuff like "class of trade".

But generally you're right. It was just simple calculations using sales records. But lots of records, at least several gigabytes, and sometimes several hundred gigabytes.

Record keeping is 90% of data projects.

The second 90% is basic math at high speeds.

Right, record keeping. But when it's not your data, things get complicated. Imagine trying to understand how another firm's data systems work. You can talk with managers, who know how the business uses data. But they have no clue how the data are stored or managed. And you can talk with IT people, who know how data are stored or managed. But they have no clue how the data are used.

And yes, speed. Aggregating hundreds of gigabytes was nontrivial to do quickly. I started with Access, and then learned to manage and use SQL Server. And eventually a multi-Xeon server with lots of RAM and SAS-attached storage.