Hacker News new | ask | show | jobs
by bigger_cheese 3305 days ago
">Are there any good frameworks that allow for processing, caching, data visualization (layout -> data population -> rendering), then exporting to some format (PNG/PDF/TeX)?"

I use SAS for this in my Day Job it's not a free program but powerful for this type of stuff.

I typically use SQL queries (via SAS's proc sql command) to manipulate and process my data but you can also programatically manipulate your data sets using SAS's "datastep" language.

SAS has support for macro expansions which make some of your examples (like manipulating 10 sensors at once) pretty trivial. But this is getting into programming language territory I would not expect someone new/unfamiliar with programming to grasp all of this intuitively.

edit: Heres some code I have in production that counts how many (of 8) sensors are reading high in a given time frame.

array aads (*) TP_AD1_TOP_STACK_TC1 -- TP_AD1_TOP_STACK_TC8; NO_AD1_TEMPERATURES_HIGH = 0; do j= 1 to dim(aads); if aads(j) gt 160 then NO_AD1_TEMPERATURES_HIGH = NO_AD1_TEMPERATURES_HIGH +1; end;

Downside is that SAS is a commercial package and it is not free I Have heard a lot of good things about "R" which is supposedly quite similar but have not had opportunity to use it myself.

3 comments

As someone who has used SAS for many, many different projects: it is terrible, vastly inferior to Pandas or R, and the only reason to ever use it is when you're forced to. Even simple stuff like functions that operate on data have to be hacked on with macros.

Case in point, your production SAS code could be replaced with this Pandas code (and the R code would look very similar):

  temperatures[[TEMPERATURE_COLUMNS]].apply(lambda t: (t > 160).sum(), axis=1)
or if your data is in proper long form

  data.groupby('time').temperature.gt(160).sum()
I'd like to get my analysis systems as "inclusive" as possible. I'd be using my internal SQL server and just fall into python for my processing if I didn't care about sharing my work.

SAS looks good though. I've looked at it many times and it is a clean solution if you really are in the "big games".

Yeah that is a good point trying to separate analysis from database.

My work is going opposite direction unfortunately we are starting to use Hadoop makes it quite difficult to do things "outside of the database" there is just too much data to work with locally.

SQL plus R is a good combo.
Funny you talk about SAS that way.

In my former team, we used SAS for a while and once I introduced the team to Pandas, they happily ditched SAS.