Hacker News new | ask | show | jobs
by AutoEngineer 2967 days ago
Working for a German Big four Car OEM. We need the following for measurement data and we simply have no solution (except matlab, which is not good enough).

We want to plot big data (up to terabytes). Columns should be selectable by gui and nameable. The Data then should be be added to database with an ID. Everything should be usable without use of a scripting language.

Right now the terabytes of data have to be loaded in to ram just to see the first few lines and determine what the columns stand for. Now I know that there are editors that can load data partially but these have to be reinstalled which requires admin rights etc. This is a huge burden in a big company! The process of simply plotting, selecting and storing data takes a huge amount of time. The solution should be web based because no admin rights are availabe.

Often I am impressed how many tools and hacks exist simply to get one thing done: visualize measurement data. Excel is not enough because even the import of dot vs comma vs tab etc takes too much time and everytime has to be relearned. Engineers have to plot the data sometimes every few months and then you have a new excel version that autocorrects measurement data to dates or whatever.

In my opinion this would solve an obscene amount of work. Right now every engineer is hacking together some scripts that are extremely inflexible. When just csv-type data has to be handled.

Edit: this also applies to smaller amounts of data of megabytes. How can we plot them more robust than excel and then select x and y axis? I am pretty sure that we would love to buy a product that solves these issues.

3 comments

Thanks for input, but do You mind being a bit more specific about the following:

- would You actually be interested to buy this service?

- what sort of visualizations do You actually make? Do they need to be interactive? SVG? Size? How do You use them?

- what exactly do You do with data before its plotted except selecting columns? Is there aggregation or any kind of processing?

- how often is this actually used because You say 'sometimes every few months', does it mean its like a quarterly report?

- what other well established tools have You used other than Excel?

- how big is Your largest data? Size, rows, columns

- if it applies with small amount of megabytes, is there a reason beside simplicity why You can't use PivotChart in Excel? Or Excel in general? Or R/Python to generate it?

I am data scientist who regularly plots quite large data sets, and I like speed :) Its totally doable to build a service that You can run locally, load a CSV, read like 1% of data, play with it.. when you get what You want, load rest and wait a bit and get the visualization You want.

But depending on visualization requirements there may be many paths solutions.

You should try using approximate algorithms. They don't quite load all the data, but are able to give approximate (near perfect) statistical results whilr consuming orders of magnitude less data.

Count sketches, Reservior sampling and similar methods come to mind.

If I am understanding your problem correctly, I did that for a large American automotive electronics supplier back in the 1990s - though back then 30-40 megabytes of data was pretty big. We trained a bunch of American and Japanese engineers on how to do that, but I don't remember any Europeans.

I think I have an email address in my profile; feel free to send me something. I am fairly certain that your needs can be satisfied with existing Unix tools. Then again, the reason I worked on the problem in the 1990s was to free up engineer time so they could do more valuable things. A gui and other tools could be worth paying for if the bosses have that mindset.

Thanks for your answer. Currently engineers can:

a) try to plot their data alone and spend time on hacking the stuff together. This takes time as the guys doing it aren't accustomed doing it daily. This happens accross all kind of divisions.

b) ask another team (with data scientists) for their support. Maybe the engineer has to write a ticket, or the person who should be doing it has other tasks, is in vacation, not willing, not replying to the request etc.

Either way hours are easily spent on solving this seemingly simple task. The amount of time spent is simply staggering.

Unix would also be my personal choice. But getting the right to put a unix machine into the network for a single user is extremely difficult. Windows, Internet Explorer and temporary admin rights are the work environment that almost everyone has to use. That's why I think a web based solutions is the only viable option.

I work in a similar space and one of the tools that might solve your problem is exploratory.io.
I'm located in Germany as well and would love chatting with you.

Is there a way I can reach out to you?

Thanks!