Hacker News new | ask | show | jobs
Startup Pivot: Small (data) Is The New Big (data) (insights.qunb.com)
37 points by Mercutionario 4648 days ago
7 comments

A point that a lot of nominal Big Data startups miss is that genuinely large-scale data management and analytics are not driven by visualizations at all nor fit in a web-driven SaaS-like environment. The purpose is to answer a complex question from unimaginably large volumes of data, not to draw charts and graphs. It is often too I/O intensive for virtualized clouds and the visualization component is almost superfluous to the purpose. Most of the problems that need to be solved in Big Data are low level, down at the computer science and infrastructure level. Many of the use cases are intrinsically poorly suited for web-based SaaS type offering.

To make matters worse, many high-value Big Data analytical problems are (literally) not meaningfully visualizable except for marketing purposes. It is rather tricky to visualize an analytic product when there are a hundred critical values that need to be rendered in some fashion for every pixel your monitor can display. A lot of high-value analytics have this characteristic but most of the nominal Big Data visualization tools ignore this case even though it is arguably the most important one.

Consequently, while labeling your startup "Big Data" is trendy and fashionable, there are very few genuine Big Data startups. Adding value in this market requires a combination of serious theoretical computer science chops plus very creative interface design. Few startups are actually addressing the needs of this market and are instead assuming the market wants the web app they have the skills to produce.

Agree! My personal experience: our customers at Keen IO constantly demand better visualizations and many expect a traditional analytics frontend like GA. However... our highest value customers pay for our API capabilities, not our line charts. Storage and querying at massive scale is the hard part and that is worth thousands of dollars a month, not tens or hundreds.

Stunning visualizations and a better web experience are definitely something we want to do when timing allows, but so far our true customer value is in the backend and our APIs.

This is presented in a much harsher light, but you're exactly telling the story the way it happened for us.

We got completely blindsinded by the idea of a "big data product" and we quite naively replaced "product" by "trendy webapp with shiny charts" in our minds, whereas there is definitely the place for a technological product adressing those needs, but those are deep tech product paired with services because they require setup & maintenance etc... For instance our friends at infochimps do it really really well, and they're on a good path to a big data product.

I hear you on visualization. A lot of big data is just, "I have a billion row, hundred column table that I want to join with 5 or 6 other tables, and do some time series math." It's computationally intense linking of tables, but not necessarily visualization.
There's a lot of money to be made in the smaller verticals. I think for a startup, there's a lot of opportunity to allow people to just manipulate and sanitize data in simpler spreadsheets.

For example, one thing I'm being forced to implement myself has been a lot of string manipulation operations to sanitize different kinds of data I'm playing with in spreadsheets.

Even just having something misimported wastes a lot of time.

OpenRefine isn't bad, but can only get you so far. That being said, if I can come up with a complete solution myself, I wouldn't mind just adding it to the suite of tools I'm already offering :)

I'm also wondering about different kinds of tools already out there though.

I once made a simple program that would create excel spread sheets from the input of a barcode scanner. Took me about a day. It made a big impact on the company. Before, they would spend 2 days creating the spread sheet. My program had it done in minutes. They were drop shippers, and went from about 25 pallets a week, to 50 (and kept growing). All in a matter of less than one month. The program was less than 200 LOC.
Exactly! It's a simple concept for a programmer to think about. (A 2d array? How hard could this be?) but really it's crazy how annoying it is to do certain things. Excel has math down pretty well, but encoding other kinds of data or even compiling the data to a spreadsheet is a significant problem for most people.
The biggest problem that company had was fetching the image for each product (it fed from one API). They had to manually insert the image for each one. Imagine a pallet with 200 different products in it! All I did was create a web GUI where they would upload their orders, then the program would fetch all the data, insert the images in the correct row, and generate the file for them. But the file would be sent off to a workstation that was operated as a printer server/queue station. After that, the spreadsheet was automatically printed and the shipping department would get the document in a nicely formatted manner. It weird, but I get excited talking about it. Its the little wins that really count.
Excel is good enough for most problems that small and medium sized firms have. Big Data tools and techniques are only worth the effort when the problems to solve are big. That's not small business.
Exactly my point, I'd like to post something on Excel as the real and ultimate data tool soon.
It's amazing how much of the world financial system is built on the back of Excel too. Probably the other extreme there - a little too much VBA where bigger and better tools would work.
I think there is definitely an opportunity to help companies understand their GA data, but our company Keen IO is proof that Big Data is not too fat for startups. We're finding a big opportunity in helping developers build custom analytics backends and white-labeled analytics. We've found the big bucks are made supporting customers with truly big data challenges. We can provide the scalability and reliability that would be arduous and expensive for them to build in-house.
Let's say I have a data analysis startup. My market is dentists with three or less offices. How would your Big Data solutions work for me?

(Real question. I am developing this business as we speak).

That's awesome! This isn't really a "Big Data" problem - dental transactions don't happen in the scale of billions per month. However, the transactions themselves are very valuable for the dentists --- that's great for you! You can provide insight without having to crunch a zillion data points.

This is an example of an industry vertical analytics solution. Our hypothesis is that analytics products will be created to support all kinds of use cases like this (insurance analytics, manufacturing analytics, distribution analytics, etc). We want Keen IO to be the platform on which people build those solutions.

The way we can help you is to first make it easy for you to reliably collect data from all your dentists (with client libraries). Then we expose all that data, and of our analytics capabilities (e.g. queries), by REST API. You can log into Keen IO and create a line chart, then copy and paste the javascript right into your site. Now you can build a completely white-labeled website for your dentists, while we take care of storing and querying your event data. Our scoped API keys will allow you to secure the data so a given dentist can only see data for her offices. You can also create an internal dashboard for your team so you'll able to do analysis across all the dentists. Hopefully you'll discover industry trends and benchmarks that you can use in marketing reports or to resell to the dental industry (assuming your dentists allow this!). I bet they would like to know how they compare to other dentists...

Anyway, this is too much fun. Let me know if you want to brainstorm more!

You mention collecting data, which has been a big issue in this market. The local dentists (this is an offline startup) use a closed sourced program that does not export data. The only way to retrieve any data is to use automation scripts that "read" from the screen and dump it into CSV files (they use excel a lot). Once that is done, the script calls another one that sanitizes the data and inserts it into the database (PostgreSQL, in this case).

I currently have a flexible API that allows me to make complicated queries by sending commands through an HTTP call. Say "GET /clients/where/__last-visit/20132009" and so on.

On the front end, I have a fairly custom dashboard for each dentist (they all have specific needs due to different business sub-specialties). The dashboard calls the API and creates the reports automatically. They are update n amount of times a day, depending on the metric being observed.

Given this information, and the one you provided me, can you give me your insight into how different your offering would be?

My aim is to reduce complexity, and reduce business costs. Without eroding the service or client experience. I'm interested how my "small data system" would compare to yours in those terms. It may sound like I'm being pedantic, but I am genuinely interested in learning more. You could be the solution that saves me hundreds of hours of development time.

Hmm... that does make your data collection tricky. Ideally you have something that captures events as they happen, in a nice format, rather than waiting on summary report views.

Looks like you already built an API similar to ours, but if you ever reach scale/maintenance/availability bottlenecks then check us out.

Most of our customers use charts that query in realtime when the page loads or the user adds a filter or whatever, but batching reporting like you describe is a way to optimize the page load time.

Generally we want to help developers before they already built the solution themselves. I think your homegrown solution is probably exactly what you need right now to validate your product in the market. When you want to invest more in your architecture for improved performance, scalability, or reliability, then you can consider building v2.0 on Keen IO :)

I appreciate the response. The system is still fairly new, and in development. My options are still pretty much open. Just waiting on more paying customers (but aren't we all?).
I can certainly relate to the challenges of working on a big data platform intended to immediately satisfy varied customers...

But mainly I want to say I'm super impressed with the Google Analytics 'storification'. I can imagine the difficulties in bringing that level of quality to myriad data sources, but I'm excited to see you succeed!

Thanks that's awesome feedback!
That's encouraging, thank you!
Heads up: when JavaScript is enabled but analytics is not (eg. Ghostery blocking HubSpot), the site falls apart badly.
Crap! Thanks for the heads up, do you have any idea how we could fix that?
Did not see your interest in fixing this until today.

Just verify the expected variables/methods exist prior to using them (lines 99,197,365 right now):

   if(hbspt) hbspt.cta.load(
   ^ added ^
Maybe a few more buzz words and I may have clicked the link.
Touché. Instantly new big data FREE for YOUR pivot might have done the trick, though.