Hacker News new | ask | show | jobs
Ask HN: Open-source sentiment analysis toolkit, a valid small business idea?
3 points by heshiming 1482 days ago
I worked for years as a freelance contractor in NLP. Clients and agencies generally consider sentiment analysis a simple matter. But the reality is off-the-shelf packages, like TextBlob are rule-based. When you put together an app, like a chatbot for customer services, they rarely work well. The cost to improve is nothing short of reinventing a new system, which most wouldn't accept. Plus, the consensus is that people prefer self-hosted in-house solutions they can improve. Reasons to avoid sending texts to an external API are 1, privacy, and 2, per-call pricing.

So I have this idea. I create an open-source toolkit for sentiment analysis. It will be data-driven, not rule-based. So it can fit most use cases well. It'll be a full package with GUI, for labeling, training, and an API server to self-host the model.

I keep a private repo of data to generate a good-enough model for a niche: customer service chatbots. A part of which is curated (scraped), and a part is labeled by myself. I sell a subscription to the model, during which period, I try to gather and label more data to improve the model. I can design the system to support partial training. That is, the customer can improve based on my model, using just hundreds of lines of their own data.

I figured if I priced it the right way (10-20% of hiring someone to create one), I can sell it to dev shops and agencies. I can go to Linkedin or Upwork to approach my customers.

Does this sound like a viable idea for a one-man shop?

3 comments

Some thoughts against it: Chatbots are a crowded market already. Chatbots that are delivering value for a company are very product specific -> lot of training needed As the customers seem not to understand the process of building a chatbot, it will be very difficult to find the 10-20% of whatever. Selling cheap is not less tedious than selling high end.

To proof me wrong, just try to sell the idea as a not yet existent or as a fake product. Build a landing page where people can subscribe to get further information, lead a few prospects to this page and count the subscriptions.

Hands down, the fake landing page idea is genius!
I'm not familiar with the market, but it sounds viable.

You seem to have missed one trick though: offering a 10-20% discount in return for adding their (labelled) data to the common training corpus.

Good idea. Although most likely I'm serving a dev shop in the middle, not the end user. So I doubt they would contribute.
Yeah, you would need to cut the middleman in on the deal somehow, or sideline them in an unobtrusive way (for example, set up or join industry-specific consortiums that will own the combined training data sets rather than you, but you help define the standards for that data).

You might find these interesting:

https://blog.gardeviance.org/2013/02/attack-defend-and-dark-...

https://blog.gardeviance.org/2015/12/open-source-as-weapon.h...

Interesting read, thanks. Although in my life as a developer, I tend to avoid relying on seemingly open-source project that has a well-funded company behind. The "open source as a weapon" idea is probably more suitable for big corporations. And few people consider controlling an intermediary (like data) as "control". If I own the data, I'd rather choose a "real open-source project" to consume it, not something that could end up charging me a fortune in the future. Without the foundation software work, data is nothing. It can't be easy to get people onboard.
Its viability is directly proportional to the emphasis on sales instead of technology.

Business is not based on making good stuff.

Business is based on selling stuff.

Or to put it another way, technical development is the inverse of a con.

Good luck.

Yes I'm aware of that. In my experience, the companies I worked for that are functioning properly is typically more than 60% effort dedicated to sales. It's about business development, partnering. But to me it's still a problem of the chicken and the egg. Because I'm also seeing tremendous obstacles in sales when the underlying product isn't a good one.
I'm sympathetic to seeing it as a chicken-egg problem because I do the same myself.

But I know that sales is both the chicken and the egg.

A company can sell other people's products and be a good business.

The only problem with that is that I don't get to make stuff.