Hacker News new | ask | show | jobs
by Buneme 2210 days ago
Hello,

Over the past few months I've been working on a neural-network based web scraper for e-commerce websites. The aim is to be able to scrape product data from any product page (so far it extracts the name, price, main image URL and technical specification of the product).

I've developed a working prototype of the API along with a demo page (with rate limits so please use it within reason! ) in order to get some feedback before I carry on with the project.

Because the API is only a prototype, there are some features which are currently missing but will be added later on - for example: • Only English sites that use GBP, EUR or USD are supported • I haven't finished integrating my computer vision algorithms, which means that in some specific situations, the API might not detect a strikethrough and will therefore mix-up the "current price" of the product with the "old price" • The service is running on a hobby heroku server so the API takes a few seconds longer than it otherwise would.

I would appreciate any feedback on the API, in particular: • Is there any other product data that you would like to see (e.g product ID, delivery costs, etc)? • What sort of applications would you use this API for once it's fully developed? • Apart from e-commerce sites, what other types of websites would you like to see an API for (e.g news websites, real estate listings, etc)?

2 comments

We really need a service like this.

We are a furniture e-commerce. Our vendors don't provided detailed product feeds. We have to rely on scraping.

The most difficult part of scraping the data is that we need to scrape all the product options (Material, Color, Size ...)

each option is a different SKU. see https://www.article.com/product/11833/sven-charme-tan-sofa

We also need to build nlp models to understand product dimensions and weight (useful when estimating shipping fee)

Hey - if this is true my company already has a pretty good solution for getting product info in a standard format from 10s of thousands of websites. My company also has to gather, format, and estimate dimensions and weight because we do only international shipping. Try out a random product url on zipx.com for an example.

Should we talk? My email is in my profile. I think there might be several ways your company and mine can help eachother actually...

Thanks for getting in touch, I've just sent over an email
I've heard of the SKU problem that e-commerce stores have to face with their vendors. Would you mind if I contacted you to learn more about it?
No problem, you can send an email to me@buneme.com :)
That's really useful feedback, especially the part about NLP models, thanks!!
Cool project, but i have no idea for what I can use it, is it like something to build like a price tracker?
Yes, a universal price tracker is a great example! Another potential use cases would be competitor price analysis so that you can react in real time to changes in your competitor's prices.