Over the past few months I've been working on a neural-network based web scraper for e-commerce websites. The aim is to be able to scrape product data from any product page (so far it extracts the name, price, main image URL and technical specification of the product).
I've developed a working prototype of the API along with a demo page (with rate limits so please use it within reason! ) in order to get some feedback before I carry on with the project.
Because the API is only a prototype, there are some features which are currently missing but will be added later on - for example:
• Only English sites that use GBP, EUR or USD are supported
• I haven't finished integrating my computer vision algorithms, which means that in some specific situations, the API might not detect a strikethrough and will therefore mix-up the "current price" of the product with the "old price"
• The service is running on a hobby heroku server so the API takes a few seconds longer than it otherwise would.
I would appreciate any feedback on the API, in particular:
• Is there any other product data that you would like to see (e.g product ID, delivery costs, etc)?
• What sort of applications would you use this API for once it's fully developed?
• Apart from e-commerce sites, what other types of websites would you like to see an API for (e.g news websites, real estate listings, etc)?
Hey - if this is true my company already has a pretty good solution for getting product info in a standard format from 10s of thousands of websites. My company also has to gather, format, and estimate dimensions and weight because we do only international shipping. Try out a random product url on zipx.com for an example.
Should we talk? My email is in my profile. I think there might be several ways your company and mine can help eachother actually...
Yes, a universal price tracker is a great example! Another potential use cases would be competitor price analysis so that you can react in real time to changes in your competitor's prices.
Well done for getting it set up! How are you training your model? Feeding in scraped product pages alongside metadata from an API to train it? And what are your training sources?
Very nice idea, looking forward to seeing how it develops!
It's currently a prototype, so I haven't fully finished integrating all the computer vision algorithms which means that it may miss some data in certain situations - but the purpose at this stage was just to see if this is something people would genuinely use and to get general feedback to help me decide what features to prioritise going forward.
I have a side project I've been meaning to finish, where I would definitely use something like this (basically identifying price arbitrage opportunities across luxury fashion retailers).
I will bookmark and keep an eye out for your progress.
Over the past few months I've been working on a neural-network based web scraper for e-commerce websites. The aim is to be able to scrape product data from any product page (so far it extracts the name, price, main image URL and technical specification of the product).
I've developed a working prototype of the API along with a demo page (with rate limits so please use it within reason! ) in order to get some feedback before I carry on with the project.
Because the API is only a prototype, there are some features which are currently missing but will be added later on - for example: • Only English sites that use GBP, EUR or USD are supported • I haven't finished integrating my computer vision algorithms, which means that in some specific situations, the API might not detect a strikethrough and will therefore mix-up the "current price" of the product with the "old price" • The service is running on a hobby heroku server so the API takes a few seconds longer than it otherwise would.
I would appreciate any feedback on the API, in particular: • Is there any other product data that you would like to see (e.g product ID, delivery costs, etc)? • What sort of applications would you use this API for once it's fully developed? • Apart from e-commerce sites, what other types of websites would you like to see an API for (e.g news websites, real estate listings, etc)?