| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by callmeed 3156 days ago

Having done a ton of scraping in the past (especially around ecommerce and products), this looks pretty cool.

A couple comments in general:

1. Personally I think its better to be great at extracting one kind of data instead of average at many types. It makes sales and growth efforts easier. Pick one of those things (products, recipes, social, etc.) and just focus on that and get great at it.

2. I don't think you need the credit <-> request abstraction. Anyone using an API knows what a request is (I hope).

Now, a few comments regarding products specifically:

1. I got 500 errors on a couple random product URLs.

2. On an Amazon product that's on sale, I got back the original price but not the sale price.

3. If you truly want to be GREAT at scraping products, the 2 things most people in this space can't do are: (a) extract ALL high-res images for a product, and (b) extract a product's options and variant data (colors, sizes, etc. and availability for each combination)

Personally I think there are a ton of opportunities in this space. This is a good start and I wish you the best.

4 comments

RussianCow 3156 days ago

I think the credit concept was created solely for this reason (from the page):

"There is only one exception if the page should be rendered with a full browser (not headless). In this case, 5 credits get charged."

link

linkfish 3156 days ago

Thanks a lot for all the feedback! 1. Yes will think about it. For just the API it would make definitely sense. However because the same technology currently also powers the bookmarking service which has to support as much as possible does the API also. 2. Exactly what RussianCow said. Honestly not a big fan of it either but that was the best I could come up with to accommodate that.

About the product.

1. It logs all requests which had issues with the more descriptive cause. Always go through all of them and fix the issues. The more people use it the more stuff breakes and the product can be improved. So I guess will get way better in the next days ;-) 2. Will also check and fix. 3. Will definitely look into that!

If you run into more issues or have more comments would love to hear them here or at api@link.fish . Thanks again!

link

matt_wulfeck 3156 days ago

> Personally I think there are a ton of opportunities in this space.

Totally agree, the problem I see is that when you become big enough to be noticeable, websites start growing ban hammers at you for flagrant disregard of TOS. Working around that stuff becomes an art in of itself.

link

sharpshadow 3155 days ago

One solid way would be to offer the service as browser addon. With that you can avoid any blockage, because the user itself is doing it.

link

tomascot 3156 days ago

What could cause that that you mention on point 2.

Are they "caching" responses or that offer is tailored to your user/cookie?

link

callmeed 3156 days ago

My guess is either (a) they're pulling the original price from a DOM element and not checking if there's also a sale price (most sites with sales prices will show the original and the new price), or (b) looking for schema.org product data and not looking at the correct item [0].

[0] http://schema.org/Product

link