Hacker News new | ask | show | jobs
Instantly Turn Any Webpage into an API
6 points by zeeb0t 682 days ago
I've been working on a tool called InstantAPI.ai, designed to help developers quickly transform any webpage into a working API. It's particularly useful when you need to extract data from a web page or automate interactions with websites that don't offer their own APIs.

Here’s how it works:

1. HTML Fetching with ScrapingBee: We use ScrapingBee to fetch the HTML content from the target webpage. The tool handles dynamic content by rendering JavaScript, which ensures we get a fully loaded page.

2. Custom HTML Cleaning for Efficiency: After retrieving the HTML, InstantAPI.ai cleans the content internally to remove non-essential elements like scripts, styles, and metadata. This cleaning process not only helps to focus on relevant data but also increases both speed and cost/token efficiency when processing the content further.

3. OpenAI API for Structuring Data: The tool leverages OpenAI to interpret the cleaned HTML content and format it into a structured API response. It identifies the implied API method, processes any relevant parameters, and ensures the output aligns with the user's requirements.

4. Customizable Parameters: Users can specify details like API method names, response structure, and even the country code for web requests, offering a high degree of flexibility.

5. Built-in Error Handling: We've included error handling mechanisms to ensure reliable operation, making it robust for various use cases.

6. I also built a no-code solution which integrates the API through Google Sheets. This will be released over the next week.

The tool is built using a combination of custom functions and libraries to handle everything from sanitizing inputs to managing complex API interactions. One of the challenges we focused on was optimizing the processing speed and reducing costs, which we managed by refining our internal HTML cleaning process.

For those interested in seeing it in action, I've created a demo video: https://www.loom.com/share/9691221bb05347a298cc36b1bd8e0c8b?sid=2e5d70e6-ddbb-4f5f-9b84-3fab6963e885

I’d love to get feedback from the HN community, especially on how we can improve the tool’s accuracy and functionality.

3 comments

I've created an open source crm / crm creator (iceburg.ca). I can create a crm from any existing database, from text using AI or custom parameters or use a number of premade templates.

I was looking at adding creating a crm from any website/page.

I'm interested in discussing steps 1 - 3. Using scrappingbee is expensive. I've brought down the price of a crm creation through AI to about 1 1/2 cents (+3 cents if you want a custom cover image). Have you looked at running something locally. I'm using laravel and was considering using Dusk to retrieve the contents.

What are you using for cleaning? Regex removals of tags?

OpenAI API for Structuring Data is new. What are your experiences? How does pricing compare to gpt3.5?

Any plans on open sourcing any part of your stack. What does your sass look like?

I'm not sure what you mean re: ScrapingBee being expensive. I am scraping some pretty heavy pages and it's costing about a half of a cent for a page. Less again with OpenAI. All-in-all, maybe sub 1c per page. Maybe there is more you are doing with this CRM creation process you have.

Right now in terms of open source or SaaS ... I think it is likely at some point I will replace ScrapingBee, maybe even OpenAI, with my own version of these and then take it down to a single API key and bill users myself. However I am leaving it free (albeit with BYO keys) for the moment as I am still building things out. I will likely continue to offer free to those who were already using it before I switch to a different model.

As mentioned, this is just the beginning of this project. The next features include logging in to accounts, giving instructions, so that ideally you can say things like 'book me a uber for 3pm to take me home from work', and the API gets it done and returns you the confirmation and whatever else you asked to be returned.

Reminds me of Kimono Labs [0] and their scraper from way back in the day, or Roborabbit [1], now with AI.

[0] https://news.ycombinator.com/item?id=7066479

[1] http://roborabbit.com

RoboRabbit is interesting and aligns with what I've seen others doing in this space also. This product, and others I am building for, are part of a larger objective I have - which is to allow AI to both understand and fully use websites as if a human viewer themselves. Ideally you'll be able to ask it to book you an uber home, and it'll get the job done - even if uber didn't have API's etc. to make it possible.

It's why my approach here is 'all of the web, and whatever you want' rather than being restricted in any way.

Sounds like what Rabbit AI is doing too, with agents. I'm not sure if they're very successful though, since their product flopped.
It's... not what they are doing. It's what they said they were doing. But also, this isn't in a gimmicky (imo) hardware device. It's a general service so anyone can build on top of it.
Yes, I'm just talking about their supposed software, since it seemed pretty similar to what OP said.
cool idea, but I don't like how i need to use two separate api keys to connect to this api. can you make it simpler?
I'm intentionally not storing any details of the request, and certainly did not want to enter into grounds of storing folks API keys for obvious reasons. That being said, I could have it generated a salt encrypted key for you which could be decrypted by my service, this way you are giving it one key, which is really both of your keys to be used. Is that the kind of thing you had in mind?
I meant more like removing the bee dependency and using something that doesn't even need an API (like building it yourself or an open source scraper). if I could just plug my OpenAI key straight in and get started it would be a more seamless experience.
Hey there.. I just wanted to update you. I took yours and others feedback seriously, and I have now implemented changes so that the services is now paid. At $0.02 per AI scraped web page, it includes all the proxy, JavaScript rendering, and AI to get the job, all on a single bill :) Thanks for your feedback!
Ok, well that is definitely in the works! In case you use Apify, I've also now published it there and it uses my own keys for OpenAI etc. https://apify.com/zeeb0t/instantapi-ai