Hacker News new | ask | show | jobs
by w_s_l 2536 days ago
The index word suddenly gave me an idea: the entire internet transformed into tabular data, searchable and open to public.

Even if we got Google's data, there's a whole lot of scraping to transform irregular and disparate. Typically you would have to google a keyword, look through the search results, visit different websites (research mode), and then consolidate separate sources of truths to build your own understanding. Building a scraper for each website in the search results and displaying a complete table data of the website. Why click through 100s of pages of profiles or data when you can have it all in one view? Why bother with HTML, when all a data-centric individual desires is data. Getting to the data is so tedious and a long journey. Scrape the website, clean the data, make it available for consumption, schedule & consolidate updates. For instance, a hedge fund that scrapes certain group of websites to execute market orders for an automated trading system.

A tabular data focused search engine would return tabular data, there would be no HTML medium, just straight up raw data. For instance, instead of seeing the comments rendered in a normal browser, imagine a tabular data that describes all the username, post time, comment minus the hierarchy.

To build a focused crawler quickly, I came up with Web Scraping Language (https://scrapeit.netlify.com), and essentially what I want to do is hire people to write WSL to scrape the web and then sell a subscription to data-centric customers.

What do ya say HN?