Hacker News new | ask | show | jobs
Show HN: AI driven online database of company profiles (aihitdata.com)
14 points by jo_kruger 2752 days ago
3 comments

what component of this has anything to do with AI?
It is like iceberg - website is just a tip... Website is interface to database of company profiles. Each of these profiles was build from corresponding corporate website.. all automatically. The process includes website crawling, page categorization, language detection, entity extraction, structured data composition, fuzzy comparison, etc... The key feature of the platform is 100% automated process which allows repeat the whole cycle for every profile on a regular base, which allows us to compare structured profiles and see all changes..
For those who can't RTF About Page

What is aiHitdata? aiHitdata is a massive, artificial intelligence/machine learning, automated system that has been trained to build and update company information from the web.

What makes the data different? aiHitdata not only extracts data. It can monitor and understand the changes that occur on company websites; afterwards, it records these changes as time series transactions. This information is incredibly powerful. What are the benefits?

Most company information databases might tell you the name of a company’s CEO. aiHitdata will tell you when the CEO changed and show the resulting transaction, date and change details. Most query engines and company databases simply tell you what a company currently says on its website. aiHitdata shows you how the company has changed over time. Most company databases quickly become out of date. aiHitdata is constantly being updated. This enables you to understand what is happening to companies and to perform queries such as…

“show me all the engineering companies in California with a new CTO appointed in the last 9 months.”

Hence, you can find and list companies by what they are and see what is happening to them. This makes aiHitdata an extremely powerful tool.

Click here to see an example

What else is different about aiHitdata? It is up to date, and we mean really up to date. There are no humans involved in compiling its data. It is all fully automated. aiHitdata’s servers scour the Internet continually, 24/7, monitoring and updating company data.

How does it work? This is complex, but to give an idea…

aiHitdata is continually building and refining its own URL map of the web Its intelligent crawlers then extract the companies from this map (aiHitdata finds approximately 30,000 new companies each week). It identifies, categorises and extracts all of the Key Fields (see below) it finds on each company site It then checks/ quality assures and stores them. Next, it re-crawls each company site periodically, noting changes. When it spots a change, it records this as a time series transaction in its database. The above has resulted in the database that today:

has more than 15m companies (with the number of records growing each day); has in excess of 500m historical company change events (transactions); is adding more than 13m new company change events per month. How accurate is aiHitdata? Part of aiHitdata’s AI/ML system is an automated QA management array. This is one of the most complex parts of the system. aiHitdata monitors both completeness and accuracy of its data. It is focused on keeping transaction data quality at a very high level. This enables it to be used for predictive analytics.

aiHitdata collects and monitors more than 100 different fields of data. For key fields it is achieving and maintaining an accuracy rate in excess of 90%.

How does it work?
We have a platform running on more than 50 servers which includes crawlers, website categorizers, entity extractors, etc. It scans about 30 millions websites (categorized as corporate websites with content in English) on a regular base. The goal is to build a structured profile for every known company website.. and repeat this process every month in order to compare historical profile and generate "transactions" like "company X changed CEO", etc...