Hacker News new | ask | show | jobs
by rog211 2498 days ago
Do you use all public data like census, BLS etc. or do you have paid data as well?
1 comments

We do not use census data, because they are extremely outdated. It's part of the reason we can make accurate predictions before other companies can, since they do look at census data.

We use alternative data, which has recently become popular in the finance industry. For example, if you ask executives at a big company what their profit outlook is, they will always be optimistic, otherwise, their stock might decline and they may panic the market.

If you waited until the quarterly announcement, then you would be finding out at the same time as everyone else, and it's delayed information.

However, some people have found that you can more accurately predict a company's outlook on their quarterly performance by monitoring job boards and see how many open positions the company is hiring for. This allows people to gain insight and act before the rest of the market catches on.

We use the same approach but for real estate. For example, if you monitor the number of french bull dogs in a neighborhood, you can accurate predict median income values for that neighborhood before any official statistics. This is because those dogs are very expensive, so someone willing and able to spend a few thousand dollars on a pet tend to have a higher economic background.

We do use some paid sources such as satellite imagery and some data sources require you to pay for their api like our weather data vendor.

Thx for the response. We are trying to use data to help understand price movements in my own properties and I was trying to use ACS5 data, but when you get into obtaining access to tax record data it can get expensive quickly. Are you guys using any tax record data to obtain housing prices or ACS5 to get median income?
Yeah data is always a big headache haha, so I feel your pain.

We use MLS data to obtain housing prices historically. We don't use government data for median income, instead, we estimate it directly by tracking social media posts from the neighborhood that are public.

We then run image analysis on them to detect features like the types of dogs people have, types of cars, and other stuff. It's not 100% accurate for sure, but it's given us a pretty good understanding of the median income for neighborhoods, and the data refreshes in real time too :)

Very cool thanks for the insights! To run you model you must have some baseline data to be able to train against median income. How do you do that?