Hacker News new | ask | show | jobs
by lotyrin 3151 days ago
Reminds me of my time spent being a customer to a real estate MLS. When industries traditionally rented access to information asymmetry, even if those rents have since diminished, there's a lingering cultural momentum not to "show your hand" by playing nice with information (in addition to these not being organizations with the insight and incentives aligned to be attractive or conducive to IT competence).
3 comments

I deal with MLSs regularly as part of my current job and this was the first thing that popped into my head when I read the parent comment. They are hell to deal with and we would pay a good amount of money to anyone who could standardize all of them and feed them to us in a consistent format.

Half of the time I spend debugging is on figuring out what is going wrong with different MLSs.

I also deal with MLS data and, on our end, it's in pretty good shape.

Of course, there is a whole department devoted to keeping it clean and consistent, so I don't have to deal with issues that you do.

Now that you mention it, perhaps we should provide an API for MLS data access.

Ive worked with MLS data before. It was terrible. I am currently working with ACH, DDF and I am astounded that our financial system even operates. Because if there is one thing you want in payment processing, it's wildly innacurate and inconsistent data that is lacking documentation.
Fuck MLS. It's supposedly a "standard", but every single MLS API endpoint is completely different. As in, there are very few recognizable similarities between any two feeds. The data structures are not actually standardized whatsoever; one MLS will have all listings in a single "table", while another will have 12 different tables that you have to figure out how to parse and/or join. There is some lingo that is fairly standard because of the industry, but the way the data is organized and represented is wholly unique per MLS. And there is no documentation other than an (optional, usually 2-3 word!) description on each field providing little insight.

The protocol used by MLS servers is kind of standardized... expect not really. There's no single way, for example, to accomplish a full download of an initial dataset. Some MLSes let you order by id with an offset, so you can paginate properly. With others you must use date ranges - but you're assuming that a create or update timestamp is actually meaningful, and surprise - they are not. Some let you run multiple concurrent connections so you can increase data throughput, while others only permit one slow connection that makes it takes 6+ hours to download their entire database.

Finally: good luck figuring out when a record is flat out deleted. The MLS may have a separate table where deleted records are supposed to go, but that table is always empty. The MLS nukes data rows with no way for clients to detect the change other than by downloading the entire database from scratch to find the gaps. This requires running hours of queue jobs PER MLS every 24 hours. It's so ridiculously inefficient.

Yeah... fuck MLS. End rant.

I'm convinced we work at the same company because you just expressed all of the same pain points that I currently have with MLSs at my job. We have to do so many hacky things to make sure that we follow each individual MLSs rules to get them to actually work.

They are absolutely terrible to work with.

I took the liberty of scanning very quickly through your post history. We live and work quite far apart. Seems MLS pain is universal. ;)
And for ACH, it's even worse: you assume success only in the absence of failure. It's shockingly bad.
Not coincidentally these are also industries with the most middlemen.
Mortgages, too...