I used to work at ITA. I did not work on the QPX product (which is the bit you're referring to) so take everything I say with a grain of salt. I can't talk about ITA specific info, but I can talk in general terms about knowledge that is public in the airline industry. Generally speaking, if you wanted to do what ITA does, you'd have to get three types of data: (1) schedules, (2) fares and fare rules, and (3) availability data.
Getting (1) and (2) is relatively simple since there are special clearinghouses where airlines publish their schedule and fare data. ATPCO does fares and I've forgotten the name of the organization that handles SSIMs for schedules but in either case, you pay for a subscription and they make data available to you. Simple. Except for the fact that the protocols used to transmit the data are baroque and painful, the data itself has all kinds of data quality issues, and airlines are really really dumb when it comes to thinking about the semantics of what they're trying to convey.
Specifically, airline standards groups typically go nuts specifying syntax while completely ignoring semantics. The result is that lots of carriers are "standards compliant" but functionally unable to communicate. There are a lot of N squared implementations in the industry (i.e., every carrier that needs to talk to each other writes custom code to talk to every other carrier).
The nice thing about (1) and (2) is that we're talking about relatively static data that is updated infrequently; many systems process fare data on a daily basis for example. You may wonder: how can this be since airline prices vary rapidly even within the same day? The answer boils down to availability data (3), which changes extremely rapidly. A rule of thumb is that you might see one availability change per flight segment per second. Getting access to (3) is a good deal harder than (1) or (2). Typically, it involves talking directly with carriers or with the more sophisticated carrier alliances that are smart enough to pool infrastructure for their members. Alternatively, one can buy access from various global distribution systems (Sabre, Galileo, Amadeus, etc.) but this can be a very very expensive proposition. Plus, if you're buying availability data from a GDS, odds are good that you're competing with them, which doesn't give them much incentive to play nice.
There are all sorts of problems you need to solve to do what ITA does, but for a host of reasons I won't go into, availability is the hardest.
As for when they were just starting, I wasn't there, but based on later conversations, I can say that Carl and Jeremy (ITA founders) were just incredibly gutsy. And shockingly smart. And really lucky. They threw something together that was complete crap but managed to impress some industry people just enough to offer them a static data file. They had no idea how to read the damn thing so they made some educated guesses and demoed a search engine using it to the industry folks, who compared the demo results to their own internal system. Obviously, they got lots of stuff wrong, but they got enough right to really impress some of the staff who then started giving them data dictionaries, etc. The rest is history.
I forgot to add: the buggy data issue is exacerbated by poor engineering practices common in the airline world. Imagine you were designing a format to describe flight schedule data. How do you represent a flight segment temporally? I'd probably use two numbers: A GMT departure datetime and a duration. Instead, airlines specify a local datetime for departure and another for arrival. So now, in addition to all the fun that comes from handling local times, you've also introduced a whole new category of data errors: I've actually seen carriers file schedules for flights that arrived before they departed (i.e., when you convert the departure and arrival times to GMT, the departure comes after the arrival).
To be clear, I'm pretty sure no one here will answer correctly with that much specificity. The people who know the answer are under NDA and will not talk here.
Well, there are only so many places you can get that kind of data, and none of them are secrets. SABRE is by far the most comprehensive for flights originating in the US.
Right, so you don't have actual knowledge, you're just speculating.
As someone who worked in the industry, I would seriously question your claim about Sabre. In any event, note that ITA deals with global flights, so flights originating in the US are only a portion of what they need.
Getting (1) and (2) is relatively simple since there are special clearinghouses where airlines publish their schedule and fare data. ATPCO does fares and I've forgotten the name of the organization that handles SSIMs for schedules but in either case, you pay for a subscription and they make data available to you. Simple. Except for the fact that the protocols used to transmit the data are baroque and painful, the data itself has all kinds of data quality issues, and airlines are really really dumb when it comes to thinking about the semantics of what they're trying to convey.
Specifically, airline standards groups typically go nuts specifying syntax while completely ignoring semantics. The result is that lots of carriers are "standards compliant" but functionally unable to communicate. There are a lot of N squared implementations in the industry (i.e., every carrier that needs to talk to each other writes custom code to talk to every other carrier).
The nice thing about (1) and (2) is that we're talking about relatively static data that is updated infrequently; many systems process fare data on a daily basis for example. You may wonder: how can this be since airline prices vary rapidly even within the same day? The answer boils down to availability data (3), which changes extremely rapidly. A rule of thumb is that you might see one availability change per flight segment per second. Getting access to (3) is a good deal harder than (1) or (2). Typically, it involves talking directly with carriers or with the more sophisticated carrier alliances that are smart enough to pool infrastructure for their members. Alternatively, one can buy access from various global distribution systems (Sabre, Galileo, Amadeus, etc.) but this can be a very very expensive proposition. Plus, if you're buying availability data from a GDS, odds are good that you're competing with them, which doesn't give them much incentive to play nice.
There are all sorts of problems you need to solve to do what ITA does, but for a host of reasons I won't go into, availability is the hardest.
As for when they were just starting, I wasn't there, but based on later conversations, I can say that Carl and Jeremy (ITA founders) were just incredibly gutsy. And shockingly smart. And really lucky. They threw something together that was complete crap but managed to impress some industry people just enough to offer them a static data file. They had no idea how to read the damn thing so they made some educated guesses and demoed a search engine using it to the industry folks, who compared the demo results to their own internal system. Obviously, they got lots of stuff wrong, but they got enough right to really impress some of the staff who then started giving them data dictionaries, etc. The rest is history.