|
About seven years ago, I was at a sushi bar and struck up a conversation with an older gentleman sitting next to me. He told me he was a developer and created systems for USPS. I am always fascinated by the technology used in large scale systems so I picked his brain for a good hour. From what I recall, he said at the key distribution centers, USPS scans every single mail (in standard envelop sizes) and in under a second, runs OCR for the destination address. Results from OCR are matched to the address database and if the match is significant, the mail is automatically diverted to the correct queue. Now here's the fun part - if OCR fails or print/handwriting is unreadable, a photograph is immediately sent to one of the hundreds of humans waiting to decipher the address and type it in (think Amazon Mechanical Turk). The humans have under 10 seconds to read, decipher, type, and submit the correct address. During this time, the letter is held up in a waiting buffer and the moment the correct address is available, it is diverted to the correct queue. I asked him if that means USPS took a photo of every single piece of mail and he said yes, they had to, otherwise nobody would ever get any mail due to the sheer volume of mail they had to manage. I asked if the photos of envelopes were saved forever and he said, well, I'm pretty sure they are but I'm not allowed to publicly admit that. I know it's a personal anecdote but that was seven years ago. I can't even imagine what they're doing now. |
I worked on the OCR systems. Fun fact: at one time, the USPS was the world's biggest user of Linux in a production setting. Their OCR boxes ran on Linux (until they were replace with SGI O2 boxes at a massive cost... but I digress).
Here's the path the mail takes: it is picked up by carriers from the mail boxes. Then dump trucks bring it to the P&DCs (Processing and Distribution Centers). There are about a 1000 PDCs in the country, I think. There, mail is dumped into a massive conveyor belt, where the first machine (AFCS, or Automated Facer Canceller System) makes sure that the mail is facing the right way, and is upright. Various heuristics are used for this. Here the mail is stacked nicely into flat boxes, vertically.
Postal workers then feed these boxes to the MLOCR (Multi-Line OCR) machines. These machines scan pieces at the rate of 13/second. After being scanned, the letter goes on a long loop before coming back to the beginning: this loop, about 3 seconds (not sure about this) is the latency: the reading machine has this much time to decode the address. Also at this time: a fluorescent barcode is sprayed at the back of the piece, giving it a unique ID. If the OCR machine can read the address, it is sent to a bin indexed by the first 2 digits (or so) of the ZIP code (assuming it's not local).
If the OCR can't read the mail, it is sent to a separate pile. Then a program called RCR (Remote Computer Reader) kicks in: a person sitting in some remote area gets the image, enters enough information to decode the address, and the results are collected (tagged by the ID of the fluorescent barcode). After a few hours, this separate pile is run through the sorting machine again: this time, the fluorescent barcode ID is used to match the results from the human, and a real barcode is sprayed on the front and the piece is sorted as before.
Now, there are variations in the above, but this is the gist of it.
Fun facts: the USPS aims to handle a piece at most 7 times. And when a piece gets jammed in the machine and is torn, it gets put in a "body bag" with an apologetic note.