|
|
|
|
|
by sugarfactory
3632 days ago
|
|
Extracting machine-understandable meaning from web pages is much analogous to extracting text from images. Fortunately, we usually don't need to process web pages using fancy yet hardly accurate algorithms in order to extract machine-readable text from web pages. Why? It's because we agreed to use character codes to codify letters and most of the time text is encoded using some character code, which makes it unnecessary to OCR pictures of hand-written letters to programatically process text from web pages. These kinds of program wouldn't be needed if only the same thing had happened for page structures, if HTTP included page semantics. |
|