|
|
|
|
|
by nsonha
656 days ago
|
|
Most discussion I found about the topic is how to extract information. Is there any technique for extracting interactive elements? I reckon listing all of inputs/controls would not be hard, but finding the corresponding labels/articles might be tricky. Another thing I wonder is, regarding text extraction, would it be a crazy idea to just snapshot the page and ask it to OCR & generate a bare minimum html table layout. That way both the content and the spatial relationship of elements are maintained (not sure how useful but I'd like to keep it anyway). |
|