* Image -> Text
* Image -> structure/shape
* Figure out hierarchical structure and graph nodes/edges from extracted shapes and text