|
|
|
|
|
by abrichr
980 days ago
|
|
Thank you for the release! What can you tell us about this: > Our internal models (based on Fuyu) have extra capabilities related to our product. In particular, > 1. They can reliably perform OCR on high-resolution images > 2. They can do fine-grained localization of text and UI elements within those images > 3. They can answer questions about images of UIs Is this just a matter of additional fine tuning, or are there architectural differences? |
|