| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by blindstitch 925 days ago

It is an interesting article and some of the results are unexpected, but the layout is way too long and could be greatly condensed. At first glance and speaking from how I like to lay things out in a paper:

- You do not really need to show the prompt interface. You can show it once and thereafter use a bulleted list format, or simply show the input image if it responded correctly. - Your figures should be about half of their size, they don't need to fit the width of the body. - For comparative results against other models you can use a table with colored cells, with the test name on the row and model names on the columns. - For the dog, show a side-by-side figure with the raw image on the left and the box on the right, and include the coordinates it gave you in the body. - In your conclusion show the full matrix table of comparative results and summarize the relative strengths of the model against the others.

In terms of the writing and methods your conclusion says little and your tests do not go into significant depth. For example with the tire image you could show that it succeeds when cropped but as the photo gets wider it begins to fail to correctly identify the text in the image's center. For example see the methodology and presentation this article used: https://dynomight.net/ducks/

Also, the OCR test is too simple, even a 20-year-old OCR algorithm would probably recognize that. Experimenting with progressive degradation of the image could show its strengths, and analysis could show its accuracy at each level of degradation.