Hacker News new | ask | show | jobs
by cung 627 days ago
It took a bit over five years, but now checking if it’s a photo of a bird is the easier task.
6 comments

Is it? I assume that you are thinking of using a 3rd-party API endpoint to which you upload the image so that the service decides for you if it is a bird and which kind of bird it is. Or you use something like Firebase.

Because if that is the way you'd solve this problem, then just sending lat/lon to a service to determine if it is in a national park is even easier, as it's just a GET request.

I'm still unsure about what would be harder to set up locally.

https://tech.amikelive.com/node-718/what-object-categories-l...

Bird is in the default COCO dataset. I haven't look for birds in images, but for people yolov10 is fewer lines of code to detect if there is someone in the frame than it is to setup a flask server for the API calls.

Using a service for this would kind of suck if you're _in_ the national park, due to cell service. The ML portion of this is probably still a bit harder than the GPS bit.

ML:

    Grab yolov8

    (Optional?) Fine tune it on bird pictures

    Convert it to CoreML, do whatever iOS stuff is needed in XCode to run it
GPS:

    Get https://www.nps.gov/lib/npmap.js/4.0.0/examples/data/national-parks.geojson (TIL this exists! thanks federal govt!)

    Stuff it into the app somehow

    Get the coordinates from the OS

    Use your favorite library for point+polygon intersection to decide if you're in a national park

    Bonus: use distance from polygon instead to account for GPS inaccuracy, keeping in mind lat and long have different scales.
...actually the ML one might be easier, nowadays. Now I kind of want to try this.
> Use your favorite library for point+polygon intersection to decide if you're in a national park

it's like 50ish lines of code in C, just iterating over the points (with the polygon represented by arrays of points). The algorithm is linear with regards to the points.

True!
"easier tasks" is arguable and arguably wrong

"task about which you will find more easy-looking tutorials hiding the complexity under a blanket of 3rd party code and services" is better

You just described all computational tasks that don’t open with ‘first, dig up some sand and some copper ore’.
checking whether coordinates fall inside a national park is an exercise in computational geometry, surveying (the earth is not flat, what are coordinates on a sphere), databases, access to government data.

detecting birds is an exercise in gathering properly labeled training sets, neural networks and their topologies, matrix multiplication performance and/or orchestration of rented GPUs.

both of which cover interesting tasks, worthy things to learn, and are by no means easy.

"easy" is the bird recognition where you do an API call to a totem pole of third party services.

Except all the stuff millions of software engineers work on for billions of hours per year.
For a person to set up but definitely not how many cpu cycles are burned
> but now checking if it’s a photo of a bird is the easier task.

That depends on whether you care about getting the answer right. If you don't, it was always the easier task.

If you do, Seek by iNaturalist still can't do this job, and that's the only thing Seek is supposed to be able to do.

At what confidence level are we talking about? With these over simplified questions (as in the xkcd) my guess would be the asker assumes 100%.
You're not going to get 100% confidence with either problem. The GPS one might be easier to get high confidence with, but even here you have to worry about 1) the accuracy of the GPS coordinates from your camera/phone, which isn't that good, and 2) calculating the exact boundaries of the park from the public data. So you could probably calculate with nearly 100% confidence that you're, for example, within 5km of a park, but if you take the photo from a location close to the park's boundary, the confidence will go way down. If you're a meter or two from the boundary, forget it.
How could the gps position and the park boundary not be exact? Phones GPS give a 2 meters accuracy, and a park boundary is a well defined hard line polygon.

Being close to the border changes nothing, I can just add a buffer outwards the park polygon to account for that. Asking because I'm afraid I may be missing something here due to this being something I already worked on.

>Phones GPS give a 2 meters accuracy

Well I already pointed out that if you're within a couple meters of the boundary, you won't have good confidence because of this fact.

>and a park boundary is a well defined hard line polygon.

Is it? I'm no expert on parks, but surely some of them have borders along rivers. Many US states have such borders.

>Being close to the border changes nothing, I can just add a buffer outwards the park polygon to account for that.

That doesn't account for the 2m accuracy. What if I'm standing exactly 1m from the boundary when I take the photo? You have no idea if I'm really in the park or not from the GPS data.

I also have serious doubts about your 2m accuracy claim, based on personal experience. Maybe if you're standing in a wide-open desert with nothing around you, but anywhere else, the accuracy isn't that great, especially around buildings. GPS accuracy is terrible in cities.

> Is it? I'm no expert on parks, but surely some of them have borders along rivers. Many US states have such borders.

Depending on the country, but Australia has some [1]. I still think that there is a set of polygons that can be used to describe this border.

Not to argue against your point (I rarely get less than 4m of accuracy), but luckily

> but anywhere else, the accuracy isn't that great, especially around buildings. GPS accuracy is terrible in cities.

cities are (almost?) never in national parks.

[1]: https://www.nationalparks.nsw.gov.au/-/media/npws/maps/pdfs/...

>cities are (almost?) never in national parks.

Sometimes they are. See Washington, DC.

Anyway, the requirement is for determining if a photo was taken in a park or not. The resolution wasn't stated, however: just how accurate do we need to be? If I'm in a canoe in a river that borders a park, but the river isn't part of the park, but the shoreline a few meters away is, our algorithm might claim I'm in the park, when I'm really not. The requirement wasn't "somewhere near a park", but "in a park". Rivers change their courses over time, so some polygons aren't going to accurately describe this border.

Let's be real, GPS is much more accurate than whatever boundary for the national park that someone might come up with, where the park starts is really ambiguos unless there's a physical man made divisor like a fence.
If you need to know 100% that the bird is in the park at that precise moment it can be tricky. If you need to identify a Bird-of-prey in the Alpha quadrant you can understand the Klingon proverb a sharp knife is nothing without a sharp eye.
What does it even mean "YOU are in the park". What is YOU? If you standing on the boundary, are YOU in or out? Details mater :D
Exactly! What if you're standing on the boundary, with one foot inside and the other outside? These specifications are far too vague.
Your phone already does both automatically, so I’d call it a draw.