Hacker News new | ask | show | jobs
by iamsalman 4224 days ago
Three components:

1. Object recognition (there's dog and frisbee in the photo) 2. Object localization (Dog and frisbee's ROI in the photo) 3. Relation estimation (Based on X factors, the dog might be chasing the frisbee).

Not sure what you meant by spatial relations (localization?) but recognizing (what) and localizing (where) would be key to drawing relationships between objects.

Really impressive work but definitely not a leap.