| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ankeshanand 2970 days ago
	One thing to note is that the camera viewpoint (it's position, roll, pitch, and yaw) is fed along with the images during training. Requiring access to this ground truth makes this method very constraining to use in practice.

1 comments

ehsankia 2970 days ago

What kind of use cases are you thinking of where this wold be constraining? Don't many computer vision algorithms also require something specifying the parameters of the camera, such as the fundamental matrix for stereo imaging?

As humans, when we look at a scene, then move a few feet and look at it again, we have a pretty good idea what the delta between the two views were, so why is providing the same info here any different?

link

boxy310 2970 days ago

I would add that humans also integrate gyroscopic & acceleration information from the inner ear to understand relative balance. Multiple sources of sensor data is a net benefit, not a drawback.

link