Edit: I'm pretty sure I missclicked to which comment I wanted to answer and now I can't find it... It said something along the ML approach being more developer-time efficient.
You can definitely do it in 20 minutes if you know how (which is also applicable to the ML version):
import cv2
import numpy as np
red_duck = cv2.imread("red_duck.png", cv2.IMREAD_GRAYSCALE)
# boilertplate etc, up to the point where you want to match your ducks on screen:
res = cv2.matchTemplate(img_screen, red_duck, cv2.TM_CCOEFF_NORMED)
positions = np.zeros_like(img_screen)
positions[res > 0.7] = 1 # we found a duck
# now do whatever you want with each position
# on real life images you may need to do some additional post-processing, on 8-bit rendered images you probably don't need to
You can definitely do it in 20 minutes if you know how (which is also applicable to the ML version):