Hacker News new | ask | show | jobs
by Liveanimalcams 2620 days ago
I'm not a professional but I built a pipeline for Makers Part List - It involves ingesting a video URL, converting the video into images, then storing the images in google storage. Once stored I trigger the model to classify the image. The images are then displayed to annotators who verify/relabel the images. Once I get enough new images the system creates a .csv and uploads it to googles autoML where it retrains my model.

My bottlenecks now are splitting the videos into images as its a very CPU intensive process. Implementing a queue here is my best choice I think.

1 comments

Interesting. Could you maybe expand on the tools that you utilize inside the pipeline for ETL, model creation, annotation and testing?
I'm running the front and backend of the consumer site on Heroku. The meat of pipeline is hosted on a DigitalOcean High CPU Droplet. I use ffmpeg to extract images from the provided videos. I store everything in Google Cloud Storage and create references to each photo in Firestore. I use Firebase to power for the image verifying/labeling app I built. Its a simple app that presents the viewer with the image and the label that it was given. If its not correct they enter the correct label. I use a cloud function to move the images into an exportable format for autoML once a new image threshold has been hit. Testing is me using it and seeing if it is correctly identifying the objects.