Hacker News new | ask | show | jobs
by mksystem 218 days ago
Is it possible to prompt this model with two or more texts for each image and get masks for each? Something like this inputs = processor(images=images, text=["cat", "dog"], return_tensors="pt").to(device)?