Hacker News new | ask | show | jobs
by sebzim4500 856 days ago
How do you get Gemini Ultra to generate images? It just tells me that it can't do that yet.
4 comments

Try telling instead of asking. When I tried it the other day "Can you create a picture" gave the response "no try DALL-E instead". Then I noticed one of the example prompts was "Generate an image with an Elephant ...." It worked, as did some other random stuff I tried as long as I told it to do it not ask it to.

I just tried asking it again and asking seems to work now too.

Most European countries are excluded:

> Image generation in Gemini Apps is available in most countries, except in the European Economic Area (EEA), Switzerland, and the UK. It’s only available for English prompts.

(https://support.google.com/gemini/answer/14286560?hl=en)

"Can you draw a photo of an avocado-shaped chair with a pineapple-man sitting in it?"

Came up with 4 images.

You raise a good point, though... I've also asked it to use its "web search" capability for tasks and it says it doesn't have that capability, but when I ask it by implying it should do a web search, it goes ahead and does it. Weird!

Yes, Gemini and previously Bard has a lot of confusion about its own capabilities. I use it to translate Chinese text in aliexpress product listings by taking screenshots. It’s perfectly capable and quite helpful in translating the text from those screenshots, but I think depending on how you phrase the question while uploading the photo, it will sometimes say “I’m only a language model I can’t help with that” or even “I can’t help with images”. Once it says that, I think it poisons the chat history and I start a new session to try to get it to work. I’ve not translated many images but so far this error happens maybe 20% of the time. It’s very strange.

I have another issue which is that when I paste a C++ code file in to the web interface, I get an error from the web interface and Gemini never even sees the code. The web interface is refusing to accept my code file. I opened up AI studio instead of the normal Gemini window and that seems to work, but I’d rather just use the normal chat window.

It’s all statistics. In the training set, there were probably questions asking about its capabilities and it was trained to say it has less than it does. (Or it’s a bad system prompt)

There’s no internal understanding of itself or its capabilities.

There is no understanding to answer a question about its capabilities but the point is it has the capability but the prompt is failing to trigger it. This is separate from "knowing" or not. Think ChatGPT functions that don't work.
Knowing how these models are trained and how these chat systems are built, I wouldn’t expect the question

“Can you search the internet?”

To actually cause an internet search.

Generally you know information about yourself, and that quirk of humans is likely reflected in the QA training data and thus the model’s outputs.

I live in the UK and have the same problem - it doesn’t work here.