Hacker News new | ask | show | jobs
by _laiq 1116 days ago
Incredible how people are underestimating ChatGPT or overestimating open-source models.

A basic question: How can i join with SQL a column to a string separated with comma

GPT-4: PostgreSQL: SELECT STRING_AGG(columnName, ', ') FROM tableName;

Guanaco: Here is an example of how you could use the CONCAT function in MySQL to concatenate a string to a column value in a single-line query: SELECT CONCAT('The total price for ', product_name, ’ is ', SUM(price)) AS total FROM products; This will result in output like “The total price for Chocolate Bar is 10” or similar depending on your data.

Yeah, no...

11 comments

> How can i join with SQL a column to a string separated with comma

I think I might give this one to Guanaco. I’m moderately familiar with SQL, and I can’t really understand your question. You’re faking the reader out by using “join” to mean something which is not JOIN.

So do you mean to concatenate a string in a column (coming from a single row) to a “string separated with a comma”? If so (and assuming “separated” means starting or ending with), then Guanaco nailed it. If you meant to join (in the Python sense) the values from a given column in multiple rows, then GPT-4 is doing better.

But GPT-4 gave no explanation, and my general experience with it is that it’s happy to write code that does something vaguely related to the prompt. As the prompt gets more complex or unusual, the degree to which the code doesn’t actually do something useful increases. And I have had quite poor results getting ChatGPT to generate code along with some explanation such that the code actually matches the explanation.

> You’re faking the reader out by using “join” to mean something which is not JOIN.

Yeah and not having to do precise prompt engineering is a major advantage of GPT-4 vs other models.

I don’t think that “prompt engineering” is quite the same thing as clearly specifying the problem. If I’m using a magic AI code-writing companion or asking an actual human to write code for me, I need to define the problem well enough if I want any expectation of getting _correct_ code.

A skilled human will distinguish themselves from a less skilled human by also thinking about the overall problem and helping make sure the specification is correct. This far, I have not seen an AI meet that bar unless the problem is something well-covered already on the web.

I have seen ChatGPT (3.5 and 4) output code that is syntactically correct, executes without errors, and does the wrong things in ways that are not entirely obvious to me until I read the docs myself.

> helping make sure the specification is correct. This far, I have not seen an AI meet that bar

I've only used Bard so far, but indeed, it has never replied to my question with a clarifying question. Only with a statement. This seems like an obvious evolution of the chatbot concept...

To be fair, I had to read the postgresql docs and interpret the answer to figure out what the _laiq was actually asking.

edit: I guess GPT-4 has better reading comprension than me!

I wish there was a model out there that asked me "what do you mean by 'join'" but I guess we're somewhat far from there :)
I think you’ve just quantified the difference between a text generator (LLM) and AGI.
I have a hard time understanding the question as you phrased it. Tested with guanaco-33B and a cleaner question, looks like a much better response:

> How can I join a column with SQL in Postgres to a string separated with a comma?

You can use the `string_agg` function in PostgreSQL to concatenate columns into a string. For example, to concatenate all values from a column named "column1" into a single string separated by commas, you can run the following query:

```sql

SELECT string_agg(column1, ',') FROM table;

```

The `string_agg` function takes two arguments: the first argument is the expression to concatenate (in this case column1), and the second argument is the separator between each value (in this case a comma). The query above will return a single string containing all values from column1 separated by commas.

I generally find WizardLM 1.0, Guanaco and Manticore to be good enough for most GPT cases.

> join with SQL a column to a string

As a human, I wouldn't know what to give you for this either. Not surprised the models give unexpected results. "Guanaco" gives something more in line with what I understand you to be asking for, but clearly you're asking for something else that I don't understand.

GPT4All-J-v1.3 Groovy [1] gave me the following answer (no idea if this is good or not, but keep in mind that the model comes in a 3.8Gb file and is released under an Apache 2 license, freely available for use and distribution):

    To join a column with SQL in Postgres to a string separated by a comma, you can use the STRING_AGG function. Here's an example query that demonstrates this:
    SELECT STRING_AGG(column_name, ', ') WITHIN GROUP (ORDER BY id) AS joined_string FROM table;
    In this query, we're selecting the joined_string column from a table called table. The STRING_AGG function is used to concatenate all values in the column_name column into a single string separated by commas. We use the WITHIN GROUP (ORDER BY id) clause to specify that we want to group the results by an integer value called id. This ensures that each row of data is treated as a separate entity, even if it has the same column_name.
    The resulting output will be a single column with all values from the joined_string column concatenated into a string separated by commas. You can then use this query in your application to join multiple columns together and store them as a single string value.
[1] https://gpt4all.io/reports/GPT4All_Technical_Report_3.pdf
Your question is a bit confusing and ambiguous. I think there are multiple valid answers. Here's a local test of a less ambiguous programming question with "Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin" on llama.cpp on an M1 Max laptop with 64GiB of RAM.

This model seems roughly on par with GPT-3, maybe GPT-3.5 in some cases.

> Write a function to compute the maximum, minimum, and average of an array of doubles in C.

Here is an example function in C that computes the maximum, minimum, and average of an array of doubles:

```c void compute_statistics(double arr[], int n) { double max = arr[0]; double min = arr[0]; double sum = 0.0; for (int i = 1; i < n; i++) { if (arr[i] > max) { max = arr[i]; } else if (arr[i] < min) { min = arr[i]; } sum += arr[i]; } double avg = sum / n; printf("Maximum: %.2f", max); printf("Minimum: %.2f", min); printf("Average: %.2f", avg); } ```

To use this function, simply call it with the array of doubles and its length as arguments. For example:

```c double arr[] = {1.0, 2.5, 3.0, 4.0, 5.0}; int n = sizeof(arr) / sizeof(arr[0]); compute_statistics(arr, n); ```

This will output:

``` Maximum: 5.00 Minimum: 1.00 Average: 3.00 ```

Here are some simple logic questions:

> Compute the sum of an array of 512 elements with each element having the value 2.

The sum of an array of 512 elements with each element having the value 2 is 1024.

> Compute the sum of an array of 512 elements with each element having the value 3.

The sum of an array of 512 elements with each element having the value 3 is 1536.

Not bad.

My test for open models is surprisingly simple.

I just ask "What is the capital of France?" and I haven't had a correct answer yet in any model I tried. They often get Paris right at least, but most other details are wrong.

Guanaco says:

> The current capital of France is Paris. It has been so since 1982 when it replaced the previous one which was Vichy.

What model size? 7B? 13B? 33B? When people say "Guanaco" or "Alpaca" or "Manticore" they are referring to a whole family of models. The really small ones like 7B often are very weird.
I just got "The capital of France is Paris." from vicuna-v1-7B running entirely on my iPhone (using the MLC Chat app).
>They often get Paris right at least, but most other details are wrong.
What other detail in that statement is wrong?
Guanaco 7B gives the same exact answer.
I also prefer the first but your point would be stronger if the example from Guanaco was actually wrong or internally inconsistent
One of the issues that i've observed with OpenAI, is that things the model got wrong a week ago it suddenly starts getting right but other examples that are not exactly the same question but very similar or another example of the same thing it gets wrong, then a few weeks passes and it gets this right.

My conclusion is that they are using the data from users to improve or some other prompt injection tricks.

So the comparision is not even fair.

What's the result for: How can I aggregate with SQL a column to a string separated with comma ?
> GPT-4: PostgreSQL: SELECT STRING_AGG(columnName, ', ') FROM tableName;

Well, you asked for sql and ansi sql function is listagg(), so technically this is not the right answer - and definitely a completely useless one for mysql or oracle users.

"Thanks ChatGPT. Could you please write that for MySQL and additionally for Oracle?"

MySQL: SELECT GROUP_CONCAT(columnName SEPARATOR ', ') FROM tableName;

Oracle: SELECT LISTAGG(columnName, ', ') WITHIN GROUP (ORDER BY columnName) FROM tableName;

I mean, it's not a big deal.. you just give have to give it a little nudge.

I have chatGPT with GPT-4 model. It never gives such concise answers. Is it because you used the API?
The default is rather wordy. You could instruct chatGPT4 to be less wordy or in the API put instructions to be less wordy in the system prompt or limit answers to a sentence.

I would assume the poster just edited this though.