Hacker News new | ask | show | jobs
by cribwi 1915 days ago
If you're running PostgreSQL, you can use the built-in generate_series (1) function like:

  SELECT id, random()
  FROM generate_series(0, 1000000) g (id);
There seems to be an equivalent in SQLite: https://sqlite.org/series.html

[1] https://www.postgresql.org/docs/current/functions-srf.html

5 comments

Here's how to do something similar in Q (KDB)

    ([]x?x:1000000)
Which gives a table of a million rows like:

    x
    ------
    877095
    265141
    935540
    49015
    ...
Nothing to do with the article, SQL, or a DB, but I can't help wanting to add even just a bit when I see array languaes mentioned. So, in J it's just:

    ?~1000000
or, if you don't need elements to be unique:

    ?$~1000000
Though it's just an array, not a persistent table - I don't know much about Jd :(
An R vector would be `1:1000000`
Nope, that's - as far as I can see - just a sequence of increasing integers. Both K and J examples give an array of random integers in 0-1000000 range.

For reference, in J such sequence of integers can be generated with:

        1+i.1000000
    1 2 3 4 5 6 7 8 9 10 ....
or:

        1+i.1000000 1
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    ....
To explain the previous examples (and let's use smaller integer for less typing...), the `$` verb is called "shape"/"reshape", and it takes a (list of) values on the right side, and a list of dimensions on the left:

       5 2 $ 10
    10 10
    10 10
    10 10
    10 10
    10 10
if there's not enough values on the right, they are cycled:

       5 2 $ 10 11 12
    10 11
    12 10
    11 12
    10 11
    12 10
which degenerates to repetition if there's only one value on the right. The `~` adjective (called "reflex") modifies a verb to its left in the following way:

       V~ x NB. same as x V x
so `$~10` is the same as `10$10`, which is a list of ten tens. That list is passed to `?`, which is a verb called "roll", which gives a random integer in the 0-(y-1) range when written as `? y`. `y` here can be a scalar, or a list, in which case the roll is performed for each element of the list:

       $~ 10
    10 10 10 10 10 10 10 10 10 10
       ? $~ 10
    1 6 9 4 6 8 8 7 9 4
The dyadic case, ie. `x ? y` is called "deal", which selects `x` elements from `i. y` list at random, without repetitions. `?~ y`, then, effectively shuffles the `i. y` list:

       ?~10
    6 9 7 3 5 1 8 0 4 2
"deal" can be used to shuffle any list, not only the `i. y` sequence, by using the shuffled list as indexes of another list (using `{` verb, called "from"):

       2*i.10
    0 2 4 6 8 10 12 14 16 18
       (?~10){2*i.10
    14 10 18 6 2 12 8 0 16 4
...I know, I know, it is strange. But it's so interestingly mind-bending that I'd be really happy if I had a valid excuse to pour hundreds of hours into learning J properly. Sadly, I don't have anything like that, so I only spread the strangeness from time to time in comments, like I do right now :)

All the primitives (words) are described here: https://code.jsoftware.com/wiki/NuVoc

This can generally fall under the category of “Generate Test Data in SQL” with both generic techniques like Recursive CTEs, as in the OP, and SQL dialect or tool specific options. Search is your friend. The OP also provides a table lookup technique for readable names but doesn’t address other data types such as timestamps or other functions such as MD5(). Other data distribution techniques other than random may also be needed. This post scratches the surface of testing with generated data.
If you're running PostgreSQL you can also just

  xxd -ps -c 16 -l $bignum < /dev/urandom | (echo '\copy random_data from stdin'; cat) | psql
Depending on how you want your random keys formatted.
If you're importing from command line tools, you might as well use the specialized `shuf` tool:

  shuf -i 1-$bignum
or for random numbers with replacement

  shuf -i 1-$bignum -r -n $bignum
These give you a sample of random integers. `shuf` is part of GNU coreutils, so present in most Linux installs (but not on macOS).
It's worth installing GNU coreutils on macOs All the command names are prepended with g, so `gshuf`
Nice yeah that's a good way. Generally the number 1 million is so small I see no reason to do this in any manner other than shell commands.
you have to use it since PostgreSQL does not support limit within a with expression.
You can use a with-expression. It's true that you can't use `limit` to limit it, but you can use a `where` condition. This is equivalent to cribwi's example:

  WITH RECURSIVE
    t AS (
      SELECT 0 id
      UNION ALL
      SELECT id + 1
        FROM t
        WHERE id < 1000000
    )
    SELECT id, random() FROM t;
It returns 1000001 rows, but so does cribwi's.
use a sequence or uuid