| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cribwi 1915 days ago

If you're running PostgreSQL, you can use the built-in generate_series (1) function like:

  SELECT id, random()
  FROM generate_series(0, 1000000) g (id);

There seems to be an equivalent in SQLite: https://sqlite.org/series.html

[1] https://www.postgresql.org/docs/current/functions-srf.html

5 comments

dintech 1915 days ago

Here's how to do something similar in Q (KDB)

    ([]x?x:1000000)

Which gives a table of a million rows like:

link

klibertp 1915 days ago

Nothing to do with the article, SQL, or a DB, but I can't help wanting to add even just a bit when I see array languaes mentioned. So, in J it's just:

    ?~1000000

or, if you don't need elements to be unique:

    ?$~1000000

Though it's just an array, not a persistent table - I don't know much about Jd :(

link

bigfudge 1915 days ago

An R vector would be `1:1000000`

link

klibertp 1915 days ago

Nope, that's - as far as I can see - just a sequence of increasing integers. Both K and J examples give an array of random integers in 0-1000000 range.

For reference, in J such sequence of integers can be generated with:

        1+i.1000000
    1 2 3 4 5 6 7 8 9 10 ....

or:

        1+i.1000000 1
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    ....

To explain the previous examples (and let's use smaller integer for less typing...), the `$` verb is called "shape"/"reshape", and it takes a (list of) values on the right side, and a list of dimensions on the left:

if there's not enough values on the right, they are cycled:

       5 2 $ 10 11 12
    10 11
    12 10
    11 12
    10 11
    12 10

which degenerates to repetition if there's only one value on the right. The `~` adjective (called "reflex") modifies a verb to its left in the following way:

       V~ x NB. same as x V x

so `$~10` is the same as `10$10`, which is a list of ten tens. That list is passed to `?`, which is a verb called "roll", which gives a random integer in the 0-(y-1) range when written as `? y`. `y` here can be a scalar, or a list, in which case the roll is performed for each element of the list:

       $~ 10
    10 10 10 10 10 10 10 10 10 10
       ? $~ 10
    1 6 9 4 6 8 8 7 9 4

The dyadic case, ie. `x ? y` is called "deal", which selects `x` elements from `i. y` list at random, without repetitions. `?~ y`, then, effectively shuffles the `i. y` list:

       ?~10
    6 9 7 3 5 1 8 0 4 2

"deal" can be used to shuffle any list, not only the `i. y` sequence, by using the shuffled list as indexes of another list (using `{` verb, called "from"):

       2*i.10
    0 2 4 6 8 10 12 14 16 18
       (?~10){2*i.10
    14 10 18 6 2 12 8 0 16 4

...I know, I know, it is strange. But it's so interestingly mind-bending that I'd be really happy if I had a valid excuse to pour hundreds of hours into learning J properly. Sadly, I don't have anything like that, so I only spread the strangeness from time to time in comments, like I do right now :)

All the primitives (words) are described here: https://code.jsoftware.com/wiki/NuVoc

link

sradman 1915 days ago

This can generally fall under the category of “Generate Test Data in SQL” with both generic techniques like Recursive CTEs, as in the OP, and SQL dialect or tool specific options. Search is your friend. The OP also provides a table lookup technique for readable names but doesn’t address other data types such as timestamps or other functions such as MD5(). Other data distribution techniques other than random may also be needed. This post scratches the surface of testing with generated data.

link

jeffbee 1915 days ago

If you're running PostgreSQL you can also just

  xxd -ps -c 16 -l $bignum < /dev/urandom | (echo '\copy random_data from stdin'; cat) | psql

Depending on how you want your random keys formatted.

link

em500 1915 days ago

If you're importing from command line tools, you might as well use the specialized `shuf` tool:

  shuf -i 1-$bignum

or for random numbers with replacement

  shuf -i 1-$bignum -r -n $bignum

These give you a sample of random integers. `shuf` is part of GNU coreutils, so present in most Linux installs (but not on macOS).

link

fouc 1915 days ago

It's worth installing GNU coreutils on macOs All the command names are prepended with g, so `gshuf`

link

jeffbee 1915 days ago

Nice yeah that's a good way. Generally the number 1 million is so small I see no reason to do this in any manner other than shell commands.

link

schoetbi 1915 days ago

you have to use it since PostgreSQL does not support limit within a with expression.

link

jolmg 1915 days ago

You can use a with-expression. It's true that you can't use `limit` to limit it, but you can use a `where` condition. This is equivalent to cribwi's example:

  WITH RECURSIVE
    t AS (
      SELECT 0 id
      UNION ALL
      SELECT id + 1
        FROM t
        WHERE id < 1000000
    )
    SELECT id, random() FROM t;

It returns 1000001 rows, but so does cribwi's.

link

deepstack 1915 days ago

use a sequence or uuid

link