Nope, that's - as far as I can see - just a sequence of increasing integers. Both K and J examples give an array of random integers in 0-1000000 range.
For reference, in J such sequence of integers can be generated with:
1+i.1000000
1 2 3 4 5 6 7 8 9 10 ....
or:
1+i.1000000 1
1
2
3
4
5
6
7
8
9
10
....
To explain the previous examples (and let's use smaller integer for less typing...), the `$` verb is called "shape"/"reshape", and it takes a (list of) values on the right side, and a list of dimensions on the left:
5 2 $ 10
10 10
10 10
10 10
10 10
10 10
if there's not enough values on the right, they are cycled:
5 2 $ 10 11 12
10 11
12 10
11 12
10 11
12 10
which degenerates to repetition if there's only one value on the right. The `~` adjective (called "reflex") modifies a verb to its left in the following way:
V~ x NB. same as x V x
so `$~10` is the same as `10$10`, which is a list of ten tens. That list is passed to `?`, which is a verb called "roll", which gives a random integer in the 0-(y-1) range when written as `? y`. `y` here can be a scalar, or a list, in which case the roll is performed for each element of the list:
The dyadic case, ie. `x ? y` is called "deal", which selects `x` elements from `i. y` list at random, without repetitions. `?~ y`, then, effectively shuffles the `i. y` list:
?~10
6 9 7 3 5 1 8 0 4 2
"deal" can be used to shuffle any list, not only the `i. y` sequence, by using the shuffled list as indexes of another list (using `{` verb, called "from"):
...I know, I know, it is strange. But it's so interestingly mind-bending that I'd be really happy if I had a valid excuse to pour hundreds of hours into learning J properly. Sadly, I don't have anything like that, so I only spread the strangeness from time to time in comments, like I do right now :)
This can generally fall under the category of “Generate Test Data in SQL” with both generic techniques like Recursive CTEs, as in the OP, and SQL dialect or tool specific options. Search is your friend. The OP also provides a table lookup technique for readable names but doesn’t address other data types such as timestamps or other functions such as MD5(). Other data distribution techniques other than random may also be needed. This post scratches the surface of testing with generated data.
You can use a with-expression. It's true that you can't use `limit` to limit it, but you can use a `where` condition. This is equivalent to cribwi's example:
WITH RECURSIVE
t AS (
SELECT 0 id
UNION ALL
SELECT id + 1
FROM t
WHERE id < 1000000
)
SELECT id, random() FROM t;
If your DB doesn't support this technique, you might be able to use this rather disgusting technique, which builds an exponentially growing tower of rows by using nested queries with JOIN's. https://stackoverflow.com/a/61169467
You can make it neater still by making the sub-table with a CTE:
WITH Digits AS (SELECT n FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) AS d(n))
SELECT seq = units.n + tens.n*10 + hundreds.n*100 + thousands.n*1000
FROM Digits AS units
CROSS JOIN Digits AS tens
CROSS JOIN Digits AS hundreds
CROSS JOIN Digits AS thousands
WHERE units.n + tens.n*10 + hundreds.n*100 + thousands.n*1000 BETWEEN 1 AND 1337
or
WITH Digits AS (SELECT n FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) AS d(n))
, Thousands AS (SELECT seq = units.n + tens.n*10 + hundreds.n*100 + thousands.n*1000
FROM Digits AS units
CROSS JOIN Digits AS tens
CROSS JOIN Digits AS hundreds
CROSS JOIN Digits AS thousands
)
SELECT seq FROM thousands ORDER BY seq
WHERE units.n + tens.n*10 + hundreds.n*100 + thousands.n*1000 BETWEEN 1 AND 1337
though at that point you probably have the recursive option so you'd need to test to see which is more efficient. As a further hack you can combine this and the recursive CTE to extend the number of rows you can return when recursion limits would otherwise be a problem.
You can also use simple window functions to make a sequence from a system table that you know will have a few thousand rows:
SELECT n=ROW_NUMBER() OVER (ORDER BY object_id) FROM sys.all_columns
Or if you just need a few thousand rows to hang randomly generated data from you don't even need the function.
All a bit hacky compared to some DBs that have sequence support, but useful for generating test data or numbers tables where that isn't supported.
This looks convenient (and performant). But how does it scale as queries join across tables?
If you need to create test data with complex business logic, referential integrity and constraints we've been working on declarative data generator that is build exactly for this: https://github.com/openquery-io/synth.
If you need another repeatable way to create random data that you can export as SQL (or CSV/Excel files). You may find a tool we built and use at work useful: https://github.com/creditdatamw/zefaker
Needs a little Groovy but very convenient for generating random (or non-random) data.
i generally end up writing a data generator using the language and apis in the application. I often want control over various aspects of the data and built the generator as such. Quite often I just generate queries and then run them which works quickly.
While this looks like a good way to generate simple data, practical applications are more involved.
You want to run a transaction in production and you're unsure about the impact because it might take a lock and block other transactions for some time. In that case you may want to spin up a separate database with fake data and test it there first.
You could create a new database and restore a production backup instead of using fake data but that might not be allowed or require some kind of approval due to rules protecting the privacy of customers or employees.
Generating test data batches like this is a common use.
The other is for populating "numbers tables" which can be very useful in reporting contexts and/or dealing with certain gaps/islands type problems.
You can even use the techniques dynamically rather than a stored numbers table, if you need it in a DB that doesn't have one and you don't have schema permissions to create one, though this is generally less efficient.
usually there is a system table with a ton of metadata that will for sure contain few thousand rows, so generating a million rows for SQL Server is simply:
select top 1000000 ROW_NUMBER()
from sys.objects a, sys.objects b
or you can use INFORMATION_SCHEMA which is more portable across different RDBMS engines
Also one of the only ways to get sequences in joins in Redshift. Unfortunately, only Redshift master nodes support 'generate_series'. If your query contains join that are spread across multiple worker nodes, Redshift will report an error saying 'generate_series' no supported.
WITH RECURSIVE is supported in MariaDB 10.2+, MySQL 8.0+, PostgreSQL 8.4+ and SQLite 3.8+. Oracle 11.2+ and SQL Server 2005 support recursive queries, but without the RECURSIVE keyword.
Before recent versions, in the Postgres implementation CTEs were a wall to predicate push-down which can make them much less efficient than in other DBMSs. Worth noting if you need to support older versions for any reason.
[1] https://www.postgresql.org/docs/current/functions-srf.html