| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nprescott 1178 days ago

This reminded me of a prior discussion[0] on bulk data generation in SQLite with Rust (vs Python vs PyPy) which previously led me to trying out two different techniques using just SQLite[1]. The approach here is so similar I tried my prior solution on the slowest VPS I have access to (1 vCPU core 2.4GHz, 512Mi, $2.50/month from vultr):

    sqlite> create table users (id blob primary key not null, created_at text not null, username text not null);
    sqlite> create unique index idx_users_on_id on users(id);
    sqlite> pragma journal_mode=wal;
    sqlite> .load '/tmp/uuid.c.so'
    sqlite> .timer on
    sqlite> insert into users(id, created_at, username)
            select uuid(), strftime('%Y-%m-%dT%H:%M:%fZ'), 'hello'
            from generate_series limit 100000;
    Run Time: real 1.159 user 0.572631 sys 0.442133

where the UUID extension comes from the SQLite authors[2] and generate_series is compiled into the SQLite CLI. It is possible further pragma-tweaking might eke out further performance but I feel like this representative of the no-optimization scenario I typically find myself in.

In the interest of finding where the bulk of the time is spent and on a hunch I tried swapping the UUID for plain auto-incrementing primary keys as well:

    sqlite> insert into users(created_at, username) select strftime('%Y-%m-%dT%H:%M:%fZ'), 'hello' from generate_series limit 100000;
    Run Time: real 0.142 user 0.068090 sys 0.025507

Clearly UUIDs are not free!

[0]: https://news.ycombinator.com/item?id=27872575

[1]: https://idle.nprescott.com/2021/bulk-data-generation-in-sqli...

[2]: https://sqlite.org/src/file/ext/misc/uuid.c