|
Sure. Suppose that we have a trivial key-value table mapping integer keys to arbitrary jsonb values: example=> CREATE TABLE tab(k int PRIMARY KEY, data jsonb NOT NULL);
CREATE TABLE
We can fill this with heterogeneous values: example=> INSERT INTO tab(k, data) SELECT i, format('{"mod":%s, "v%s":true}', i % 1000, i)::jsonb FROM generate_series(1,10000) q(i);
INSERT 0 10000
example=> INSERT INTO tab(k, data) SELECT i, '{"different":"abc"}'::jsonb FROM generate_series(10001,20000) q(i);
INSERT 0 10000
Now, keys in the range 1–10000 correspond to values with a JSON key "mod". We can create an index on that property of the JSON object: example=> CREATE INDEX idx ON tab((data->'mod'));
CREATE INDEX
Then, we can query over it: example=> SELECT k, data FROM tab WHERE data->'mod' = '7';
k | data
------+---------------------------
7 | {"v7": true, "mod": 7}
1007 | {"mod": 7, "v1007": true}
2007 | {"mod": 7, "v2007": true}
3007 | {"mod": 7, "v3007": true}
4007 | {"mod": 7, "v4007": true}
5007 | {"mod": 7, "v5007": true}
6007 | {"mod": 7, "v6007": true}
7007 | {"mod": 7, "v7007": true}
8007 | {"mod": 7, "v8007": true}
9007 | {"mod": 7, "v9007": true}
(10 rows)
And we can check that the query is indexed, and only ever reads 10 rows: example=> EXPLAIN ANALYZE SELECT k, data FROM tab WHERE data->'mod' = '7';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tab (cost=5.06..157.71 rows=100 width=40) (actual time=0.035..0.052 rows=10 loops=1)
Recheck Cond: ((data -> 'mod'::text) = '7'::jsonb)
Heap Blocks: exact=10
-> Bitmap Index Scan on idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.026..0.027 rows=10 loops=1)
Index Cond: ((data -> 'mod'::text) = '7'::jsonb)
Planning Time: 0.086 ms
Execution Time: 0.078 ms
If we did not have an index, the query would be slower: example=> DROP INDEX idx;
DROP INDEX
example=> EXPLAIN ANALYZE SELECT k, data FROM tab WHERE data->'mod' = '7';
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on tab (cost=0.00..467.00 rows=100 width=34) (actual time=0.019..9.968 rows=10 loops=1)
Filter: ((data -> 'mod'::text) = '7'::jsonb)
Rows Removed by Filter: 19990
Planning Time: 0.157 ms
Execution Time: 9.989 ms
Hence, "arbitrary indices on derived functions of your JSONB data". So the query is fast, and there's no problem with the JSON shapes of `data` being different for different rows.See docs for expression indices: https://www.postgresql.org/docs/16/indexes-expressional.html |