I'm maintaining a vast database where each table is a document storage like this:
id uuid
created_at timestamptz
updated_at timestamptz
document jsonb compression=lz4
Some documents are small (several kilobytes), and some are really huge (take 40K of lines when pretty-printed). Queries are mostly get-by-id, but we also have a lot of queries filtering documents with JSON Path expressions (e.g. where document @@ '$.path.to.status == "active" ').
The following idea came into my mind recently. By default, the jsonb type uses the EXTERNAL storage meaning it allows both compression and TOAST. But what if prevent TOAST-ing? A compressed document will be stored in the primary table, and its reading and writing will be faster.
I know it leads to bloated records and more intensive reading when scanning pages. But I made an experiment. I prepared two tables with id and doc fields (uuid and jsonb). The first table is standard, and the second one relies on custom settings:
create table app1 (
id uuid primary key,
doc jsonb compression lz4 not null,
created_at timestamptz not null default current_timestamp,
updated_at timestamptz
);
create table app2 (
id uuid primary key,
doc jsonb compression lz4 not null,
created_at timestamptz not null default current_timestamp,
updated_at timestamptz
);
alter table app2 alter column doc set storage main;
alter table app2 set (toast_tuple_target = 8000);
I inserted 1M of huge random JSON values into each table. Then I used the following code to measure reading:
DO $$
DECLARE
app jsonb;
BEGIN
FOR i IN 1..1000000 BY 1 LOOP
select doc into app from app1/app2 where id = gen_uuid(i);
END LOOP;
END $$;
For the table app1, it took 9 seconds (9719.504 ms (00:09.720)). But for the table app2 which doesn't use TOAST, it was 4 seconds (4116.775 ms (00:04.117)). Double performance boost! I also did some other queries like select ... where with custom json filtering, and the second table gave better results.
Now I'm thinking what could I overlook? Is it worth shipping these changes to production? How can I test it better? Does anyone have experience with such case?
Here is the link which gave me the idea (the author is experimenting with a custom page size): https://dev.to/franckpachot/postgresql-jsonb-size-limits-to-prevent-toast-slicing-9e8
Thanks! (that's my first question on DBA stack, please let me know if anything needs correction)