4

I'm maintaining a vast database where each table is a document storage like this:

id         uuid
created_at timestamptz
updated_at timestamptz
document   jsonb compression=lz4

Some documents are small (several kilobytes), and some are really huge (take 40K of lines when pretty-printed). Queries are mostly get-by-id, but we also have a lot of queries filtering documents with JSON Path expressions (e.g. where document @@ '$.path.to.status == "active" ').

The following idea came into my mind recently. By default, the jsonb type uses the EXTERNAL storage meaning it allows both compression and TOAST. But what if prevent TOAST-ing? A compressed document will be stored in the primary table, and its reading and writing will be faster.

I know it leads to bloated records and more intensive reading when scanning pages. But I made an experiment. I prepared two tables with id and doc fields (uuid and jsonb). The first table is standard, and the second one relies on custom settings:

create table app1 (
    id uuid primary key,
    doc jsonb compression lz4 not null,
    created_at timestamptz not null default current_timestamp,
    updated_at timestamptz
);

create table app2 (
    id uuid primary key,
    doc jsonb compression lz4 not null,
    created_at timestamptz not null default current_timestamp,
    updated_at timestamptz
);

alter table app2 alter column doc set storage main;
alter table app2 set (toast_tuple_target = 8000);

I inserted 1M of huge random JSON values into each table. Then I used the following code to measure reading:

DO $$
DECLARE
    app jsonb;
BEGIN
    FOR i IN 1..1000000 BY 1 LOOP
        select doc into app from app1/app2 where id = gen_uuid(i);
    END LOOP;
END $$;

For the table app1, it took 9 seconds (9719.504 ms (00:09.720)). But for the table app2 which doesn't use TOAST, it was 4 seconds (4116.775 ms (00:04.117)). Double performance boost! I also did some other queries like select ... where with custom json filtering, and the second table gave better results.

Now I'm thinking what could I overlook? Is it worth shipping these changes to production? How can I test it better? Does anyone have experience with such case?

Here is the link which gave me the idea (the author is experimenting with a custom page size): https://dev.to/franckpachot/postgresql-jsonb-size-limits-to-prevent-toast-slicing-9e8

Thanks! (that's my first question on DBA stack, please let me know if anything needs correction)

New contributor
Ivan Grishaev is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

1 Answer 1

3

The main problem with your approach is that you will get an error if you are trying to store a row that won't fit into a single page even after TOAST compression. Therefore, I would recommend that you set the toast_tuple_target to 8000 to avoid compression if possible, but leave EXTENDED storage, so that really large values can get stored out of line rather than causing an error.

Your setting may perform better if CPU cost is the bottleneck for your workload. You didn't test with big amounts of data, so perhaps your working set fit into RAM. Things may look different if your working set exceeds RAM and disk I/O becomes the dominant cost. Then it might be advantageous to compress the data with the normal TOAST settings. Test well!

2
  • Thank you Laurenz for your reply! I'm a bit unsure about the assumption that storing a document which exceeds 8000 bytes compressed would cause an error. The TOAST docs describe the MAIN storage type as follows: "Actually, out-of-line storage will still be performed for such columns, but only as a last resort when there is no other way to make the row small enough to fit on a page". Commented 8 hours ago
  • I'm going to add more data into my test table (up to 50M) and test random access again. Commented 8 hours ago

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.