Hacker News Clone

zamalekJan 20, 2026, 10:33 PM

> The index is 214 MB! That's almost half the size of the entire table. So the analysts are happy, but you? Not so much...

This is part of a broader choice: write amplification. You'd want to, of course, have the most precise index possible - but no matter how you cut it, you are incurring extra I/O for writes - one for the tuple, one per index. How you index things is heavily influenced by the mix of reads and writes, and this is why we have data warehouses/read replicas in the first place: it allows us to avoid write amplification in the write path, while having fast filtered reads (that are slightly delayed).

If you're dealing with <ridiculous number of users>, there is a good chance that you don't want to be putting BI/OLAP indices on your OLTP database. You probably don't have enough users to worry about this - but - if you ever find that your writes are becoming an issue this is something to consider.

SigmundAJan 20, 2026, 11:16 PM

Would be nice if PG supported clustered indexes (Index Organized Tables in Oracle speak) as an option if you have a table thats accessed mostly the same way you can get a index without the write amplification because the table is the index.

giovannibonettiJan 20, 2026, 11:27 PM

Clustered indexes only save up to 2x write amplification in the very rare case where you're indexing the entire table (e.g. if it has very few columns).

However, that is usually the least of your concerns with write amplification. If you don't batch your writes, you can easily get 100x write amplification. For any primary key or any other index not strongly correlated with your INSERTs, you can get perhaps another 100x write amplification even if you batch you writes.

SigmundAJan 20, 2026, 11:51 PM

>in the very rare case where you're indexing the entire table (e.g. if it has very few columns).

Not sure I follow most tables are accessed primarily in one way (primary key) while maybe sometimes in others for analysis. Having the PK written twice because it's almost always indexed is normally a waste and good candidate for a clustered index. So much so that many DB's like SQLite and MySql always do clustered indexes on primary key because their storage engine is built such that tables are a b-tree anyway vs PG that has separate b-tree indexes and heap tables. MSSQL and Oracle give you a choice whether the table is a index structure or a heap.

If you have very specific use case tables they can typically have a clustered index and no secondary indexes, you can still scan them for ad-hoc analysis but you get better insert performance and space usage because you aren't double writing to the heap and a PK index like you would in PG.

As far as batch writes that is a separate issue and has to due with whether that even makes sense for durability, if you need to commit a single random row due to something occurring you can't batch that up and maintain consistency, if your bulk loading data sure and is common practice to do commit batches there, clustered indexes could still be a 100 vs 200x write amplification if you have to insert both an index row and heap row vs just a single clustered index row.

branko_dJan 21, 2026, 7:28 AM

For inserts, you cannot escape writing into the base table and all indexes. However, my understanding is that for updates PostgreSQL has a write amplification problem due to the fact that each time a row is updated this creates a new row (to implement MVCC), and a new physical location in the heap, so all indexes need to be updated to point to the new location, even those not containing the updated columns.

OTOH, with a heap-less (aka. clustered, aka. index organized) table, you would only have to update the indexes containing the columns that are actually being updated. You don't need to touch any other index. Furthermore, only if you are updating a key column would you physically "move" the entry into a different part of the B-tree. If you update an included column (PK columns are automatically "included" in all secondary indexes, even if not explicitly mentioned in the index definition), you can do that in-place, without moving the entry.

Here is how this works in SQL Server - consider the following example:

    CREATE TABLE T (

        ID int,
        NAME nvarchar(255) NOT NULL,
        AMOUNT int NOT NULL,

        CONSTRAINT T_PK PRIMARY KEY (ID)

    );

    GO

    CREATE INDEX T_I1 ON T (NAME);

    GO

    CREATE INDEX T_I2 ON T (AMOUNT);

Now, doing this...

    UPDATE T SET AMOUNT = 42 WHERE ID = 100;

...will only write to T_PK and T_I2, but not T_I1. Furthermore T_PK's entry will not need to be moved to a different place in the B-tree. SQL Server uses row versioning similar to PostgreSQL, so it's conceivable that PostgreSQL could behave similarly to SQL Server if it supported clustered (index-organized) tables.

TostinoJan 21, 2026, 3:30 AM

Clustered indexes aren't just about write amplification. They also reduce the reads needed to get the data. Sometimes by quite a bit.

branko_dJan 21, 2026, 7:40 AM

That's true for seeks into the clustered (primary) index because that index includes all fields, so you don't need to "jump" to the heap to get them.

However, seeking into a secondary index, and then reading a column not included in that index incurs an additional index seek (into the clustered index), which may be somewhat slower than what would happen in a heap-based table.

So there are pros and cons, as usual...

SigmundAJan 21, 2026, 1:10 PM

I have found very minimal penalty on secondary index reads in practice such that it has never made a difference.

Remember some databases always use clustered index internally (SQLite, MySql) such that even if you have no primary key they will create a hidden one instead for use with the index.

https://www.sqlite.org/rowidtable.html

It is nice to have the choice which way to go and would be nice if PG implemented this. It can have significant space savings on narrow table with one primary index and performance advantages.

ComputerGuruJan 21, 2026, 12:22 AM

Another option would be a good way of placing indexes on a different physical disk. You could use fast, ephemeral storage like you can for a WAL without amplifying the writes to the same device that is your expensive bottleneck. You could rebuild on data loss.

But it would add complexity to detect out-of-sync indexes and tables.

SigmundAJan 21, 2026, 12:41 AM

You mean tablespaces: https://www.postgresql.org/docs/current/manage-ag-tablespace... ?

ComputerGuruJan 21, 2026, 12:50 AM

Maybe? I wasn’t under the impression these could be reliably lost or out of sync without risking data loss?

SigmundAJan 21, 2026, 1:13 AM

Wasn't aware you could put a WAL on a unreliable storage system either without risking data loss?

Would be interesting for indexes say put them on ram drive and rebuild them on restart if they aren't there just fallback to table scans.

MSSQL has memory optimized tables that do this sort of thing: https://learn.microsoft.com/en-us/sql/relational-databases/i...

ComputerGuruJan 21, 2026, 1:23 AM

If you lose the WAL you lose the data since the last merge but there’s no risk of corruption. The WAL handles missed syncs fine, too, missing losing just that window of data.

I don’t know if or how Postgres records the transaction number in the index to be able to notice if it’s out of date. If it does, I don’t know of any solution to “catch up” the index besides recreating it, which would be ok if that’s the only issue but from my experience with out-of-date indexes (libc or icu updates, where Postgres doesn’t know if anything IS broken and just reports that it could be), there’s no guarantee you’d even notice and your app could be running completely broken until you rebuild.

SigmundAJan 21, 2026, 1:48 AM

>If you lose the WAL you lose the data since the last merge but there’s no risk of corruption.

That is not my understanding:

https://www.postgresql.org/docs/current/app-pgresetwal.html

>After running this command on a data directory with corrupted WAL or a corrupted control file, it should be possible to start the server, but bear in mind that the database might contain inconsistent data due to partially-committed transactions. You should immediately dump your data, run initdb, and restore. After restore, check for inconsistencies and repair as needed.

ComputerGuruJan 21, 2026, 1:57 AM

I guess that depends on the definition of corrupted. It just describes what to do when the WAL is corrupt, but doesn’t say what conditions are considered corrupt. (We use ZFS so I haven’t run into torn writes.)

wreathJan 21, 2026, 7:41 AM

It does support index organized tables with the CLUSTER command, or you meant something else?

fuyJan 21, 2026, 8:22 AM

CLUSTER command is not the same as index organized tables, it's a one-time "physical sort" operation. New data is not organized until you run CLUSTER again. Index organized tables are maintained automatically by Oracle/SQL Server.

SigmundAJan 21, 2026, 1:00 PM

Not just maintained automatically, clustered indexes have no heap at all, the table is an index.

The CLUSTER command in PG just moves rows around in the heap so they match the still separate index order which can help a little bit with range operations because rows are close on disk, but otherwise doesn't do much.

So they are completely separate things that just happen to use the same term.

wreathJan 21, 2026, 10:33 AM

Interesting, i didnt know that. Thanks!

msdriggJan 20, 2026, 7:04 PM

The most interesting thing for me in this article was the mention of `MERGE` almost in passing at the end.

> I'm not a big fan of using the constraint names in SQL, so to overcome both limitations I'd use MERGE instead:

``` db=# MERGE INTO urls t USING (VALUES (1000004, 'https://hakibenita.com')) AS s(id, url) ON t.url = s.url WHEN MATCHED THEN UPDATE SET id = s.id WHEN NOT MATCHED THEN INSERT (id, url) VALUES (s.id, s.url); MERGE 1 ```

I use `insert ... on conflict do update ...` all the time to handle upserts, but it seems like merge may be more powerful and able to work in more scenarios. I hadn't heard of it before.

gshulegaardJan 20, 2026, 7:22 PM

IIRC `MERGE` has been part of SQL for a while, but Postgres opted against adding it for many years because it's syntax is inherently non-atomic within Postgres's MVCC model.

https://pganalyze.com/blog/5mins-postgres-15-merge-vs-insert...

This is somewhat a personal preference, but I would just use `INSERT ... ON CONFLICT` and design my data model around it as much as I can. If I absolutely need the more general features of `MERGE` and _can't_ design an alternative using `INSERT ... ON CONLFICT` then I would take a bit of extra time to ensure I handle `MERGE` edge cases (failures) gracefully.

kbolinoJan 20, 2026, 10:04 PM

It's kinda hard to handle MERGE failures gracefully. You generally expect the whole thing to succeed, and the syntax deceptively makes it seem like you can handle all the cases. But because of MVCC, you get these TOCTOU-style spurious constraint violations, yet there's no way to address them on a per-row basis, leading to the entire statement rolling back even for the rows that had no issues. If you are designing for concurrent OLTP workloads against the table, you should probably just avoid MERGE altogether. It's more useful for one-off manual fixups.

da_chickenJan 21, 2026, 5:18 AM

I'm not sure why you'd expect partial updates of a single statement in the first place. I mean, if I run `UPDATE Account SET Status = 'Closed' WHERE LastAccess < NOW() - INTERVAL '90 days';`, I'm not going to be happy if there's 50 records that match, the DB updates 30 successfully, and then error on 20. Atomic isn't just about rows. Do all the work or none of it.

If you're experiencing things that smell like TOCTOU, first you need to be sure you don't have oddball many-to-one issues going on (i.e., a cardinality violation error), and then you're going to have to increase your transaction isolation level to eliminate non-repeatable reads and phantom reads.

Like, the alternative to a MERGE is writing a few UPDATE statements followed by an INSERT and wrapping the entire batch in a transaction. And you should likely still wrap the whole thing in a transaction. If it breaks, you just replay the whole thing. Re-run the whole job.

kbolinoJan 21, 2026, 6:25 PM

I don't want partial updates, I want full, conflict-free upserts.

At read committed (default) isolation level, INSERT ... ON CONFLICT handles concurrent, conflicting inserts just fine, while MERGE ... WHEN NOT MATCHED (e.g.) does not. This is surprising behavior from the syntax alone, one would assume the two statements, when written with the same intent, would have the same behavior. Of course, this difference is documented, see the last paragraph of the Notes section on MERGE: https://www.postgresql.org/docs/18/sql-merge.html#id-1.9.3.1...

I don't know this for sure, but I believe that the effect of raising the transaction isolation level will just be to turn the constraint violation into a serialization error. That's not any easier to handle gracefully.

awesome_dudeJan 20, 2026, 10:04 PM

That reference - my initial gut feeling was that `MERGE` felt more readable, but then I read this paragraph

> If you want the generality of MERGE, you have to accept the fact that you might get unique constraint violations, when there are concurrent inserts, versus with INSERT ON CONFLICT, the way it's designed with its speculative insertions, guarantees that you either get an INSERT or an UPDATE and that is true even if there are concurrent inserts. You might want to choose INSERT ON CONFLICT if you need the guarantee.

Basically, `MERGE` is susceptible to a concurrent process also writing `INSERT` where that `INSERT` and `MERGE` are unaware of one another, causing a duplicate value to be used.

MarkusWinandJan 21, 2026, 11:05 AM

INSERT ... ON CONFLICT has fewer surprises in context of concurrency: https://modern-sql.com/caniuse/merge#illogical-errors

Besides portability, there is IMHO nothing against INSERT ... ON CONFLICT if it does what you need.

philjohnJan 20, 2026, 7:31 PM

If you're doing large batch inserts, I've found using the COPY INTO the fastest way, especially if you use the binary data format so there's no overhead on the postgres server side.

sirfzJan 20, 2026, 8:34 PM

That doesn't work well with conflicts tho iirc

nicwolffJan 23, 2026, 5:54 AM

COPY INTO a temp table and INSERT INTO... SELECT FROM... ON CONFLICT UPDATE...

sc68calJan 20, 2026, 5:28 PM

Great article, shows a lot of interesting PostgreSQL features. I have used PostgreSQL and MySQL for decades, and this article showed me that I have barely scratched the surface of what is possible.

booiJan 20, 2026, 9:08 PM

I've used Postgres for more than a decade and everytime I wade into the docs I feel the same way, I'm barely scratching the surface. It's so immensely powerful.

ciesJan 21, 2026, 12:21 AM

I love what LLMs are doing for me in PG's SQL. I discovered many features by having LLMs write them for me, often spot-on 100% on first prompt.

Since I know conceptually how RDBMSes work, I can ask vey specifically what I want. Also asking for feedback on schemas/queries really helped me. I use a lot more of PGs features now!

whalesaladJan 20, 2026, 9:31 PM

PostgreSQL is like Emacs. It's an operating system disguised as something else.

mistrial9Jan 21, 2026, 12:06 AM

the way PG was originally implemented does have some overlap with operating systems design IMHO.. PG internals define and use PG tables in internal schema to implement core architectural features. The PG code that bootstraps the PG environment is minimal in important ways.

jefuriiJan 23, 2026, 6:50 PM

I've heard Postgres described as a "database framework".

sgarlandJan 21, 2026, 12:47 AM

Not discussed in TFA: BRIN indices. For the given date example, they would be perfect - if you can guarantee that whatever you’re indexing is monotonic, they’re a godsend. Tiny and performant.

ComputerGuruJan 21, 2026, 1:15 AM

Minor correction: you don’t even have to guarantee the data is monotonic, it just performs best when it mostly is (good example is when handling incoming timestamps data on the server so messages/rexords are only generally monotonic but may be processed out of order).

How well do they work for UUIDv7? You’d probably have to tune (increase?) pages_per_range, but even though each index entry is 16 bytes you have to consider the btree index on the same is also similarly affected (or worse).

ComputerGuruJan 21, 2026, 12:47 AM

The no unique constraint using hash index thing is something I always run into. Am I missing something or is it just glue code missing to translate that syntax to a exclusion constraint with a hash index, allowing foreign keys to reference the column in the process, and possibly also fixing the ON CONFLICT approach in one go?

OptionOfTJan 20, 2026, 4:24 PM

Remarkably fresh content.

It's interesting how both virtual columns and hash indexes work, but feel like they're bolted on, vs being made part of the whole ecosystem so that they work seamlessly.

ComputerGuruJan 21, 2026, 12:58 AM

Virtual columns are basically one or two minor patches from being fully done. Pg18 brought us most of the way there.

Hash indices have long been crippled; they shipped almost unusable but every few years get a good QoL update. I think automatic unique constraints are the big thing left there.

pksunkaraJan 20, 2026, 5:01 PM

I think a stored generated column allows you to create an index on it directly. Isn't it better approach?

tczMUFlmoNkJan 20, 2026, 5:13 PM

The article explains why they want to avoid this option:

> Starting at version 14, PostgreSQL supports generated columns - these are columns that are automatically populated with an expression when we insert the row. Sounds exactly like what we need but there is a caveat - the result of the expression is materialized - this means additional storage, which is what we were trying to save in the first place!

pksunkaraJan 20, 2026, 5:22 PM

Thanks, missed that part. I would still be interested in knowing how much additional storage that adds, if the OP is interested in updating the article.

zenmacJan 20, 2026, 5:43 PM

>I think a stored generated column allows you to create an index on it directly. Isn't it better approach?

Is it also possible to create index (maybe partial index) on expressions?

masklinnJan 20, 2026, 8:32 PM

That's the first solution (a function based index), however it has the drawback of fragility: a seemingly innocent change to the query can lead to not matching the index's expression anymore). Which is why the article moves on to generated columns.

rpswJan 20, 2026, 5:11 PM

I assume it would increase the storage usage, which they say they are trying to avoid in that example.

adrian17Jan 21, 2026, 9:44 AM

Couldn't the first example be better done by having a Plan enum type? It'd both be a little more lightweight than text, and better at handling bad queries (mistyped filter resulting in error instead of empty result set).

MarazanJan 21, 2026, 1:55 PM

My favourite PostgresSQL optimization was running a SELECT on a multi-billion row table before running a DELETE so that the shared_memory_buffer would be filled with the rows that would need to be deleted.

Postgres makes DELETEs single threaded, this includes the selection part of the DELETE. By running a completely separate SELECT first Postgres would multithread the SELECT and populate the cache fast. Then the single thread DELETE can operate on in-memory data and not endlessly block loading data from disk.

SigmundAJan 20, 2026, 5:58 PM

>Currently, constraint exclusion is enabled by default only for cases that are often used to implement table partitioning via inheritance trees. Turning it on for all tables imposes extra planning overhead that is quite noticeable on simple queries, and most often will yield no benefit for simple queries.

PG's lack of plan caching strikes again, this sort of thing is not a concern in other DB's that reuse query plans.

singronJan 20, 2026, 7:01 PM

PG does reuse plans, but only if you prepare a query and run it more than 5 times on that connection. See plan_cache_mode[0] and the PREPARE docs it links to. This works great on simple queries that run all the time.

It sometimes really stinks on some queries since the generic plan can't "see" the parameter values anymore. E.g. if you have an index on (customer_id, item_id) and run a query where `customer_id = $1 AND item_id = ANY($2)` ($2 is an array parameter), the generic query plan doesn't know how many elements are in the array and can decide to do an elaborate plan like a bitmap index scan instead of a nested loop join. I've seen the generic plan flip-flop in a situation like this and have a >100x load difference.

The plan cache is also per-connection, so you still have to plan a query multiple times. This is another reason why consolidating connections in PG is important.

0: https://www.postgresql.org/docs/current/runtime-config-query...

SigmundAJan 20, 2026, 9:09 PM

Yes manual query preparation by client [1] is what you did in MSSQL server up until v7.0 I believe, which was 1998 when it started doing automatic caching based on statement text. I believe it also cached stored procedures before v7.0 which is one reason they were recommended for all application code access to the database back then.

MSSQL server also does parameter sniffing now days and can have multiple plans based on the parameters values it also has a hint to guide or disable sniffing because many times a generic plan is actually better, again something else PG doesn't have, HINTS [2].

PG being process based per connection instead of thread based makes it much more difficult to share plans between connections and it also has no plan serialization ability. Where MSSQL can save plans to xml and they can be loaded on other servers and "frozen" to use that plan if desired, they can also be loaded into plan inspection tools that way as well [3].

1. https://learn.microsoft.com/en-us/sql/relational-databases/n...

2. https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-tr...

3. https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-tr...

mxeyJan 21, 2026, 8:35 AM

PostgreSQL shares other caches between processes so they probably could have a global plan cache if they wanted. I wonder why they don’t though.

One possible reason is that the planner configuration can be different per connection, so the plans might not transfer

SigmundAJan 21, 2026, 1:22 PM

In MSSQL Server part of the plan match is the various session/connection options, if they are different there are different plans cached.

I believe the plan data structure PG is intimately tied to process space memory addresses since it was never thought to share between them and can even contain executable code that was generated.

This makes it difficult to share between processes without a heavy redesign but would be a good change IMO.

anarazelJan 21, 2026, 1:14 PM

> PostgreSQL shares other caches between processes so they probably could have a global plan cache if they wanted. I wonder why they don’t though.

> One possible reason is that the planner configuration can be different per connection, so the plans might not transfer

That's part of it, another big part is that the transactional DDL makes it more complicated, as different sessions might require different plans.

danielheathJan 20, 2026, 9:57 PM

The hash technique for uniqueness isn’t supported for indexes because it doesn’t handle hash collisions. The authors proposed solution suffers the same problem- values which do not already exist in the table will sometimes be rejected because they have the same hash as something that was already saved.

DiggseyJan 20, 2026, 11:06 PM

This is completely untrue. While the index only stores the hashes, the table itself stores the full value and postgres requires both the hash and the full value to match before rejecting the new row. Ie. Duplicate hashes are fine.

tizzyJan 22, 2026, 12:55 PM

That's super interesting and I am convinced by the dbfiddle but is not very intuitive or well documented? https://www.postgresql.org/docs/current/hash-index.html

mxeyJan 21, 2026, 8:33 AM

This is very good to know because it means this exclusion constraint workaround is a better approach over using a SQL hash function and a btree if you want to enforce uniqueness on values too long for a btree index.

gmcabritaJan 21, 2026, 11:25 AM

Your comment is 100% not true: https://dbfiddle.uk/Iu-u886S

pphyschJan 20, 2026, 8:23 PM

Is the Hash Index method strictly superior to creating a unique "hash" column and precomputing the hash in the application or query?

ComputerGuruJan 21, 2026, 12:54 AM

Sibling comment mentions less space but also your alternative doesn’t (naively) handle supporting values that differ but hash to the same value.

Of course the hash index also outperforms a unique (btree) index on top of separately calculating the hash, in addition to the storage overhead, row bloat, lack of guarantees regarding the hash unless you expose it to Postgres as a user-defined function AND add a check constraint.

sirfzJan 20, 2026, 10:38 PM

It'll use less storage space

vjay15Jan 23, 2026, 12:31 PM

Actually interesting read thanks haki!

themafiaJan 20, 2026, 7:24 PM

I moved into the cloud a few years ago and so I don't get to play with fixed server infrastructure like pgsql as much anymore.

Is the syntax highlighting built into pgsql now or is that some other wrapper that provides that? (it looks really nice).

tuetuopayJan 20, 2026, 8:00 PM

I generally use pgcli to that end. Works well, has a few niceties like clearer transaction state, better reconnect, syntax highlighting, and better autocomplete that works in many more cases than plain psql (it can even autocomplete on clauses when foreign key relations are defined!).

My only gripe with it is its insistence on adding a space after a line break when the query is too long, making copy/paste a pain for long queries.

hans_castorpJan 21, 2026, 2:17 PM

No relational database implements syntax highlighting. The tools you use to write and execute SQL queries implement that.

folliJan 20, 2026, 7:27 PM

You can use an IDE like IntelliJ and you get syntax highlighting, code completion etc.

lasgaweJan 20, 2026, 6:03 PM

some points from this article that I didn't know before.

Unconventional PostgreSQL Optimizations

Comments