PostgreSQL Storage Internals
Understanding how PostgreSQL stores data helps you make informed decisions about schema design and capacity planning. Each row in a PostgreSQL table has a fixed overhead of 23 bytes for the HeapTupleHeaderData structure, which contains transaction visibility information (xmin, xmax, cmin, cmax), an infomask with status flags, and a pointer offset to the row data. This overhead exists regardless of how small your actual data is, which means tables with many narrow rows have a higher overhead ratio than tables with fewer, wider rows.
Beyond the row header, PostgreSQL stores column data according to each type's binary format. Fixed-size types like integer (4 bytes), bigint (8 bytes), and boolean (1 byte) always use the same amount of space. Variable-length types like varchar and text use a 1-byte or 4-byte length prefix followed by the actual data. PostgreSQL also applies alignment padding between columns to ensure proper memory alignment, typically rounding up to 4-byte or 8-byte boundaries.
Index Storage Considerations
This calculator adds 30% to the raw table size to estimate index overhead, which is a reasonable starting point for typical tables. In practice, index sizes vary widely depending on the number and type of indexes. A standard B-tree index on a bigint primary key adds approximately 8 bytes per row plus B-tree node overhead. A composite index on multiple columns stores all indexed column values plus the row pointer. GIN indexes for full-text search or JSONB containment queries can be several times larger than the source data.
TOAST and Large Values
PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique) to handle values larger than approximately 2KB. When a row exceeds the page size limit of 8KB, PostgreSQL compresses large column values and may store them in a separate TOAST table. This means a text column with an average of 200 bytes will be stored inline, but a text column with an average of 5KB will be compressed and partially stored out-of-line. This calculator uses the average bytes you specify, which works well for inline storage but may overestimate for very large text values that TOAST would compress.
Capacity Planning Best Practices
For production capacity planning, take the estimate from this tool and add a safety buffer. Account for dead rows that accumulate between vacuum cycles (typically 10-20% of table size). Consider WAL (Write-Ahead Log) storage, which is separate from table data. Plan for temporary space needed during operations like CREATE INDEX CONCURRENTLY, VACUUM FULL, and pg_dump. A common rule of thumb is to provision 2-3x the estimated data size for total database storage, ensuring room for indexes, WAL, temporary files, and growth.
UUID vs Integer Primary Keys
The choice between UUID (16 bytes) and bigint (8 bytes) primary keys affects storage at scale. With 100 million rows, the primary key column alone uses 1.6 GB with UUIDs versus 0.8 GB with bigint. Each index that references the primary key also pays this cost. However, UUIDs enable distributed ID generation without coordination, which is essential for microservices. Consider using ordered UUIDs (UUIDv7) to maintain B-tree locality while keeping the distributed generation benefit.
Frequently Asked Questions
How does PostgreSQL store rows internally?
Each row has a 23-byte header for transaction visibility, followed by column data with alignment padding. Fixed types use constant space; variable types use a length prefix plus the actual data.
Why add 30% for index overhead?
B-tree indexes on primary keys and foreign keys typically add 20-40% to the raw table size. This 30% estimate accounts for a primary key index and one or two secondary indexes.
How much space does a UUID column use?
PostgreSQL stores UUIDs as 16 bytes in compact binary format. This is twice the size of a bigint (8 bytes) and adds up at scale across the column and all referencing indexes.
What is the difference between varchar and text storage?
In PostgreSQL, varchar and text have identical storage mechanics. Both use TOAST for values over 2KB. The only difference is that varchar(n) enforces a maximum length. For estimation, the average actual data length is what matters.
How accurate is this estimation?
This provides a reasonable estimate for planning. Actual sizes vary due to alignment padding, TOAST compression, fill factor, dead rows, and WAL. Add a 20-50% buffer for production planning.
Save your results & get weekly tips
Get calculator tips, formula guides, and financial insights delivered weekly. Join 10,000+ readers.
No spam. Unsubscribe anytime.