Database Size Estimator

Q: How does PostgreSQL store rows internally?

Each PostgreSQL row has a 23-byte header called HeapTupleHeaderData that stores transaction visibility information (xmin, xmax), command ID, and a pointer to the row data. After this header, the actual column data is stored. PostgreSQL also adds alignment padding between columns. The total row size is the sum of column data plus the 23-byte overhead plus any alignment padding.

Q: Why add 30% for index overhead?

B-tree indexes, which are the default in PostgreSQL, typically add 2-3x the size of the indexed columns. When you account for primary keys, foreign keys, and common query indexes, a 30% overhead estimate on top of raw table data is a reasonable starting point. Heavily indexed tables (like those with full-text search or GIN indexes) can have index sizes exceeding the table itself.

Q: How much space does a UUID column use?

A UUID in PostgreSQL uses exactly 16 bytes of storage. This is the binary representation of the 128-bit UUID value. While UUIDs appear as 36-character strings (like 550e8400-e29b-41d4-a716-446655440000), PostgreSQL stores them in a compact binary format. Compared to bigint (8 bytes), UUIDs use twice the storage per row, which adds up significantly at scale.

Q: What is the difference between varchar and text storage?

In PostgreSQL, varchar(n) and text have nearly identical storage characteristics. Both use TOAST (The Oversized-Attribute Storage Technique) for values larger than about 2KB, compressing and potentially storing them out-of-line. The only difference is that varchar(n) enforces a maximum length constraint. For size estimation, the average length of actual data stored matters more than the column type.

Q: How accurate is this estimation?

This tool provides a reasonable estimate for capacity planning. Actual PostgreSQL storage varies due to alignment padding (typically 8-byte alignment), TOAST compression for large text values, page fill factor (default 90%), dead rows from updates/deletes before vacuum, and WAL overhead. For production planning, add a 20-50% buffer above the estimate to account for these factors.

Estimate PostgreSQL table storage requirements based on column definitions and row count. Includes row overhead and index size estimates for capacity planning.

PostgreSQL Storage Internals

Understanding how PostgreSQL stores data helps you make informed decisions about schema design and capacity planning. Each row in a PostgreSQL table has a fixed overhead of 23 bytes for the HeapTupleHeaderData structure, which contains transaction visibility information (xmin, xmax, cmin, cmax), an infomask with status flags, and a pointer offset to the row data. This overhead exists regardless of how small your actual data is, which means tables with many narrow rows have a higher overhead ratio than tables with fewer, wider rows.

Beyond the row header, PostgreSQL stores column data according to each type's binary format. Fixed-size types like integer (4 bytes), bigint (8 bytes), and boolean (1 byte) always use the same amount of space. Variable-length types like varchar and text use a 1-byte or 4-byte length prefix followed by the actual data. PostgreSQL also applies alignment padding between columns to ensure proper memory alignment, typically rounding up to 4-byte or 8-byte boundaries.

Index Storage Considerations

This calculator adds 30% to the raw table size to estimate index overhead, which is a reasonable starting point for typical tables. In practice, index sizes vary widely depending on the number and type of indexes. A standard B-tree index on a bigint primary key adds approximately 8 bytes per row plus B-tree node overhead. A composite index on multiple columns stores all indexed column values plus the row pointer. GIN indexes for full-text search or JSONB containment queries can be several times larger than the source data.

TOAST and Large Values

PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique) to handle values larger than approximately 2KB. When a row exceeds the page size limit of 8KB, PostgreSQL compresses large column values and may store them in a separate TOAST table. This means a text column with an average of 200 bytes will be stored inline, but a text column with an average of 5KB will be compressed and partially stored out-of-line. This calculator uses the average bytes you specify, which works well for inline storage but may overestimate for very large text values that TOAST would compress.

Capacity Planning Best Practices

For production capacity planning, take the estimate from this tool and add a safety buffer. Account for dead rows that accumulate between vacuum cycles (typically 10-20% of table size). Consider WAL (Write-Ahead Log) storage, which is separate from table data. Plan for temporary space needed during operations like CREATE INDEX CONCURRENTLY, VACUUM FULL, and pg_dump. A common rule of thumb is to provision 2-3x the estimated data size for total database storage, ensuring room for indexes, WAL, temporary files, and growth.

UUID vs Integer Primary Keys

The choice between UUID (16 bytes) and bigint (8 bytes) primary keys affects storage at scale. With 100 million rows, the primary key column alone uses 1.6 GB with UUIDs versus 0.8 GB with bigint. Each index that references the primary key also pays this cost. However, UUIDs enable distributed ID generation without coordination, which is essential for microservices. Consider using ordered UUIDs (UUIDv7) to maintain B-tree locality while keeping the distributed generation benefit.

Frequently Asked Questions

How does PostgreSQL store rows internally?

Each row has a 23-byte header for transaction visibility, followed by column data with alignment padding. Fixed types use constant space; variable types use a length prefix plus the actual data.

Why add 30% for index overhead?

B-tree indexes on primary keys and foreign keys typically add 20-40% to the raw table size. This 30% estimate accounts for a primary key index and one or two secondary indexes.

How much space does a UUID column use?

PostgreSQL stores UUIDs as 16 bytes in compact binary format. This is twice the size of a bigint (8 bytes) and adds up at scale across the column and all referencing indexes.

What is the difference between varchar and text storage?

In PostgreSQL, varchar and text have identical storage mechanics. Both use TOAST for values over 2KB. The only difference is that varchar(n) enforces a maximum length. For estimation, the average actual data length is what matters.

How accurate is this estimation?

This provides a reasonable estimate for planning. Actual sizes vary due to alignment padding, TOAST compression, fill factor, dead rows, and WAL. Add a 20-50% buffer for production planning.

Estimated Sizes

Column Type Reference

Embed This

PostgreSQL Storage Internals

Index Storage Considerations

TOAST and Large Values

Capacity Planning Best Practices

UUID vs Integer Primary Keys

Frequently Asked Questions

How does PostgreSQL store rows internally?

Why add 30% for index overhead?

How much space does a UUID column use?

What is the difference between varchar and text storage?

How accurate is this estimation?

Related Calculators

Byte Converter

Bandwidth Calculator

JSON Formatter

You Might Also Need

Bandwidth Calculator

Byte/Data Size Converter

API Response Time Calculator

Recommended Reading

How Much Should You Tip? A Complete Tipping Guide

GPA Calculator: How to Calculate Your Grade Point Average