Skip to main content

VACUUM

What is VACUUM?

Vacuuming is the database's garbage collection for row versions — it removes dead tuples, reclaims space, and prevents internal ID wraparound. In PostgreSQL, it's essential housekeeping, usually handled automatically by autovacuum, but DBAs sometimes run it manually for large/busy tables.

Why vacuuming is needed

Databases that use MVCC (multi-version concurrency control) don't overwrite rows in place when you update or delete. Instead:

  • UPDATE: inserts a new row version (a tuple) and marks the old one as obsolete.
  • DELETE: just marks the row as dead, but keeps it until no transaction can still see it.

This means tables accumulate dead tuples (old versions of rows). If nothing removes them:

  • Tables and indexes bloat (wasted space, slower scans).
  • Queries may see outdated data if old tuples stick around incorrectly.
  • Transaction IDs can wrap around (in Postgres), which risks data corruption.

So vacuuming is the process of cleaning up old/dead tuples and maintaining system metadata.

VACUUM in different databases

  • PostgreSQL: VACUUM does several things: reclaims space, maintains visibility maps, advances relfrozenxid.
  • MySQL (InnoDB): doesn't have explicit VACUUM, but background purge threads clean up old row versions and undo logs.
  • Oracle: similar concept under the hood, but automated by undo/redo management; DBAs rarely "vacuum."
  • SQLite: has a VACUUM command, but it's different: it rebuilds the entire database file to reclaim free space and defragment it.

Key trade-offs

  • Regular vacuuming keeps tables healthy but consumes I/O and CPU.
  • Too little vacuuming → bloat, wraparound risks.
  • Too aggressive vacuuming → contention with queries (though Postgres is pretty good at minimizing this with autovacuum).