TL;DR: Once you prepare the connection, pg_dump is your friend.
Recently, we had to migrate a large-ish DB (1.5 TB) from a Windows installation to a Linux one in order to accommodate a series of third party extensions. We have split the process in three:
- Prepare the migration
- Migrate users
- Migrate the database
- Wrap up
Let’s say we have the two machines:
- SourceMachine - the machine containing the source DB (the Windows machine in my case)
- DestinationMachine - the destination machine
Note: All PostgreSQL commands are symmetrical in the sense that you can run them on the source machine or on the destination machine. However, IMHO it’s easier to run them on the linux machine (pipes, ssh, su installed). Since the destination machine was Linux, we chose to execute the migration commands on it.
To prepare the migration, we need to give access to one of the machines to the other’s resources. Remote access is managed by pg_hba.conf and we need to edit it:
Log in on the source machine
Go to your pg_hba.conf location. On Windows installations is in the directory where you installed PostgreSQL (e.g.
C:\Program Files\PostgreSQL. On Linux, it’s in /etc , e.g.
Note: Make sure you edit the right one in case you have multiple versions intalled :)
Add the following to pg_hba.conf:
# First we authorize DestinationMachineto access to SourceMachine # by adding: host all all <DestinationMachine IP address>/24 md5
We need access not only to the database we need to migrate, but to other databases e.g. for user migration.
We need to take note of installation specifics e.g. custom ports on the source/destination machines.
Now we are ready to connect from the destination machine.
Since we only have one DB, we can assume we want to migrate all users. This makes life easier :)
On the DestinationMachine we run:
# run the command as "postgres" user su postgres # copy the users, groups... pg_dumpall -g -h <SourceMachine> -U postgres | psql
The command does the following:
- On the SourceMachine, get the global bits (users, groups…) using the postgres user.
- Instead of printing to STDOUT,
- Pipe the result to psql command to execute the dump’s result on the DestinationMachine
If the command ends successfully, then the destination machine will have the same users as the source.
This is the meat of the problem and it’s solved by a simple command executed on DestinationMachine:
# run the command as "postgres" user su postgres # copy the DB pg_dump -C -h <SourceMachine> -U postgres <DB_NAME>| psql
What it does id the following:
- Dump the database
_postgres*user, adding the CREATE DATABASE commands at the beginning of the dump
- Pipe the output…
- … to psql command on the DestinationMachine (local)
I’d advise you create a database on the DestinationMachine before executing the above command (using the
CREATE DATABASE SQL command).
Note: This command may time out on locking tables on the SourceMachine. If it does, you can:
Change the time out period using:
PGOPTIONS="-c statement_timeout=0" pg_dump ...
as suggested here, or
Re-run the command above without any changes and hope for the best :)
You should actually clean the DestinationMachine before re-running the command though…
Now that all the data was transferred, we need to do a couple of configuration things:
Remove the extra line we added in pg_hba.conf at the beginning. We don’t need that connection anymore
Replicate the other customisations from the SourceMachine’s pg_hba.conf to the DestinationMachine’s pg_hba.conf. For example, we had explicit access to the database for some groups:
host database +db_group_ro 0.0.0.0/0 md5 host database +db_group_rw 0.0.0.0/0 md5
Perform whatever extra configuration needed to the DestinationMachine to finish (e.g. firewall, physical users…)
Now, you can use the DestinationMachine with the same privileges. If needed, you can retire the SourceMachine and rename the DestinationMachine with the same name/IP address and its usage would be transparent to ordinary users.