Declarative Version-Controlled Database Migrations: My Hackathon Onboarding Project

In my previous projects before joining Commit, the tech stack I used (including PostGraphile, which I highly recommend) relied on writing SQL and PL/pgSQL code to define the database, business and authorization layers of the application. This code was deployed onto PostgreSQL databases using `sqitch` and unit-tested with`pgTap.`

As the projects grew and features were added and refined, one pain point with our tooling was that the way our database code was written was not very readable: as with many database migration tools, one tends to add incremental changes to the code, often leading to situations where the reader has to open a series of files to understand the structure of a table. While `sqitch`, with its `rework` command allows developers to update existing code rather than adding incremental changes, it requires the code to be idempotent (i.e. it can be run any number of times and the outcome will be the same). Writing idempotent SQL code can be tricky and makes collaboration more difficult for less-experienced developers.

What I Made

For my Hackathon Onboarding Project, I explored the creation of a tool that aims to make this process easier. My project, PostGit, is a command-line tool that integrates existing PostgreSQL schema diffing tools (e.g. migra) with Git. It allows developers to focus on writing declarative schema migrations, i.e., to define the schema they want, rather than defining incremental migrations. Let’s follow the simple example of a todo list. In this application, you may want to define the following table.

```sql
create table todo (
	id int primary key generated as identity,
	task text,
	is_done bool not null default false
);
```

In the following release, you may want to add a feature that allows archiving existing todo records, by adding an `archived_at` column to the table. If you are writing classical database migrations, you may need to add a new file containing the following:

```sql
alter table todo add column archived_at timestamptz;
```

Instead, with PostGit, you can edit the original SQL file like so.

```sql
create table todo (
	id int primary key generated as identity,
	task text,
	is_done bool not null default false,
	archived_at timestamptz
);
```

You can then commit the changes, and the `postgit diff` command can calculate the migration for you:

```
$ postgit diff --from HEAD^1 --to HEAD schema
alter table "public"."todo" add column "archived_at" timestamp with time zone;
```

Here we simply ask `postgit` to calculate the difference between the sql files located in the `schema/` folder of our git repository, between the last two commits. When you want to deploy those changes to the production database, you can use the `postgit push` command, which takes the same arguments as `postgit diff`.

For a seamless local development experience, the `postgit watch` command will automatically deploy changes to your local database as you save your SQL files.

How It Works

Written in Rust

On top of allowing me to work on any project of my choosing, the Commit Hackathon Onboarding Project was a great opportunity to learn new technologies. Having started to learn Rust a few weeks before joining Commit, I decided that this would be my first Rust project.

Beyond the usual learning resources, what I found most useful was Blessed.rs, which provides guidance when it comes to choosing crates (i.e. Rust’s “packages”).

The PostGit code uses clap to define the CLI via a couple of simple structs and enum, gitoxide to retrieve files in the git repository, tokio_postgres to connect to the database, notify to watch files in the local directory, and serde and toml to parse the configuration file.

Clap Example

The struct defining the arguments for the `postgit diff` command (that’s *almost* all you need for a complete CLI)

```rust
#[derive(Args)]
pub struct DiffArgs {
   /// Path to the root of the git repository
   #[arg(long, short, default_value = ".")]
   pub repo_path: String,
  
   /// Git revision where the source schema can be found.
   /// This may be omitted for the first migration, when the database is empty
	#[arg(long, short)]
	pub from: Option<String>, 
	
	/// Git revision where the target schema can be found.
	#[arg(long, short)]
	pub to: String,
	
	/// Path to the source schema at the source ref, if different from the target path
	#[arg(long)]
	pub source_path: Option<String>,
	
	/// Path to the schema file or directory, relative to the repo root
	pub path: String,
}
```

The output generated by `clap` when running `postgit diff --help` :

```
Usage: postgit diff [OPTIONS] --to <TO> <PATH>

Arguments:
 <PATH>  Path to the schema file or directory, relative to the repo root

Options:
 -r, --repo-path <REPO_PATH>      Path to the root of the git repository [default: .]
 -f, --from <FROM>                Git revision where the source schema can be found. This may be omitted for the first migration, when the database is empty
 -t, --to <TO>                    Git revision where the target schema can be found
     --source-path <SOURCE_PATH>  Path to the source schema at the source ref, if different from the target path
 -h, --help                       Print help information
 -V, --version                    Print version information
```

Schema Diffing

As mentioned above, PostGit reuses existing schema diffing tools. `migra` is the default tool, but it can be configured. If you want to use Supabase’s diff CLI for pgAdmin4, you can configure the tool as follows:

```toml
[diff_engine]
command='docker run --network=host supabase/pgadmin-schema-diff $1 $2'
```

Ultimately, the accuracy and maturity of PostGit’s approach will depend on having a reliable and complete schema diffing tool.

This tooling should be combined with other guidelines found in the PostgreSQL ecosystem, including linting for migration safety with a tool such as squawk and best practices such as ensuring that tables/columns are deprecated and not used by any client before they are dropped.

What I Learned & Next Steps

In a couple of weeks, I have learned a lot about the Rust language and its ecosystem and I found many blog articles confirming that, while PostgreSQL is steadily gaining in popularity, PostgreSQL schema migrations are still a challenging topic.

I will continue improving PostGit to provide better error handling, a smooth user experience and perhaps add a GitHub action to help integrate it in existing projects. I am also considering writing tutorials to explain how PostGit can be used in a project, allowing potential users to be confident that they are using the tool as intended.