Does cloud-native PostgreSQL🐘 dream of Git🐙 branches?

Foreword#

Imagine if your database could branch like Git manages code. Creating a new development environment would no longer require hours of data copying but could be completed in seconds, just like creating a Git branch.

In traditional development processes, we are accustomed to version control for code: feature branches, hotfix branches, experimental branches... However, when it comes to databases, we (including my team, I apologize to everyone here 🙇) often fall into the dilemma of

"a massive production database"

Developers either use small sample data (lacking authenticity) or wait for a lengthy data backup recovery process (inefficient).

🪄 Muggles, do you know what ZFS is?#

Ah, today's programmers have it too rough; they start with

ext4
xfs
btrfs

Looking back at our ancient gods Solaris and BSD, the ancient gods whisper in your ear: ~~Cthulhu🐙Cthulhu🐙Cthulhu~~

Have you thought about using ZFS?

ZFS (Zettabyte File System) was developed by Sun Microsystems and later acquired by Oracle. It was conditionally open-sourced by OpenSolaris and popularized in FreeBSD. ZFS is not just a file system; it is a revolutionary storage solution that fundamentally changes our understanding of data storage and management.

Copy-on-Write, abbreviated as COW 🐄#

The core magic of ZFS lies in the Copy-on-Write (COW) technology. Traditional file systems directly overwrite existing data when you modify a file, while ZFS adopts a smarter approach:

Original data is never modified: When you need to change data, ZFS writes the modified data to a new location.
Snapshots are instantaneous: Creating a snapshot merely records a point in time without involving data copying.
Cloning is almost free: New clones share the same data blocks as the original data, only diverging when modifications occur.

This is like the manifestation of parallel universe theory in computer storage: all "branches" exist in the same physical space, only diverging into independent entities when needed.

Guarantee of Data Consistency 🔗#

Another killer feature of ZFS is the transactional file system. Each write operation is atomic; it either completely succeeds or completely fails. This is crucial for databases as it ensures the consistency of snapshots—you will never get a "half-baked" database state.

Why do databases need "branching"?#

Now our team Pain points of traditional database management#

In traditional development environments, databases are often the biggest bottleneck:

Resource contention: Multiple developers share the same test database, affecting each other.
Data pollution: Data modifications during testing affect other tests.
Environmental consistency: Data differences between production and development environments lead to bugs.
Difficult recovery: Time-consuming data recovery is needed after destructive testing.

The value of branching databases#

Database branching addresses these fundamental issues:

Isolation: Each developer has their own independent database instance, without interference.

Authenticity: Development environments can be created based on snapshots of production data, ensuring data authenticity.

Recoverability: Any destructive operation can be quickly recovered by deleting the branch.

Parallelism: Multiple feature developments can proceed in parallel without blocking each other.

The Philosophy of Technical Architecture#

Wisdom Learned from Git#

The success of Git lies not only in its technical implementation but also in its design philosophy:

Distributed: Each developer has a complete history.
Lightweight branches: Creating a branch incurs almost no cost.
Parallel development: Multiple features can be developed simultaneously.
Version history: You can revert to any historical state.

We apply these concepts to database management:

Mermaid Loading...

Evolution of Storage Architecture#

Traditional database backup solutions:

Mermaid Loading...

ZFS-based branching solution:

Mermaid Loading...

Practical Application Scenarios#

Scenario 1: Parallelization of Feature Development#

Traditional method:

Developer A tests new features in the test database.
Developer B must wait for A to finish before starting testing.
Test data is polluted by A's actions, requiring B to prepare data again.

Branching method:

Developer A creates a feature-a branch database.
Developer B creates a feature-b branch database.
Both can develop in parallel without affecting each other.
After testing is complete, branches are deleted directly.

Scenario 2: Safety of Data Analysis#

Data analysts often need to perform complex queries on production data without affecting the production system:

Mermaid Loading...

Scenario 3: New Approaches to Disaster Recovery#

Traditional disaster recovery relies on backup restoration, which is time-consuming and uncertain. With ZFS snapshot technology, you can:

Quick rollback: Immediately roll back to the last snapshot upon discovering an issue.
Parallel recovery: Test recovery plans without affecting the current system.
Incremental recovery: Gradually verify the data integrity at each point in time.

Performance Revolution#

Comparison of Time Costs#

Traditional solution:

Creating test environment: 2-4 hours (depending on data volume)
Restoring environment: 1-2 hours
Cleaning environment: 30 minutes

ZFS branching solution:

Creating test environment: 30 seconds
Restoring environment: delete and rebuild, 30 seconds
Cleaning environment: completed instantly

Optimization of Storage Costs#

A 100GB production database requires 500GB of storage space to create 5 test environments using traditional solutions.

Using the ZFS branching solution:

Original data: 100GB
Incremental data for 5 branches: usually less than 20GB
Total storage requirement: about 120GB

Storage space saved: 76%

Shift in Operations and Maintenance Philosophy#

From "Maintenance" to "Creation"#

Traditional operations and maintenance focus more on maintaining the stability of existing systems, while branching databases allow operations personnel to:

Quickly experiment: New configuration optimizations can be safely tested in cloned environments.
Version management: Database states now have version histories.
Rollback capability: Any changes can be quickly rolled back.

Reconstruction of Development Processes#

Branching databases promote a deeper DevOps culture:

Mermaid Loading...

I think the future of the database industry should be#

Database branching is not just a technical improvement; it represents a fundamental shift in data management philosophy:

Data as code: Database states can be versioned like code.
Environment as a service: Creating a development environment is as simple as starting a service.
Testing as safe: Any testing poses no risk to production data.

This hand-crafted technology lowers the barrier to database management. Complex operations that only senior DBAs could perform in the past can now be easily accomplished by ordinary developers.