LemonHX

LemonHX

CEO of Limit-LAB 喜欢鼓捣底层的代码,意图改变世界
twitter
tg_channel

Does cloud-native PostgreSQL🐘 dream of Git🐙 branches?

Foreword#

Imagine if your database could branch like Git manages code. Creating a new development environment would no longer require hours of data copying but could be completed in seconds, just like creating a Git branch.

In traditional development processes, we are accustomed to version control for code: feature branches, hotfix branches, experimental branches... However, when it comes to databases, we (including my team, I apologize to everyone here 🙇) often fall into the dilemma of

"a massive production database"

Developers either use small sample data (lacking authenticity) or wait for a lengthy data backup recovery process (inefficient).

🪄 Muggles, do you know what ZFS is?#

Ah, today's programmers have it too rough; they start with

  • ext4
  • xfs
  • btrfs

Looking back at our ancient gods Solaris and BSD, the ancient gods whisper in your ear: Cthulhu🐙Cthulhu🐙Cthulhu

Have you thought about using ZFS?

ZFS (Zettabyte File System) was developed by Sun Microsystems and later acquired by Oracle. It was conditionally open-sourced by OpenSolaris and popularized in FreeBSD. ZFS is not just a file system; it is a revolutionary storage solution that fundamentally changes our understanding of data storage and management.

Copy-on-Write, abbreviated as COW 🐄#

The core magic of ZFS lies in the Copy-on-Write (COW) technology. Traditional file systems directly overwrite existing data when you modify a file, while ZFS adopts a smarter approach:

  1. Original data is never modified: When you need to change data, ZFS writes the modified data to a new location.
  2. Snapshots are instantaneous: Creating a snapshot merely records a point in time without involving data copying.
  3. Cloning is almost free: New clones share the same data blocks as the original data, only diverging when modifications occur.

This is like the manifestation of parallel universe theory in computer storage: all "branches" exist in the same physical space, only diverging into independent entities when needed.

Guarantee of Data Consistency 🔗#

Another killer feature of ZFS is the transactional file system. Each write operation is atomic; it either completely succeeds or completely fails. This is crucial for databases as it ensures the consistency of snapshots—you will never get a "half-baked" database state.

Why do databases need "branching"?#

Now our team Pain points of traditional database management#

In traditional development environments, databases are often the biggest bottleneck:

  1. Resource contention: Multiple developers share the same test database, affecting each other.
  2. Data pollution: Data modifications during testing affect other tests.
  3. Environmental consistency: Data differences between production and development environments lead to bugs.
  4. Difficult recovery: Time-consuming data recovery is needed after destructive testing.

The value of branching databases#

Database branching addresses these fundamental issues:

Isolation: Each developer has their own independent database instance, without interference.

Authenticity: Development environments can be created based on snapshots of production data, ensuring data authenticity.

Recoverability: Any destructive operation can be quickly recovered by deleting the branch.

Parallelism: Multiple feature developments can proceed in parallel without blocking each other.

The Philosophy of Technical Architecture#

Wisdom Learned from Git#

The success of Git lies not only in its technical implementation but also in its design philosophy:

  • Distributed: Each developer has a complete history.
  • Lightweight branches: Creating a branch incurs almost no cost.
  • Parallel development: Multiple features can be developed simultaneously.
  • Version history: You can revert to any historical state.

We apply these concepts to database management:

Mermaid Loading...

Evolution of Storage Architecture#

Traditional database backup solutions:

Mermaid Loading...

ZFS-based branching solution:

Mermaid Loading...

Practical Application Scenarios#

Scenario 1: Parallelization of Feature Development#

Traditional method:

  • Developer A tests new features in the test database.
  • Developer B must wait for A to finish before starting testing.
  • Test data is polluted by A's actions, requiring B to prepare data again.

Branching method:

  • Developer A creates a feature-a branch database.
  • Developer B creates a feature-b branch database.
  • Both can develop in parallel without affecting each other.
  • After testing is complete, branches are deleted directly.

Scenario 2: Safety of Data Analysis#

Data analysts often need to perform complex queries on production data without affecting the production system:

Mermaid Loading...

Scenario 3: New Approaches to Disaster Recovery#

Traditional disaster recovery relies on backup restoration, which is time-consuming and uncertain. With ZFS snapshot technology, you can:

  1. Quick rollback: Immediately roll back to the last snapshot upon discovering an issue.
  2. Parallel recovery: Test recovery plans without affecting the current system.
  3. Incremental recovery: Gradually verify the data integrity at each point in time.

Performance Revolution#

Comparison of Time Costs#

Traditional solution:

  • Creating test environment: 2-4 hours (depending on data volume)
  • Restoring environment: 1-2 hours
  • Cleaning environment: 30 minutes

ZFS branching solution:

  • Creating test environment: 30 seconds
  • Restoring environment: delete and rebuild, 30 seconds
  • Cleaning environment: completed instantly

Optimization of Storage Costs#

A 100GB production database requires 500GB of storage space to create 5 test environments using traditional solutions.

Using the ZFS branching solution:

  • Original data: 100GB
  • Incremental data for 5 branches: usually less than 20GB
  • Total storage requirement: about 120GB

Storage space saved: 76%

Shift in Operations and Maintenance Philosophy#

From "Maintenance" to "Creation"#

Traditional operations and maintenance focus more on maintaining the stability of existing systems, while branching databases allow operations personnel to:

  1. Quickly experiment: New configuration optimizations can be safely tested in cloned environments.
  2. Version management: Database states now have version histories.
  3. Rollback capability: Any changes can be quickly rolled back.

Reconstruction of Development Processes#

Branching databases promote a deeper DevOps culture:

Mermaid Loading...

I think the future of the database industry should be#

Database branching is not just a technical improvement; it represents a fundamental shift in data management philosophy:

  1. Data as code: Database states can be versioned like code.
  2. Environment as a service: Creating a development environment is as simple as starting a service.
  3. Testing as safe: Any testing poses no risk to production data.

This hand-crafted technology lowers the barrier to database management. Complex operations that only senior DBAs could perform in the past can now be easily accomplished by ordinary developers.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.