Foreword#
Imagine if your database could branch like Git manages code. Creating a new development environment would no longer require hours of data copying but could be completed in seconds, just like creating a Git branch.
In traditional development processes, we are accustomed to version control for code: feature branches, hotfix branches, experimental branches... However, when it comes to databases, we (including my team, I apologize to everyone here 🙇) often fall into the dilemma of
"a massive production database"
Developers either use small sample data (lacking authenticity) or wait for a lengthy data backup recovery process (inefficient).
🪄 Muggles, do you know what ZFS is?#
Ah, today's programmers have it too rough; they start with
- ext4
- xfs
- btrfs
Looking back at our ancient gods Solaris and BSD, the ancient gods whisper in your ear: Cthulhu🐙Cthulhu🐙Cthulhu
Have you thought about using ZFS?
ZFS (Zettabyte File System) was developed by Sun Microsystems and later acquired by Oracle. It was conditionally open-sourced by OpenSolaris and popularized in FreeBSD. ZFS is not just a file system; it is a revolutionary storage solution that fundamentally changes our understanding of data storage and management.
Copy-on-Write, abbreviated as COW 🐄#
The core magic of ZFS lies in the Copy-on-Write (COW) technology. Traditional file systems directly overwrite existing data when you modify a file, while ZFS adopts a smarter approach:
- Original data is never modified: When you need to change data, ZFS writes the modified data to a new location.
- Snapshots are instantaneous: Creating a snapshot merely records a point in time without involving data copying.
- Cloning is almost free: New clones share the same data blocks as the original data, only diverging when modifications occur.
This is like the manifestation of parallel universe theory in computer storage: all "branches" exist in the same physical space, only diverging into independent entities when needed.
Guarantee of Data Consistency 🔗#
Another killer feature of ZFS is the transactional file system. Each write operation is atomic; it either completely succeeds or completely fails. This is crucial for databases as it ensures the consistency of snapshots—you will never get a "half-baked" database state.
Why do databases need "branching"?#
Now our team Pain points of traditional database management#
In traditional development environments, databases are often the biggest bottleneck:
- Resource contention: Multiple developers share the same test database, affecting each other.
- Data pollution: Data modifications during testing affect other tests.
- Environmental consistency: Data differences between production and development environments lead to bugs.
- Difficult recovery: Time-consuming data recovery is needed after destructive testing.
The value of branching databases#
Database branching addresses these fundamental issues:
Isolation: Each developer has their own independent database instance, without interference.
Authenticity: Development environments can be created based on snapshots of production data, ensuring data authenticity.
Recoverability: Any destructive operation can be quickly recovered by deleting the branch.
Parallelism: Multiple feature developments can proceed in parallel without blocking each other.
The Philosophy of Technical Architecture#
Wisdom Learned from Git#
The success of Git lies not only in its technical implementation but also in its design philosophy:
- Distributed: Each developer has a complete history.
- Lightweight branches: Creating a branch incurs almost no cost.
- Parallel development: Multiple features can be developed simultaneously.
- Version history: You can revert to any historical state.
We apply these concepts to database management:
Evolution of Storage Architecture#
Traditional database backup solutions:
ZFS-based branching solution:
Practical Application Scenarios#
Scenario 1: Parallelization of Feature Development#
Traditional method:
- Developer A tests new features in the test database.
- Developer B must wait for A to finish before starting testing.
- Test data is polluted by A's actions, requiring B to prepare data again.
Branching method:
- Developer A creates a feature-a branch database.
- Developer B creates a feature-b branch database.
- Both can develop in parallel without affecting each other.
- After testing is complete, branches are deleted directly.
Scenario 2: Safety of Data Analysis#
Data analysts often need to perform complex queries on production data without affecting the production system:
Scenario 3: New Approaches to Disaster Recovery#
Traditional disaster recovery relies on backup restoration, which is time-consuming and uncertain. With ZFS snapshot technology, you can:
- Quick rollback: Immediately roll back to the last snapshot upon discovering an issue.
- Parallel recovery: Test recovery plans without affecting the current system.
- Incremental recovery: Gradually verify the data integrity at each point in time.
Performance Revolution#
Comparison of Time Costs#
Traditional solution:
- Creating test environment: 2-4 hours (depending on data volume)
- Restoring environment: 1-2 hours
- Cleaning environment: 30 minutes
ZFS branching solution:
- Creating test environment: 30 seconds
- Restoring environment: delete and rebuild, 30 seconds
- Cleaning environment: completed instantly
Optimization of Storage Costs#
A 100GB production database requires 500GB of storage space to create 5 test environments using traditional solutions.
Using the ZFS branching solution:
- Original data: 100GB
- Incremental data for 5 branches: usually less than 20GB
- Total storage requirement: about 120GB
Storage space saved: 76%
Shift in Operations and Maintenance Philosophy#
From "Maintenance" to "Creation"#
Traditional operations and maintenance focus more on maintaining the stability of existing systems, while branching databases allow operations personnel to:
- Quickly experiment: New configuration optimizations can be safely tested in cloned environments.
- Version management: Database states now have version histories.
- Rollback capability: Any changes can be quickly rolled back.
Reconstruction of Development Processes#
Branching databases promote a deeper DevOps culture:
I think the future of the database industry should be#
Database branching is not just a technical improvement; it represents a fundamental shift in data management philosophy:
- Data as code: Database states can be versioned like code.
- Environment as a service: Creating a development environment is as simple as starting a service.
- Testing as safe: Any testing poses no risk to production data.
This hand-crafted technology lowers the barrier to database management. Complex operations that only senior DBAs could perform in the past can now be easily accomplished by ordinary developers.