Distributed Version Control with svk
I started to use Subversion one year ago and liked the elegant file-system design a lot. Soon it became impossible for me to go back to CVS. This means that I felt uncomfortable whenever I was working on projects using CVS, and I wanted to see a tool to keep my Subversion repository in sync with a CVS repository. This would not only save me time importing snapshots into vendor branches, but it would also give me the whole history when I’m not online.
I found Barrie’s VCP and wrote a Subversion driver. Then I understood why people said Subversion was slow. My driver invoked the svn
command, and it took something like 30 hours to convert from a CVS repository that resulted in 3000 revisions in the Subversion repository.
Fortunately the Subversion developers made the code easy and ready for wrapping into different languages using SWIG. At that time, only Python bindings were implemented, so I had to do the Perl bindings myself. With the Perl bindings implemented rapidly, VCP gets much faster, and I also started writing SVN::Mirror
, a module that enables mirroring between Subversion repositories. When I felt bored, I would add Subversion back-end support to tools like Bloxsom and Kwiki.
Then the season for traveling came. As I’m far more productive and creative while disconnected from the Internet, I realized I need a distributed version control system, and decided to give myself a year break to develop such a tool to enable me to be even more productive in the future. svk was born soon after my birthday in September 2003.
Why?
There are other distributed version control systems available: Arch, monotone
, darcs
. The functionalities they offer are more or less equivalent.
svk
, however, is written in Perl, and so might be more hackable by a large community. svk
also has a set of commands similar to those of cvs
. On top of this, svk
plans to implement transparent interpolation between different version-control systems.
As I don’t see any strong argument suggesting one system over another, it’s really up to you to try and decide.
Design Decisions
Subversion has a layered design:
fs
: Underlying versioned tree library usingbdb
.repos
: Higher-level support for thefs
, like log messages.ra
: Repository access. Abstracted protocol handlers.wc
: Working copy handling.client
: Implements the commands for clients.
The first design decision was to drop the wc
and ra
layers. Elaborating the Subversion design mentality: “Bandwidth is expensive, disks are cheap,” we should really keep a local copy for every revision – and SVN::Mirror
is already available for such purposes.
Having everything in a local repository, we don’t need anything like the bloated wc
implementation at all. The wc
library not only has the .svn
metadata directory to confuse your favorite utilities like diff
and grep
, but also stores a text-base that makes your checkout twice the size of the actual content. XD (which is the character-wise increment of wc
) was written to maintain checkout copies in a lightweight manner.
Next, I found the most important component of Subversion is not on the above list. It’s the delta library that defines the API for describing tree deltas; this is definitely the core thing in tree-based version control systems. It’s called “Delta Editor.”
For example, running a delta between revision 1 and revision 3 will generate a series of method calls (add_directory
, open_file
, apply_textdelta
, close_file
, close_directory
, etc.), to the targeted editor object. These method calls describe the changes made from revision 1 to revision 3.
While svk
was self-hosting within two months of development, I started to refactor the existing code to center around this interface. With Perl, I could easily stack the editors together, making each editor do its own job, adding arbitrary callbacks as extension to the API, and all of the fun things you know you can do with Perl. Much of the functionality is abstracted and it resulted in the following core components of svk
:
- An editor that receives delta calls to modify the checkout copy.
- A function to generate delta calls for describing the modification done to the checkout copy.
- An editor that takes delta calls and merges them with a tree to generate non-conflicting calls.
Together with these, the logic behind most of the commands became just a question of gluing together a delta generator and the appropriate editors.
Additionally, with Perl’s flexible PerlIO layers system, keyword expansion (like $Id$
in cvs
) was done within one hour. The reusable part of this was abstracted out to the PerlIO::via::dynamic
module on CPAN.
Now let’s see svk
in action.
A First Look
I hate typing those long URLs when using Subversion. So mapping repositories to shorter names is a must:
$ svk depotmap
This will help you create a default repository at ~/.svk/local
, and you could refer to it by //
in the future. If you have a Subversion repository on the disk, you could add another line: test: '/path/to/repos'
. Then you have immediate access to the existing repository – only with the shorter name /test/
instead of file:///long-path-plus-auto-complete-wont-work
.
Now let’s put something in it:
$ svk import //project/vendor /path/to/project-0.01
This will do what you think: import things into //project/vendor
. Repeat the command with a newer version of this project, say 0.02, you’ll have a vendor branch tracked on the path.
Like Subversion, branches and tags are implemented as cheap file system copies:
$ svk cp -m 'development trunk for project' //project/vendor //project/trunk
Now let’s check it out:
$ svk checkout //project/trunk ~/work/project
If you have experience with cvs
or Subversion, you’ll find it familiar when trying to add, modify, remove, or commit files. svk log
will give a change history of files or directories.
Suppose you import project-0.02 after branching trunk, and want to merge the changes from the vendor branch. You just need to:
$ svk smerge //project/vendor //project/trunk
svk
remembers branch and merge history, so it does things automatically for you. If there are conflicts, just replace //project/trunk
with a checkout path such as ~/project/trunk
. You will be able to see the conflicts. Resolve them and commit once done. Merging is no more painful.
Once merged, you could bring the checkout copy of your trunk to the latest revision with svk update
.
Working with Remote Repositories
As mentioned earlier, svk
uses SVN::Mirror
to handle remote repository access. You need to mirror them before you can use them:
$ svk mirror //project/trunk https://svn.somewhere.org/repos/trunk
$ svk sync //project/trunk
Currently you need to set up a Subversion server (either using Apache2 or svnserve
). See relevant articles or tutorials about it.
Now create a local branch, and prepare for traveling:
$ svk cp -m 'create a local branch' //project/trunk //project/local
You could now check out //project/local
and work on it just as above. Of course you could still create your own branch with cp //project/local //project/new-feature
.
Use svk sync
to sync the latest trunk when connected. Merging the new changes from trunk to your local branch works just like the previous example of merging from a vendor branch. How about merging your local changes back to the remote repository?
$ svk smerge //project/local //project/trunk
Transparent, isn’t it?
You should use smerge -C
in advance to check if there are conflicts. Even if your local branch is not merged from the latest trunk, svk
will merge the changes for you and commit to the remote repository directly, provided there’s no conflicts. But be sure to sync the latest trunk first.
In fact, if you are online and about to commit a minor change, you could forget about the process “modify on local branch, then merge back.” Just do:
$ svk switch //project/trunk
$ svk commit
This first line means we now switch from the local branch to trunk, which is the path containing the mirrored archive. The switch command will keep your local change and apply it to trunk as if those modifications are done to a checkout of trunk. Then the svk commit
on the mirrored path will just commit the changes directly to the remote server and then sync the path for you. If the server is temporarily unavailable, just switch back to local and merge back later.
You could also merge individual changes. Find the change number you want with svk log
, and use:
$ svk cmerge -c 113,125-128,130 //project/trunk //project/stable
Now if you are working on projects where you don’t have the permission to commit, you could easily generate a diff
and submit it to the author:
$ svk diff //project/trunk //project/local
Working with Multiple Repositories
Many people track development of several projects. Once you use svk
to mirror the projects, you can run svk sync -a
to sync all of them.
Now suppose another hacker uses svk
and adds a feature to the project and publishes his own branch, and you wish to experiment with or utilize his feature:
$ svk mirror //project/new-feature http://svn.somewhere.else/repos/trunk
$ svk sync //project/new-feature
Then you could merge the changes from him:
$ svk smerge //project/new-feature ~/work/project
Or you might decide to merge that branch to trunk directly:
$ svk smerge //project/new-feature //project/trunk
You could also use the cmerge
command described above to merge specific changes only from that new-feature branch.
This is the minimum case of the distributed development model. The idea is that everyone could create his private branch of the product and then to be merged back by the maintainer. There have been arguments against such model, but I am not going into them here. Although tools somehow promote certain models to solve problems, we change the model or just use another tool when we have to.
There are several features planned in the near future:
Changeset signing and verification
Signing the modified file in a commit with gpg
. This is already done; it’s just that the SVN::Mirror
side hasn’t been able to propagate and verify the signatures.
VCP integration
This would enable mirroring (and thus branching) from alien version control systems, like cvs
or perforce
. Imagine:
$ svk mirror //foo/fromcvs cvs://cvs.server/foo@trunk
This will make me (and perhaps other people) more comfortable when working with projects which use other version control systems, and also less confused when switching between different command sets for working with different projects.
Patch manager
Non-committers can already easily generate the diff
as shown above. While it would be good to register a merge history with the patch manager, large projects that need to merge many developer-submitted patches would find it handy to have a feature which allowed a review, then test, then click-to-apply for a particular change.
The development of svk
is rather rapid, so expect them coming soon!
Conclusion
Five months after the birth of svk
, it had become a fast, full-featured distributed version control system. This is possible mainly because the flexibility of Perl and the spirit of Perl – use something that exists to create new things. Besides, the commands are designed to DWIM!
If you find it interesting, get a copy from the home page and install it just like any other Perl module. Hopefully I’ll then receive your complaints, make svk
better, and make the open source world more productive.
Tags
Feedback
Something wrong with this article? Help us out by opening an issue or pull request on GitHub