Revision control, also known as version control, source control or software configuration management (SCM), is the management of changes to documents, programs, and other information stored as computer files.
Version control systems provide the ability to track (and potentially revert) incremental changes to files, reporting them to a mailing list as they are made, and can be used concurrently by many developers.
Distributed vs Centralized
To better understand the comparison of this two Version Control tools, first we must understand what is the difference between Centralized and Distributed (Decentralized) version control systems.
Distributed revision control (DRCS) takes a peer-to-peer approach, as opposed to the client-server approach of centralized systems. Rather than a single, central repository on which clients synchronize, each peer’s working copy of the codebase is a bona-fide repository. Distributed revision control conducts synchronization by exchanging patches (change-sets) from peer to peer. This results in some important differences from a centralized system:
- No canonical, reference copy of the codebase exists by default; only working copies.
- Common operations (such as commits, viewing history, and reverting changes) are fast, because there is no need to communicate with a central server.
- Rather, communication is only necessary when pushing or pulling changes to or from other peers.
- Each working copy effectively functions as a remote backup of the codebase and of its change-history, providing natural protection against data loss.
Other differences are as follows:
- In DVCS there may be many “central” repositories.
- Code from disparate repositories is merged based on a web of trust, i.e., historical merit or quality of changes.
- Numerous different development models are possible, such as development / release branches or a Commander / Lieutenant model, allowing for efficient delegation of topical developments in very large projects.
- Lieutenants are project members who have the power to dynamically decide which branches to merge.
- Network is not involved in most operations.
- A separate set of “sync” operations are available for committing or receiving changes with remote repositories.
DVCS proponents point to several advantages of distributed version control systems over the traditional centralized model:
- Allows users to work productively even when not connected to a network
- Makes most operations much faster since no network is involved
- Allows participation in projects without requiring permissions from project authorities, and thus arguably better fosters culture of meritocracy instead of requiring “committer” status
- Allows private work, so users can use their revision control system even for early drafts they do not want to publish
- Avoids relying on a single physical machine as a single point of failure.
- Still permits centralized control of the “release version” of the project
- As a disadvantage of DVCS, one could note that initial cloning of a repository is slower compared to centralized checkout, because all branches and revision history are copied. This may be relevant if access speed is low and the project is large enough.
- Another problem with DVCS is the lack of locking mechanisms that is part of most centralized VCS and still plays an important role when it comes to non-mergable binary files such as graphic assets.
SVN or Subversion is Centralized Version Control (CVCS) tool and belongs to second generation of Version Control tools. It was developed in 2000 and at this moment maintained by Apache.
SVN is the third implementation of a revision control: RCS, then CVS and finally SVN. SVN offers VCS features (labeling and merging), but its tag is just a directory copy (like a branch, except you are not “supposed” to touch anything in a tag directory), and its merge is still complicated, currently based on meta-data added to remember what has already been merged.
GIT is Decentralized Version Control (DVCS) tool and belongs to third generation of Version Control tools.
GIT was initially designed and developed by Linus Torvalds for Linux kernel development. Every GIT working directory is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on network access or a central server.
GIT is a file content management (a tool made to merge files), evolved into a true Version Control System, based on a DAG (Directed Acyclic Graph) of commits, where branches are part of the history of data (and not a data itself), and where tags are a true meta-data.
GIT over SVN
Git was designed from the ground up as a distributed version control system. Being a distributed version control system means that multiple redundant repositories and branching are first class concepts of the tool.
In a distributed VCS like Git every user has a complete copy of the repository data stored locally, thereby making access to file history extremely fast, as well as allowing full functionality when disconnected from the network. It also means every user has a complete backup of the repository. Have 20 users? You probably have more than 20 complete backups of the repository as some users tend to keep more than one repository for the same project. If any repository is lost due to system failure only the changes which were unique to that repository are lost. If users frequently push and fetch changes with each other this tends to be a small amount of loss, if any.
In a centralized VCS like Subversion only the central repository has the complete history. This means that users must communicate over the network with the central repository to obtain history about a file. Backups must be maintained independently of the VCS. If the central repository is lost due to system failure it must be restored from backup and changes since that last backup are likely to be lost. Depending on the backup policies in place this could be several human-weeks worth of work.
Due to being distributed, you inherently do not have to give commit access to other people in order for them to use the versioning features. Instead, you decide when to merge what from whom.
That is, because subversion controls access, in order for daily checkins to be allowed – for example – the user requires commit access. In git, users are able to have version control of their own work while the source is controlled by the repo owner.
Branches in Git are a core concept used every day by every user. In Subversion they are more cumbersome and often used sparingly.
The reason branches are so core in Git is every developer’s working directory is itself a branch. Even if two developers are modifying two different unrelated files at the same time it’s easy to view these two different working directories as different branches stemming from the same common base revision of the project.
Tracks the project revision the branch started from – this information is necessary to merge the branch back to trunk
- Records branch merge events including:
- Author, time and date
- Branch and revision information
- Changes made on the branch(es) remain attributed to the original authors and the original timestamps of those changes
- What changes were made to complete the merge? These are attributed to the merging user
- Why the merge was done (optional; can be supplied by the user).
- Automatically starts the next merge at the last merge.
- Knowing what revision was last merged is necessary in order to successfully merge the same branches together again in the future.
This is different to Subversion’s handling of branches. As of Subversion 1.5:
- Automatically tracks the project revision the branch started from.
- Like Git, Subversion remembers where a branch originated.
- If the merging user had to modify 12 lines of code to complete the merge successfully you can’t tell what those 12 lines were, or how those 12 lines differ from the versions on the branches being merged.
In Subversion, branches and tags all are copies. Sometimes this is inconvenient, it is easy to check out the whole repository by mistake. Branch path and file path lie in same namespace but they have different semantics – this can be confusing.
Performance (Speed of Operation)
Git is extremely fast. Since all operations (except for push and fetch) are local there is no network latency involved to:
- Perform a diff.
- View file history.
- Commit changes.
- Merge branches.
- Obtain any other revision of a file (not just the prior committed revision).
- Switch branches.
Smaller Space Requirements
Git’s repository and working directory sizes are extremely small when compared to SVN.
For example the Mozilla repository is reported to be almost 12 Gb when stored in SVN using the fsfs backend. Previously, the fsfs backend also required over 240,000 files in one directory to record all 240,000 commits made over the 10 year project history. This was fixed in SVN 1.5, where every 1000 revisions are placed in a separate directory. The exact same history is stored in Git by only two files totaling just over 420 Mb. This means that SVN requires 30x the disk space to store the same history.
One of the reasons for the smaller repo size is that an SVN working directory always contains two copies of each file: one for the user to actually work with and another hidden in .svn/ to aid operations such as status, diff and commit. In contrast a Git working directory requires only one small index file that stores about 100 bytes of data per tracked file. On projects with a large number of files this can be a substantial difference in the disk space required per working copy.
As a full Git clone is often smaller than a full checkout, Git working directories (including the repositories) are typically smaller than the corresponding SVN working directories. There are even ways in Git to share one repository across many working directories, but in contrast to SVN, this requires the working directories to be collocated.
Line Ending Conversion
Subversion can be easily configured to automatically convert line endings to CRLF or LF, depending on the native line ending used by the client’s operating system. This conversion feature is useful when Windows and UNIX users are collaborating on the same set of source code.
It is also possible to configure a fixed line ending independent of the native operating system. Files such as a Makefile need to only use LFs, even when they are accessed from Windows. This can be adjusted in a global config and overridden in user configs. Binary files are checked in with a binary flag (like with CVS except that SVN does this almost always automatically) and such never get converted or keyword substituted.
Subversion also allows the user to specify line ending conversion on a file-by-file basis. But if the user does not check the binary flag on adding (Subversion prints for every added file whether it recognized it as binary) binary content might get corrupted.
Whilst Git versions prior 1.5.1 never convert files and always assume that every file is opaque and should not be modified. Git 1.5.1 and onwards make [line ending conversion configurable]. Git’s advantage over Subversion is that you do not have to manually specify which files this conversion should be applied to, it happens automatically (hence autocrlf).
SVN over GIT
Since Subversion only supports a single repository there is little doubt about where something is stored. Once a user knows the repository URL they can reasonably assume that all materials and all branches related to that project are always available at that location. Backup to tape/CD/DVD is also simple as there is exactly one location that needs to be backed up regularly.
Since Git is distributed by nature not everything related to a project may be stored in the same location. Therefore there may be some degree of confusion about where to obtain a particular branch, unless repository location is always explicitly specified. There may also be some confusion about which repositories are backed up to tape/CD/DVD regularly, and which aren’t.
Since Subversion has a single central repository it is possible to specify read and write access controls in a single location and have them be enforced across the entire project.
Detection and properties
Subversion can be used with binary files (it is automatically detected; if that detection fails, you have to mark the file binary yourself). Just like Git.
Only that with Git, the default is to interpret the files as binary to begin with. If you _have_ to have CR+LF line endings (even though most modern programs grok the saner LF-only line endings just fine), you have to tell Git so. Git will then autodetect if a file is text (just like Subversion), and act accordingly. Analogous to Subversion, you can correct an erroneous autodetection by setting a git attribute.
In an earlier version of git seemingly minor changes to binary files, such as adjusting brightness on an image, could be different enough that Git interprets them as a new file, causing the content history to split. Since Subversion tracks by file, history for such changes is maintained.
Partial Checkout/Bandwidth Requirements
With Subversion, you can check out just a subdirectory of a repository. This is not possible with Git. For a large project, this means that you always have to download the whole repository, even if you only need the current version of some sub-directory. In times where fast Internet connections are only available in most cities and traffic over mobile internet connections is expensive, git can cost much more time and money in rural areas or with mobile devices. This is arguably mitigated by the small size of git repositories.
In other cases, requirements other than the raw repository size provide the motivation for wanting a partial checkout, e.g. access control (you can’t restrict read access to part of the repository with Git) or directory layout requirements. There is no general solution for this problem other than to split the original Git repository into multiple repositories, then cloning one of the new repositories. (Git subprojects can mitigate some of the difficulties of managing the collection of new repositories.)
Shorter and Predictable Revision Numbers
First, as SVN assigns revision numbers sequentially (starting from 1) even very old projects such as Mozilla have short unique revision numbers (Mozilla is only up to 6 digits in length). Many users find this convenient when entering revisions for historical research purposes. They also find this number easy to embed into their product, supposedly making it easy to determine which sources were used to create a particular executable. However since the revision number is global to the entire repository, including all branches, there is still a question of which branch the revision number corresponds to.
Unless the last committed revision is recorded. Since revisions are global for a repository, the last committed revision makes it possible to determine which branch was used
As Git uses a SHA1 to uniquely identify a commit each specific revision can only be described by a 40 character hexadecimal string, however this string not only identifies the revision but also the branch it came from. In practice the first 8 characters tends to be unique for a project, however most users try to not rely on this over the long term. Rather than embedding long commit SHA1s into executables Git users generate a uniquely named tag. This is an additional step, but a simple one.
Secondly, SVN’s revision numbers are predictable. If the current commit is 435 the next one will be 436. It’s very easy then to go through a few sequential revisions to, e.g. look at differences, revert to an old revision to find when a regression was introduced, etc. Furthermore, without looking up any additional information, you know that commit 436 was done after 435. Similar actions and knowledge from git requires looking at the log.
Git provides shorthand syntax to partially compensate for this by allowing you to add any number of ^ after a revision to indicate how far back to go. e8fa9c^^^..e8fa9c, for instance, would show the history for e8fa9c and it’s 3 parent revisions. (However, it does not provide any shorthand syntax for going forward in time.)
Both VC systems can deliver almost everything a modern developer need. Personally I prefer GIT as it is more intuitive (to me) and faster. I think that it is even more reliable.
What do you think? What VCS are you using?
To get even more excellent content, you can follow me on Twitter.