Why Version Control Matters

The software you write as a developer is often subject to frequent changes.  Bugs are discovered, features are requested, and content needs to be updated.

If you make a change to a file, and then later find out something went awry, it’s helpful to be able to go back to the previous version or get a report of what actually changed.

What about restoring from a backup?   Backups are usually performed daily and won't help us if we need to go back to this morning's version of a file.

You could use different filenames to keep multiple versions:

sales_report.html
sales_report_20170103_v1.html
sales_report_20170103_v2.html
sales_report_20170103_final.html
sales_report_20170103.html

This can get messy and difficult to manage very quickly.   This approach also fails to show you how the files are different.  Also, another person may update one of the files, and you may need to know who made the change and why.

Version control solves the problem of reviewing and retrieving previous changes and allows us to use a single filename rather than having copies of files with slightly different names for versions and dates. 

What is version control?

A version control system (VCS) is software that tracks changes to files for the purpose of retrieving specific versions later.  It's typically used to track changes to source code, but can be used for many different types of files.  You can request any version at any time, and you'll have a snapshot of the file at your fingertips.

Version control integrates work done simultaneously by different team members. In most cases, edits to different files or even the same file can be combined without losing any work. In cases where two people make conflicting changes to the same part of a file, then the version control system asks for manual intervention in reconciling the changes (there are called merge conflicts).

Version control gives access to historical versions of your project.  This is basically insurance against risks such as updates that break the app, crashes, or data loss.  If you make a mistake, you can roll back to a previous version.  You can reproduce and understand a bug report on a previous version of the software.  You can also undo specific edits without losing the work that was done in the meantime.  For any part of a file, you can determine when, why, and by whom it was ever edited.

Many people assume that version control only applies to large teams.   However, a history of all modifications is useful even if you're a solo developer.  Version control also enables you to work on a project on different computers. 

How does version control work?

Version control uses a repository to store all the changes in the files along with versioning information so it can revert back to a previous snapshot.  

Your working copy (sometimes called a checkout) is your personal copy of all the files in the project. You make arbitrary edits to this copy, without affecting your teammates. When you are finished with the edits, you commit your changes to a repository.

A commit instructs version control to update the repository with the latest version of the file(s).  Each commit corresponds to a particular version and stores references to the previously made commit. A commit message describes the changes made and includes the timestamp and author of the change. 

commit 1e22ae80f563dc594eedf18b6784c9f420ee89f0
Author: Amir Boroumand <mail@domain.com>
Date: Tue Dec 20 15:55:49 2016 -0500

Added exception handling to getPost method in Post class

The file's state can be compared to a previous version to see what has changed over time.  Other developers can then see this change.  They can also pull down the change, and the version control tool will automatically update the contents of any files that were changed.

Types of Version Control

Version control systems fall into two camps: centralized and distributed.

Centralized Version Control

Centralized version control is based on the idea of a single repository on a server and developers committing their changes to this repository directly. 

A developer does not have to keep multiple copies of files on their systems, because the version control tool can talk to the central copy and retrieve any version they need on-demand.

When working with centralized version control, your workflow will typically look like this:

  • Pull down any changes other people have made from the central server.
  • Make your changes on a working copy locally
  • Commit your changes to the central server, so the rest of the team can see them.

Some examples of centralized version control systems are CVS, Subversion (SVN), and Perforce.

Distributed Version Control

The second (more popular) type of version control is distributed.   Here, each user clones the central repository to create their own personal repository which includes a complete history of the project.

After you commit to your own repository, others won't have access to your changes until you push those changes to the central repository.  If you want to see other people's changes, you need to pull the changes into your repository first.

Notice that the commit and update commands only move changes between the working copy and the local repository, without affecting any other repository.  By contrast, the push and pull commands move changes between the local repository and the central repository, without affecting your working copy.

Now the workflow will look like this:

  • You create a local copy of the entire project from the central repository (called a clone)
  • You works on a features and commits the necessary changes to your own repository
  • When finished, you push your changes from your repository to the central repository
  • The other developers can now perform the pull & update steps to see your changes in their repository.

Here are some advantages of distributed version control:

  • It works offline - You only need to be online to push your changes.  Otherwise, you are free to work in your repository even if you're on an airplane or the central repository is down.
  • It’s fast - Most operations are done locally so you're not relying on a flaky network connection to commit your changes.
  • No single point of failure -  Since every developer on the team has a full backup of the project data, losing the repository server is a minor inconvenience.  Any team member can push to a new server and the whole team can be easily up and running in a matter of minutes.

Some examples of distributed version control systems are Git and Mercurial.

Which version control system should I use?

Any version control system is better than none.  

Today, Git holds a commanding share of the VCS market based on Google Trends data.   However, Apache Subversion and Mercurial are still used in many environments.

For new projects, I recommend Git because of its widespread adoption and its rich feature set. 

Stay tuned for a future post where I cover the basics of getting started with Git.