All Articles

Intro To PyTorch

2018 has been a revolutionary year to the field of Deep Learning. Especially with the release of various libraries and numerous features in existing libraries. Let us quickly go through some current Deep Learning Libraries: TensorFlow, PyTorch, Apache MXNet, Chainer, TuriCreate, and CNTK. There are also some wrappers written around these libraries to simplify the use and creation of deep learning architectures. Some of those wrappers include Keras and

My favorite part is the comfort these libraries provided to make the process of deep learning architecture design, multi-GPU/distributed training, easing the creation of custom layers and custom loss functions, model-zoo with a pool of pre-trained models, and support for converting the trained models from library specific to architecture specific {iOS, Android, and Raspberry Pi} models.

It looks like we have a lot to cover to keep ourselves on toes and meet the pace of current market trends in Deep Learning and I, after a long gap, would like to start this with an introduction to PyTorch. The contents of this blog will be as follows:

  • Environment Setup
  • Introductory (modular) code as a skeleton for DL applications.
  • Data Loading
  • Model Training and Testing
  • Visualization
  • Model Saving and Loading

For simplicity, we are covering only a subset of things in this blog to get your wheels rolling into the field. And we are aiming to get a high-level overview which can eventually help you to design your own Deep Learning ToolKit.

Environment Setup

There are many ways to set up your Deep Learning environment –

  • Cloud VMs – Microsoft Azure, Google Cloud or Amazon’s AWS provide specialized VMs with pre-configured tools and libraries that can help you with your DL/ML journey.
  • Custom Setup – You can either have a local machine with decent GPUs or a cloud VM (with OS of your choice), and setup your deep learning environment by yourself using either virtual-env or Anaconda.

My personal choice of environment setup is by using Anaconda (for both Cloud VM and Custom Setup). You can download and install Anaconda using the following link – After downloading and installing Anaconda, you can setup your working environment using the following commands –

$ conda create -n oml python=3.6
$ conda activate oml
(oml) $ conda install pytorch torchvision -c pytorch
(oml) $ conda install tensorflow-gpu
(oml) $ conda install tensorboardX

Skeleton Code for DL applications

Long gone are the days where we create a single python script to create our Deep Learning models. To efficiently train large architectures on larger datasets, it will be less painful if we follow a modular code pattern. A lot of open-source contributors are already following a specific pattern which can look overwhelming to the newcomers. For simplicity, I would like to utilize this section on how I organize my projects related to Deep Learning. Please note that it is up to the individual’s choice to organize their projects and I am just providing the most optimal structure for my workflow. Here is a skeleton of my project structure –

├── (Entry point to the application)
├──  (Net class for init/train/test DL models)
├── models/ (Directory containing different DL arch.)
|   ├──
|   ├── <model_name>.py
|   └── ...
├── loaders/ (Directory containing data/model loaders)
|   ├──
|   ├── (DataLoader class for loading data)
|   └── (ModelLoader class for loading model)
├── LICENSE  (License of your choice)
└── documentation for Setupundefined Running & Results)

The code for this repo can be seen at – LINK. The contains the entry point for the entire pipeline with necessary arguments to train/infer the Deep Learning models. The file has the Net class which uses one of the arguments to load the dataset and model structure for further computations.

Data Loading

Loading data efficiently for training and testing used to be a large hassle once upon a time. In PyTorch, loading and handling data has become easy by using the Torch also supports a lot of popular Datasets for easily loading through torchvision.datasets. A sample code for loading MNIST data can be written as follows –

def loadMNIST(self, args):
    self.train_loader =, 
                                                    train=True, download=True,
                                                    transforms.Normalize((0.1307,), (0.3081,))
                                                    ])), batch_size=args.train_batch_size, shuffle=True, **self.kwargs)
    self.test_loader =, 
                                                    transforms.Normalize((0.1307,), (0.3081,))
                                                    ])), batch_size=args.test_batch_size, shuffle=True, **self.kwargs)

Model Training and Testing

The contains the helper code for training and testing the initialized model. This is the place where we initialize the required model, load the datasets required for training and testing, loads the necessary optimizer, and load/save models. One interesting thing to consider is the _build_model(self) method. For training, we can either load the pretrained model or start training from scratch. And while loading the model, we can either choose to run the training/inference of it from the GPU (or) CPU. If multiple GPUs are available, PyTorch’s nn.DataParallel can help with easy multi-GPU training.

def _build_model(self):
    # Load the model
    _model_loader = ModelLoader(self.args)
    self.model = _model_loader.model
    # If continue_train, load the pre-trained model
    if self.args.phase == 'train':
        if self.args.continue_train:
    # If multiple GPUs are available, automatically include DataParallel
    if self.args.multi_gpu and torch.cuda.device_count() > 1:
        self.model = nn.DataParallel(self.model)


While training large deep neural networks, it will be helpful to visualize the loss, accuracy and other important metrics that can help us to debug our networks. TensorBoard can really come into handy for this. The interesting fact is we can integrate TensorBoard into our DL pipeline made using PyTorch with the help of TensorBoardX. And the code for integrating it is as easy as –

# Initialize summary writer
self.writer = SummaryWriter('runs/{}'.format(self.args.data_name))
# Add the values to Summary Writer
self.writer.add_scalar('train/loss', loss.item(), self.args.iter_count)

You can start the TensorBoard session and run the training by using the following command –

(oml) $ tensorboard --logdir=./runs/ --host --port 6007 & python --phase train --continue_train 0

# Go to http://localhost:6007 to see the results.


Save/Load Models

It is important to save your models periodically in the middle of your training. Longer training hours may lead to some unexpected OS errors and out-of-memory errors. Saving trained models periodically can save a lot of time and resources that are invested in your training phase. PyTorch model saving and loading is as easy as –

# Save the model, model_filename)
# Load the (state_dict to) model

The source code for this blog is made open-source so that other DL enthusiasts can use this as a primer for their DL related projects involving PyTorch.