Activation maps

This notebook shows how to use osculari to obtain activation maps for a network at multiple layers. This technique is useful for several further analyses including representational-similarity-analysis (RSA).

If you are running this notebook on Google Colab, install osculari by uncommenting and executing the cell below.

# !pip install osculari

# importing required packages
from osculari import models

import numpy as np
import requests
from matplotlib import pyplot as plt
from PIL import Image as pil_image
import torch
import torchvision.transforms as torch_transforms

AlexNet

The models.ActivationLoader class allows a simple way to load activations for one or several layers of a network. The ActivationLoader class requires the following arguments:

architecture is the network’s architecture you want to load (e.g. resnet50 or vit_b_32). It should be one of the items from the available models we mentioned above.
weights defines the pretrained weights. It can be one of the following formats:
- Path to a local file.
- Downloadable URL of the pretrained weights.
- A string corresponding to the available weight, for instance, PyTorch resnet50 supports one of the following strings: [”DEFAULT”, “IMAGENET1K_V1”, “IMAGENET1K_V2”].
- The same name as architecture, which loads the network’s default weights.
layers determines the read-out (cut-off) layer(s). Which layers are available for each network can be obtained by calling the models.available_layers() function.

In this example, we obtain activation maps of all AlexNet layers.

architecture = 'alexnet'         # networks' architecture
weights = 'alexnet'              # the pretrained weights
readout_kwargs = {               # parameters for loading activations from the pretrained network
    'architecture': architecture, 
    'weights': weights,
    'layers': models.available_layers(architecture)
}
activation_loader = models.ActivationLoader(**readout_kwargs)

# reading an image
url = 'https://github.com/pytorch/hub/raw/master/images/dog.jpg'
input_img = pil_image.open(requests.get(url, stream=True).raw)

img_size = 224
mean, std = activation_loader.normalise_mean_std
# converting it to torch tensor
transforsm = torch_transforms.Compose([
    torch_transforms.Resize((img_size, img_size)),
    torch_transforms.ToTensor(),
    torch_transforms.Normalize(mean=mean, std=std)
])
torch_img = torch.stack([transforsm(input_img)])
print('Shape of the input image:', torch_img.shape)

# visualising the image that will be input to the network
img_vis = torch_img.numpy().squeeze().transpose(1, 2, 0) * std + mean
img_vis = np.maximum(np.minimum(img_vis, 1), 0)
plt.imshow(img_vis)
plt.axis('off')
plt.show()

Shape of the input image: torch.Size([1, 3, 224, 224])

../_images/3f0125258d834ba7d185feea587c0da503c1da52fe462a7148d2e46a615b8914.png

We can now load activation maps for our image. Note that activation_loader is like any other torch.nn.Module and is callable. In this exampple, we input the network with one image, but multiple images can also be input.

# loading the activation maps
activation_maps = activation_loader(torch_img)
print('Layers whose activation maps is stored with corresponding size:')
for layer_name, activations in activation_maps.items():
    print('\tLayer: %s' % layer_name, '\tshape:', activations.shape)

Layers whose activation maps is stored with corresponding size:
	Layer: feature0 	shape: torch.Size([1, 64, 55, 55])
	Layer: feature1 	shape: torch.Size([1, 64, 55, 55])
	Layer: feature2 	shape: torch.Size([1, 64, 27, 27])
	Layer: feature3 	shape: torch.Size([1, 192, 27, 27])
	Layer: feature4 	shape: torch.Size([1, 192, 27, 27])
	Layer: feature5 	shape: torch.Size([1, 192, 13, 13])
	Layer: feature6 	shape: torch.Size([1, 384, 13, 13])
	Layer: feature7 	shape: torch.Size([1, 384, 13, 13])
	Layer: feature8 	shape: torch.Size([1, 256, 13, 13])
	Layer: feature9 	shape: torch.Size([1, 256, 13, 13])
	Layer: feature10 	shape: torch.Size([1, 256, 13, 13])
	Layer: feature11 	shape: torch.Size([1, 256, 13, 13])
	Layer: feature12 	shape: torch.Size([1, 256, 6, 6])
	Layer: classifier1 	shape: torch.Size([1, 4096])
	Layer: classifier2 	shape: torch.Size([1, 4096])
	Layer: classifier4 	shape: torch.Size([1, 4096])
	Layer: classifier5 	shape: torch.Size([1, 4096])
	Layer: fc 	shape: torch.Size([1, 1000])

From the print above, we can see the layers whose activation maps are loaded and the size of the activation maps:

The first dimension corresponds to batch size. This example is “1” because we have input the network with one image.
The featureX layers have three numbers: the number of kernels, spatial width, and spatial height.
The classifierX layers are only one-dimensional vectors.

Visualising activations

Let’s visualise the activation maps of twelve kernels in each layer.

for layer_name, activations in activation_maps.items():
    if len(activations.shape) <= 2:
        continue
    fig = plt.figure(figsize=(16, 2))
    for i in range(12):
        ax = fig.add_subplot(1, 12, i+1)
        ax.matshow(activations[0, i].detach().numpy(), cmap='gray')
        ax.axis('off')
    fig.suptitle('Layer: %s' % layer_name, fontsize=24)
    fig.tight_layout()

../_images/ed28b3441c575bb8f475245123917754e7dfc418ec88efc991cc10f4c474203b.png

../_images/57ca3ef6f0991160fdee927f32756252076cc74eb46683f838a83cd450a668a3.png

../_images/dd97a78036e1909ea3b6a0c96c26ab2844fee8ae4621beb67c746e6fc591e98e.png

../_images/09626991c2ac0725b8c68aeadd2fe34d3adafb015704689c156a17d7fe86f93f.png

../_images/023056f7a5814123584d13e07f10ffabd5ac58eac1591b00caecc0e08476ad88.png

../_images/5787932f2dc274a4a26fa85f90a7530cf32671c0c3768ba8f8b28597dd02e59b.png

../_images/88edd7704f0029dc4ba9b8ceeb26394b55cec0dbfd2e4c6acba29e46c6bab223.png

../_images/328ffaeafaf38791543fe0c983d38a5f4e22a260128920b75617b9369442b855.png

../_images/e7e588f92f2aa57d7dffc4eff0ad02f476754cd4f213740f4b08a28fed34491b.png

../_images/2893d829d8101ddbb610febab6cda941234e4f2fa17069463276b62bb03dc4fa.png

../_images/2c36a3286eef5fea188bd5e156fc15fa29ad85ec4ddf54f62ebd529e8cb8f920.png

../_images/97aebf0d3ce1421e0c339282636b6953f7a8d70e36d20f4a24b4b46f5d571d9c.png

../_images/7ba471a5025a5f320c0394a45f6b837ad098a9682868ddf83fc3d8e693fcccd2.png

Vision Transformer

Let’s look at the activation maps of the vision transformers.

architecture = 'vit_b_32'        # networks' architecture
weights = 'vit_b_32'             # the pretrained weights
readout_kwargs = {               # parameters for loading activations from the pretrained network
    'architecture': architecture, 
    'weights': weights,
    'layers': models.available_layers(architecture)
}
activation_loader = models.ActivationLoader(**readout_kwargs)

# loading the activation maps
activation_maps = activation_loader(torch_img)
print('Layers whose activation maps is stored with corresponding size:')
for layer_name, activations in activation_maps.items():
    print('\tLayer: %s' % layer_name, '\tshape:', activations.shape)

Layers whose activation maps is stored with corresponding size:
	Layer: conv_proj 	shape: torch.Size([1, 768, 7, 7])
	Layer: block0 	shape: torch.Size([1, 50, 768])
	Layer: block1 	shape: torch.Size([1, 50, 768])
	Layer: block2 	shape: torch.Size([1, 50, 768])
	Layer: block3 	shape: torch.Size([1, 50, 768])
	Layer: block4 	shape: torch.Size([1, 50, 768])
	Layer: block5 	shape: torch.Size([1, 50, 768])
	Layer: block6 	shape: torch.Size([1, 50, 768])
	Layer: block7 	shape: torch.Size([1, 50, 768])
	Layer: block8 	shape: torch.Size([1, 50, 768])
	Layer: block9 	shape: torch.Size([1, 50, 768])
	Layer: block10 	shape: torch.Size([1, 50, 768])
	Layer: block11 	shape: torch.Size([1, 50, 768])
	Layer: fc 	shape: torch.Size([1, 1000])

From the print above, we can see the layers whose activation maps are loaded and the size of the activation maps:

The first dimension corresponds to batch size. This example is “1” because we have input the network with one image.
The conv_proj contains 768 kernels with spatial resolution \(7 \times 7\).
The blockX layers are all a matrix of 50-by-768 elements. The first element is the “[class] embedding” and the other 49 correspond to the 7-by-7 position embedding of patches.
The fc layer is a one-dimensional vector of 1000 elements.