Materials and Recordings

Former Session Metadata:

πŸ“… Date

May 8th, May 8, 2021 11:00 AM (PDT) β†’ 12:00 PM

πŸ—ΊοΈ Where

Video call link:


Role Playing

Paper Title

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale


While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

πŸ“… Date

May 15th, May 15, 2021 11:00 AM (PDT) β†’ 12:00 PM


Role Playing

Assigned Roles

Reviewer: @ashima @Arushi Rai

Archaeologist: @hari.hp12

Researcher: @Sahanave

Practitioner: @Ben Pikus [CV_MOD]

Hacker: @pcuenq

Paper Title

Space-Time Correspondence as a Contrastive Random Walk


This paper proposes a simple self-supervised approach for learning a representation for visual correspondence from raw video. We cast correspondence as prediction of links in a space-time graph constructed from video. In this graph, the nodes are patches sampled from each frame, and nodes adjacent in time can share a directed edge. We learn a representation in which pairwise similarity defines transition probability of a random walk, so that long-range correspondence is computed as a walk along the graph. We optimize the representation to place high probability along paths of similarity. Targets for learning are formed without supervision, by cycle-consistency: the objective is to maximize the likelihood of returning to the initial node when walking along a graph constructed from a palindrome of frames. Thus, a single path-level constraint implicitly supervises chains of intermediate comparisons. When used as a similarity metric without adaptation, the learned representation outperforms the self-supervised state-of-the-art on label propagation tasks involving objects, semantic parts, and pose. Moreover, we demonstrate that a technique we call edge dropout, as well as self-supervised adaptation at test-time, further improve transfer for object-centric correspondence.

πŸ“… Date

June 5th, 2021 11:00 AM - 12:00 PM PST

πŸ—ΊοΈ Where

Video call link:


Copy of Role Playing

Paper Title:

Differentiable Patch Selection for Image Recognition


Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand. We propose a method based on a differentiable Top-K operator to select the most relevant parts of the input to efficiently process high resolution images. Our method may be interfaced with any downstream neural network, is able to aggregate information from different patches in a flexible way, and allows the whole model to be trained end-to-end using backpropagation. We show results for traffic sign recognition, inter-patch relationship reasoning, and fine-grained recognition without using object/part bounding box annotations during training.


Reviewer - @Vivek#3397 Archaeologist - Arushi Practitioner - @Salman#6748 Researcher - @hari.hp12#5526 Hacker - @AdrianDalessandro#3778

πŸ“… Date

June 12th, 2021 11:00 AM - 12:00 PM PST

πŸ—ΊοΈ Where

Video call link:


Copy of Role Playing

Paper Title:

What is being transferred in transfer learning?


One desired capability for machines is the ability to transfer their knowledge of one domain to another where data is (usually) scarce. Despite ample adaptation of transfer learning in various deep learning applications, we yet do not understand what enables a successful transfer and which part of the network is responsible for that. In this paper, we provide new tools and analyses to address these fundamental questions. Through a series of analyses on transferring to block-shuffled images, we separate the effect of feature reuse from learning low-level statistics of data and show that some benefit of transfer learning comes from the latter. We present that when training from pre-trained weights, the model stays in the same basin in the loss landscape and different instances of such model are similar in feature space and close in parameter space.

🧸 Roles:

Reviewer - johniac#6093 Archaeologist - AdrianDalessandro#3778 Practitioner - Arushi? Researcher - picpic#4916 Hacker - Arushi?

πŸ“… Date

July 10th, 2021 11:00 AM - 12:00 PM PST

πŸ—ΊοΈ Where

Video call link:

Dial-in:Β (US)Β +1 669-234-8179

PIN:Β 195 166 859#


Copy of Role Playing

Paper Title:

ResNet paper: Deep residual learning for image recognition [2016]

Deep Residual Learning for Image Recognition


Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers.

The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

🧸 Roles:

Reviewer: @picpic

Archaeologist: @johniac



Hacker: @Luke_66

Be sure to fill out the

Copy of Role Assignment Form

soon if you would like to do a role for any future session.

πŸ“… Date

July 17th, 2021 11:00 AM - 12:00 PM PST

πŸ”— Links Shared

This upcoming session we are holding a personal project workshop (not paper reading session) where members can present their current work in progress and solicit feedback! Please fill out the following form to present:

πŸ“… Date

July 17th and July 24th, 2021 11:00 AM - 12:00 PM PST

πŸ—ΊοΈ Where

Video call link:


Personal Project Workshop - Members have an opportunity to present ongoing work and collect feedback.

Copy of Lineup

πŸ“… Date

August 7th, 2021 11:00 AM - 12:00 PM PST

πŸ—ΊοΈ Where

Video call link:


Copy of Role Playing

Paper Title:

Barlow Twins: Self-Supervised Learning via Redundancy Reduction


Self-supervised learning (SSL) is rapidly closing the gap with supervised methods on large computer vision benchmarks. A successful approach to SSL is to learn embeddings which are invariant to distortions of the input sample. However, a recurring issue with this approach is the existence of trivial constant solutions. Most current methods avoid such solutions by careful implementation details. We propose an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. This causes the embedding vectors of distorted versions of a sample to be similar, while minimizing the redundancy between the components of these vectors. The method is called Barlow Twins, owing to neuroscientist H. Barlow's redundancy-reduction principle applied to a pair of identical networks. Barlow Twins does not require large batches nor asymmetry between the network twins such as a predictor network, gradient stopping, or a moving average on the weight updates. Intriguingly it benefits from very high-dimensional output vectors. Barlow Twins outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime, and is on par with current state of the art for ImageNet classification with a linear classifier head, and for transfer tasks of classification and object detection.

🧸 Roles:

Reviewer - @Adriano D. (he/him) [CV_MOD] Archaeologist - @jaberkow Practitioner - @Gaurav Kakoti Researcher - @Arushi Rai [CV_MOD] Hacker -

πŸ“… Date

August 14th, 2021 11:00 AM - 12:00 PM PST

πŸ—ΊοΈ Where

Video call link:
