https://drive.google.com/drive/u/0/folders/19snI8VUmIHmxqCOv3Apewk19xD0ns929
π Date
May 8th, May 8, 2021 11:00 AM (PDT) β 12:00 PM
πΊοΈ Where
Video call link: https://meet.google.com/gny-cvaa-wyp
Format
Paper Title
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Abstract
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
π Date
May 15th, May 15, 2021 11:00 AM (PDT) β 12:00 PM
Format
Assigned Roles
Reviewer: @ashima @Arushi Rai
Archaeologist: @hari.hp12
Researcher: @Sahanave
Practitioner: @Ben Pikus [CV_MOD]
Hacker: @pcuenq
Paper Title
Space-Time Correspondence as a Contrastive Random Walk
Abstract
This paper proposes a simple self-supervised approach for learning a representation for visual correspondence from raw video. We cast correspondence as prediction of links in a space-time graph constructed from video. In this graph, the nodes are patches sampled from each frame, and nodes adjacent in time can share a directed edge. We learn a representation in which pairwise similarity defines transition probability of a random walk, so that long-range correspondence is computed as a walk along the graph. We optimize the representation to place high probability along paths of similarity. Targets for learning are formed without supervision, by cycle-consistency: the objective is to maximize the likelihood of returning to the initial node when walking along a graph constructed from a palindrome of frames. Thus, a single path-level constraint implicitly supervises chains of intermediate comparisons. When used as a similarity metric without adaptation, the learned representation outperforms the self-supervised state-of-the-art on label propagation tasks involving objects, semantic parts, and pose. Moreover, we demonstrate that a technique we call edge dropout, as well as self-supervised adaptation at test-time, further improve transfer for object-centric correspondence.
π Date
June 5th, 2021 11:00 AM - 12:00 PM PST
πΊοΈ Where
Video call link: https://meet.google.com/cnv-qsha-hak
Format
Paper Title:
Differentiable Patch Selection for Image Recognition
Abstract:
Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand. We propose a method based on a differentiable Top-K operator to select the most relevant parts of the input to efficiently process high resolution images. Our method may be interfaced with any downstream neural network, is able to aggregate information from different patches in a flexible way, and allows the whole model to be trained end-to-end using backpropagation. We show results for traffic sign recognition, inter-patch relationship reasoning, and fine-grained recognition without using object/part bounding box annotations during training.
Roles:
Reviewer - @Vivek#3397 Archaeologist - Arushi Practitioner - @Salman#6748 Researcher - @hari.hp12#5526 Hacker - @AdrianDalessandro#3778
π Date
June 12th, 2021 11:00 AM - 12:00 PM PST
πΊοΈ Where
Video call link: https://meet.google.com/cnv-qsha-hak
Format
Paper Title:
What is being transferred in transfer learning?
Abstract:
One desired capability for machines is the ability to transfer their knowledge of one domain to another where data is (usually) scarce. Despite ample adaptation of transfer learning in various deep learning applications, we yet do not understand what enables a successful transfer and which part of the network is responsible for that. In this paper, we provide new tools and analyses to address these fundamental questions. Through a series of analyses on transferring to block-shuffled images, we separate the effect of feature reuse from learning low-level statistics of data and show that some benefit of transfer learning comes from the latter. We present that when training from pre-trained weights, the model stays in the same basin in the loss landscape and different instances of such model are similar in feature space and close in parameter space.
π§Έ Roles:
Reviewer - johniac#6093 Archaeologist - AdrianDalessandro#3778 Practitioner - Arushi? Researcher - picpic#4916 Hacker - Arushi?
π Date
July 10th, 2021 11:00 AM - 12:00 PM PST
πΊοΈ Where
Video call link: meet.google.com/ayz-mmgz-fvo
Dial-in: (US) +1 669-234-8179
PIN: 195 166 859#
Format
Paper Title:
ResNet paper: Deep residual learning for image recognition [2016]
Deep Residual Learning for Image Recognition
Abstract:
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers.
The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
π§Έ Roles:
Reviewer: @picpic
Archaeologist: @johniac
Researcher:
Practitioner:
Hacker: @Luke_66
Be sure to fill out the
soon if you would like to do a role for any future session.
π Date
π Links Shared
This upcoming session we are holding a personal project workshop (not paper reading session) where members can present their current work in progress and solicit feedback! Please fill out the following form to present: https://airtable.com/shrBrhD6TSTqaLETM
π Date
πΊοΈ Where
Format
Personal Project Workshop - Members have an opportunity to present ongoing work and collect feedback.
π Date
πΊοΈ Where
Format
Paper Title:
Abstract:
π§Έ Roles:
Reviewer - @Adriano D. (he/him) [CV_MOD] Archaeologist - @jaberkow Practitioner - @Gaurav Kakoti Researcher - @Arushi Rai [CV_MOD] Hacker -
π Date
πΊοΈ Where
Format
Paper Title:
Abstract:
π§Έ Roles:
Reviewer - Archaeologist - Practitioner - Researcher - Hacker -
π Date
πΊοΈ Where