Work Experience

  • Present 05/2022

    Research Intern

    Google - Tensorflow Model Garden Team, Mountain View, CA

  • 05/2022 01/2022

    Research Assistant

    University of Texas at Dallas

  • 12/2021 08/2021

    Teaching Assistant

    University of Texas at Dallas

  • 08/2021 05/2021

    AI Research Intern

    NVIDIA Autonomous Vehicles Team, Santa Clara, CA

  • 05/2021 01/2021

    Teaching Assistant

    University of Texas at Dallas

  • 01/2021 05/2020

    AI Research Intern

    NVIDIA Autonomous Vehicles Team, Santa Clara, CA

  • 05/2020 08/2019

    Teaching Assistant

    University of Texas at Dallas

  • 08/2019 03/2019

    Research Assistant

    Advisor: Prof. Stefanos Nikolaidis, ICAROS lab, University of Southern California

  • 12/2018 04/2018

    Machine Learning Engineer

    AitoeLabs, India

  • 06/2018 12/2017

    Research Intern

    Advisor: Prof. Ganesh Ramakirshnan, Indian Institute of Technology, Bombay, India

  • 07/2017 05/2017

    Eklavya Research Intern

    Advisor: Prof. Deepak B. Phatak, Indian Institute of Technology, Bombay, India

  • 01/2016 12/2015

    Research Intern

    Tata Consultancy Services, Innovation Labs, Mumbai, India

Education

  • Doctor of Philosophy 2023

    Ph.D. in Computer Engineering

    University of Texas at Dallas

  • Master of Science 2019

    MS in Computer Science

    University of Southern California

    Transferred to University of Texas at Dallas for research in optimization and data subset selection

  • Bachelor of Technology 2018

    B.Tech in Computer Science and Engineering

    SGGS Institute of Engineering and Technology

    (An Autonomous Institute of Government of Maharashtra)

  • Senior Secondary2014

    All India Senior School Certificate Exam

    M.H High School, Thane

Achievements and Awards

  • Apr 2022
    Jan Van der Ziel Fellowship
    A prestigious merit-based fellowship awarded to one Ph.D. student at the University of Texas at Dallas.
  • Apr 2022
    Runner Up at the UT Dallas Three Minute Thesis (3MT) competition
    The 3MT competition was started by The University of Queensland, which encourages PhD students to present their research to a general audience in under 3 minutes using just one PowerPoint slide!
  • Apr 2018
    Best Student Award
    Awarded the title for being the most outstanding student by Tata Consultancy Services.
  • Apr 2018
    Best Project Award
    Awarded the title by Tata Consultancy Services for the work done for Indian Space Reseach Organisation (ISRO) on 'Content Based Image Retrieval from AWiFS Images Repository of IRS Resourcesat-2 Satellite Based on Water Bodies and Burnt Areas'.
  • Dec 2017
    Honorable Mention at ACM ICPC 2017
    Qualified for ACM ICPC 2017 Regionals
  • Aug 2017
    Departmental Gold Medalist - Junior Year
    Ranked 1/160 in Computer science & Engineering department in Junior year.

Projects and Open-Source Contributions

  • DISTIL: Deep dIverSifed inTeractIve Learning

    DISTIL implements a number of state-of-the-art active learning algorithms.

  • TRUST: TaRgeted sUbSet selecTion

    TRUST supports a number of algorithms for targeted selection which provides a mechanism to include additional information via data to priortize the semantics of the selection.

  • Vis-DSS: Visual Data Selection and Summarization

    An open-source toolkit for Image and Video Summarization, Data Subset Selection and Diversified Active Learning using Submodular functions.

  • Jensen: Convex Optimization and ML toolkit

    A C++ toolkit with API support for Convex Optimization and Machine Learning.

  • Massive scale search and recognition (Bhopal Police, Madhya Pradesh, India)

    Implementing person search, face search, face recognition and text search on thousands of hours of footage from surveillance cameras for the police department in Bhopal.

  • Compliance and Quality Monitoring System (Ministry of Rural Development)

    Led a team of four people to develop a product for the Ministry of Rural Development. The product comprised of four clasroom compliances that enabled the user to monitor the duration for which the classes were actually conducted during a day, the attendance in each class, the instructor who taught the class and the number of people wearing uniform.

  • CBIR on AWiFS Data from Large Satellite Image Repository (Indian Space Research Organization)

    Worked extensively on the Analytics including Machine Learning, Feature Engineering and Code Optimization to identify Water bodies and Burnt Areas from the satellite image based on multiple algorithms.


TALISMAN: Targeted Active Learning for Object Detection with Rare Classes and Slices using Submodular Mutual Information

Suraj Kothawade, Saikat Ghosh, Sumit Shekhar, Yu Xiang, and Rishabh Iyer
Preprint arXiv preprint arXiv:2010.08593.

Abstract

Deep neural networks based object detectors have shown great success in a variety of domains like autonomous vehicles, biomedical imaging, etc. It is known that their success depends on a large amount of data from the domain of interest. While deep models often perform well in terms of overall accuracy, they often struggle in performance on rare yet critical data slices. For example, data slices like "motorcycle at night" or "bicycle at night" are often rare but very critical slices for self-driving applications and false negatives on such rare slices could result in ill-fated failures and accidents. Active learning (AL) is a well-known paradigm to incrementally and adaptively build training datasets with a human in the loop. However, current AL based acquisition functions are not well-equipped to tackle real-world datasets with rare slices, since they are based on uncertainty scores or global descriptors of the image. We propose TALISMAN, a novel framework for Targeted Active Learning or object detectIon with rare slices using Submodular MutuAl iNformation. Our method uses the submodular mutual information functions instantiated using features of the region of interest (RoI) to efficiently target and acquire data points with rare slices. We evaluate our framework on the standard PASCAL VOC07+12 and BDD100K, a real-world self-driving dataset. We observe that TALISMAN outperforms other methods by in terms of average precision on rare slices, and in terms of mAP.

PROBE: Deep Submodular Networks for Subset Selection

Suraj Kothawade, and Rishabh Iyer
Preprint arXiv preprint arXiv:2010.08593.

Abstract

Deep Models are increasingly becoming prevalent in summarization problems (e.g. document, video and images) due to their ability to learn complex feature interactions and representations. However, they do not model characteristics such as diversity, representation, and coverage, which are also very important for summarization tasks. On the other hand, submodular functions naturally model these characteristics because of their diminishing returns property. Most approaches for modelling and learning submodular functions rely on very simple models, such as weighted mixtures of submodular functions. Unfortunately, these models only learn the relative importance of the different submodular functions (such as diversity, representation or importance), but cannot learn more complex feature representations, which are often required for state-of-the-art performance. We propose Deep Submodular Networks (DSN), an end-to-end learning framework that facilitates the learning of more complex features and richer functions, crafted for better modelling of all aspects of summarization. The DSN framework can be used to learn features appropriate for summarization from scratch. We demonstrate the utility of DSNs on both generic and query focused image-collection summarization, and show significant improvement over the state-of-the-art. In particular, we show that DSNs outperform simple mixture models using off the shelf features. Secondly, we also show that just using four submodular functions in a DSN with end-to-end learning performs comparably to the state-of-the-art mixture model with a hand-crafted set of 594 components and outperforms other methods for image collection summarization.

AUTO-DISCERN: Autonomous Driving Using Common Sense Reasoning

Suraj Kothawade, Vinaya Khandelwal, Kinjal Basu, Huaduo Wang, Gopal Gupta
Preprint arXiv preprint arXiv:2110.13606.

Abstract

Driving an automobile involves the tasks of observing surroundings, then making a driving decision based on these observations (steer, brake, coast, etc.). In autonomous driving, all these tasks have to be automated. Autonomous driving technology thus far has relied primarily on machine learning techniques. We argue that appropriate technology should be used for the appropriate task. That is, while machine learning technology is good for observing and automatically understanding the surroundings of an automobile, driving decisions are better automated via commonsense reasoning rather than machine learning. In this paper, we discuss (i) how commonsense reasoning can be automated using answer set programming (ASP) and the goal-directed s(CASP) ASP system, and (ii) develop the AUTO-DISCERN system using this technology for automating decision-making in driving. The goal of our research, described in this paper, is to develop an autonomous driving system that works by simulating the mind of a human driver. Since driving decisions are based on human-style reasoning, they are explainable, their ethics can be ensured, and they will always be correct, provided the system modeling and system inputs are correct.

PRISM: A Rich Class of Parameterized Submodular Information Measures for Guided Subset Selection

Suraj Kothawade, Vishal Kaushal, Ganesh Ramakrishnan, Jeff Bilmes, Rishabh Iyer
Conference PaperTo Appear in the Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022.

Abstract

With ever-increasing dataset sizes, subset selection techniques are becoming increasingly important for a plethora of tasks. It is often necessary to guide the subset selection to achieve certain desiderata, which includes focusing or targeting certain data points, while avoiding others. Examples of such problems include: i)targeted learning, where the goal is to find subsets with rare classes or rare attributes on which the model is under performing, and ii)guided summarization, where data (e.g.,image collection, text, document or video) is summarized for quicker human consumption with specific additional user in-tent. Motivated by such applications, we present PRISM, a rich class of PaRameterIzed Submodular information Measures. Through novel functions and their parameterizations, PRISM offers a variety of modeling capabilities that enable a trade-off between desired qualities of a subset like diversity or representation and similarity/dissimilarity with a set of data points. We demonstrate how PRISM can be applied to the two real-world problems mentioned above, which require guided subset selection. In doing so, we show that PRISM interestingly generalizes some past work, therein reinforcing its broad utility. Through extensive experiments on diverse datasets, we demonstrate the superiority of PRISM over the state-of-the-art in targeted learning and in guided image-collection summarization.

SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios

Suraj Kothawade, Nathan Beck, Krishnateja Killamsetty, Rishabh Iyer
Conference PaperTo Appear at the 35th Conference on Neural Information Processing Systems, NeurIPS 2021.

Abstract

Active learning has proven to be useful for minimizing labeling costs by selecting the most informative samples. However, existing active learning methods do not work well in realistic scenarios such as imbalance or rare classes, out-of-distribution data in the unlabeled set, and redundancy. In this work, we propose SIMILAR (Submodular Information Measures based actIve LeARning), a unified active learning framework using recently proposed submodular information measures (SIM) as acquisition functions. We argue that SIMILAR not only works in standard active learning, but also easily extends to the realistic settings considered above and acts as a one-stop solution for active learning that is scalable to large real-world datasets. Empirically, we show that SIMILAR significantly outperforms existing active learning algorithms by as much as ~5% - 18% in the case of rare classes and ~5% - 10% in the case of out-of-distribution data on several image classification tasks like CIFAR-10, MNIST, and ImageNet.

Robotic Lime Picking by Considering Leaves as Permeable Obstacles

Heramb Nemlekar, Ziang Liu, Suraj Kothawade, Sherdil Niyaz, Barath Raghavan, Stefanos Nikolaidis
Conference Paper International Conference on Intelligent Robots and Systems (IROS 2021).

Abstract

The problem of robotic lime picking is challenging; lime plants have dense foliage which makes it difficult for a robotic arm to grasp a lime without coming in contact with leaves. Existing approaches either do not consider leaves, or treat them as obstacles and completely avoid them, often resulting in undesirable or infeasible plans. We focus on reaching a lime in the presence of dense foliage by considering the leaves of a plant as 'permeable obstacles' with a collision cost. We then adapt the rapidly exploring random tree star (RRT*) algorithm for the problem of fruit harvesting by incorporating the cost of collision with leaves into the path cost. To reduce the time required for finding low-cost paths to goal, we bias the growth of the tree using an artificial potential field (APF). We compare our proposed method with prior work in a 2-D environment and a 6-DOF robot simulation. Our experiments and a real-world demonstration on a robotic lime picking task demonstrate the applicability of our approach.

Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision

Vishal Kaushal, Rishabh Iyer, Suraj Kothawade, et al.
Conference PaperIEEE Winter Conference on Applications of Computer Vision (WACV) 2019

Abstract

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around times. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges. A special class of subset selection functions naturally model notions of diversity, coverage and representation and can be used to eliminate redundancy thus lending themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work, we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Dispersion models for training-data subset selection and reducing labeling effort. We demonstrate this across the board for a variety of computer vision tasks including Gender Recognition, Face Recognition, Scene Recognition, Object Detection and Object Recognition. Our results show that diversity based subset selection done in the right way can increase the accuracy by upto 5 - 10% over existing baselines, particularly in settings in which less training data is available. This allows the training of complex machine learning models like Convolutional Neural Networks with much less training data and labeling costs while incurring minimal performance loss.

A Framework towards Domain Specific Video Summarization

Vishal Kaushal, Sandeep Subramanian, Suraj Kothawade, Rishabh Iyer, Ganesh Ramakrishnan
Conference PaperIEEE Winter Conference on Applications of Computer Vision (WACV) 2019

Abstract

In the light of exponentially increasing video content, video summarization has attracted a lot of attention recently due to its ability to optimize time and storage. Characteristics of a good summary of a video depend on the particular domain under question. We propose a novel framework for domain specific video summarization. Given a video of a particular domain, our system can produce a summary based on what is important for that domain in addition to possessing other desired characteristics like representativeness, coverage, diversity etc. as suitable to that domain. Past related work has focused either on using supervised approaches for ranking the snippets to produce summary or on using unsupervised approaches of generating the summary as a subset of snippets with the above characteristics. We look at the joint problem of learning domain specific importance of segments as well as the desired summary characteristic for that domain. Our studies show that the more efficient way of incorporating domain specific relevances into a summary is by obtaining ratings of shots as opposed to binary inclusion/exclusion information. We also argue that ratings can be seen as unified representation of all possible ground truth summaries of a video, taking us one step closer in dealing with challenges associated with multiple ground truth summaries of a video. We also propose a novel evaluation measure which is more naturally suited in assessing the quality of video summary for the task at hand than F1 like measures. It leverages the ratings information and is richer in appropriately modeling desirable and undesirable characteristics of a summary. Lastly, we release a gold standard dataset for furthering research in domain specific video summarization, which to our knowledge is the first dataset with long videos across several domains with rating annotations.

Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity, Representation, Coverage and Importance

Vishal Kaushal, Rishabh Iyer, Suraj Kothawade, et al.
Conference PaperIEEE Winter Conference on Applications of Computer Vision (WACV) 2019

Abstract

This paper addresses automatic summarization of videos in a unified manner. In particular, we propose a framework for multi-faceted summarization for extractive, query base and entity summarization (summarization at the level of entities like objects, scenes, humans and faces in the video). We investigate several summarization models which capture notions of diversity, coverage, representation and importance, and argue the utility of these different models depending on the application. While most of the prior work on submodular summarization approaches has focused oncombining several models and learning weighted mixtures, we focus on the explainability of different models and featurizations, and how they apply to different domains. We also provide implementation details on summarization systems and the different modalities involved. We hope that the study from this paper will give insights into practitioners to appropriately choose the right summarization models for the problems at hand.

Learning Collaborative Action Plans from YouTube Videos

Hejia Zhang, Po-Jen Lai, Sayan Paul, Suraj Kothawade and Stefanos Nikolaidis
Conference PaperInternational Symposium on Robotics Research (ISRR) 2019

Abstract

Videos from the World Wide Web provide a rich source of information that robots could use to acquire knowledge about manipulation tasks. Previous work has focused on generating action sequences from unconstrained videos for a single robot performing manipulation tasks by itself. However, robots operating in the same physical space with people need to not only perform actions autonomously, but also coordinate seamlessly with their human counterparts. This often requires representing and executing collaborative manipulation actions, such as handing over a tool or holding an object for the other agent. We present a system for knowledge acquisition of collaborative manipulation action plans that outputs commands to the robot in the form of visual sentence. We show the performance of the system in 12 unlabeled action clips taken from collaborative cooking videos on YouTube. We view this as the first step towards extracting collaborative manipulation action sequences from unconstrained, unlabeled online videos

Object Level Targeted Selection using Deep Template Matching

Suraj Kothawade, Donna Roy, Michele Fenzi, Elmar Haussman, Jose M. Alvarez, and Christoph Angerer.
Workshop Paper Machine Learning for Autonomous Driving Workshop at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

Abstract

Retrieving images with objects that are semantically similar to objects of interest (OOI) in a query image has many practical use cases. A few examples include fixing failures like false negatives/positives of a learned model or mitigating class imbalance in a dataset. The targeted selection task requires finding the relevant data from a large-scale pool of unlabeled data. Manual mining at this scale is infeasible. Further, the OOI are often small and occupy less than 1\% of image area, are occluded, and co-exist with many semantically different objects in cluttered scenes. Existing semantic image retrieval methods often focus on mining for larger sized geographical landmarks, and/or require extra labeled data, such as images/image-pairs with similar objects, for mining images with generic objects. We propose a fast and robust template matching algorithm in the DNN feature space, that retrieves semantically similar images at the object-level from a large unlabeled pool of data. We project the region(s) around the OOI in the query image to the DNN feature space for use as the template. This enables our method to focus on the semantics of the OOI without requiring extra labeled data. In the context of autonomous driving, we evaluate our system for targeted selection by using failure cases of object detectors as OOI. We demonstrate its efficacy on a large unlabeled dataset with 2.2M images and show high recall in mining for images with small-sized OOI. We compare our method against a well-known semantic image retrieval method, which also does not require extra labeled data. Lastly, we show that our method is flexible and retrieves images with one or more semantically different co-occurring OOI seamlessly.

Targeted Active Learning using Submodular Mutual Information for Imbalanced Medical Image Classification

Suraj Kothawade, Lakshman Tamil, and Rishabh Iyer.
Workshop Paper Medical Imaging Meets NeurIPSWorkshop at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

Abstract

Training deep learning models on medical datasets that perform well for all classes is a challenging task. It is often the case that a suboptimal performance is obtained on some classes due to the natural class imbalance issue that comes with medical data. An effective way to tackle this problem is by using targeted active learning, where we iteratively add data points to the training data that belong to the rare classes. However, existing active learning methods are ineffective in targeting rare classes in medical datasets. In this work, we propose TALISMAN, a framework for targeted active learning that uses submodular mutual information functions as acquisition functions. We show that TALISMAN outperforms the state-of-the-art active learning methods by ~10%-12% on the rare classes accuracy and ~4%-6% on overall accuracy for Path-MNIST and Pneumonia-MNIST image classification datasets.

Submodular Mutual Information for Targeted Data Subset Selection

Suraj Kothawade, Vishal Kaushal, Ganesh Ramakrishnan, Jeff Bilmes, and Rishabh Iyer.
Workshop Paper In ICLR 2021 Workshop: From Shallow to Deep: Overcoming Limited and Adverse Data

Abstract

With the rapid growth of data, it is becoming increasingly difficult to train or improve deep learning models with the right subset of data. We show that this problem can be effectively solved at an additional labeling cost by targeted data subset selection(TSS) where a subset of unlabeled data points similar to an auxiliary set are added to the training data. We do so by using a rich class of Submodular Mutual Information (SMI) functions and demonstrate its effectiveness for image classification on CIFAR-10 and MNIST datasets. Lastly, we compare the performance of SMI functions for TSS with other state-of-the-art methods for closely related problems like active learning. Using SMI functions, we observe ~20-30% gain over the model's performance before re-training with added targeted subset; ~12% more than other methods.

SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios

Suraj Kothawade, Nathan Beck, Krishnateja Killamsetty, Rishabh Iyer
Workshop Paper ICML 2021 Workshop: Subset Selection in Machine Learning.

Abstract

Active learning has proven to be useful for minimizing labeling costs by selecting the most informative samples. However, existing active learning methods do not work well in realistic scenarios such as imbalance or rare classes, out-of-distribution data in the unlabeled set, and redundancy. In this work, we propose SIMILAR (Submodular Information Measures based actIve LeARning), a unified active learning framework using recently proposed submodular information measures (SIM) as acquisition functions. We argue that SIMILAR not only works in standard active learning, but also easily extends to the realistic settings considered above and acts as a one-stop solution for active learning that is scalable to large real-world datasets. Empirically, we show that SIMILAR significantly outperforms existing active learning algorithms by as much as ~5% - 18% in the case of rare classes and ~5% - 10% in the case of out-of-distribution data on several image classification tasks like CIFAR-10, MNIST, and ImageNet.

AUTO-DISCERN: Autonomous Driving Using Common Sense Reasoning

Suraj Kothawade, Vinaya Khandelwal, Kinjal Basu, Huaduo Wang, Gopal Gupta
Workshop Paper ICLP 2021 Workshop on Goal-directed Execution of Answer Set Programs

Abstract

Driving an automobile involves the tasks of observing surroundings, then making a driving decision based on these observations (steer, brake, coast, etc.). In autonomous driving, all these tasks have to be automated. Autonomous driving technology thus far has relied primarily on machine learning techniques. We argue that appropriate technology should be used for the appropriate task. That is, while machine learning technology is good for observing and automatically understanding the surroundings of an automobile, driving decisions are better automated via commonsense reasoning rather than machine learning. In this paper, we discuss (i) how commonsense reasoning can be automated using answer set programming (ASP) and the goal-directed s(CASP) ASP system, and (ii) develop the AUTO-DISCERN system using this technology for automating decision-making in driving. The goal of our research, described in this paper, is to develop an autonomous driving system that works by simulating the mind of a human driver. Since driving decisions are based on human-style reasoning, they are explainable, their ethics can be ensured, and they will always be correct, provided the system modeling and system inputs are correct.

Realistic Video Summarization through VISIOCITY: A New Benchmark and Evaluation Framework.

Vishal Kaushal, Suraj Kothawade, Rishabh Iyer and Ganesh Ramakrishnan
Workshop Paper In Proceedings of the ACMMM 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery, pp. 37-44, 2020.

Abstract

Automatic video summarization is still an unsolved problem due to several challenges. We take steps towards making automatic video summarization more realistic by addressing them. Firstly, the currently available datasets either have very short videos or have few long videos of only a particular type. We introduce a new benchmarking dataset VISIOCITY which comprises of longer videos across six different categories with dense concept annotations capable of supporting different flavors of video summarization and can be used for other vision problems. Secondly, for long videos, human reference summaries are difficult to obtain. We present a novel recipe based on pareto optimality to automatically generate multiple reference summaries from indirect ground truth present in VISIOCITY. We show that these summaries are at par with human summaries. Thirdly, we demonstrate that in the presence of multiple ground truth summaries (due to the highly subjective nature of the task), learning from a single combined ground truth summary using a single loss function is not a good idea. We propose a simple recipe VISIOCITY-SUM to enhance an existing model using a combination of losses and demonstrate that it beats the current state of the art techniques when tested on VISIOCITY. We also show that a single measure to evaluate a summary, as is the current typical practice, falls short. We propose a framework for better quantitative assessment of summary quality which is closer to human judgment than a single measure, say F1. We report the performance of a few representative techniques of video summarization on VISIOCITY assessed using various measures and bring out the limitation of the techniques and/or the assessment mechanism in modeling human judgment and demonstrate the effectiveness of our evaluation framework in doing so.