The PDML Lab gratefully acknowledges funding support from the National Science Foundation under grant IIS-1750911 (Chen), IIS-1815696 (Chen, Hu), IIS-1441479 (Zhang, Rose, Scardamalia, Chen), ARO- W911NF1720129 (Chen), NIH (Chen), and IARPA (Chen).
Currently funded projects include:
- IARPA (University-PI: Chen. Lead PI: Brian Clipp, Kitware) Hidden Activity Signal and Trajectory Anomaly Characterization Currently, large-scale human movement modeling focuses on aggregated, high-level data to study migration, the spread of disease, and other patterns of behavior. For example, hourly toll booth counts indicate traffic volume. The IARPA HAYSTAC program aims to capture more fine-grained, individual human movement patterns to identify what characterizes “normal” movement while upholding public expectations for privacy. In contrast to toll booth counts, this could mean modeling the behavior of vehicles traveling along a toll road. Where do they originate from? Where do they typically go after the toll booth, and what stops do they make along a given route? How can we model this behavior without violating any individual’s privacy? To help achieve these goals, Kitware will lead a team including the University of Maryland, CityData.ai, the University of Central Florida, The University of North Carolina at Chapel Hill, and the University of Texas at Dallas. The team will be responsible for simulating urban human movement patterns at a mega-city scale over long periods of time, up to a year. The team will detect anomalous patterns of behavior in the huge volume of simulated trajectories that represent a city’s inhabitants. They will also plan itineraries, in which multiple teams compete, to identify what trajectories are generated from activity that was inserted by one of the other teams. Successful HAYSTAC systems will be capable of identifying trajectories created by subtle deviations against normal while also being able to generate activity and corresponding trajectories which are not distinguishable from normal. This game of hide and seek will ultimately lead to a deep characterization of normal human activity while maintaining an expectation of privacy that can be used to train AI algorithms.
- NSF DMS-2220574 (PI: Chen). ATD: Sparse and Localized Graph Convolutional Networks for Anomaly Detection and Active Learning Many applications produce massive quantities of data in which complex relationships and interdependency are naturally modeled as graphs, such as road networks, mobile networks, and public health surveillance networks. Despite considerable attention on graph convolution networks (GCNs), the vast majority of the existing works are limited to balanced-node classification, thus ineffective at detecting anomalous nodes accurately. This project aims at a sparse and localized GCN for detecting anomalous patterns. In addition, a novel active learning framework will be developed that learns the optimal query strategy to reduce the number of training labels. Theoretical investigations will be performed to interpret the anomalous patterns in an attempt to align with intuitions and clarifications from domain experts. The computational tools developed will be applicable, specifically in the fields of remote sensing and geospatial information. Furthermore, the investigators will incorporate the research results into developing new interdisciplinary courses with a focus on both theory and application. These courses will serve as a springboard for student recruitment, such as graduate students interested in using this research as the subject of a Ph.D. thesis and undergraduates interested in a summer research project.
- NSF FAI-2147375 (PI: Chen). FAI: A novel paradigm for fairness-aware deep learning models on data streams. Massive amounts of information are transferred constantly between different domains in the form of data streams. Social networks, blogs, online businesses, and sensors all generate immense data streams. Such data streams are received in patterns that change over time. While this data can be assigned to specific categories, objects and events, their distribution is not constant. These categories are subject to distribution shifts. These distribution shifts are often due to the changes in the underlying environmental, geographical, economic, and cultural contexts. For example, the risks levels in loan applications have been subject to distribution shifts during the COVID-19 pandemic. This is because loan risks are based on factors associated to the applicants, such as employment status and income. Such factors are usually relatively stable, but have changed rapidly due to the economic impact of the pandemic. As a result, existing loan recommendation systems need to be adapted to limited examples. This project will develop open software to help users evaluate online fairness-in algorithms, mitigate potential biases, and examine utility-fairness trade-offs. It will implement two real-world applications: online crime event recognition from video data and online purchase behavior prediction from click-stream data. To amplify the impact of this project in research and education, this project will leverage STEM programs for students with diverse backgrounds, gender and race/ethnicity. This project includes activities including seminars, workshops, short courses, and research projects for students. This project aims to develop a new and innovative paradigm for designing, implementing, and evaluating online fairness-aware Deep Learning (DL) models. Such models would be used for classification tasks in noisy and non-stationary data streams. This project will focus on five areas. First, the project will explore how to ensure a variety of fairness principles are incorporated in a DL model in online and non-stationary settings. The project will also look at how to identify a neural network architecture that will reflect the causal structure and be adaptive to distribution shifts. The project also looks at how the DL model will learn global initialization of primal parameters (associated with model accuracy) and dual parameters (associated with model fairness). Finally, the project looks at how to make online learning algorithms robust to uncertainties in model estimation of fairness and how to, ultimately, interpret the fairness of an online DL model. By bridging the areas of neural architecture search, online meta-learning, and fairness-aware deep learning techniques, this project advances state-of-the-art research in Fairness in AI. This project will offer the following innovations: (1) disentangle underlying sensitive and non-sensitive causal variables from raw features via causal representation learning; (2) identify adaptive architectures for data streams via differential architecture search; (3) learn effective initializations for both primal and dual model parameters in an online-within-online manner; (4) develop robust versions of the algorithms to deal with uncertainties in model fairness and tasks, and (5) identify the training examples and latent causal variables responsible for model adaption using local and global interpretations.
- NSF IIS-2107449 (PI: Chen). III: Medium: Collaborative Research: MUDL: Multidimensional Uncertainty-Aware Deep Learning Framework People encounter serious hurdles in finding effective decision-making solutions to real world problems because of uncertainty from a lack of information, conflicting information, and/or unsure observations. Critical safety concerns have been consistently highlighted because how to interpret this uncertainty has not been carefully investigated. If the uncertainty is misinterpreted, this can result in unnecessary risk. For example, a self-driving autonomous car can misdetect a human in the road. An artificial intelligence-based medical assistant may misdiagnose cancer as a benign tumor. Further, a phishing email can be detected as a normal email. The consequences of all these misdetections or misclassifications caused by different types of uncertainty adds risk and potential adverse events. Artificial intelligence (AI) researchers have actively explored how to solve various decision-making problems under uncertainty. However, no prior research has looked into how different approaches of studying uncertainty in AI can leverage each other. This project studies how to measure different causes of uncertainty and use them to solve diverse decision-making problems more effectively. This project can help develop trustworthy AI algorithms that can be used in many real world decision-making problems. In addition, this project is highly transdisciplinary so that it can encourage broader, newer, and more diverse approaches. To magnify the impact of this project in research and education, this project leverages multicultural, diversity, and STEM programs for students with diverse backgrounds and under-represented populations. This project also includes seminar talks, workshops, short courses, and/or research projects for high school and community college students. This project aims to develop a suite of deep learning (DL) techniques by considering multiple types of uncertainties caused by different root causes and employ them to maximize the effectiveness of decision-making in the presence of highly intelligent, adversarial attacks. This project makes a synergistic but transformative research effort to study: (1) how different types of uncertainties can be quantified based on belief theory; (2) how the estimates of different types of uncertainties can be considered in DL-based approaches; and (3) how multiple types of uncertainties influence the effectiveness and efficiency of decision-making in high-dimensional, complex problems. This project advances the state-of-the-art research by performing the following: (1) Proposing a scalable, robust unified DL-based framework to effectively infer predictive multidimensional uncertainty caused by heterogeneous root causes in adversarial environments. (2) Dealing with multidimensional uncertainty based on neural networks. (3) Enhancing both decision effectiveness and efficiency by considering multidimensional uncertainty-aware designs. (4) Testing proposed approaches to ensure their robustness in the presence of intelligent adversarial attackers with advanced deception tactics based on both simulation models and visualization tools.
- NSF IIS-1750911 (PI: Chen). CAREER: SPARK: A Theoretical Framework for Discovering Complex Patterns in Big Attributed Networks.. Recent advances in sensing and computing techniques have led to a need for massive quantities of data to be aggregated from heterogeneous information sources in fields such as science, engineering, and business that are naturally modeled in the form of big attributed networks. A big attributed network (BAN) is characterized by a combination of (a) high-dimensional and heterogeneous network topologies and (b) high-dimensional and heterogeneous attribute data. Effective analysis of BAN data relies on simultaneous subgraph mining and feature selection for discovering complex patterns that are interesting or significant. However, as yet little has been done to bridge these two important research areas. The focus of this project is therefore to unify a wide range of complex pattern discovery tasks including, for example, the detection and forecasting of societal events (disasters, civil unrest), anomalous patterns (disease outbreaks, cyberattacks), discriminative subnetworks (cancer diagnosis), knowledge patterns (new knowledge building) and storylines (intelligence analysis), and to resolve the fundamental modeling, algorithmic, and interactive challenges associated with ubiquitous BAN data in today's big data era. Preliminary studies on real datasets have strongly demonstrated a preference for sparse convolution filters on the graph. It is also desirable to have joint localization in both the vertex and graph frequency domains for computational efficiency and structural feature extraction. By combining sparsity and localization, the project aims at a sparse and localized GCN paradigm that efficiently detects anomalous nodes. In particular, this project will address the following questions: (1) how to design graph convolutions to effectively enforce sparse and localized patterns for anomaly detection; (2) how to optimize the network architecture and algorithm design; and (3) how to reduce the user?s burden in the collection of labeled data by active learning, which involves a sequence of dynamical query for labeling a small number of nodes that are most effective in anomaly detection. Overall, this project will advance the algorithmic and theoretical foundations of GNNs and anomaly detection. The research will contribute to the areas of time-frequency analysis, harmonic analysis, uncertainty quantification, and nonconvex optimization.
Recently Completed projects include:
- NSF IIS-1815696 (PI: Chen). A novel paradigm for detecting complex anomalous patterns in multi-modal, heterogeneous, and high-dimensional multi-source data sets. One of the greatest challenges in modern data analysis is to identify subtle, complex anomalous patterns (subsets of a data set that are novel or unexpected) within ubiquitous multi-modal, heterogeneous, and high-dimensional multi-source data sets in the current big data era. The detection of such salient patterns is an indispensable tool for knowledge mining and discovery in important applications across many fields of science, engineering, and business, including the early detection of infectious disease outbreaks, crime hotspots, network intrusions, false advertising, cyber botnets, customer activity monitoring and user profiling, and fraudulent medical claims, among others. The project research goal is to develop a new and innovative paradigm for discovering complex and subtle anomalous patterns in ubiquitous multi-modal, heterogeneous, and high-dimensional multi-source datasets in the current big data era. The key idea is to generalize the idea of meta-analysis from the statistical community and to reframe the problem as a search over all subsets of nonparametric statistical tests that are conducted on individual record-level features, in order to find the subsets (anomalous patterns) that are jointly significant. The project is focused on real-world problems related to biosurveillance and cybersecurity with two challenging applications: early detection of rare and infectious disease outbreaks (e.g., foodborne, Hantavirus, yellow fever) and Sybil attacks (e.g., spammers, fake users, and compromised normal users).
- ARO-W911NF1720129 (PI: Chen). A Uncertainty Management for Dynamic Decision Making. Managing uncertainty is one of key factors that critically affect effective decision making. A tactical military environment is often characterized by highly dynamic, high-tempo operations in the presence of hostile entities. The disadvantaged characteristics of a tactical environment make a mission-related decision highly challenging because evidence for a decision making may be incomplete, uncertain, incorrect, or even not available. Further, any cognitive biases or errors introduced by a human decision maker can make the decision making process harder. This research aims to provide efficient decision making tools for Warfighters under a highly disadvantaged tactical environment by efficiently reducing uncertainty in perceiving a given situation or problem. This research proposes three key thrusts as follows: (1) extending a belief model, called subjective logic, to consider different sources of uncertainty in the decision making process; (2) developing a data reduction algorithm that reduces uncertainty in data; and (3) building a decision network consisting of rational agents with uncertainty-based utility functions that are adaptive to dynamics of uncertainty in time and space. The proposed research will leverage the existing techniques in belief models, machine learning, and game theories to maximize the effectiveness of decision while efficiently lowering down the degree of uncertainty.
Prof. Feng gratefully acknowledges past funding support from the National Science Foundation (NSF), grants IIS-1750911, IIS-1815696, as well as grants from Army Research Offices (ARO), National Institutes of Health (NIH), IARPA, and US Department of Transportation (DOT).
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF, NIH, ARO, IARPA, and DOT.