Research: Anastasiu Lab

Our research is focused on algorithmic design for machine learning problems with real-world applications and impact, especially those with unconventional inputs, such as sparse data, sets of multivariate time series, and video streams. Dr. Anastasiu believes there is great benefit in partnering with domain experts when designing such methods, which has resulted in a number of government- and industry-funded collaborative projects. This page shows a subset of the current and previous projects Dr. Anastasiu and his students have worked on.

Current Projects
	Video Anomaly Anticipation Partners: Woven By Toyota, NVIDIA We aim to develop real-time artificial intelligence (AI) models for anomaly anticipation in video streams, i.e., to predict an anomaly before it happens. So far, the research community has primarily focused on the video anomaly detection (VAD) problem, which happens after the fact. While it is important to detect anomalies and intervene to assist, e.g., send an ambulance in response to an automobile accident, advances in machine learning and AI now make it possible to consider preventing the anomalies from happening in the first place. However, to prevent user dissatisfaction, the proposed methods should also be able to explain why the action was taken to avert the anticipated anomaly. [ ANASTASIU2025 KACHHADIYAPA25 ]
	AI Models for Hydrologic Flow Prediction Funding: Santa Clara Valley Water Authority The project aims to develop predictive artificial intelligence-based tools that will assist the Santa Clara Valley Water Authority (SCVWA) in its core business goals of flood protection and water supply planning. Examples of predictive capabilities may include predicting flow in streams and subsequent reservoir inflow, the probability for a wet or dry year, and the flood potential for creeks. [ LIA2025PAKDD CIKM '24 AAAI'24 LI2023ARXIV LI2023BIGDATA AAAI'23/IAAI'23/EAAI'23 LI2022ARXIV ]
	Smart City Transportation Partner: Milind Naphade Focusing on the problems of traffic congestion and safety, in collaboration with NVIDIA and professors from several universities, we have organized the AI Cities Challenge and associated workshops. The challenge encourages the development of state-of-the-art deep learning methods for transportation analytics and their integration in smart city operations. Problems tackled through the challenge include multi-camera multi-target vehicle re-identification and tracking, estimating traffic flow characteristics, detecting anomalies caused by crashes or stalled vehicles, and multi-sensor tracking in urban environments. In addition to helping organize the challenge, our lab has also participated in the challenge each year from its inception. [ RAHMANVSWGAW2023 AICITYCHALLENGE2023 VATSA2023 VATSA2022 CVPRW'22 CVPRW'21 JBDAT2020 CVPRW'20 CVPRW'19 CVPR 2019 SOSE 2019 CVPRW'18 CVPRW'18 SmartWorld'17 SmartWorld'17 ]
	Kidney Health Monitoring Partner: Alessandro Bellofiore We are developing novel methods for characterizing the severity of kidney disease with the ease of snapping a picture. A specially designed teststrip developed by Dr. Bellofiore's team facilitates a colorimetric reaction between alkaline picric acid and creatinine in a blood sample that has been applied to the teststrip. Our lab designed a system that uses state-of-the-art deep learning localization models to capture quality images of the teststrip using a cell phone, and then processes them using computer vision and machine learning techniques to predict the concentration of creatinine in the sample based on the change in color. The predicted creatinine concentration is then used to classify the severity of the kidney disease as normal, intermediate risk, or kidney failure. [ WHELANEBA2024 GHTC 2022 NKFSCM2018 ]
	Antibiofilm & Antithrombotic Peptide Prediction Partner: Anand Ramasubramanian Biofilm is an assembly of surface-associated microbial cells that is enclosed in an extracellular polymeric substance matrix. We hypothesize that AntiBiofilm peptides can be predicted by computational methods that use machine learning to screen peptides from diverse habitats in order to identify new peptides that will block biofilm creation, thus allowing traditional antibacterial medicines to provide more effective treatment. [ FIM2022 bioRxiv'21 ]
	Nearest Neighbor Search Partner: Gheorghi Guzun Finding neighbors is a foundational problem that many machine learning algorithms depend on. We have developed and continue to develop serial and parallel solutions to the problems of constructing neighborhood graphs and nearest neighbor search for sparse data, in which objects have few features with non-zero values. By ignoring objects not similar enough to be part of the final neighborhood, our methods have achieved order-of-magnitude performance gains over state-of-the-art baselines. [ iDSC 2017 JDSA'17 JPDC'17 IA3 2016 DSAA'16 IA3 2015 CIKM'15 ICDE 2014 ]
	Active Learning Effectiveness Funding: San José State University, Santa Clara University I developed the Competitive Learning Platform (CLP), an active learning tool for encouraging and motivating student engagement in computer engineering courses. Additionally, I have integrated hands-on training via Jupyter Notebook activities in most of my courses. I measured the effectiveness of these methods through an analysis of both trace evidence from students using the tools and surveys from multiple classes using the tools. [ VATSGA2022 FIE 2018 ]
Previous Projects
	Open Modification Spectral Library Search Funding: National Science Foundation Partner: William Stafford Noble Open Modification Spectral Library Search is a technique that seeks to match results from mass spectrometry experiments (spectra) with protein-labeled spectra from massive spectral databases. The high computational complexity involved in the search has led to approximate methods being primarily used to search large databases, which fail to identify many of the spectra. We are developing efficient exact serial and parallel algorithms for spectral library search that use theoretic properties of the proximity measure to prune much of the search space and quickly identify matches. This research has the potential to significantly improve our ability to characterize peptides that are present in biological samples and could eventually lead to a better understanding of and potential cures for diseases. [ LINAA23 KDD'19 ]
	Mass-Cytometry Partner: Edgar A. Arriaga With the aim of analyzing large-sized multidimensional single-cell datasets, we developed a Cosine-based Tanimoto similarity-refined graph method for community detection using Leiden's algorithm (CosTaL). As a graph-based clustering method, CosTaL transforms the cells with high-dimensional features into a weighted k-nearest-neighbor (kNN) graph. The cells are represented by the vertices of the graph, while an edge between two vertices in the graph represents the close relatedness between the two cells. Specifically, CosTaL builds an exact kNN graph using cosine similarity and uses the Tanimoto coefficient as the refining strategy to re-weight the edges in order to improve the effectiveness of clustering. [ LINAA23 bioRxiv'22 ]
	Parallel Bipartite Network Projection Funding: Intel Labs Partner: Shaden Smith Bipartite networks are often used to describe the relationship between entities of two different types. While many network analysis methods have been proposed for unipartite networks, fewer exist for bipartite ones. Thus, a key kernel in the analysis of bipartite networks is computing a projection of the network onto the nodes of one or both of the partitions, which then allows completing standard graph analytics tasks on the projected network. In this project, we are analyzing various parallel implementations of graph projection algorithms for current and potential architectures as part of a hardware/software co-design effort that seeks significant efficiency improvements in graph analytics solutions.
	DNS Analytics Funding: Infoblox Partners: Magdalini Eirinaki, Cricket Liu We will analyze active domain name system (DNS) data with a goal to extract useful knowledge from large amounts of such data. Initially, we are looking to develop novel methods that can identify out of place requests faster than and with higher precision than existing approaches. Then, we will design methods that can elicit new insights from these data that current methods may not be able to uncover. [ MCCSE 2019 ]
	Anomaly Detection in Expense Reports Funding: Flex Ltd. This project aims to develop a deep learning-based document processing pipeline for the automatic validation of and anomaly detection in expense reports. Existing methods are insufficient for the task, in part due to the multinational and multilingual aspects of the Flex business, and continue to produce high error rates. Deep learning will play a crucial role in each processing stage of the pipeline. We are working on multi- and mixed-language object character recognition models, localization-based knowledge extraction methods, and a vision-enhanced rules-based anomaly detection engine.
	Autism Spectrum Prediction Partner: Megan C. Chang Our lab developed novel methods to analyze longitudinal multivariate time series data for accurately diagnosing autism spectrum disorders (ASD) in children. ASDs are a group of conditions characterized by impairments in reciprocal social interaction and by the presence of restricted and repetitive behaviors. Data we analyzed were collected during a sensory challenge protocol in which we observed the reactions of subjects to eight stimuli. The challenge in analyzing these types of data is their length, each series having an average of 2M points for each sensor. [ iDSC 2019 KAPOOR2019-TR ]
	Behavioral Evolution Analysis Funding: Intel Labs The goal of machine learning is often to describe and predict human behavior. We live in a world where behavior is constantly being captured. Every time someone checks out at a grocery store, for example, a record of their transaction, the items they purchased and associated prices, is stored on a server. These captured actions result in massive datasets, the analysis of which can provide us with a wealth of valuable information. We developed scalable machine learning approaches for analyzing these types of longitudinal user activity data for large groups of users in order to characterize and predict changes in user behavior. [ ICDE 2015 ]
	Human Activity Recognition As the usage of smartphones and wearable devices continues to grow, becoming ubiquitous, it opens the door for human activity tracking and discovery of associated trends from sensor data. Though fitness tracking devices currently give customized information of a user's daily physical activities, understanding general trends across users that span location, age, or gender could help the insurance or health care industries offer personalized plans and services. In this project, we analyzed a massive time series sensor dataset collected through a custom-built phone application from many users in order to automatically categorize their activities. By leveraging efficient optimization techniques, our method is able to automatically derive prototypical activity patterns and identify dynamic changes in activity across time. [ MCCSE 2019 ]
	Constrained Wireless Network Planning Partner: San José OEM We developed data structures and methods for efficient city-wide wireless network planning that aim to efficiently identify the optimal placement of network antennae in a metropolitan environment. [ CIKM'18 IEEE SCI 2017 ]

Current Projects

Video Anomaly Anticipation

AI Models for Hydrologic Flow Prediction

Smart City Transportation

Kidney Health Monitoring

Antibiofilm & Antithrombotic Peptide Prediction

Nearest Neighbor Search

Active Learning Effectiveness

Previous Projects

Open Modification Spectral Library Search

Mass-Cytometry

Parallel Bipartite Network Projection

DNS Analytics

Anomaly Detection in Expense Reports

Autism Spectrum Prediction

Behavioral Evolution Analysis

Human Activity Recognition

Constrained Wireless Network Planning