Our research is focused on algorithmic design for machine learning problems with real-world applications and impact, especially those with unconventional inputs, such as sparse data, sets of multivariate time series, and video streams. Dr. Anastasiu believes there is great benefit in partnering with domain experts when designing such methods, which has resulted in a number of government- and industry-funded collaborative projects. This page shows a subset of the current and previous projects Dr. Anastasiu and his students have worked on.

Current Projects

Open Modification Spectral Library Search

Funding: National Science Foundation
Partner: William Stafford Noble
Open Modification Spectral Library Search is a technique that seeks to match results from mass spectrometry experiments (spectra) with protein-labeled spectra from massive spectral databases. The high computational complexity involved in the search has led to approximate methods being primarily used to search large databases, which fail to identify many of the spectra. We are developing efficient exact serial and parallel algorithms for spectral library search that use theoretic properties of the proximity measure to prune much of the search space and quickly identify matches. This research has the potential to significantly improve our ability to characterize peptides that are present in biological samples and could eventually lead to a better understanding of and potential cures for diseases.
[   LINAA23   KDD'19   ]

AI Models for Hydrologic Flow Prediction

Funding: Santa Clara Valley Water Authority
The project aims to develop predictive artificial intelligence-based tools that will assist the Santa Clara Valley Water Authority (SCVWA) in its core business goals of flood protection and water supply planning. Examples of predictive capabilities may include predicting flow in streams and subsequent reservoir inflow, the probability for a wet or dry year, and the flood potential for creeks.
[   li2022AAAI23  LI2022ARXIV   ]

Smart City Transportation

Partner: Milind Naphade
Focusing on the problems of traffic congestion and safety, in collaboration with NVIDIA and professors from several universities, we have organized the AI Cities Challenge and associated workshops. The challenge encourages the development of state-of-the-art deep learning methods for transportation analytics and their integration in smart city operations. Problems tackled through the challenge include multi-camera multi-target vehicle re-identification and tracking, estimating traffic flow characteristics, detecting anomalies caused by crashes or stalled vehicles, and multi-sensor tracking in urban environments. In addition to helping organize the challenge, our lab has also participated in the challenge each year from its inception.
[   RAHMANVSWGAW2023   AICITYCHALLENGE2023   VATSA2023   VATSA2022   CVPRW'22   CVPRW'21   JBDAT2020   CVPRW'20   CVPRW'19   CVPR 2019   SOSE 2019   CVPRW'18   CVPRW'18   SmartWorld'17   SmartWorld'17   ]

Kidney Health Monitoring

Partner: Alessandro Bellofiore
We are developing novel methods for characterizing the severity of kidney disease with the ease of snapping a picture. A specially designed teststrip developed by Dr. Bellofiore's team facilitates a colorimetric reaction between alkaline picric acid and creatinine in a blood sample that has been applied to the teststrip. Our lab designed a system that uses state-of-the-art deep learning localization models to capture quality images of the teststrip using a cell phone, and then processes them using computer vision and machine learning techniques to predict the concentration of creatinine in the sample based on the change in color. The predicted creatinine concentration is then used to classify the severity of the kidney disease as normal, intermediate risk, or kidney failure.
[   WHELANEBA2024   GHTC 2022   NKFSCM2018   ]

Antibiofilm & Antithrombotic Peptide Prediction

Partner: Anand Ramasubramanian
Biofilm is an assembly of surface-associated microbial cells that is enclosed in an extracellular polymeric substance matrix. We hypothesize that AntiBiofilm peptides can be predicted by computational methods that use machine learning to screen peptides from diverse habitats in order to identify new peptides that will block biofilm creation, thus allowing traditional antibacterial medicines to provide more effective treatment.
[   FIM2022   bioRxiv'21   ]

Nearest Neighbor Search

Partner: Gheorghi Guzun
Finding neighbors is a foundational problem that many machine learning algorithms depend on. We have developed and continue to develop serial and parallel solutions to the problems of constructing neighborhood graphs and nearest neighbor search for sparse data, in which objects have few features with non-zero values. By ignoring objects not similar enough to be part of the final neighborhood, our methods have achieved order-of-magnitude performance gains over state-of-the-art baselines.
[   iDSC 2017   JDSA'17   JPDC'17   IA3 2016   DSAA'16   IA3 2015   CIKM'15   ICDE 2014   ]

Active Learning Effectiveness

Funding: San José State University, Santa Clara University
I developed the Competitive Learning Platform (CLP), an active learning tool for encouraging and motivating student engagement in computer engineering courses. Additionally, I have integrated hands-on training via Jupyter Notebook activities in most of my courses. I measured the effectiveness of these methods through an analysis of both trace evidence from students using the tools and surveys from multiple classes using the tools.
[   VATSGA2022   FIE 2018   ]

Previous Projects

Mass-Cytometry

Partner: Edgar A. Arriaga
With the aim of analyzing large-sized multidimensional single-cell datasets, we developed a Cosine-based Tanimoto similarity-refined graph method for community detection using Leiden's algorithm (CosTaL). As a graph-based clustering method, CosTaL transforms the cells with high-dimensional features into a weighted k-nearest-neighbor (kNN) graph. The cells are represented by the vertices of the graph, while an edge between two vertices in the graph represents the close relatedness between the two cells. Specifically, CosTaL builds an exact kNN graph using cosine similarity and uses the Tanimoto coefficient as the refining strategy to re-weight the edges in order to improve the effectiveness of clustering.
[   LINAA23   bioRxiv'22   LIAA2022   ]

Parallel Bipartite Network Projection

Funding: Intel Labs
Partner: Shaden Smith
Bipartite networks are often used to describe the relationship between entities of two different types. While many network analysis methods have been proposed for unipartite networks, fewer exist for bipartite ones. Thus, a key kernel in the analysis of bipartite networks is computing a projection of the network onto the nodes of one or both of the partitions, which then allows completing standard graph analytics tasks on the projected network. In this project, we are analyzing various parallel implementations of graph projection algorithms for current and potential architectures as part of a hardware/software co-design effort that seeks significant efficiency improvements in graph analytics solutions.

DNS Analytics

Funding: Infoblox
Partners: Magdalini Eirinaki, Cricket Liu
We will analyze active domain name system (DNS) data with a goal to extract useful knowledge from large amounts of such data. Initially, we are looking to develop novel methods that can identify out of place requests faster than and with higher precision than existing approaches. Then, we will design methods that can elicit new insights from these data that current methods may not be able to uncover.
[   MCCSE 2019   ]

Anomaly Detection in Expense Reports

Funding: Flex Ltd.
This project aims to develop a deep learning-based document processing pipeline for the automatic validation of and anomaly detection in expense reports. Existing methods are insufficient for the task, in part due to the multinational and multilingual aspects of the Flex business, and continue to produce high error rates. Deep learning will play a crucial role in each processing stage of the pipeline. We are working on multi- and mixed-language object character recognition models, localization-based knowledge extraction methods, and a vision-enhanced rules-based anomaly detection engine.

Autism Spectrum Prediction

Partner: Megan C. Chang
Our lab developed novel methods to analyze longitudinal multivariate time series data for accurately diagnosing autism spectrum disorders (ASD) in children. ASDs are a group of conditions characterized by impairments in reciprocal social interaction and by the presence of restricted and repetitive behaviors. Data we analyzed were collected during a sensory challenge protocol in which we observed the reactions of subjects to eight stimuli. The challenge in analyzing these types of data is their length, each series having an average of 2M points for each sensor.
[   iDSC 2019   KAPOOR2019-TR   ]

Behavioral Evolution Analysis

Funding: Intel Labs
The goal of machine learning is often to describe and predict human behavior. We live in a world where behavior is constantly being captured. Every time someone checks out at a grocery store, for example, a record of their transaction, the items they purchased and associated prices, is stored on a server. These captured actions result in massive datasets, the analysis of which can provide us with a wealth of valuable information. We developed scalable machine learning approaches for analyzing these types of longitudinal user activity data for large groups of users in order to characterize and predict changes in user behavior.
[   ICDE 2015   ]

Human Activity Recognition

As the usage of smartphones and wearable devices continues to grow, becoming ubiquitous, it opens the door for human activity tracking and discovery of associated trends from sensor data. Though fitness tracking devices currently give customized information of a user's daily physical activities, understanding general trends across users that span location, age, or gender could help the insurance or health care industries offer personalized plans and services. In this project, we analyzed a massive time series sensor dataset collected through a custom-built phone application from many users in order to automatically categorize their activities. By leveraging efficient optimization techniques, our method is able to automatically derive prototypical activity patterns and identify dynamic changes in activity across time.
[   MCCSE 2019   ]

Constrained Wireless Network Planning

Partner: San José OEM
We developed data structures and methods for efficient city-wide wireless network planning that aim to efficiently identify the optimal placement of network antennae in a metropolitan environment.
[   CIKM'18   IEEE SCI 2017   ]