Eight new interdisciplinary research projects have won seed funding from Princeton University’s Schmidt DataX Fund, marking the third round of grants undertaken by the fund since 2019. The fund, supported by a major gift from the Schmidt Futures Foundation, offers grants to explore using artificial intelligence and machine learning to accelerate discovery.

The eight funded projects involve 13 faculty across seven departments and programs, ranging from computer science and engineering to Near Eastern studies and psychology.

The projects run the gamut and include proposals such as deciphering a large body of documents from 11th-13th century Egypt, improving the performance of organic semiconductor devices, improving security of autonomous driving and the discovery of the dynamics of human thought.

“These projects are exciting as they explore how big and difficult problems can be solved using modern data analytics and machine learning approaches while accelerating scientific discovery. The central idea is to replace slow and laborious processes with machine-assisted methods,” said Peter Ramadge, director of the Center for Statistics and Machine Learning (CSML). “These projects are not limited to traditional “technical” areas, but also cover problems on a large scale in the humanities and social sciences.”

CSML oversees a series of efforts made possible by the Schmidt DataX Fund to expand the reach of data science and machine learning on campus. These efforts include hiring data scientists and overseeing the awarding of DataX grants. This is the third round of DataX seed funding.

The 8 new research projects and faculty

Handwritten text recognition for the Princeton Geniza project (HTR4PGP)
Marina Rustow, Khedouri A. Zilkha Professor of Jewish Civilization in the Near East and Professor of Near Eastern Studies and History

The Cairo Geniza, a cache of medieval manuscripts discovered in an Egyptian synagogue, has helped historians piece together networks of ordinary merchants, craftsmen, women, children and slaves, stretching from Spain in Sumatra. But in the century since the cache was discovered, researchers have published less than 5,000 of its documentary texts. HTR4PGP seeks to triple that number by using machine learning to produce searchable transcripts.

Machine Learning Methods for Next Generation Immunoepidemiology
C. Jessica Metcalf, Associate Professor of Ecology and Evolutionary Biology and Public Affairs, Princeton School of Public and International Affairs (SPIA);
Bryan Grenfell, Kathryn Briger and Sarah Fenton Professor of Ecology and Evolutionary Biology and Public Affairs, SPIA

High-dimensional immunological data are increasingly available, especially as the COVID-19 pandemic highlights the importance of record keeping. Yet, appropriate analytical methods that provide insight into population outcomes and disease control remain elusive. To solve this problem, this project aims to develop a process that uses machine learning to analyze immunological data and uncover the hidden mechanisms of infectious diseases that affect entire populations.

Activation of crystalline organic semiconductor devices
Barry Rand, Associate Professor of Electrical and Computer Engineering and Andlinger Center for Energy and the Environment
Adji Bousso Dieng, computer science lecturer

Organic semiconductor devices are multi-layered, sometimes involving seven to eight separate layers, but these layers are messy and limit the performance of organic semiconductor devices. This project fills these gaps by using data science to design crystalline layers that will advance these devices.

Computational approaches to discovering the dynamics of the stream of thought
Yael Niv, professor of psychology and neuroscience
Diana Tamir, Associate Professor of Psychology

In our mind, thoughts flow continuously and freely. The typical methods used to analyze these spontaneous and unconstrained thoughts are laborious, expensive and inefficient. To address these constraints, this project aims to develop machine learning tools to efficiently analyze the content and dynamics of spontaneous thought in order to provide insight into the function of thought and its clinical implications.

Learn How Fast Antarctica’s Ice Shelves Are Melting Using Neural Networks
Ching-Yao Lai, Assistant Professor of Geosciences

As the climate warms, substantial melting of ice shelves may accelerate sea level rise, but current methods for estimating melt rates are inadequate for data with significant noise and poor resolution. To more accurately assess the extent and speed of melting ice shelves in Antarctica, this project proposes to use neural networks trained by both observational data and physical laws to better assess the impact of this phenomenon of climate change.

Efficient estimation of sampled random field parameters in geophysics
Frederik Simons, Professor of Geosciences

The geosciences are full of data such as measurements of the seismic, gravitational and magnetic properties of the Earth, but the techniques to analyze them are often inefficient because geoscience data can be incomplete and noisy. To circumvent these problems, the researchers developed a computational technique for analyzing geoscience data that is efficient, fast and robust. To advance this technique, this project proposes to test this procedure by performing an analysis of terrestrial and planetary field data, with a focus on the topography of Venus and the bathymetry of the Earth’s ocean floor.

Proven perception and control for safe autonomous driving
Jaime Fernandez Fisac, Assistant Professor of Electrical and Computer Engineering
Prateek Mittal, Associate Professor of Electrical and Computer Engineering

Autonomous vehicles are poised to revolutionize transportation, but still face hurdles in navigating unexpected, even hostile situations. This project aims to overcome these challenges by unifying robust visual perception and safe path planning under a common framework.

Assessment frameworks for privacy, auditing, and assessment in federated learning
Sanjeev Arora, Charles C. Fitzmorris Professor of Computer Science
Kai Li, Paul M. Wythes ’55 P86 and Marcia R. Wythes P86 Professor of Computer Science

Federated learning is an emerging machine learning technique that allows models to be trained on a centralized cloud server while using data from different users, who keep their data on their own devices. This framework is intended to address concerns about security and privacy in the online world, but there is little research evaluating the level of security and privacy provided by federated learning. This project aims to develop and evaluate different federated learning frameworks providing privacy preservation mechanisms, to identify if there are traces left of the user’s data in the trained model and to assess the contribution of each user participating in the final model.

For more information on past DataX recipients, see the 2019 cohort and the 2021 cohort.