Searching for Science Insights with Computing

Home Publications Projects

Date: 26 Mar 2021

“Everyone knows that in research there are no final answers, only insights that allow one to formulate new questions,” said the Italian microbiologist Salvador Luria in his 1984 autobiography. This continues to be true in modern scientific research, where our questions have become more ambitious and tools more powerful. On one side, scientists use hypothesis-based testing to ask questions about the universe. As questions become more interesting and complex, the tools and methodologies needed also become more complex, including improved measurements and modeling capabilities.

This tightly coupled loop can be seen in nearly every research field. The Large Hadron Collider (CERN) in Switerzland was built by over 10,000 scientists to study fundamental laws of physics at a cost of 4 billion euros. CERN was used to locate the first Higgs-Boson particle. Quite an expensive measurement device to measure such a small particle. Sandia National Lab built the “Z Machine” which can create energy bursts hotter than the center of the sun for 80 million dollars to study how material reacts in unusual environments. While expensive, advancing scientific knowledge often requires significant monetary investment. Quantum computing is in the early stages of this process with challenges in controlling and measuring quantum states.

High-performance computing facilities provide the hardware needed to analyze and model the world around us. Holding the world’s largest super-computer title is an international competition. Currently, Japan holds the largest super-computer followed by the Summit machine at Oak Ridge National Lab. These computers have millions of processing cores and accelerated hardware, using an insane amount of energy. Scientists use these machines to ask questions about the world. In my areas of interest, climate, weather, and remote sensing, large computing facilities enables us to improve the accuracy of our predictions. With improved accuracy, scientists are able to study the environment in finer and finer detail. However, more detail and complexity then opens up more questions.

Today, deep learning (DL) is the new tool that is invading many areas of science. DL is an area of Artificial Intelligence (AI) research that uses patterns to find interesting representations of data. These models have been found to generate highly accurate mappings between domains, providing solutions to a number of tasks. Siri and Alexa are good examples of this, which use DL for speech detection. Google photos using DL to recognize people across your albums. This technology is becoming pervasive. Data availability continues to grow, opening up more and more opportunities for DL applications.

The question becomes, how should scientists use DL? Should we expect AI to generate insight for us or is that too ambitious? In scientific research we ask a question, test the hypothesis, analyze the data, and report results, with a continuous feedback loop. So, DL methodologies used to test the hypothesis will generate data to be analyzed. If the analysis validates the hypothesis, did the DL model help improve results? If so, does improving our model help in future hypothesis testing? Are there anomalies in the outputs which cannot be explained? Is this an insight humans are yet to understand? How do we know if it is an insight?

This blog is going to focus on the intersection of artificial intelligence, science, and environmental technologies. I research deep learning applications to climate science and remote sensing.