NASA’s Jet Propulsion Lab Tackles Big Data

NASA’s Jet Propulsion Lab Tackles Big Data

NASA’s Jet Propulsion Laboratory, like many large organizations, is taking on the Big Data problem: the task of analyzing enormous data sets to find actionable information.

In JPL’s case, the job involves collecting and mining data from 22 spacecraft and 10 instruments including the Mars Science Laboratory’s Curiosity rover and the Kepler space telescope. Tom Soderstrom, IT chief technology officer at JPL, joked that his biggest Big Data challenge is more down to Earth: dealing effectively with his email inbox. But kidding aside, JPL now confronts Big Data as a key problem and a key opportunity.

“If we define the Big Data era as beginning where our current systems are no longer effective, we have already entered this epoch,” Soderstrom explained.

The Problem Defined

Soderstrom defines a Big Data problem as having one or more of the following “V”s:

High Volume

“We are already overflowing with radar data and earth science data,” Soderstrom said. “Once we get optical communications into space, the problem will increase by orders of magnitude.”

Rapid Velocity

JPL finds its data arriving at a faster and faster rate, an issue for both spacecraft and ground-based systems. Soderstrom said JPL faces the question of how best to deal with an ever-increasing data speed in realtime.

Large Variety

JPLs engineering data once consisted of structured data. Today, JPL needs to combine both structured SQL data with unstructured NoSQL data. Soderstrom said reconciling the two can be very time consuming.

High Viscosity

Here, the issue is data discovery and manipulation. Soderstrom notes that data is becoming more difficult to detect, extract and combine.

Significant Value

An organization must be able build business cases for Big Data if they are to determine which problem to take on first.

“It is challenging to come up with crisp business value and return on investment (ROI) for a Big Data problem,” Soderstrom said. “However, this is the key to prioritizing and solving Big Data problems.”

The payoff is potentially huge. Soderstrom said Big Data can yield scientific advances without the need to invest in big-ticket items.

“If we could effectively combine data from various sources — such as oceans data with ozone data with hurricane data — we could detect new science without needing to build new instruments or launch new spacecraft,” Soderstrom said.

Solving the Problem

Outdated IT systems represent one aspect of JPL’s Big Data challenge. System upgrades and the use of cloud computing will help address that issue, Soderstrom said. But new systems aren’t the only issue. JPL also needs to cultivate people with the skills to manage and analyze the data.

“Training our current workforce and augmenting with new personnel skilled in the new Big Data IT systems can solve this,” Soderstrom added.

One sought-after Big Data specialist is the data scientist. This role combines a range of skills in fields including statistics, programming, machine learning/data mining, structured and unstructured data, Big Data tools and modeling. A data scientist should also possess domain knowledge — science or engineering, for example — and the ability to provide a data narrative.

“Simply put, the data scientist teaches the data to tell an interesting story we didn’t already know,” Soderstrom said.

Data scientists won’t be expected to become experts in every Big Data field, but will need a high-level of proficiency across the board, he noted. A data scientist who is 80 percent good in many disciplines is better than one who is 100 percent good in any single discipline.

Other qualities include a penchant for exploring data and finding patterns.

“Because much of the exploration will be demonstrated via rapid prototyping, the data scientist will need to use visualization to help tell the story,” Soderstrom said.

Data scientists work together in teams, which could include student contributors who supplement the workforce. The team approach is characteristic of JPL’s testing of emerging technologies.

“We do this in a highly collaborative fashion by establishing working groups and testing the interesting technologies in actual useful prototypes,” Soderstrom said.

“This is part of our journey to redefine IT from the traditional Information Technology definition into Innovating Together.”

Advice for Agencies

Soderstrom suggested that JPL and other federal entities contend with similar Big Data challenges. He said agencies will need to upgrade IT tools and their staffers’ skills, noting that strategic recruiting will play a role.

As for getting started, Soderstrom recommended hiring or appointing a data scientist. That person can come from outside the agency or within it, he said, noting that the latter option will prove easier and less expensive.

Soderstrom also advised agencies to go for some quick wins and avoid analysis paralysis.

“Look for the tasty low-hanging Big Data fruit,” Soderstrom said. “These are problems that have significant business impact if solved. An end user, who is facilitated by the data scientist, articulates these business problems. They are short enough that they can be prototyped and demonstrated within a three-month period and with a low budget.”

Soderstrom advocates a learn-by-doing approach that helps organizations set the stage for tackling additional Big Data projects. The ability to learn from a Big Data experiment is the key success metric, he said.

“Learn on easy problems,” Soderstrom said. “Then you will know where to make the next round of investments and what ROI you could expect.”