Lillian Ingster, director of the National Death Index (NDI), has access to the entire universe of death data in the United States since 1979. Ingster shares her experience managing this data with her Federal peers as often as she can.
NDI, a branch of the National Center for Health Statistics (NCHS), is a centralized database that houses death record information for 54 territories, including all 50 states, New York City, Washington, D.C., the Virgin Islands, and Puerto Rico. NDI collects death records for those who perished in the U.S.
“We don’t really collect marriage and divorce records. That kind of bit the dust in the 1990s,” Ingster said. “But we do collect birth and death records.”
The Centers for Disease Control and Prevention (CDC), which encompasses NCHS, was founded 70 years ago. Ingster, who began work with NDI five years ago, said she likes to share her experience with data management, and frequently attends meetings with representative from other Federal agencies to discuss big data.
“A lot of people use big data. We’ve been doing it for decades,” Ingster said. “We talk to our peers throughout the Federal government. I hope maybe I can give people a little insight into what we do on our end.”
Ingster warned big data collectors to be mindful of information that comes from many sources. She said that sometimes two different sources can provide different readings of the same information.
She said, for example, that asking someone their height and weight would yield a different result from measuring their height and weight. In her experience, she said, people tend to grow a couple of inches and lose 10 pounds when asked what their measurements are.
In addition to advising people to recognize the exact sources of their data, Ingster also warned big data managers to analyze information with a preconceived hypothesis. She said that an established hypothesis will prevent getting lost in vast amounts of data.
“When you have big data lakes and you’re doing data mining, you have to have an a priori hypothesis. If you just go fishing in those big data lakes, and you find significant results, those results are likely to be garbage,” Ingster said. “Any statistician worth his or her salt will tell you your results aren’t worth the paper they’re printed on if you don’t have a prior hypothesis.”
Researchers with private industries submit applications to NDI and pay a fee for the chance to match the database’s information with their own, which includes death date, death state, death certificate number, and cause of death. Ingster said these researchers frequently use the data matching service to create mortality profiles or industry assessments. She stressed that the NDI’s data was for research only, and that it was not a tool to look up how a favorite aunt died.
Ingster said she looks forward to attending big data talks in the future. In addition to imparting lessons from her experiences with data, she also uses such events to listen to what other agencies are working on with their data.
“There are a lot of diverse areas of big data, but, conceptually, there are overlaps,” Ingster said. “It’s a lot of data. There’s a lot of information that can be gleaned, but you’ve got to do it the right way.”