Federal data managers need to know the personality of their data sets in order to control them, according to Thomas Beach, chief data strategist and portfolio manager of the U.S. Patent and Trademark Office’s Digital Service and Big Data department.
Beach said USPTO’s data team benefited from stepping back and looking at what their data really was, rather than just forming a strategy for managing it. Beach spoke at a webinar titled “Government’s IT Modernization Imperative” sponsored by Dell EMC and Govplace on April 19.
“Know the personality of data and what it means,” Beach said. “Optimization means critical paths and the user community.”
Last April, USPTO launched its Open Data and Mobility program, a portal that makes the agency’s data available to the public, allowing people to view information on patents and research history.
Beach said the agency is continuing to double down on big data projects since the open data portal’s launch. For example, USPTO created a repository containing examiners’ frequent reference documents. The platform stores reams of reference materials; examiners’ favorites are located near the top of the page and “bubble up to them” when they visit the site.
“We look at what we do as cognitive assistance. We’re not looking to replace, we’re looking to improve areas,” Beach said. “They’re now starting from the 50-yard line, as opposed to the first yard line.”
USPTO, one of the oldest Federal offices, has been disseminating information since its inception. However, Beach admitted that it could be better at making its data accessible to the public.
“There’s a huge pent-up appetite for a better understanding of what we do,” Beach said. “For that to happen, the public needs to have access to it.”
USPTO is not the only Federal agency overseeing big data projects while trying to modernize infrastructure. The Government Accountability Office called for agencies to prioritize IT modernization in a May 2016 report revealing that the Federal government spent more than 75 percent of the IT budget for fiscal year 2015 on maintaining outdated technology.
The National Cancer Institute, a department within the National Institutes of Health, analyzes large amounts of patient data with machine learning. Jeff Shilling, chief of IT and infrastructure services at NCI’s Center for Biomedical Informatics and Information Technology, said the institute built a cloud-based storage system for their newly created genome sequencing technology.
In a contract with the University of Chicago, NCI CBIT constructed the Genomic Data Commons to store and manage data related to the new technology. He said the people behind big data analytics are what make computers’ results meaningful.
“They can do one-off experiments where a lot of time that’s hindered because it takes two weeks to get something up and running,” Shilling said. “You have to have a great workforce. They’re the thing that turns these fast stupid machines into solutions.”
George Jakabcin, chief information officer of the Treasury Inspector General for Tax Administration, echoed Beach, stressing the importance of knowing certain data sets’ sensitivity. Although TIGTA’s data analytics program is eight years old, Jakabcin said their program is not as sophisticated as USPTO’s.
He also said that employees need to have an understanding of their agency’s big data mission.
“You’ve got to understand what your data is and how sensitive it is. This is relatively new territory for us all,” Jakabcin said. “We often get in the conundrum of doing the same thing over and over. We need to have a vision and set the compass rose so we’re all moving in the same direction. We’re working with fast idiots. That’s what computers are.”