During a panel discussion on effective AI integration strategies, AI experts recommended that when developing AI pilot programs, organizations should start small, test thoroughly, and apply best practices.
Federal and industry speakers shared best practices for getting started in AI pilot development during a MeriTalk webinar. These include considering mission objectives, being intentional in AI governance, and maintaining organizational values.
“Right up front when you’re working with generative AI is a focus on ethical and responsible AI,” said Harry Dreany, AI intelligence development lead at the U.S. Marine Corps Warfighting Lab. “If you get to a point where you’re really, really happy with your product and capability, but you haven’t addressed things like ethics and responsible AI, it could lead to you having to sort of circle back and restart, or put a large …negative impact on your timeline and getting it established.”
Jeff Winterich, account chief technologist for HPE’s Department of Defense team, added that once the foundations of pilot development have been addressed, starting small is key to integrating AI into workflows.
“Always start small [with] something that you can get your hands around, show how AI can enhance those operations, and then … move on to a bigger project after you get comfortable with the technology,” said Winterich. “Think of a use case that’s going to move the needle and get everybody excited.”
Replicating real environments with the right data when testing is important to finding the best large language model (LLM) system and producing optimal outcomes, the panelists said. For instance, they said considerations such as who will be using the AI system and whether the tests have enough compute to replicate system speed response are vital.
With limited space to work within sandbox environments, Dreany said designing tests to focus on specific tasks tied to model accuracy and performance while setting clear boundaries and criteria can help to get a thorough understanding of a model’s capabilities.
“Instead of doing discovery learning or sort of a free-for-all, you want your LLM to do very specific things for your organization,” said Dreany. “If I test my LLM to do the things that I need it to do in order … to meet my mission objectives, that will make this test environment very, very efficient and very, very productive for me.”
Testing should also include evaluating the flexibility of models for long-term use, according to Robin Braun, the vice president of AI and data strategy at Government Acquisitions, Inc (GAI).
“You want to make sure that you’re using this year’s model, not two years ago model, because you will have vastly different results,” said Braun.
Additional recommendations shared include developing partnerships with industry and collaborating with other Federal agencies to help with cost efficiency, as well as complying with Federal standards and guidance to mitigate system security concerns.
Watch the full conversation.