How one HR department’s ChatGPT experiment recovered from ‘total failure’

CHICAGO — After a long journey in management consulting, Susan Anderson was used to horror stories about the new shiny object of a company’s obsession, only for that object to be a passing fad. 

But she had a sense that generative AI would be different, Anderson told the audience at the Society for Human Resource Management 2024 Annual Conference Monday.

Anderson, head of HR compliance services and content at Mitratech, shared how she led her team through a ChatGPT experiment that revealed the limits and the potential of generative AI for them. She gave SHRM attendees tips on how to fail fast — and what employers need to do to enable experimentation at an organizational level.

The first step: ‘A total failure’

Anderson’s team at Mineral, a company within Mitratech, answered complicated HR questions posed by clients. They needed answers to be precise, nuanced and fast; the expectation was a 24-hour turnaround time or less.

The team’s first reaction to ChatGPT was fear, frustration and “self-righteous indignation,” she said. But Anderson sought to make her experts’ demanding jobs easier with tech that was assuredly going to be on the scene regardless of what the team did, and so they experimented with it. 

The company chose ChatGPT because it was among the first generative AI options out of the gate, with security protocols that Anderson and her IT and legal teams saw as sufficient. The experiment team input questions to ChatGPT on topics such as FMLA, FLSA, ADA and immigration, as the topics would not have issues with recency bias — a key consideration in the experiment, as ChatGPT 3.5 (the model at the time), used old data.

That first implementation was “a total failure,” Anderson said. The plan was to experiment with these questions for six weeks. They pulled the plug at four.

The team struggled to understand prompt crafting, such as how to phrase questions so that the tool would give a sufficient answer. “What it gave us back was garbage,” Anderson said. Answers lacked necessary nuance, especially when questions involved issues that concerned both state and federal law — a big part of their work.

The experiment also revealed something about the emotional experience employees may go through when confronted with automated tools, Anderson said. That fear and indignation shifted to frustration and a belief that the tool was not very effective.

Take two: Revealing the critical nature of human involvement

The first failure was not the end of the team’s engagement with ChatGPT, however, Anderson said. 

Mineral experimented again, this time with ChatGPT 4.0. The goal was to explore “human in the loop” collaboration with ChatGPT, and to keep an eye on the employee experience using these tools, keeping the emotional reaction in mind.

For three weeks, every question went through the company’s custom ChatGPT platform, which remembered the company’s preferred way of generating answers. Mineral trained the team on how to input a question while stripping personally identifiable information.

At the end of each week, Anderson asked the experimenters how it was going. How did it perform? How complicated was it?

Quality improved dramatically, Anderson said, though the learning curve remained. Feedback received during the first week was very different from that received the third week, she noted, but productivity increased week to week, too.

The employees learned to challenge the tool and press it for reasoning, further emphasizing the criticality of the human in the loop, Anderson said. The experiment team started to have more fun getting creative with the tool, especially once they understood its limitations and what they could legitimately challenge it on.

Empower experimentation

Anderson provided three tips for companies seeking to improve their own experimentation with AI.

  1. Establish a solid foundation. “You can’t jump into this not having a plan,” Anderson said. How risk tolerant or wary is the organization? Does it have a policy on AI use? If not, make one.
  2. Build knowledge and skills to experiment and overcome resistance. This step was fun for Anderson, she said, watching her employees as they overcame the learning curve. They moved from fear to annoyance to excitement, finding new ways to experiment and innovate with the tool, including challenging its approach.
  3. Foster a culture of experimentation. When a company forces employees to go through an approval process for the use of AI tools, employees become spooked from trying it. “The rules are important, for every reason you can imagine,” Anderson said. But making the tool secure and focusing on how an employee actually uses it can help make tool use and experimentation a stronger part of the culture and work.

Overall, emotional intelligence will be a competitive advantage, Anderson said, especially in encouraging experimentation with these tools. AI won’t displace experts — but it will displace those not using AI, she said.