Senior Reporter,
New York-based Wine Enthusiast offers online customers what it calls everything they need to live the wine lifestyle — from the vino itself to corkscrews, glasses, wine cellars, furniture, and even two magazines on the topic. The company also receives 100,000 customer service inquiries annually.
During the COVID-19 pandemic, the 45-year old online retailer’s presence boomed. Consumers were staying home, nesting, building out their perfect office space, and drinking more.
For more than a year, Wine Enthusiast had been utilizing a SaaS-based system from the San Francisco-based startup Pathlight for performance management metrics of its customer-facing teams. Subsequently, Pathlight introduced a new generative artificial intelligence (genAI) product titled Conversation Intelligence. This product had the capability to transcribe every customer service conversation, grade customer representatives based on company metrics, and identify potential issues.
The large language model (LLM) that forms the basis of the tool uses Wine Enthusiast’s own data to comprehend company policies and procedures, and to determine if a representative adhered to these procedures, and whether or not a customer was satisfied after a call, stated John Burke, the head of customer service and systems at Wine Enthusiast.
Historically, the company had to manually go through each customer service call to identify customer trends or problems, a task unfeasible to perform on a large scale. As a result, Wine Enthusiast could only perform a superficial analysis of customer service conversations. Furthermore, when complaints arose, they were all anecdotal, making the identification of recurring problems nearly impossible.
Now, genAI tools essentially operate as autonomous analysts, according to Burke. The LLMs utilized by the tools can rapidly scan the majority of customer conversations, analyze the content, and condense the transcripts into reports that highlight consumer trends and product issues.
John Burke is the head of Customer Experience at Wine Enthusiast.
He recently had a conversation with Computerworld discussing the implementation of genAI at Wine Enthusiast. They talked about the project’s history, the challenges faced during its implementation, and the resulting benefits.
When asked about the problem genAI was expected to solve, he said: “Our customer service footprint was relatively small, making it incapable of handling the inflow of customers effectively. Contrary to common understanding, customer service is not just point-of-sales service. It also spans product warranties and support. For instance, our wine cellars are designed with a longevity of 10 to 15 years, which means they will require maintenance and spare parts.”
“Being brought into this role, my responsibility was to spearhead the growth of this business segment to meet customer expectations. This is especially crucial in today’s world where customers are used to the immediacy and technological advancements offered by companies like Amazon. However, we had to find a way to achieve this without expanding our workforce by 60 extra hires.”
What was your strategy for solving the problem? Initially, to improve communication with our customers, we transitioned to using Zendesk as our main tool. One issue we encountered, despite the ease of communication provided by Zendesk, was understanding the reasons customers were reaching out to us.
We started by asking our service team to answer a few questions at the end of each conversation to help us identify the topics they discussed. As expected, we discovered that inquiries made up 90% of the reasons for communication. But what were these inquiries about?
I don’t point fingers at the team. They are continuously switching between calls and don’t want to have to pause to answer multiple questions.
My attention is not solely focused on the number of calls or tickets handled. Instead, it’s about the quality and consistency of the service delivered to our customers. We found Pathlight to be beneficial due to its sophisticated coaching platform. This platform aggregates various essential metrics and presents them in an intuitive ‘Health Score.’ This allows the team to understand their performance better.
“Instead of pointing out, ‘you’re achieving commendable results in first-contact resolution but your chat response time needs work,’ we assert, ‘your cumulative Health Score is at 90 and here are the areas you need to focus on for improvement.’
“About twelve months into our alliance with Pathlight…, they announced their plans for developing a product that wouldn’t just evaluate the [service] representative’s performance, but also dissect every single conversation that occurred. This tool could then provide insights on what is being discussed, gauge the sentiment, assess the resolution method used, and determine whether policy and procedure compliance was intact. This innovative approach catalyzed our exploration with AI.”
Do most of your service representative communications take place through voice calls or messaging apps? “Our communication channels comprise of 70% voice calls and the remaining 30% includes everything else. We were faced with the difficulty of extracting meaningful insights from telephonic conversation that sometimes extend up to 20, 30, or even 40 minutes.
“This was the crux of our problem. With Pathlight, we can now assign digital scores to our representatives. But my leadership team was posed with the dilemma of balancing their own tasks with the need to evaluate the team. They were exclaiming, ‘John, it took me 20 minutes just to analyze one phone conversation. How am I supposed to fulfill my responsibilities and also assess the team?’
“Historically, the only occasion we took a look at a [service] recording came whenever a complaint was registered by the customer. We’d then investigate what went awry. We were consistently focusing on the worst conversations to evaluate our team performance, neglecting the hundreds of perfectly pleasant interactions they have.”
Can you tell us about the amount of work involved in evaluating agents before the genAI rollout? “My leadership team isn’t particularly large. I believe they spent about half of their time conducting evaluations. A considerable chunk of these evaluations centred around damage control. For example, when a customer is angry because their order did not arrive in time. Many team members felt as though they were lawyers putting together a case against a client. Roughly half of the management team’s time was consumed by either identifying top performers or those who required extra guidance and training, or simply ensuring adherence to our processes.
“In our case, we pay close attention to specific business metrics. We strive for customer satisfaction, but we also can’t just hand over the store. The challenge lies in striking a balance – making the customers feel satisfied when things go wrong, without instantly resorting to giving a full refund.”
How did your old method of evaluating customer support not meet your company’s objectives? “We found ourselves in a place where we were only glancing at the absolute worst cases. A major hurdle for me was when I attended our marketing and commerce meetings, the issues raised usually included – what products are enjoying popularity or being disliked, what are the reoccurring issues? I knew it was a problem this regular meeting every week began with me asking my team on Slack, ‘What has been the topic of discussion this week?’
“It was so anecdotal, and I felt quirky to present that to the marketing team. Their subsequent follow-up queries were perpetually, ‘Quantity, which clients? Which product lines? Every time all I could retort is, ‘That’s all the information I have.’”
When was the initiation of your deployment of genAI and when was it finalized? “The inception started in August of the previous year and it took nearly a month of adaption. Then we went live around September. We’ve been operative ever since. I’d express we’re quite finished with tweaking the stimulus. We’ve got it quite in tune based on our identity and what conversations ought to resemble, which has been extremely useful.
“We’ve practically eradicated manual grading. We don’t practice it anymore. We merely let the framework handle it.”
Did you have worry that Pathlight’s LLM based on a cloud would use your proprietary data for self-training, and could possibly disclose your data later on? “I’ve pursued AI studies and I appreciate being at the forefront of technology. So, I’ve kept informed about privacy worries and ethical confines of AI —Governance and similar issues. I didn’t instantly have that worry, partly because we’re not a banking entity. We’re not in insurance or healthcare. If the language model desired to learn against our client base, I was not particularly bothered about that.
“Though there were initial concerns — mainly related to customer credit card details — Pathlight was transparent about their model. It is designed to identify and remove such sensitive data, which alleviated my worries. The only data we don’t own is our customers’ personal info, and ensuring its security gave us the confidence to proceed.”
Did you form a dedicated genAI team to implement the platform, or did you primarily lean on Pathlight for their expertise?
“Being a modestly-sized business, we couldn’t dedicate an entirely new team for this initiative. The implementation was largely handled by me and a few of my managers who started working closely with Pathlight. The first interaction where they assessed our calls and demonstrated their preliminary findings was not even through a finalized product but an early prototype. We got to see how the solution was evolving and feel that we contributed to some facets of product development.”
You’ve labeled your genAI technology as “autonomous analysts.” Why did you choose this name, and how does it operate?
“The presentation of the product by Pathlight was somewhat the inverse of its actual value for us. They thought it would primarily help avoid the manual process of evaluating your team and a secondary benefit would be enhanced understanding of customer interactions.
“For us, the value was precisely the opposite: we were more interested in understanding what our customers were talking about and addressing potential issues preemptively. As a result, our team’s performance naturally improved.”
“So, having this robot in the background listening to calls all day long and surfacing the stuff most important to us both on the agent and customer level was incredibly helpful to us, especially when my team’s biggest complaint before was they were spending half their day or more not even doing the work, just listening and scrubbing through calls and then having to go through the manual process of evaluating. That’s another area we struggled in.
“My leadership team has different backgrounds. They have different management styles. One of my managers who has been in this industry for 40 years is a tough grader. It takes a lot to impress her. So, when I looked at scores when manually graded, the agents she evaluated were generally graded a lot lower than one of our other managers who is a little more forgiving.
“When we switched to AI, that bias was removed. What we were seeing was the actual analysis of the conversation without the human nature of thinking, ‘Well, the agent has had a tough week.’ Or ‘the customer was really laying into them, and I think they really did well enough.’ We removed that element from the equation.”
How do you store your customer service interactions, and how is Pathlight’s LLM able to sift through them? “We currently use a cloud-based telephony system called Aircall. Aircall and Pathlight integrate together through APIs. So, basically the conversations are recorded securely on the Aircall side and we give access to Pathlight to access those recordings for a brief period of time to analyze them and move on.
“That was something important to us; We didn’t have to adjust our modus operandi. We maintained the use of our conventional phone and ticketing systems, only permitting Pathlight secure access to the specific data they required for the evaluation.”
Did you encounter any obstacles? For instance, did you have to label your data for easier detection?
“Truth be told, even now, we are still refining it. Much of the utility hinges on the instructions fed into the AI model upfront. In our case, this involved educating the model about our business. It goes beyond simply stating, ‘We market wine.’ You would invariably encounter references to items like corkscrews, furniture, magazine stories, and refunds.”
“After a few iterations with Pathlight’s assistance, we realized, ‘It’s not quite comprehending our customer yet.’”
“Another aspect that required us to train the model was in relation to our procedures. Initially, the AI couldn’t conclusively inform us if a customer’s issue had been addressed. It could not comprehend the meaning of ‘resolved’ in our business context. Would it mean a return? Or a refund? Or a credit? Over time, through repeated refining of the prompts, we managed to help the system grasp that the customer doesn’t always have to conclude the conversation on a happy note, provided we have achieved particular business safeguarding goals and offered them a satisfactory experience. Despite being slightly irritated, we would have still met our anticipated standard.”
“I think that was a learning process for us. We had an initial prompt we built, but it wasn’t until you started seeing the output that we realized we need to tell is a little more about our business, a little more about our products for it to really understand what we were looking for.”
Leave a Reply