In-house AI chatbots—LLMs augmented with a group’s own data--are rapidly gaining traction as GenAI tools that can bring significant practical benefits to organizations of all types and sizes. By augmenting an LLM application with internal data, companies can leverage GenAI technology to expand automation, accelerate content creation, improve data analysis and customer outcomes, and much more. By setting up the app on premise, they can engage the workforce-multiplying effects of AI while still protecting data security and sovereignty, ensuring regulatory compliance, and maintaining operational independence.  

For decision-makers interested in exploring the capacity-boosting potential of an in-house AI chatbot in a small business or departmental context, it can be difficult to know what server hardware you truly require. While it’s true that AI apps and processes often require significant computing resources, including powerful and expensive GPUs, this is not always the case. In a recent set of tests, we showed how upgrading to a Supermicro H14 Hyper DP server powered by the latest generation of AMD EPYC processors can effectively support the in-house AI chatbot needs of a small business or department—all with inferencing on the CPUs and without requiring an investment in GPUs. 

We used an end-to-end chatbot benchmark service, PTChatterly, to evaluate the capabilities of two servers: a new Supermicro H14 Hyper DP powered by two AMD EPYC 9965 CPUs and a 4-year-old Supermicro H12 Ultra powered by two older AMD processors. In our tests, the chatbot utilized a Llama LLM augmented with in-house data via local retrieval augmented generation (RAG). For acceptable performance in our tests, a server solution had to support a given number of simultaneous users while the chatbot delivered a complete response to most users within 10 or fewer seconds. 

We found that the new Supermicro H14 Hyper DP server with EPYC 9965 CPUs could provide up to 18 simultaneous users with end-to-end responses in a median of 10 seconds, with answers beginning to appear within one second. In many corporate settings, only a few employees would be interacting with the chatbot at any given time, rather than all hitting it at the same time as our test did, so the H14 can likely support far more users in practice. In contrast, the older Supermicro server configuration could not support even a single user within the acceptable response time. 

Our tests show that by upgrading to a new Supermicro H14 Hyper DP server powered by two AMD EPYC 9965 processors, small organizations or departments within larger companies could effectively support the computational demands of an in-house AI chatbot—allowing them to enjoy the potential advantages of AI while avoiding the cost of expensive GPU-based solutions.  

To learn more about how we evaluated the in-house LLM chatbot capabilities of Supermicro H14 Hyper DP servers powered by AMD EPYC 9965 CPUs, check out the report below.