Augmented LLM performance on Dell PowerEdge XE9680 servers with NVIDIA H100 GPUs

In just a short time, generative AI (Gen AI) use has grown at a fantastic rate, quickly moving from the status of futuristic curiosity to something that many people encounter every day—whether they know it or not. GenAI's potential, however, is much deeper and more transformative than producing a steady stream of content in social media feeds. Companies are seeing the capabilities that GenAI has to revolutionize workflows, create new efficiencies across all aspects of business, refine products and services, and much more. Conversational GenAI applications—such as large language models (LLM) that utilize private in-house data—can help organizations use the power of AI to greatly multiply the value of the data they already have.

A successful in-house augmented LLM project requires hardware that is capable of handling the resource-intensive nature of GenAI. The server solution you choose to power your LLM must be able to support both the number of simultaneous users you expect and quick response times that fall within an acceptable threshold. One way to get started is to invest in more high-performance server hardware, but you don’t want to spend more than necessary or end up with hardware that underperforms. Objective, relevant server performance data that reflects real-world scenarios can help you build a successful in-house LLM initiative from the bottom up.

To help you plan for your GenAI server needs, we used the PTChatterly testing service to assess the augmented LLM performance of a Dell PowerEdge XE9680 server equipped with eight NVIDIA H100 SXM Tensor Core GPUs. In addition, we calculated the expected five-year TCO costs of the XE9680 solution to help explain how much a GenAI project might cost over time. We used a high-precision, very large LLM, Llama 3.1 405B.

We found that the XE9680 solution supported 68 simultaneous users within an acceptable response time threshold—while still leaving room for other workloads. In addition, our research showed that with a rack of six XE9680 servers equipped like those in our tests, an in-house augmented LLM could support up to 408 simultaneous users for a total cost of $8.3M over five years.

If you’re exploring ways to multiply the value of your data with a very large, high-precision LLM, check out this study.