American artificial intelligence (AI) company, Cerebras Systems, has announced the launch of what it claims to be the fastest AI inference solution in the world.
Cerebras said its new solution allows developers to access its large chips for running applications with Cerebras, stating that it provides a more affordable alternative to industry-standard NVIDIA processors.
Accessing NVIDIA graphics processing units (GPUs) for training and deploying large AI models used in applications like OpenAI’s ChatGPT, can be challenging and costly. This process, known as inference, can be difficult to obtain and expensive to run for developers.
Cerebras’ chips, called Wafer Scale Engines, aims to tackle the issue of data crunching for AI applications by using large single chips instead of hundreds or thousands of chips strung together.
According to its website, the Cerebras inference delivers 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B, which is 20x faster than NVIDIA GPU-based hyperscale clouds. The industry’s best pricing is at 10c per million tokens for Lama 3.1 8B and 60c per million tokens for Llama 3 70B.
CEO at Cerebras, Andrew Feldman, told Reuters the company is ‘delivering performance that cannot be achieved by a GPU’.
“We are doing it at the highest accuracy and we are offering it at the lowest price,” said Feldman.
In January, the non-profit medical centre, Mayo Clinic, and Cerebras Systems said it will collaborate to develop AI models for the healthcare industry.
According to reports, Mayo Clinic will use systems and computing chips from Cerebras to access decades of anonymous data and medical records to develop its own AI models.