AMD unveils its first small language model AMD-135M

Equipped with speculative decoding feature

Advanced Micro Devices (AMD) has announced the launch of its first small language model, AMD-135M, specifically tailored for private business deployments.

The new model is part of the renowned Llama family and comes with a speculative decoding feature.

"AMD is excited to release its very first small language model, AMD-135M with Speculative Decoding," the company said in a blog post.

"This work demonstrates the commitment to an open approach to AI which will lead to more inclusive, ethical, and innovative technological progress, helping ensure that its benefits are more widely shared, and its challenges more collaboratively addressed."

AMD's entry into the AI market is marked by a strategic focus on both hardware and software. The company is leveraging its expertise in chip design to develop cutting-edge AI accelerators while simultaneously investing in AI research and development.

AMD recently made headlines with its acquisition of Silo AI, a European AI firm, although it remains unclear whether the development of AMD-135M is directly tied to this acquisition.

The deal, which is still pending approval from various regulatory authorities, could potentially provide AMD with a broader base of AI expertise for future projects.

One of the key features of AMD-135M is its use of speculative decoding, a technique that allows for faster and more efficient token generation during inference.

This method involves introducing a smaller "draft model" that generates multiple potential tokens in a single forward pass. The tokens are then verified or corrected by a larger, more accurate "target model." This significantly improves inference speed and reduces memory access consumption.

AMD says it has demonstrated substantial performance gains using speculative decoding on various AMD platforms, including the Instinct MI250 accelerator and Ryzen AI processors.

The chip maker has released two versions of AMD-135M: the base model and a coding-optimised variant. Both models benefit from speculative decoding technology, resulting in accelerated inference performance.

AMD-Llama-135M was trained from scratch on a massive dataset consisting of 670 billion tokens of general-purpose data. The training process took six days and was powered by four 8-way AMD Instinct MI250-based nodes, which AMD refers to as "four AMD MI250 nodes."

AMD-Llama-135M-code, builds on the base model and was fine-tuned with an additional 20 billion tokens specifically geared toward coding tasks. This fine-tuning process took four days, again using the same AMD MI250 hardware.

AMD says it is open-sourcing all assets of the AMD-135M model to support AI development and to encourage the use of AMD hardware for training and inference.

In recent months, there has been a continuous flow of announcements about new language models boasting enhanced capabilities compared to their predecessors.

In April, Meta introduced Llama 3, the latest in its large language model (LLM) series, claiming notable performance upgrades over earlier versions. Llama 3 was released in two versions: Llama 3 8B, featuring 8 billion parameters, and Llama 3 70B, with 70 billion parameters.

In July, Microsoft announced a new experimental LLM, called SpreadsheetLLM, specifically designed to tackle the challenge of spreadsheets. While still in research phase, this model aims to bridge the gap between powerful AI models and the complexities of spreadsheets.