AMD
AI & ML interests
None defined yet.
together we advance_AI
AI is increasingly pervasive across the modern world. It’s driving our smart technology in retail, cities, factories and healthcare, and transforming our digital homes. AMD offers advanced AI acceleration from data center to edge, enabling high performance and high efficiency to make the world smarter.
Getting Started with Hugging Face Transformers
AMD’s Ryzen™ AI family of laptop processors provide users with an integrated Neural Processing Unit (NPU) which offloads the host CPU and GPU from AI processing tasks. Ryzen™ AI software consists of the Vitis ™ AI execution provider (EP) for ONNX Runtime combined with quantization tools and a pre-optimized model zoo. All of this is made possible based on Ryzen™ AI technology built on AMD XDNA™ architecture, purpose-built to run AI workloads efficiently and locally, offering a host of benefits for the developer innovating the next groundbreaking AI app. Details on getting started with Hugging Face models are available on the Optimum page
The following section describes how to use the most common transformers on Hugging Face for inference workloads on select AMD Instinct™ accelerators and AMD Radeon™ GPUs using the AMD ROCm software ecosystem. This base knowledge can be leveraged to start fine-tuning from a base model or even start developing your own model. General Linux and ML experience is a required pre-requisite.
1. Confirm you have a supported AMD hardware platform
Is my hardware supported with ROCm on Linux?
2. Install ROCm driver, libraries and tools
Follow the detailed installation instructions for your Linux based platform.
3. Install Machine Learning Frameworks
Pip installation is an easy way to acquire all the required packages and is described in more detail below.
If you prefer to use a container strategy, check out the pre-built images at ROCm Docker Hub and AMD Infinity Hub after installing the required dependancies.
PyTorch
AMD ROCm is fully integrated into the mainline PyTorch ecosystem. Pip wheels are built and tested as part of the stable and nightly releases. Go to pytorch.org and use the 'Install PyTorch' widget. Select 'Stable + Linux + Pip + Python + ROCm' to get the specific pip installation command.
An example command line (note the versioning of the whl file):
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
TensorFlow
AMD ROCm is upstreamed into the TensorFlow github repository. Pre-built wheels are hosted on pipy.org
The latest version can be installed with this command:
pip install tensorflow-rocm
4. Use a Hugging Face Model
Now that you have the base requirements installed, get the latest transformer models.
pip install transformers
This allows you to easily import any of the base models into your python application. Here is an example using GPT2 in PyTorch:
from transformers import GPT2Tokenizer, GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
All of the 200+ standard transformer models are regularly tested with our supported hardware platforms. Note that this also implies that all derivatives of those core models should also function correctly. Let us know if you run into issues at our ROCm Community page
Here are a few of the more popular ones to get you started:
Click on the 'Use in Transformers' button to see the exact code to import a specific model into your Python application.
5. Optimum Support
For a deeper dive into using Hugging Face libraries on AMD GPUs, check out the Optimum page describing details on Flash Attention 2, GPTQ Quantization and ONNX Runtime integration.
Serving a model with TGI
Text Generation Inference (a.k.a “TGI”) provides an end-to-end solution to deploy large language models for inference at scale.
TGI is already usable in production on AMD Instinct™ GPUs through the docker image ghcr.io/huggingface/text-generation-inference:1.2-rocm
.
Make sure to refer to the documentation
concerning the support and any limitations.
Benchmarking
The Optimum-Benchmark is available as a utility to easily benchmark the performance of transformers on AMD GPUs, across normal and distributed settings, with various supported optimizations and quantization schemes.
Useful Links and Blogs
- Detailed Llama-2 results show casing the Optimum benchmark on AMD Instinct MI250
- Check out our blog titled Run a Chatgpt-like Chatbot on a Single GPU with ROCm
- Complete ROCm Documentation for installation and usage
- Extended training content and connect with the development community at the Developer Hub