Andrew Reed

andrewrreed

AI & ML interests

Applied ML, Practical AI, Inference & Deployment, LLMs, Multi-modal Models, RAG

Articles

Organizations

Posts 2

view post
Post
1914
IMO, the "grounded generation" feature from Cohere's CommandR+ has flown under the radar...

For RAG use cases, responses directly include inline citations, making source attribution an inherent part of generation rather than an afterthought 😎

Who's working on an open dataset with this for the HF community to fine-tune with??

πŸ”—CommandR+ Docs: https://docs.cohere.com/docs/retrieval-augmented-generation-rag

πŸ”—Model on the πŸ€— Hub: CohereForAI/c4ai-command-r-plus
view post
Post
πŸš€ It's now easier than ever to switch from OpenAI to open LLMs

Hugging Face's TGI now supports an OpenAI compatible Chat Completion API

This means you can transition code that uses OpenAI client libraries (or frameworks like LangChain 🦜 and LlamaIndex πŸ¦™) to run open models by changing just two lines of code πŸ€—

⭐ Here's how:
from openai import OpenAI

# initialize the client but point it to TGI
client = OpenAI(
    base_url="<ENDPOINT_URL>" + "/v1/",  # replace with your endpoint url
    api_key="<HF_API_TOKEN>",  # replace with your token
)
chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Why is open-source software important?"},
    ],
    stream=True,
    max_tokens=500
)

# iterate and print stream
for message in chat_completion:
    print(message.choices[0].delta.content, end="")


πŸ”— Blog post ➑ https://huggingface.co/blog/tgi-messages-api
πŸ”— TGI docs ➑ https://huggingface.co/docs/text-generation-inference/en/messages_api