DeepInfra
DeepInfra is a serverless inference as a service that provides access to a variety of LLMs and embeddings models. This notebook goes over how to use LangChain with DeepInfra for language models.
Set the Environment API Key
Make sure to get your API key from DeepInfra. You have to Login and get a new token.
You are given a 1 hour free of serverless GPU compute to test different models. (see here)
You can print your token with deepctl auth token
# get a new token: https://deepinfra.com/login?from=%2Fdash
from getpass import getpass
DEEPINFRA_API_TOKEN = getpass()
········
import os
os.environ["DEEPINFRA_API_TOKEN"] = DEEPINFRA_API_TOKEN
Create the DeepInfra instance
You can also use our open-source deepctl tool to manage your model deployments. You can view a list of available parameters here.
from langchain_community.llms import DeepInfra
llm = DeepInfra(model_id="meta-llama/Llama-2-70b-chat-hf")
llm.model_kwargs = {
"temperature": 0.7,
"repetition_penalty": 1.2,
"max_new_tokens": 250,
"top_p": 0.9,
}
# run inferences directly via wrapper
llm("Who let the dogs out?")
'This is a question that has puzzled many people'
# run streaming inference
for chunk in llm.stream("Who let the dogs out?"):
print(chunk)
Will
Smith
.