Spring AI now supports NVIDIA’s Large Language Model API, offering integration with a wide range of models. By leveraging NVIDIA’s OpenAI-compatible API, Spring AI allows developers to use NVIDIA’s LLMs through the familiar Spring AI API.
We’ll explore how to configure and use the Spring AI OpenAI chat client to connect with NVIDIA LLM API.
- The demo application code is available in the nvidia-llm GitHub repository.
- The SpringAI / NVIDIA integration documentation.
Prerequisite
- Create NVIDIA account with sufficient credits.
- Select your preferred LLM model from NVIDIA’s offerings. Like the
meta/llama-3.1-70b-instruct
in the screenshot below. - From the model’s page, obtain the API key for your chosen model.
Dependencies
To get started, add the Spring AI OpenAI starter to your project.
For Maven, add this to your pom.xml:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
For Gradle, add this to your build.gradle:
gradleCopydependencies {
implementation 'org.springframework.ai:spring-ai-openai-spring-boot-starter'
}
Ensure you’ve added the Spring Milestone and Snapshot repositories and add the Spring AI BOM.
Configuring Spring AI
To use NVIDIA LLM API with Spring AI, we need to configure the OpenAI client to point to the NVIDIA LLM API endpoint and use NVIDIA-specific models.
Add the following environment variables to your project:
export SPRING_AI_OPENAI_API_KEY=<NVIDIA_API_KEY>
export SPRING_AI_OPENAI_BASE_URL=https://integrate.api.nvidia.com
export SPRING_AI_OPENAI_CHAT_OPTIONS_MODEL=meta/llama-3.1-70b-instruct
export SPRING_AI_OPENAI_EMBEDDING_ENABLED=false
export SPRING_AI_OPENAI_CHAT_OPTIONS_MAX_TOKENS=2048
Alternatively, you can add these to your application.properties file:
spring.ai.openai.api-key=<NVIDIA_API_KEY>
spring.ai.openai.base-url=https://integrate.api.nvidia.com
spring.ai.openai.chat.options.model=meta/llama-3.1-70b-instruct
# The NVIDIA LLM API doesn't support embeddings.
spring.ai.openai.embedding.enabled=false
# The NVIDIA LLM API requires this parameter to be set explicitly or error will be thrown.
spring.ai.openai.chat.options.max-tokens=2048
Key points:
- The
api-key
is set to your NVIDIA API key. - The
base-url
is set to NVIDIA’s LLM API endpoint: https://integrate.api.nvidia.com - The
model
is set to one of the models available on NVIDIA’s LLM API. - The NVIDIA LLM API reuquires the
max-tokens
to be explicitly set or a server error will be thrown. - Since the NVIDIA LLM API is LLM only we can disable the embedding endpong:
embedding.enabled=false
.
Check the reference documentation for the complete list of configuration properties.
Code Example
Now that we’ve configured Spring AI to use NVIDIA LLM API, let’s look at a simple example of how to use it in your application.
@RestController
public class ChatController {
private final ChatClient chatClient;
@Autowired
public ChatController(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
@GetMapping("/ai/generate")
public String generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
return chatClient.prompt().user(message).call().content();
}
@GetMapping("/ai/generateStream")
public Flux<String> generateStream(
@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
return chatClient.prompt().user(message).stream().content();
}
}
In the ChatController.java example, we’ve created a simple REST controller with two endpoints:
/ai/generate
: Generates a single response to a given prompt./ai/generateStream
: Streams the response, which can be useful for longer outputs or real-time interactions.
Tool/Function Calling
NVIDIA LLM API endpoints support tool/function calling when selecting one of the Tool/Function supporting models.
You can register custom Java functions with your ChatModel and have the provided LLM model intelligently choose to output a JSON object containing arguments to call one or many of the registered functions.
This is a powerful technique to connect the LLM capabilities with external tools and APIs.
Find more about SpringAI/OpenAI Function Calling support.
Tool Example
Here’s a simple example of how to use too/function calling with Spring AI:
@SpringBootApplication
public class NvidiaLlmApplication {
public static void main(String[] args) {
SpringApplication.run(NvidiaLlmApplication.class, args);
}
@Bean
CommandLineRunner runner(ChatClient.Builder chatClientBuilder) {
return args -> {
var chatClient = chatClientBuilder.build();
var response = chatClient.prompt()
.user("What is the weather in Amsterdam and Paris?")
.functions("weatherFunction") // reference by bean name.
.call()
.content();
System.out.println(response);
};
}
@Bean
@Description("Get the weather in location")
public Function<WeatherRequest, WeatherResponse> weatherFunction() {
return new MockWeatherService();
}
public static class MockWeatherService implements Function<WeatherRequest, WeatherResponse> {
public record WeatherRequest(String location, String unit) {}
public record WeatherResponse(double temp, String unit) {}
@Override
public WeatherResponse apply(WeatherRequest request) {
double temperature = request.location().contains("Amsterdam") ? 20 : 25;
return new WeatherResponse(temperature, request.unit);
}
}
}
In the NvidiaLlmApplication.java example, when the model needs weather information, it will automatically call the weatherFunction
bean, which can then fetch real-time weather data.
The expected response looks like this:
The weather in Amsterdam is currently 20 degrees Celsius, and the weather in Paris is currently 25 degrees Celsius.
Key Considerations
When using NVIDIA LLM API with Spring AI, keep the following points in mind:
- Model Selection: NVIDIA offers a wide range of models from various providers. Choose the appropriate model for your use case.
- API Compatibility: The NVIDIA LLM API is designed to be compatible with the OpenAI API, which allows for easy integration with Spring AI.
- Performance: NVIDIA’s LLM API is optimized for high-performance inference. You may notice improved response speeds, especially for larger models.
- Specialized Models: NVIDIA offers models specialized for different tasks, such as code completion, math problems, and general chat. Select the most appropriate model for your specific needs.
- API Limits: Be aware of any rate limits or usage quotas associated with your NVIDIA API key.
References
For further information check the Spring AI and OpenAI reference documentations.
- Spring AI NVIDIA Reference docs
- NVIDIA LLM API
- Spring AI – NVIDIA – Demo
- Previous Spring AI blog posts:
- Spring AI Embraces OpenAI’s Structured Outputs: Enhancing JSON Response Reliability
- Spring AI with Groq – a blazingly fast AI inference engine
- Spring AI with Ollama Tool Support
- Spring AI – Structured Output
- Spring AI – Multimodality – Orbis Sensualium Pictus
- Function Calling in Java and Spring AI using the latest Mistral AI API
- Spring Cloud Function for Azure Function
Conclusion
Integrating NVIDIA LLM API with Spring AI opens up new possibilities for developers looking to leverage high-performance AI models in their Spring applications.
By repurposing the OpenAI client, Spring AI makes it straightforward to switch between different AI providers, allowing you to choose the best solution for your specific needs.
As you explore this integration, remember to stay updated with the latest documentation from both Spring AI and NVIDIA LLM API, as features and model availability may evolve over time.
We encourage you to experiment with different models and compare their performance and outputs to find the best fit for your use case.
Happy coding, and enjoy the speed and capabilities that NVIDIA LLM API brings to your AI-powered Spring applications!