In the rapidly evolving field of software development, artificial intelligence has emerged as a powerful technology for enhancing productivity and code quality. This article discusses a convenient approach to implementing AI-assisted development environments using VMware Cloud Foundation (VCF) 5.2.1, the comprehensive private cloud platform deployed on a CPU-based architecture, to successfully run smaller-sized large language models (LLMs).
Benefits of VCF for deploying small language models
VCF deployed on CPU-based architecture forms the infrastructural layer to make AI assistants a reality for developers. VMware vSphere is one of the critical components of VCF. vSphere 8 Update 3 offers several features that are particularly beneficial for deploying and managing VMs optimized for AI workloads:
- Enhanced NUMA Awareness: vSphere provides improved NUMA scheduling capabilities, which are critical for optimizing the performance of large-scale, CPU-intensive workloads like AI models.
- Advanced Resource Management: The platform allows for fine-grained control over CPU and memory resources, enabling administrators to create VMs that are precisely tailored for AI workloads.
- Automated VM Deployment: vSphere supports the automated deployment of specialized VMs through features like vSphere Content Library and VM Templates, making it easier to scale AI-assisted development environments across an organization.
- Performance Optimization: Enhanced DRS algorithms allow vSphere to optimize VM placement across hosts, ensuring that AI workloads can always access the needed resources.
- Integration with Modern Development Tools: vSphere’s compatibility with containerization and modern CI/CD pipelines allows for seamless integration of AI-assisted development tools into existing workflows.
Technical Overview
The AMD EPYC™ 9654 CPU, featuring 96 cores per socket, is the core of this setup. Combined with optimized software and the advanced virtualization capabilities of VCF, these processors perform well in running smaller-sized LLMs such as StarChat2 15B, a model known for its proficiency in code generation and completion tasks.
llama.cpp: Powering Efficient LLM Inference
llama.cpp is a crucial component in our AI-assisted development setup. It’s an open-source C++ library designed for efficient LLM inference, particularly on CPU-based systems. Here are some key features of llama.cpp:
- CPU Optimization: llama.cpp is optimized explicitly for CPU inference, making it ideal for our AMD EPYC™-based setup.
- Quantization Techniques: It employs advanced quantization methods to reduce model size and computational requirements without significantly sacrificing model quality.
- Cross-Platform Compatibility: llama.cpp can run on various platforms with limited resources, enhancing its versatility.
- Performance Enhancements: When compiled with AMD AOCL and AOCC, llama.cpp can leverage the AMD EPYC™ architecture’s high core counts and large cache sizes, resulting in substantial performance improvements for LLM workloads.
- Memory Efficiency: Its efficient memory management allows large models to be run on systems with limited RAM.
- Customization Options: llama.cpp offers various parameters for fine-tuning inference, such as context size, temperature, and top-k sampling.
StarChat2 15B: A Powerful LLM for Code Generation
StarChat2 15B (version 0.1) is our large language model for this AI-assisted development environment. Here are some key attributes of this LLM:
- Model Size: With 15 billion parameters, StarChat2 15B balances model complexity, and computational requirements, making it suitable for CPU-based inference.
- Context Window: It features a 16k token context size, allowing it to process longer code snippets and more complex programming tasks.
- Coding Proficiency: StarChat2 15B is trained explicitly for coding tasks, making it highly effective for code generation, completion, and explanation.
- Multi-language Support: The model can handle multiple programming languages, enhancing its versatility in diverse development environments.
- Instruction Following: StarChat2 15B can follow detailed instructions, making it ideal for generating specific code implementations or unit tests.
- Quantization Compatibility: The model we use is quantized (Q5_K_S version), which reduces its memory footprint and computational requirements while maintaining high performance.
- Fine-tuning Potential: While we use a pre-trained version, StarChat2 15B can be further fine-tuned for specific coding styles or domain-specific tasks if needed.
AI-Assisted XGBoost Model Implementation
To demonstrate the capabilities of this AI-assisted development environment, we used the StarChat2 15B model to generate code for training an XGBoost model on the Iris dataset. Here’s a breakdown of the process:
- Data Loading and Preprocessing: The AI assistant generated code to load the Iris dataset using scikit-learn and split it into training and testing sets.
- Model Training: The code created an XGBClassifier instance and fit it to the training data.
- Evaluation: The model’s performance was evaluated using an accuracy score on the test set.
In our setup, we integrated a llama.cpp instance (running a quantized version of the StarChat2 15B LLM) with MS VS Code via the “collama” extension. From the collama’s chat box, we submitted the following prompt:
“Write the Python code to train an XGBoost model for classifying the Iris dataset. Include data loading, preprocessing, model training, and a simple evaluation. Use scikit-learn to load the dataset and split it into training and testing sets. Return the code in a single cell and add comments where you consider appropriate.”
Seconds after submitting the query, one can see the AI assistant generating the response to the query, which eventually looks like what is shown in the following code block:
# Import required libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from xgboost import XGBClassifier
# Load Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create XGBoost classifier
model = XGBClassifier()
# Train the model
model.fit(X_train, y_train)
# Make predictions on the testing set
y_pred = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(‘Accuracy: %.2f%%‘ % (accuracy * 100.0))
This code demonstrates the AI assistant’s ability to generate Python code and comments required to implement a complete machine learning workflow, from data preparation to model evaluation.
Automated Unit Test Generation
To further showcase the AI-assisted development environment’s capabilities, we prompted the StarChat2 15B model to create a unit test for the XGBoost model training code. The AI assistant generated a comprehensive unit test using the unit test framework.
Critical aspects of the generated unit test:
- Test Setup: The test case includes necessary imports and sets up the test environment.
- Model Training Verification: The test ensures that the XGBoost model can be trained without errors on the Iris dataset.
- Accuracy Threshold Check: The test verifies that the trained model achieves a minimum accuracy threshold on the test set.
Here’s a snippet of the generated unit test code:
import unittest
from sklearn import datasets
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
class TestXGBoostModel(unittest.TestCase):
def setUp(self):
# Load Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Split the dataset
self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and train the model
self.model = XGBClassifier()
self.model.fit(self.X_train, self.y_train)
def test_model_accuracy(self):
# Make predictions
y_pred = self.model.predict(self.X_test)
# Calculate accuracy
accuracy = accuracy_score(self.y_test, y_pred)
# Assert that accuracy is above a threshold (e.g., 90%)
self.assertGreater(accuracy, 0.9, "Model accuracy is below the expected threshold")
if __name__ == ‘__main__‘:
unittest.main()
This code block demonstrates the AI assistant’s ability to generate unit tests to verify the functionality of the XGBoost model it generated previously and ensure it meets a predefined performance standard.
Conclusion
Implementing AI-powered coding assistants on CPU-based infrastructure, mainly using AMD EPYC™ processors and VCF, represents a significant advancement in development tools. This approach demonstrates the strong capabilities of CPU technology in AI applications. It showcases how VCF, the comprehensive private cloud platform, can be leveraged to deploy, manage, and scale AI-assisted development environments efficiently.
The combination of llama.cpp’s efficient inference capabilities and StarChat2 15B’s powerful code generation abilities create a robust AI-assisted development environment. This setup, running on AMD EPYC™ CPUs and managed on VCF, provides developers with a powerful tool for accelerating coding tasks, improving code quality, and enhancing overall productivity.
As this technology continues to evolve, we expect to see further improvements in performance and capabilities, potentially reshaping the software development landscape. The ability to run sophisticated AI models on CPU infrastructure opens new possibilities for organizations to integrate AI assistance into their development workflows without needing specialized GPU hardware.
Next Steps
If you want to reproduce the setup we used to achieve the results from this article, please refer to the following GitHub repository for detailed setup instructions.