🗣️ Speech-LLM-Speech: Containerized Conversational AI Pipeline
Developed an end-to-end conversational AI pipeline that processes speech input through three modular components: Automatic Speech Recognition (ASR) using Whisper.cpp, an LLM decision maker integrating OpenAI/Gemini/Ollama APIs, and text-to-speech synthesis via Google Cloud. The system leverages ROS2 for inter-node communication and Docker for seamless deployment. Github repo https://github.com/naren200/speech-llm-speech
%%{init: {'theme': 'dark', 'themeVariables': {
'background': 'transparent',
'primaryBorderColor': '#4FD1C550',
'lineColor': '#4FD1C580',
'textColor': '#FFFFFF',
'edgeLabelBackground': '#1d27383d',
'edgeLabelColor': '#FFFFFF'
}}}%%
graph LR
subgraph Docker_Network["Docker Network | ROS2 Domain"]
direction TB
A[["Whisper ASR
(Docker Container)"]]:::asr -->|/recognized_speech| B[["Decision Maker
(Docker Container)"]]:::llm B -->|/text_to_speak| C[["Google TTS
(Docker Container)"]]:::tts end D[🎤 Audio Input]:::input --> A C --> E[🔈 Synthesized Speech]:::output F[[GPT-4]]:::openai -->|OpenAI API| B G[[Llama 2]]:::huggingface -->|Huggingface API| B H[[Ollama]]:::ollama -->|Local LLM| B classDef asr fill:#2B7A78,stroke:#38B2AC,color:#FFFFFF classDef llm fill:#7295d1b7,stroke:#718096,color:#FFFFFF classDef tts fill:#6B46c157,stroke:#9F7AEA,color:#FFFFFF classDef openai fill:#3182ce63,stroke:#63B3ED classDef huggingface fill:#38a1696e,stroke:#68D391 classDef ollama fill:#dd6c2069,stroke:#F6AD55
(Docker Container)"]]:::asr -->|/recognized_speech| B[["Decision Maker
(Docker Container)"]]:::llm B -->|/text_to_speak| C[["Google TTS
(Docker Container)"]]:::tts end D[🎤 Audio Input]:::input --> A C --> E[🔈 Synthesized Speech]:::output F[[GPT-4]]:::openai -->|OpenAI API| B G[[Llama 2]]:::huggingface -->|Huggingface API| B H[[Ollama]]:::ollama -->|Local LLM| B classDef asr fill:#2B7A78,stroke:#38B2AC,color:#FFFFFF classDef llm fill:#7295d1b7,stroke:#718096,color:#FFFFFF classDef tts fill:#6B46c157,stroke:#9F7AEA,color:#FFFFFF classDef openai fill:#3182ce63,stroke:#63B3ED classDef huggingface fill:#38a1696e,stroke:#68D391 classDef ollama fill:#dd6c2069,stroke:#F6AD55
Technical Highlights
- Containerized ROS2 Nodes: Independently deployable Docker containers for ASR (C++), LLM decision maker (C++), and TTS (C++)
- Multi-LLM Integration: Dynamic API selection between OpenAI, HuggingFace, and local Ollama models
- Real-Time Audio Processing: Whisper.cpp optimization for WAV/MP3 parsing with 2.5s latency
Key Features
- ROS2
/recognized_speech
&/text_to_speak
topics for modular communication - CMake integration for Whisper.cpp with custom audio preprocessing
- Docker Compose orchestration for multi-container deployment
- GPU acceleration support via NVIDIA Container Toolkit
Challenges Solved
- Whisper.cpp Integration: Resolved CMake build issues and audio parsing challenges
- ROS2-Docker Networking: Configured cross-container discovery using shared ROS domains
- LLM Response Optimization: Implemented confidence-based API fallback mechanism
- Audio Format Handling: Added resampling pipeline for MP3/WAV compatibility
Prerequisites
sudo apt install -y gnome-terminal
For further instructions, follow here
Launch Full System
git clone https://github.com/naren200/speech-llm-speech.git
cd speech-llm-speech
# Start all services
./start_all_docker.sh
Demo Video: End-to-End System Walkthrough
Technical Deep Dive
Core Components
Component | Tech Stack | Optimization |
---|---|---|
ASR | C++17, Whisper.cpp | SIMD acceleration |
LLM | Python3.9, FastAPI | Async API calls |
TTS | gTTS, SoundFile | Audio buffering |
Troubleshooting
Common Issues:
- Audio Device Permissions:
sudo usermod -aG audio $USER sudo reboot
- ROS2 Discovery:
export ROS_DOMAIN_ID=42 export ROS_LOCALHOST_ONLY=0
Debugging Modes:
# Developer mode with shell access
./start_docker.sh transcribe --developer=true
# Force rebuild containers
./start_docker.sh decide --build=true
Reference information:
Github Page: https://github.com/naren200/speech-llm-speech
Demo Video: https://youtu.be/7YaoBxjnQag
Docker Hub: https://hub.docker.com/u/naren200