Local Whisper AI Integration Example

Local Whisper AI Integration: Example Setup

This guide provides a practical demonstration of setting up an on-premises Whisper AI instance on the same server as your PBX. The example uses Ubuntu 24.10, with the service endpoint located at http://127.0.0.1:8000/transcribe/.

Key Notes

The whisper_api.py script includes embedded username and password settings for security.
The example uses the tiny model for transcription. You can switch to the large model, but this will require more VRAM allocation. Refer to the OpenAI Whisper GitHub repository for details.

Step 1: Update System and Install Dependencies

Run the following commands to update your system and install the required dependencies:

sudo apt update && sudo apt upgrade -y
sudo apt install ffmpeg git python3-pip python3.12-venv -y

Step 2: Set Up a Python Virtual Environment

Create and activate a Python virtual environment:

python3 -m venv whisper-env
source whisper-env/bin/activate

Step 3: Install Whisper and Required Libraries

Install Whisper and the necessary Python libraries:

pip install openai-whisper
pip install fastapi uvicorn
pip install python-multipart

Step 4: Create the whisper_api.py Script

Create a file named whisper_api.py and add the following code. Update the USERNAME and PASSWORD variables as needed:

from fastapi import FastAPI, UploadFile, File, Depends, HTTPException, status
from fastapi.security import HTTPBasic, HTTPBasicCredentials
import whisper
import shutil

app = FastAPI()
security = HTTPBasic()
model = whisper.load_model("tiny")  # Change to "small", "medium", or "large" if needed

# Define username and password
USERNAME = "user"
PASSWORD = "password"

# Authentication function
def authenticate(credentials: HTTPBasicCredentials = Depends(security)):
    if credentials.username != USERNAME or credentials.password != PASSWORD:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid credentials",
            headers={"WWW-Authenticate": "Basic"},
        )
    return credentials.username

@app.post("/transcribe/")
async def transcribe_audio(file: UploadFile = File(...), user: str = Depends(authenticate)):
    with open(file.filename, "wb") as buffer:
        shutil.copyfileobj(file.file, buffer)

    result = model.transcribe(file.filename)
    return {"user": user, "transcription": result["text"]}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="127.0.0.1", port=8000)

Step 5: Run the API Service in the Background

Start the API service in the background using nohup:

nohup python whisper_api.py > whisper.log 2>&1 &

Step 6: Monitor Logs

To monitor the logs, use the following command:

cat whisper.log

nohup: ignoring input
INFO:     Started server process [1442923]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
/root/whisper-env/lib/python3.12/site-packages/whisper/transcribe.py:126: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
INFO:     127.0.0.1:49828 - "POST /transcribe/ HTTP/1.1" 200 OK