OpenAI-compatible API serving chat, embeddings, speech-to-text, and text-to-speech. One key, multiple models, zero cloud costs.
# pip install openai from openai import OpenAI client = OpenAI( base_url="https://llm.maxpetrusenko.com/v1", api_key="your-api-key", ) # Chat response = client.chat.completions.create( model="gemma4", messages=[{"role": "user", "content": "Hello!"}], ) print(response.choices[0].message.content) # Vision-language response = client.chat.completions.create( model="shizhengpt", messages=[{"role": "user", "content": [ {"type": "text", "text": "Analyze visible tongue features as JSON."}, {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}, ]}], max_tokens=2048, ) print(response.choices[0].message.content) # Embeddings embeddings = client.embeddings.create( model="nomic-embed-text", input="search query", ) print(f"Dimensions: {len(embeddings.data[0].embedding)}")
# Chat curl https://llm.maxpetrusenko.com/v1/chat/completions \ -H "Authorization: Bearer YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"gemma4","messages":[{"role":"user","content":"Hello!"}]}' # Text-to-Speech curl https://llm.maxpetrusenko.com/v1/audio/speech \ -H "Authorization: Bearer YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"input":"Hello from the gateway."}' \ --output speech.wav
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://llm.maxpetrusenko.com/v1", apiKey: "your-api-key", }); const res = await client.chat.completions.create({ model: "gemma4", messages: [{ role: "user", content: "Hello!" }], }); console.log(res.choices[0].message.content);
| Model | Type | Use for | Details |
|---|---|---|---|
gemma4 | Chat | Reasoning, agent tasks | Google Gemma 4 8B, 32K context, Q4_K_M |
qwen3:8b | Chat | Routing, summaries, classification | Alibaba Qwen 3 8B, fast |
nomic-embed-text | Embedding | Semantic search, RAG retrieval | 768 dimensions, 8K input |
qwen35-35b-a3b-iq2m | Chat/code | Local 35B MoE reasoning/coding on 16GB Mac mini | Qwen3.5-35B-A3B UD-IQ2_M GGUF, 10.6 GiB; aliases: qwen35, qwen35-35b, qwen35-35b-a3b |
shizhengpt | Vision-language | TCM tongue image observations | ShizhenGPT-7B-VL, MLX Q4 |
whisper | STT | Audio transcription | whisper.cpp base.en, Apple Silicon optimized |
piper | TTS | Speech synthesis | piper-tts, en_US lessac medium voice |
Send messages, get completions. Supports chat models and shizhengpt image+text messages.
{
"model": "gemma4",
"messages": [{"role": "user", "content": "Explain quantum computing"}],
"temperature": 0.7
}
Generate vector embeddings for text. Returns 768-dimensional vectors.
{
"model": "nomic-embed-text",
"input": "Your text here"
}
Transcribe audio files to text. Send as multipart/form-data with a file field.
curl https://llm.maxpetrusenko.com/v1/audio/transcriptions \ -H "Authorization: Bearer YOUR_KEY" \ -F "[email protected]"
Convert text to speech. Returns a WAV audio file.
{
"input": "Text you want spoken aloud."
}
List all available models.
Health check. No authentication required.
All endpoints (except /health and /docs) require a bearer token:
Authorization: Bearer your-api-key