Hacker News Clone

Comments

molchanovartemJan 26, 2026, 6:00 AM

Hi HN,

I built a support ticket classifier using a fine-tuned Qwen2.5-0.5B model. It determines intent, category, urgency, sentiment, and routing — all in a single inference.

*Why I built this:* A company needed to automate ticket routing but couldn't use cloud LLM APIs due to data privacy requirements. Self-hosted was the only option.

*Stack:* - Qwen2.5-0.5B-Instruct (fine-tuned, not LoRA) - GGUF Q4_K_M quantization (350MB) - llama-cpp-python + FastAPI - Docker on a $10/mo VPS

*Results:* - ~90% accuracy on intent/category (on synthetic ~4K dataset — with real data and 5-10K examples, accuracy improves) - 150ms on Apple Silicon, 3-5s on budget VPS (old Xeon without AVX2)

*When this makes sense vs cloud APIs:* - Data must stay on-premise - High volume (>10K/month) where API costs add up - Narrow classification task (not general chat)

*Try it:* - Demo: https://silentworks.tech/test - API docs: https://silentworks.tech/docs

Happy to discuss the implementation details, training approach, or deployment setup.

---

Contact: https://t.me/var_molchanov

I fine-tuned a 0.5B LLM to classify support tickets for $10/month

Comments