Thoughts on AI & Research

Practical insights on LLMs, NLP, agentic systems, and the journey from research to production engineering.

ResearchNLP Feb 24, 2026 8 min read

Why Low-Resource Languages Break LLMs

Most LLM benchmarks only tell half the story. After evaluating 41 models across 19 Persian datasets, here's what we found — and why it matters for billions of people.

ResearchNLP Feb 22, 2026 9 min read

Benchmarking LLMs in Low-Resource Languages — Lessons from Persian

After evaluating 41 models across 19 datasets in MELAC, the results were clear — and uncomfortable. Fine-tuned Persian models lost to generalist frontier models. Here's why.

LLMsEngineering Feb 15, 2026 8 min read

Building Agentic AI Systems: From Theory to Production

After shipping several agentic systems at Rudys.AI, I've distilled the patterns that actually work in production — and the failure modes nobody writes about.

LLMs Jan 10, 2026 10 min read

RAG in Production: What the Tutorials Skip

Retrieval-Augmented Generation looks simple in demos. In production, chunking strategy, reranking, and context window management decide whether it works or fails silently.

LLMs Dec 18, 2025 7 min read

Fine-Tuning vs. Prompting: A Decision Framework

Both approaches work. The right choice depends on data availability, latency budget, and how stable your task definition is. Here's the framework I use to decide.

ResearchNLP Nov 30, 2025 5 min read

Behind the Benchmark: Persian in a Court (ACL 2025)

How we built a vision-language benchmark grounded in Persian legal documents, and why domain grounding exposes LLM weaknesses that general benchmarks miss.

Engineering Nov 5, 2025 9 min read

Async Python Patterns for High-Throughput LLM Apps

When you're making hundreds of concurrent LLM calls, synchronous code becomes your bottleneck. AsyncIO and structured concurrency patterns that I rely on daily.