K
Krunal Kanojiya
HomeAboutServicesBlog
Hire Me
K
Krunal Kanojiya

Technical Content Writer

BlogRSSSitemapEmail
© 2026 Krunal Kanojiya · Built with Next.js
Privacy PolicyTerms of Service
  1. Home
  2. /
  3. Blog
  4. /
  5. LLMs & Deep Learning

Category · 12 articles

LLMs & Deep Learning

How large language models actually work — transformer architecture and attention, pre-training and fine-tuning, RLHF, inference optimization, KV caching, and the deep learning fundamentals that power modern AI systems.

All postsVector Search & DatabasesLLMs & Deep LearningRAGAI Engineering & TrendsML FoundationsData Engineering
LLMs & Deep Learning15 min read

Exploration Hacking: The RL Failure Mode That ML Engineers Need to Understand Now

A new paper from MATS, Anthropic, Google DeepMind, and UC San Diego shows that AI models can deliberately underperform during their own RL training to prevent capability updates. Here is what exploration hacking is, how it works, what the experiments found, and what mitigations actually help.

#reinforcement-learning#ai-safety#llm
May 05, 2026Read more
LLMs & Deep Learning12 min read

Why 1M Tokens Is a Trap: The Hidden Cost of Long Context Windows

A 1M-token context window is a capability, not a strategy. This article breaks down why bigger context often leads to worse reasoning, higher costs, and lazy system design — and what disciplined long-context engineering actually looks like.

#long-context#context-window#llm
Apr 22, 2026Read more
LLMs & Deep Learning12 min read

Evaluation, Inference, and Deployment: Shipping an LLM Product That Actually Works

Building a model is one thing. Knowing whether it works, making it fast enough to serve, and keeping it working in production is another. This final article covers benchmarks, quantization, KV cache, latency, and what breaks when you move from research to real users.

#evaluation#benchmarks#quantization
Apr 21, 2026Read more
LLMs & Deep Learning13 min read

KV Cache Explained: How LLMs Generate Text Without Recomputing Everything

KV cache is the reason your LLM can generate text fast without recomputing the entire conversation at every step. This article explains how key-value caching works in transformer inference, why it is both essential and expensive, and how modern systems like vLLM, PagedAttention, and GQA manage it at scale.

#kv-cache#transformer#inference
Apr 21, 2026Read more
LLMs & Deep Learning11 min read

Fine-tuning and RLHF: How a Pre-trained Model Becomes a Useful Assistant

Pre-training gives a model language. Fine-tuning and RLHF give it behavior. This article covers supervised fine-tuning, reward modeling, PPO-based alignment, and Direct Preference Optimization — the full post-training stack that turns a text predictor into an AI assistant.

#fine-tuning#rlhf#sft
Apr 19, 2026Read more
LLMs & Deep Learning14 min read

Pre-training and Language Modeling: How a Transformer Learns to Predict Text

The transformer architecture from Article 6 starts as a random function. Pre-training is what turns it into a language model. This article covers next-token prediction, scaling laws, data quality, and why capabilities like reasoning emerge from a training objective that never mentions them.

#pre-training#language-modeling#scaling-laws
Apr 18, 2026Read more
LLMs & Deep Learning17 min read

Transformer Architecture and Attention: Why Every Modern LLM Is Built This Way

The transformer solved three problems that broke RNNs: sequential computation, vanishing gradients over long distances, and fixed-size bottlenecks. This article walks through self-attention from dot products to multi-head, the full transformer block, and how modern optimizations like FlashAttention and GQA work.

#transformer#attention#self-attention
Apr 15, 2026Read more
LLMs & Deep Learning15 min read

Sequence Modeling and RNNs: The Problem That Made Transformers Inevitable

Before transformers took over, RNNs were the standard approach for sequences. Understanding what they got right, what broke at scale, and exactly why the vanishing gradient problem made long-range learning nearly impossible is what makes transformer attention click into place.

#rnn#lstm#gru
Apr 11, 2026Read more
LLMs & Deep Learning15 min read

Embeddings and Representation Learning: How Models Turn Words Into Math

Embeddings are how neural networks turn raw tokens into something they can actually reason about. This article covers token embeddings, positional embeddings, the evolution from Word2Vec to RoPE, and why the geometry of the vector space matters for everything downstream.

#embeddings#representation-learning#word2vec
Apr 09, 2026Read more
LLMs & Deep Learning14 min read

Neural Networks and Backpropagation: Where the Math Starts Doing Something

This is where linear algebra and probability stop being theory and start training a model. A full walkthrough of how neural networks are structured, how a forward pass works, how backpropagation computes gradients, and what modern optimizers like AdamW actually do differently.

#neural-networks#backpropagation#deep-learning
Apr 08, 2026Read more
LLMs & Deep Learning16 min read

I Built a Tiny GPT Model From Scratch: Here's Exactly How It Works

How GPT really works, explained by building a 10M-parameter model from scratch in PyTorch. Covers tokenization, attention, transformer blocks, training, and text generation — all in ~300 lines of Python.

#gpt#transformer#pytorch
Mar 31, 2026Read more
LLMs & Deep Learning8 min read

TurboQuant Explained: Google's Breakthrough in AI Model Compression

Google's TurboQuant compresses AI memory by 6x and speeds up attention computation by 8x without retraining. Here is what it actually does, how it works, and what it means for anyone building or running AI systems.

#turboquant#ai#llm
Mar 31, 2026Read more