Krunal Kanojiya

Technical content writer from India. I write clear, accurate articles on blockchain, AI, machine learning, and software development.

Navigate

Home
About
Writing

Connect

LinkedIn
GitHub
X / Twitter
Medium
DEV Community

LLMs & Deep Learning

How large language models actually work. Covers transformer architecture, attention, pre-training, fine-tuning, RLHF, inference optimization, KV caching, and the deep learning basics that power modern AI systems.

All posts Vector Search & Databases LLMs & Deep Learning RAG AI Engineering & Trends ML Foundations Data Engineering

LLMs & Deep Learning15 min read

Exploration Hacking: The RL Failure Mode That ML Engineers Need to Understand Now

A new paper from MATS, Anthropic, Google DeepMind, and UC San Diego shows that AI models can deliberately underperform during their own RL training to prevent capability updates. Here is what exploration hacking is, how it works, what the experiments found, and what mitigations actually help.

#reinforcement-learning#ai-safety#llm

May 05, 2026Read more

LLMs & Deep Learning12 min read

Why 1M Tokens Is a Trap: The Hidden Cost of Long Context Windows

A 1M-token context window is a capability, not a strategy. This article breaks down why bigger context often leads to worse reasoning, higher costs, and lazy system design - and what disciplined long-context engineering actually looks like.

#long-context#context-window#llm

Apr 22, 2026Read more

LLMs & Deep Learning12 min read

Evaluation, Inference, and Deployment: Shipping an LLM Product That Actually Works

Building a model is one thing. Knowing whether it works, making it fast enough to serve, and keeping it working in production is another. This final article covers benchmarks, quantization, KV cache, latency, and what breaks when you move from research to real users.

#evaluation#benchmarks#quantization

Apr 21, 2026Read more

LLMs & Deep Learning13 min read

KV Cache Explained: How LLMs Generate Text Without Recomputing Everything

KV cache is the reason your LLM can generate text fast without recomputing the entire conversation at every step. This article explains how key-value caching works in transformer inference, why it is both essential and expensive, and how modern systems like vLLM, PagedAttention, and GQA manage it at scale.

#kv-cache#transformer#inference

Apr 21, 2026Read more

LLMs & Deep Learning11 min read

Fine-tuning and RLHF: How a Pre-trained Model Becomes a Useful Assistant

Pre-training gives a model language. Fine-tuning and RLHF give it behavior. This article covers supervised fine-tuning, reward modeling, PPO-based alignment, and Direct Preference Optimization - the full post-training stack that turns a text predictor into an AI assistant.

#fine-tuning#rlhf#sft

Apr 19, 2026Read more

LLMs & Deep Learning14 min read

Pre-training and Language Modeling: How a Transformer Learns to Predict Text

The transformer architecture from Article 6 starts as a random function. Pre-training is what turns it into a language model. This article covers next-token prediction, scaling laws, data quality, and why capabilities like reasoning emerge from a training objective that never mentions them.

#pre-training#language-modeling#scaling-laws

Apr 18, 2026Read more

LLMs & Deep Learning17 min read

Transformer Architecture and Attention: Why Every Modern LLM Is Built This Way

The transformer solved three problems that broke RNNs: sequential computation, vanishing gradients over long distances, and fixed-size bottlenecks. This article walks through self-attention from dot products to multi-head, the full transformer block, and how modern optimizations like FlashAttention and GQA work.

#transformer#attention#self-attention

Apr 15, 2026Read more

LLMs & Deep Learning15 min read

Sequence Modeling and RNNs: The Problem That Made Transformers Inevitable

Before transformers took over, RNNs were the standard approach for sequences. Understanding what they got right, what broke at scale, and exactly why the vanishing gradient problem made long-range learning nearly impossible is what makes transformer attention click into place.

#rnn#lstm#gru

Apr 11, 2026Read more

LLMs & Deep Learning15 min read

Token and Positional Embeddings: How Transformers Represent Words (RoPE Explained)

How a transformer represents tokens inside the model: the token embedding lookup table, positional embeddings, the path from sinusoidal encoding to RoPE and ALiBi, and why vector-space geometry matters for everything downstream.

#embeddings#representation-learning#word2vec

Apr 09, 2026Read more

LLMs & Deep Learning14 min read

Neural Networks and Backpropagation: Where the Math Starts Doing Something

This is where linear algebra and probability stop being theory and start training a model. A full walkthrough of how neural networks are structured, how a forward pass works, how backpropagation computes gradients, and what modern optimizers like AdamW actually do differently.

#neural-networks#backpropagation#deep-learning

Apr 08, 2026Read more

LLMs & Deep Learning16 min read

I Built a Tiny GPT Model From Scratch: Here's Exactly How It Works

How GPT really works, explained by building a 10M-parameter model from scratch in PyTorch. Covers tokenization, attention, transformer blocks, training, and text generation - all in ~300 lines of Python.

#gpt#transformer#pytorch

Mar 31, 2026Read more

LLMs & Deep Learning8 min read

TurboQuant Explained: Google's Breakthrough in AI Model Compression

Google's TurboQuant compresses AI memory by 6x and speeds up attention computation by 8x without retraining. Here is what it actually does, how it works, and what it means for anyone building or running AI systems.

#turboquant#ai#llm

Mar 31, 2026Read more