What Is a Vector in Machine Learning? Simple Explanation
A clear, research-backed explanation of what vectors are in machine learning. Learn how they represent data, how vector operations like dot product and cosine similarity work, and why they are the foundation of every ML model and AI application you use.
Every machine learning model you have ever used runs entirely on numbers. When you type a query into a search engine, upload a photo for classification, or ask a language model a question, the first thing that happens before any computation is that your data gets turned into numbers. Not stored as numbers. Represented as numbers in a specific structure.
That structure is a vector.
Understanding what a vector is, and what you can do with one, is not optional background knowledge for ML practitioners. It is the foundation. The dot product, the similarity search, the training loop, the embedding — all of it is vector math.
This article covers what a vector is from first principles, how it represents different types of data, what operations matter in practice, and how vectors connect to the broader concepts of embeddings and vector databases that power modern AI applications.
The Mathematical Definition
In mathematics and physics, a vector is an object with magnitude and direction. That geometric definition is useful for physics problems involving forces and velocity, but in machine learning the more useful definition is simpler: a vector is an ordered list of numbers.
# A 3-dimensional vector
v = [2.5, 170.0, 3.0]The word "ordered" matters. The list [2.5, 170.0, 3.0] and the list [170.0, 3.0, 2.5] are different vectors, even though they contain the same numbers. The position of each number is its meaning.
Each number in a vector is called a component, an element, or a dimension. A vector with three elements lives in three-dimensional space. A vector with 768 elements lives in 768-dimensional space. The math is the same regardless of the number of dimensions.
According to GeeksforGeeks' machine learning documentation, vectors in machine learning are fundamental for data representation and are applied in classification, regression, clustering, and deep learning.
The Hierarchy: Scalars, Vectors, Matrices, Tensors
Before going further, the relationship between these four terms is worth establishing clearly because they appear constantly in ML literature.
A scalar is a single number. Temperature in Celsius, a probability score, a loss value — all scalars.
temperature = 36.6 # scalar
loss = 0.342 # scalar
probability = 0.87 # scalarA vector is a one-dimensional ordered list of numbers. Each element is a scalar.
# Feature vector for a house
house = [1200.0, 3.0, 8.5, 2.0]
# [area_sqft, bedrooms, school_rating, bathrooms]A matrix is a two-dimensional grid of numbers. Think of it as multiple vectors stacked as rows or columns. In neural networks, weight matrices are what the model learns during training.
import numpy as np
# Weight matrix: 3 inputs, 2 neurons
W = np.array([
[0.5, -0.3],
[0.8, 0.1],
[-0.2, 0.9]
])
# Shape: (3, 2)A tensor is the generalization of all the above. A scalar is a rank-0 tensor. A vector is a rank-1 tensor. A matrix is a rank-2 tensor. An image stored as height x width x channels is a rank-3 tensor. PyTorch and TensorFlow use tensors as their core data structure for exactly this reason — they can represent any shape of numerical data.
What a Vector Actually Represents
The abstract definition is clear enough. The more important question is what a vector represents when you are solving a real problem.
Example 1: Tabular Data
Suppose you are predicting house prices. Each house in your dataset has several measurable attributes: area in square feet, number of bedrooms, number of bathrooms, distance from city center in kilometers, and age of the property in years.
# House A
house_a = [1450.0, 3.0, 2.0, 5.2, 12.0]
# [area, bedrooms, bathrooms, dist_km, age_years]
# House B
house_b = [980.0, 2.0, 1.0, 8.7, 25.0]Each house is now a point in a five-dimensional space. The linear regression model learns a weight vector w such that the dot product of w and the house vector approximates the price.
# Linear regression: price = w · x + b
w = np.array([120.0, 8500.0, 6000.0, -3200.0, -500.0])
b = 50000.0
house_a = np.array([1450.0, 3.0, 2.0, 5.2, 12.0])
predicted_price = np.dot(w, house_a) + b
print(f"Predicted price: ${predicted_price:,.0f}")
# Predicted price: $279,060The equation Y = Xw + b where X is a feature vector, w is a weights vector, and b is the bias term is the defining formula of linear regression, as described in GeeksforGeeks' ML reference.
Example 2: Text Data
Text cannot be fed into a model directly. It has to be converted to numbers. One classic technique is to represent a document as a vector of word counts. If your vocabulary is 10,000 words, each document becomes a 10,000-dimensional vector where each element is how many times that word appears.
# Vocabulary (simplified): ["cat", "dog", "run", "sleep", "bark"]
# Index: 0 1 2 3 4
doc1 = np.array([3, 0, 1, 2, 0]) # "cat cat cat run sleep sleep"
doc2 = np.array([0, 4, 0, 1, 3]) # "dog dog dog dog sleep bark bark bark"
doc3 = np.array([2, 1, 1, 3, 0]) # mixed contentModern techniques go further. Word2Vec, introduced by Tomas Mikolov and colleagues at Google in 2013, maps each word to a dense vector of 100 to 300 dimensions where words used in similar contexts end up close together in vector space. The classic demonstration is that vector arithmetic captures semantic relationships:
vector("king") - vector("man") + vector("woman") ≈ vector("queen")That is not a trick. It is the geometry of the learned vector space working correctly.
Example 3: Images
A grayscale image of 28x28 pixels contains 784 individual pixel values, each between 0 and 255. Flatten that two-dimensional grid into a single list and you have a 784-dimensional vector.
from PIL import Image
import numpy as np
img = Image.open("digit.png").convert("L") # grayscale
pixel_array = np.array(img) # shape: (28, 28)
feature_vector = pixel_array.flatten() # shape: (784,)
print(feature_vector.shape)
# (784,)A color image adds a third channel dimension. A 224x224 RGB image used by models like ResNet flattens to 150,528 dimensions before any learned representation is applied.
Vector Operations That Matter
Knowing what a vector is matters less than knowing what you can do with one. Three operations appear everywhere in machine learning.
1. Vector Addition and Subtraction
Adding two vectors produces a new vector. Subtraction works the same way. Both operations are element-wise.
a = np.array([2.0, 3.0, 5.0])
b = np.array([1.0, 4.0, 2.0])
addition = a + b # [3.0, 7.0, 7.0]
subtraction = a - b # [1.0, -1.0, 3.0]In neural networks, adding a bias vector to the output of a weight multiplication is one of the most repeated operations in the entire forward pass.
2. Dot Product
The dot product multiplies corresponding elements of two vectors and sums the results. It produces a single number (a scalar).
a = np.array([2.0, 3.0, 5.0])
b = np.array([1.0, 4.0, 2.0])
# Manual dot product
dot = (2.0 * 1.0) + (3.0 * 4.0) + (5.0 * 2.0)
# = 2 + 12 + 10 = 24
# NumPy
dot = np.dot(a, b) # 24.0The dot product is the key operation in almost every neural network layer. When a layer multiplies input activations by its weight matrix, it performs a dot product for each neuron. The result tells the neuron how much the input aligns with the pattern the weight vector has learned. According to Machine Learning Mastery, the dot product is the key tool for calculating vector projections, vector decompositions, and determining orthogonality.
3. Cosine Similarity
Cosine similarity measures the angle between two vectors rather than the distance between them. It ranges from -1 (opposite directions) to 1 (identical direction), with 0 meaning no relationship.
The formula:
cosine_similarity(A, B) = (A · B) / (||A|| × ||B||)Where ||A|| is the L2 norm (length) of vector A.
import numpy as np
from numpy.linalg import norm
def cosine_similarity(vec_a, vec_b):
dot = np.dot(vec_a, vec_b)
return dot / (norm(vec_a) * norm(vec_b))
# Two documents about machine learning
doc_ml_1 = np.array([3, 2, 0, 5, 1]) # word counts for topic A words
doc_ml_2 = np.array([4, 1, 0, 6, 2]) # slightly longer, same topic
# A document about cooking
doc_cooking = np.array([0, 0, 7, 0, 0])
sim_same_topic = cosine_similarity(doc_ml_1, doc_ml_2)
sim_diff_topic = cosine_similarity(doc_ml_1, doc_cooking)
print(f"Same topic: {sim_same_topic:.4f}") # ~0.9990 — very similar
print(f"Diff topic: {sim_diff_topic:.4f}") # ~0.0000 — not similarThis is exactly how vector search in vector databases works. The query is converted to a vector, and the database finds stored vectors with the highest cosine similarity. According to the Towards Data Science article on cosine similarity, cosine similarity is preferred over Euclidean distance for text and document search because it normalizes for document length.
Row Vectors vs Column Vectors
This distinction matters more than it seems when you start reading research papers.
A row vector arranges elements horizontally: v = [x1, x2, x3]. Its shape is (1, 3).
A column vector arranges elements vertically, with each element on its own line. Its shape is (3, 1).
row_vec = np.array([[2, 3, 5]]) # shape: (1, 3)
col_vec = np.array([[2], [3], [5]]) # shape: (3, 1)
# In practice, 1-D arrays are more common
flat_vec = np.array([2, 3, 5]) # shape: (3,)NumPy's documentation handles both, but the distinction is critical when doing matrix multiplication. A (1, 3) row vector multiplied by a (3, 3) weight matrix gives a (1, 3) output. Transposing the vector changes the result entirely.
Dense Vectors vs Sparse Vectors
Not all vectors look the same. The classification into dense and sparse is important because each requires a different storage and indexing strategy.
A dense vector has most or all elements as non-zero values. Embedding models produce dense vectors. A typical text embedding from OpenAI's embedding API produces 1536 non-zero floats.
# Dense vector — all elements carry information
dense = np.array([0.41, -1.22, 0.03, 2.18, 0.77, -0.55, 0.91, ...])A sparse vector has most elements as zero, with a small number of non-zero values. Classic bag-of-words representations are sparse. In a vocabulary of 50,000 words, most documents use fewer than 500 unique words — so 49,500 elements are zero.
# Sparse vector — almost all zeros
# Vocabulary size: 10,000 words
# Document uses only 4 unique words
sparse = [0, 0, 0, 3, 0, 0, ..., 1, 0, ..., 2, 0, ..., 5, ...]
# ^ ^ ^ ^
# word at index 3 index 1200 index 7400 index 9998The dense vs sparse comparison is covered in full detail in the dedicated cluster article. The short version: dense vectors capture semantic meaning, sparse vectors capture lexical precision, and hybrid search combines both.
How Vectors Flow Through a Neural Network
A neural network is, at its core, a sequence of vector operations. Understanding this removes much of the mystery around how deep learning works.
Input Layer
x = [0.5, 1.2, -0.3, 0.8] ← input vector (4 features)
Hidden Layer 1
z1 = W1 · x + b1 ← matrix multiply + bias add
a1 = ReLU(z1) ← apply activation function
Hidden Layer 2
z2 = W2 · a1 + b2
a2 = ReLU(z2)
Output Layer
z3 = W3 · a2 + b3
output = Softmax(z3) ← probabilities across classesEvery weight matrix W1, W2, W3 is learned during training through backpropagation. The gradients that update those weights are also vectors and matrices. The entire learning process is linear algebra applied repeatedly.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville's deep learning textbook dedicates its second chapter entirely to the linear algebra foundations — vectors, matrices, and tensor operations — before covering any learning algorithm. That sequencing is deliberate.
Vectors in Specific ML Algorithms
The same vector abstraction shows up across completely different algorithm families.
Linear Regression
The model learns a weight vector w that minimizes the error between w · x and the true label y. The prediction is a dot product between the input vector and the learned weight vector.
Support Vector Machines
SVMs find a hyperplane that maximally separates two classes. That hyperplane is defined by a normal vector. New data points are classified based on which side of the hyperplane they fall on, calculated via a dot product. The name "support vector" refers to the training examples closest to the decision boundary.
K-Means Clustering
K-Means assigns each data point to the cluster whose centroid is closest. The centroid is the average vector of all points in the cluster. Distance is measured using Euclidean distance between vectors.
Word2Vec in NLP
Word2Vec, introduced at Google in 2013, represents each word as a 100 to 300-dimensional dense vector trained so that words appearing in similar contexts are nearby in vector space. The model is a shallow neural network whose learned internal representations are the vectors you keep after training. As the Word2Vec Wikipedia article notes, words which appear in similar contexts are mapped to vectors which are nearby as measured by cosine similarity.
from gensim.models import Word2Vec
sentences = [
["the", "cat", "sat", "on", "the", "mat"],
["the", "dog", "slept", "on", "the", "floor"],
["cats", "and", "dogs", "are", "pets"]
]
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1)
# Each word is now a 100-dimensional vector
cat_vector = model.wv["cat"] # shape: (100,)
dog_vector = model.wv["dog"] # shape: (100,)
similarity = model.wv.similarity("cat", "dog")
print(f"cat-dog similarity: {similarity:.4f}") # high — appear in similar contextsFrom Vectors to Embeddings
The word "vector" and the word "embedding" are often used interchangeably in AI contexts. They are not exactly the same thing, but the distinction is subtle.
A vector is the mathematical object — an ordered list of numbers. An embedding is a specific type of vector produced by a model that has been trained to encode semantic meaning. Every embedding is a vector. Not every vector is an embedding.
OpenAI's text-embedding-3-small model converts a sentence into a 1536-dimensional vector where sentences with similar meaning have similar vectors. That vector is an embedding. The embeddings article covers how models learn to produce them and which model you should use for different tasks.
Why Vectors Are the Foundation of Vector Databases
A vector database stores millions of these high-dimensional vectors and finds the nearest ones to any incoming query vector at millisecond speed.
The entire retrieval process is vector math. The query is embedded into a vector. The database calculates similarity between the query vector and stored vectors using cosine similarity or Euclidean distance. The top-K closest vectors are returned as results.
User Query: "How do I reset my password?"
↓
Embedding model → [0.41, -1.22, 0.03, ..., 0.77] (1536 floats)
↓
Vector database ANN search
↓
Most similar stored vectors
↓
["Reset instructions", "Account recovery", "Two-factor reset"]Without an understanding of what a vector is and how similarity is measured, the rest of the RAG pipeline — indexing, retrieval, reranking — is a black box. With it, you can reason about why retrieval fails and how to fix it.
The latent space article goes deeper on the geometry that makes this work: why certain vectors cluster together, what each dimension represents, and how arithmetic in vector space produces meaningful results.
Vector Norms
One concept that appears often in regularization and similarity computation is the vector norm. The L2 norm, also called the Euclidean norm, is the straight-line length of the vector.
v = np.array([3.0, 4.0])
l2_norm = np.linalg.norm(v)
print(l2_norm) # 5.0 (Pythagorean theorem: sqrt(3² + 4²) = 5)The L2 norm is the denominator in cosine similarity. Dividing by it normalizes the vector's length to 1, which is why cosine similarity measures direction independently of magnitude. A short document and a long document on the same topic point in the same direction in vector space, so their cosine similarity is high even though their raw word count vectors have very different lengths.
L1 norm (sum of absolute values) and L-infinity norm (maximum absolute value) are used in specific regularization contexts, but L2 is the default for most ML and similarity tasks.
Practical Code: Working With Vectors in NumPy
NumPy is the standard library for vector math in Python. Every major ML framework — PyTorch, TensorFlow, JAX, scikit-learn — is built on top of it or implements the same interface.
import numpy as np
# Define vectors
a = np.array([1.0, 2.0, 3.0])
b = np.array([4.0, 5.0, 6.0])
# Basic operations
print(a + b) # [5.0, 7.0, 9.0]
print(a - b) # [-3.0, -3.0, -3.0]
print(a * 3) # [3.0, 6.0, 9.0] — scalar multiplication
# Dot product
print(np.dot(a, b)) # 32.0
# L2 norm
print(np.linalg.norm(a)) # 3.7417
# Cosine similarity
def cosine_sim(v1, v2):
return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
print(cosine_sim(a, b)) # 0.9746 — very similar direction
# Stack multiple vectors into a matrix
matrix = np.stack([a, b]) # shape: (2, 3)
print(matrix.shape)The full NumPy documentation on array operations covers reshaping, transposing, and batching vectors into matrices, which is the next step after single-vector operations.
What "High-Dimensional" Actually Means
Most vectors in modern ML are not three-dimensional. They are hundreds or thousands of dimensions long.
A 2-dimensional vector is a point on a flat plane. A 3-dimensional vector is a point in physical space. A 1536-dimensional vector is a point in 1536-dimensional space — something that cannot be visualized, but can be computed with using the same rules.
The key property that transfers to high dimensions: two vectors that are close together in that space represent content that is similar. Two vectors far apart represent content that is unrelated. This holds whether the space is three-dimensional or 3000-dimensional.
The challenge is that our intuitions about space break down in high dimensions. As the number of dimensions increases, the volume of the space grows so fast that most pairs of points become roughly equidistant. This is called the curse of dimensionality, and it is why standard B-tree indexes fail for vector search — a point covered in depth in the why traditional indexes fail article.
Summary
A vector in machine learning is an ordered list of numbers. Each element is one feature or dimension of a data point. A house can be a vector. A word can be a vector. An image can be a vector. A sentence can be a vector.
The operations that matter are addition, dot product, cosine similarity, and norm calculation. All of them are implemented efficiently in NumPy and underlie every layer of every neural network you will ever work with.
From this foundation, the next logical step is understanding how models learn to produce good vectors — ones where closeness in vector space means closeness in meaning. That is the subject of embeddings.
After that, the question becomes how to store and search millions of those vectors fast. That is the subject of vector databases.
Sources and Further Reading
- GeeksforGeeks. Vectors for Machine Learning. geeksforgeeks.org/machine-learning/vectors-for-ml
- Machine Learning Mastery. A Gentle Introduction to Vectors for Machine Learning. machinelearningmastery.com/gentle-introduction-vectors-machine-learning
- H2O.ai. What Is a Vector? h2o.ai/wiki/vector
- Algolia. What Are Vectors and How Do They Apply to Machine Learning? algolia.com/blog/ai/what-are-vectors-and-how-do-they-apply-to-machine-learning
- Wikipedia. Word2Vec. en.wikipedia.org/wiki/Word2vec
- Serokell. Word2Vec: Explanation and Examples. serokell.io/blog/word2vec
- Towards Data Science. Cosine Similarity: How Does It Measure the Similarity. towardsdatascience.com/cosine-similarity
- Pathmind. A Beginner's Guide to Word2Vec and Neural Word Embeddings. wiki.pathmind.com/word2vec
- NumPy. Official Documentation. numpy.org/doc/stable
- Goodfellow, Bengio, Courville. Deep Learning, Chapter 2: Linear Algebra. deeplearningbook.org
- Shelf.io. How Vectors in Machine Learning Supply AI Engines with Data. shelf.io/blog/vectors-in-machine-learning
Follow on Google
Add as a preferred source in Search & Discover
Add as preferred sourceKrunal Kanojiya
Technical Content Writer
Technical Content Writer and former software developer from India. I write in-depth articles on blockchain, AI/ML, data engineering, web development, and developer careers. Currently at Lucent Innovation, previously at Cromtek Solution and freelance.