Why are vectors important in machine learning?

Machines cannot process raw text, images, or audio directly. Everything must be converted into numbers. Vectors are the data structure that carries those numbers in a structured, ordered form that ML models can compute with. Every forward pass in a neural network, every distance calculation, every similarity check is a vector operation.

What is the difference between a scalar, a vector, and a matrix?

A scalar is a single number, like 5. A vector is an ordered list of numbers, like [5, 170, 3]. A matrix is a two-dimensional grid of numbers, like a table with rows and columns. In neural networks, input data is a vector, model weights are a matrix, and a single prediction like a probability score is a scalar.

What is a feature vector?

A feature vector is a vector where each element corresponds to one measurable attribute of a data point. For a movie recommendation system, the feature vector for a film might include its release year, genre encoded as a number, average rating, and runtime. The model uses this vector to learn patterns and make predictions.

How is cosine similarity different from Euclidean distance?

Euclidean distance measures the straight-line distance between two vectors in space. Cosine similarity measures the angle between them. For text and embeddings, cosine similarity is preferred because it is insensitive to magnitude. A short document and a long document on the same topic can have very different lengths as vectors, but a small angle between them, meaning they are semantically similar.

What is a high-dimensional vector?

A high-dimensional vector is simply a very long list of numbers. In machine learning, embeddings produced by large language models like OpenAI's text-embedding-3-small have 1536 dimensions. Each dimension is a coordinate that encodes one aspect of meaning. The difficulty is that humans cannot visualize spaces beyond three dimensions, but the math works identically regardless of the number of dimensions.

What Is a Vector in Machine Learning?

Q: What is a vector in machine learning?

A vector is an ordered list of numbers. Each number represents one feature or dimension of a data point. A house, for example, can be represented as a vector where one number is the size in square feet, another is the number of bedrooms, and a third is the distance from the city center. Machine learning models operate entirely on these numerical lists.

Every machine learning model you have ever used runs entirely on numbers. When you type a query into a search engine, upload a photo for classification, or ask a language model a question, the first thing that happens before any computation is that your data gets turned into numbers. Not stored as numbers. Represented as numbers in a specific structure.

That structure is a vector.

Understanding what a vector is, and what you can do with one, is not optional background knowledge for ML practitioners. It is the foundation. The dot product, the similarity search, the training loop, the embedding — all of it is vector math.

This article covers what a vector is from first principles, how it represents different types of data, what operations matter in practice, and how vectors connect to the broader concepts of embeddings and vector databases that power modern AI applications.

The Mathematical Definition

In mathematics and physics, a vector is an object with magnitude and direction. That geometric definition is useful for physics problems involving forces and velocity, but in machine learning the more useful definition is simpler: a vector is an ordered list of numbers.

python

# A 3-dimensional vector
v = [2.5, 170.0, 3.0]

The word "ordered" matters. The list [2.5, 170.0, 3.0] and the list [170.0, 3.0, 2.5] are different vectors, even though they contain the same numbers. The position of each number is its meaning.

Each number in a vector is called a component, an element, or a dimension. A vector with three elements lives in three-dimensional space. A vector with 768 elements lives in 768-dimensional space. The math is the same regardless of the number of dimensions.

According to GeeksforGeeks' machine learning documentation, vectors in machine learning are fundamental for data representation and are applied in classification, regression, clustering, and deep learning.

The Hierarchy: Scalars, Vectors, Matrices, Tensors

Before going further, the relationship between these four terms is worth establishing clearly because they appear constantly in ML literature.

A scalar is a single number. Temperature in Celsius, a probability score, a loss value — all scalars.

python

temperature = 36.6       # scalar
loss = 0.342             # scalar
probability = 0.87       # scalar

A vector is a one-dimensional ordered list of numbers. Each element is a scalar.

python

# Feature vector for a house
house = [1200.0, 3.0, 8.5, 2.0]
# [area_sqft, bedrooms, school_rating, bathrooms]

A matrix is a two-dimensional grid of numbers. Think of it as multiple vectors stacked as rows or columns. In neural networks, weight matrices are what the model learns during training.

python

import numpy as np

# Weight matrix: 3 inputs, 2 neurons
W = np.array([
    [0.5, -0.3],
    [0.8,  0.1],
    [-0.2, 0.9]
])
# Shape: (3, 2)

A tensor is the generalization of all the above. A scalar is a rank-0 tensor. A vector is a rank-1 tensor. A matrix is a rank-2 tensor. An image stored as height x width x channels is a rank-3 tensor. PyTorch and TensorFlow use tensors as their core data structure for exactly this reason — they can represent any shape of numerical data.

What a Vector Actually Represents

The abstract definition is clear enough. The more important question is what a vector represents when you are solving a real problem.

Example 1: Tabular Data

Suppose you are predicting house prices. Each house in your dataset has several measurable attributes: area in square feet, number of bedrooms, number of bathrooms, distance from city center in kilometers, and age of the property in years.

python

# House A
house_a = [1450.0, 3.0, 2.0, 5.2, 12.0]
# [area, bedrooms, bathrooms, dist_km, age_years]

# House B
house_b = [980.0, 2.0, 1.0, 8.7, 25.0]

Each house is now a point in a five-dimensional space. The linear regression model learns a weight vector w such that the dot product of w and the house vector approximates the price.

python

# Linear regression: price = w · x + b
w = np.array([120.0, 8500.0, 6000.0, -3200.0, -500.0])
b = 50000.0

house_a = np.array([1450.0, 3.0, 2.0, 5.2, 12.0])
predicted_price = np.dot(w, house_a) + b
print(f"Predicted price: ${predicted_price:,.0f}")
# Predicted price: $279,060

The equation Y = Xw + b where X is a feature vector, w is a weights vector, and b is the bias term is the defining formula of linear regression, as described in GeeksforGeeks' ML reference.

Example 2: Text Data

Text cannot be fed into a model directly. It has to be converted to numbers. One classic technique is to represent a document as a vector of word counts. If your vocabulary is 10,000 words, each document becomes a 10,000-dimensional vector where each element is how many times that word appears.

python

# Vocabulary (simplified): ["cat", "dog", "run", "sleep", "bark"]
# Index:                       0       1      2       3        4

doc1 = np.array([3, 0, 1, 2, 0])  # "cat cat cat run sleep sleep"
doc2 = np.array([0, 4, 0, 1, 3])  # "dog dog dog dog sleep bark bark bark"
doc3 = np.array([2, 1, 1, 3, 0])  # mixed content

Modern techniques go further. Word2Vec, introduced by Tomas Mikolov and colleagues at Google in 2013, maps each word to a dense vector of 100 to 300 dimensions where words used in similar contexts end up close together in vector space. The classic demonstration is that vector arithmetic captures semantic relationships:

plaintext

vector("king") - vector("man") + vector("woman") ≈ vector("queen")

That is not a trick. It is the geometry of the learned vector space working correctly.

Example 3: Images

A grayscale image of 28x28 pixels contains 784 individual pixel values, each between 0 and 255. Flatten that two-dimensional grid into a single list and you have a 784-dimensional vector.

python

from PIL import Image
import numpy as np

img = Image.open("digit.png").convert("L")   # grayscale
pixel_array = np.array(img)                   # shape: (28, 28)
feature_vector = pixel_array.flatten()        # shape: (784,)
print(feature_vector.shape)
# (784,)

A color image adds a third channel dimension. A 224x224 RGB image used by models like ResNet flattens to 150,528 dimensions before any learned representation is applied.

Vector Operations That Matter

Knowing what a vector is matters less than knowing what you can do with one. Three operations appear everywhere in machine learning.

1. Vector Addition and Subtraction

Adding two vectors produces a new vector. Subtraction works the same way. Both operations are element-wise.

python

a = np.array([2.0, 3.0, 5.0])
b = np.array([1.0, 4.0, 2.0])

addition    = a + b   # [3.0, 7.0, 7.0]
subtraction = a - b   # [1.0, -1.0, 3.0]

In neural networks, adding a bias vector to the output of a weight multiplication is one of the most repeated operations in the entire forward pass.

2. Dot Product

The dot product multiplies corresponding elements of two vectors and sums the results. It produces a single number (a scalar).

python

a = np.array([2.0, 3.0, 5.0])
b = np.array([1.0, 4.0, 2.0])

# Manual dot product
dot = (2.0 * 1.0) + (3.0 * 4.0) + (5.0 * 2.0)
# = 2 + 12 + 10 = 24

# NumPy
dot = np.dot(a, b)   # 24.0

The dot product is the key operation in almost every neural network layer. When a layer multiplies input activations by its weight matrix, it performs a dot product for each neuron. The result tells the neuron how much the input aligns with the pattern the weight vector has learned. According to Machine Learning Mastery, the dot product is the key tool for calculating vector projections, vector decompositions, and determining orthogonality.

3. Cosine Similarity

Cosine similarity measures the angle between two vectors rather than the distance between them. It ranges from -1 (opposite directions) to 1 (identical direction), with 0 meaning no relationship.

The formula:

plaintext

cosine_similarity(A, B) = (A · B) / (||A|| × ||B||)

Where ||A|| is the L2 norm (length) of vector A.

python

import numpy as np
from numpy.linalg import norm

def cosine_similarity(vec_a, vec_b):
    dot = np.dot(vec_a, vec_b)
    return dot / (norm(vec_a) * norm(vec_b))

# Two documents about machine learning
doc_ml_1 = np.array([3, 2, 0, 5, 1])  # word counts for topic A words
doc_ml_2 = np.array([4, 1, 0, 6, 2])  # slightly longer, same topic

# A document about cooking
doc_cooking = np.array([0, 0, 7, 0, 0])

sim_same_topic = cosine_similarity(doc_ml_1, doc_ml_2)
sim_diff_topic = cosine_similarity(doc_ml_1, doc_cooking)

print(f"Same topic: {sim_same_topic:.4f}")    # ~0.9990 — very similar
print(f"Diff topic: {sim_diff_topic:.4f}")    # ~0.0000 — not similar

This is exactly how vector search in vector databases works. The query is converted to a vector, and the database finds stored vectors with the highest cosine similarity. According to the Towards Data Science article on cosine similarity, cosine similarity is preferred over Euclidean distance for text and document search because it normalizes for document length.

Row Vectors vs Column Vectors

This distinction matters more than it seems when you start reading research papers.

A row vector arranges elements horizontally: v = [x1, x2, x3]. Its shape is (1, 3).

A column vector arranges elements vertically, with each element on its own line. Its shape is (3, 1).

python

row_vec    = np.array([[2, 3, 5]])        # shape: (1, 3)
col_vec    = np.array([[2], [3], [5]])    # shape: (3, 1)

# In practice, 1-D arrays are more common
flat_vec   = np.array([2, 3, 5])          # shape: (3,)

NumPy's documentation handles both, but the distinction is critical when doing matrix multiplication. A (1, 3) row vector multiplied by a (3, 3) weight matrix gives a (1, 3) output. Transposing the vector changes the result entirely.

Dense Vectors vs Sparse Vectors

Not all vectors look the same. The classification into dense and sparse is important because each requires a different storage and indexing strategy.

A dense vector has most or all elements as non-zero values. Embedding models produce dense vectors. A typical text embedding from OpenAI's embedding API produces 1536 non-zero floats.

python

# Dense vector — all elements carry information
dense = np.array([0.41, -1.22, 0.03, 2.18, 0.77, -0.55, 0.91, ...])

A sparse vector has most elements as zero, with a small number of non-zero values. Classic bag-of-words representations are sparse. In a vocabulary of 50,000 words, most documents use fewer than 500 unique words — so 49,500 elements are zero.

python

# Sparse vector — almost all zeros
# Vocabulary size: 10,000 words
# Document uses only 4 unique words
sparse = [0, 0, 0, 3, 0, 0, ..., 1, 0, ..., 2, 0, ..., 5, ...]
#                   ^                ^               ^          ^
#                word at index 3    index 1200   index 7400  index 9998

The dense vs sparse comparison is covered in full detail in the dedicated cluster article. The short version: dense vectors capture semantic meaning, sparse vectors capture lexical precision, and hybrid search combines both.

How Vectors Flow Through a Neural Network

A neural network is, at its core, a sequence of vector operations. Understanding this removes much of the mystery around how deep learning works.

plaintext

Input Layer
   x = [0.5, 1.2, -0.3, 0.8]   ← input vector (4 features)

Hidden Layer 1
   z1 = W1 · x + b1             ← matrix multiply + bias add
   a1 = ReLU(z1)                ← apply activation function

Hidden Layer 2
   z2 = W2 · a1 + b2
   a2 = ReLU(z2)

Output Layer
   z3 = W3 · a2 + b3
   output = Softmax(z3)          ← probabilities across classes

Every weight matrix W1, W2, W3 is learned during training through backpropagation. The gradients that update those weights are also vectors and matrices. The entire learning process is linear algebra applied repeatedly.

Ian Goodfellow, Yoshua Bengio, and Aaron Courville's deep learning textbook dedicates its second chapter entirely to the linear algebra foundations — vectors, matrices, and tensor operations — before covering any learning algorithm. That sequencing is deliberate.

Vectors in Specific ML Algorithms

The same vector abstraction shows up across completely different algorithm families.

Linear Regression

The model learns a weight vector w that minimizes the error between w · x and the true label y. The prediction is a dot product between the input vector and the learned weight vector.

Support Vector Machines

SVMs find a hyperplane that maximally separates two classes. That hyperplane is defined by a normal vector. New data points are classified based on which side of the hyperplane they fall on, calculated via a dot product. The name "support vector" refers to the training examples closest to the decision boundary.

K-Means Clustering

K-Means assigns each data point to the cluster whose centroid is closest. The centroid is the average vector of all points in the cluster. Distance is measured using Euclidean distance between vectors.

Word2Vec in NLP

Word2Vec, introduced at Google in 2013, represents each word as a 100 to 300-dimensional dense vector trained so that words appearing in similar contexts are nearby in vector space. The model is a shallow neural network whose learned internal representations are the vectors you keep after training. As the Word2Vec Wikipedia article notes, words which appear in similar contexts are mapped to vectors which are nearby as measured by cosine similarity.

python

from gensim.models import Word2Vec

sentences = [
    ["the", "cat", "sat", "on", "the", "mat"],
    ["the", "dog", "slept", "on", "the", "floor"],
    ["cats", "and", "dogs", "are", "pets"]
]

model = Word2Vec(sentences, vector_size=100, window=5, min_count=1)

# Each word is now a 100-dimensional vector
cat_vector = model.wv["cat"]      # shape: (100,)
dog_vector = model.wv["dog"]      # shape: (100,)

similarity = model.wv.similarity("cat", "dog")
print(f"cat-dog similarity: {similarity:.4f}")   # high — appear in similar contexts

From Vectors to Embeddings

The word "vector" and the word "embedding" are often used interchangeably in AI contexts. They are not exactly the same thing, but the distinction is subtle.

A vector is the mathematical object — an ordered list of numbers. An embedding is a specific type of vector produced by a model that has been trained to encode semantic meaning. Every embedding is a vector. Not every vector is an embedding.

OpenAI's text-embedding-3-small model converts a sentence into a 1536-dimensional vector where sentences with similar meaning have similar vectors. That vector is an embedding. The embeddings article covers how models learn to produce them and which model you should use for different tasks.

Why Vectors Are the Foundation of Vector Databases

A vector database stores millions of these high-dimensional vectors and finds the nearest ones to any incoming query vector at millisecond speed.

The entire retrieval process is vector math. The query is embedded into a vector. The database calculates similarity between the query vector and stored vectors using cosine similarity or Euclidean distance. The top-K closest vectors are returned as results.

plaintext

User Query: "How do I reset my password?"
    ↓
Embedding model → [0.41, -1.22, 0.03, ..., 0.77]  (1536 floats)
    ↓
Vector database ANN search
    ↓
Most similar stored vectors
    ↓
["Reset instructions", "Account recovery", "Two-factor reset"]

Without an understanding of what a vector is and how similarity is measured, the rest of the RAG pipeline — indexing, retrieval, reranking — is a black box. With it, you can reason about why retrieval fails and how to fix it.

The latent space article goes deeper on the geometry that makes this work: why certain vectors cluster together, what each dimension represents, and how arithmetic in vector space produces meaningful results.

Vector Norms

One concept that appears often in regularization and similarity computation is the vector norm. The L2 norm, also called the Euclidean norm, is the straight-line length of the vector.

python

v = np.array([3.0, 4.0])

l2_norm = np.linalg.norm(v)
print(l2_norm)   # 5.0 (Pythagorean theorem: sqrt(3² + 4²) = 5)

The L2 norm is the denominator in cosine similarity. Dividing by it normalizes the vector's length to 1, which is why cosine similarity measures direction independently of magnitude. A short document and a long document on the same topic point in the same direction in vector space, so their cosine similarity is high even though their raw word count vectors have very different lengths.

L1 norm (sum of absolute values) and L-infinity norm (maximum absolute value) are used in specific regularization contexts, but L2 is the default for most ML and similarity tasks.

Practical Code: Working With Vectors in NumPy

NumPy is the standard library for vector math in Python. Every major ML framework — PyTorch, TensorFlow, JAX, scikit-learn — is built on top of it or implements the same interface.

python

import numpy as np

# Define vectors
a = np.array([1.0, 2.0, 3.0])
b = np.array([4.0, 5.0, 6.0])

# Basic operations
print(a + b)             # [5.0, 7.0, 9.0]
print(a - b)             # [-3.0, -3.0, -3.0]
print(a * 3)             # [3.0, 6.0, 9.0]  — scalar multiplication

# Dot product
print(np.dot(a, b))      # 32.0

# L2 norm
print(np.linalg.norm(a)) # 3.7417

# Cosine similarity
def cosine_sim(v1, v2):
    return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

print(cosine_sim(a, b))  # 0.9746 — very similar direction

# Stack multiple vectors into a matrix
matrix = np.stack([a, b])   # shape: (2, 3)
print(matrix.shape)

The full NumPy documentation on array operations covers reshaping, transposing, and batching vectors into matrices, which is the next step after single-vector operations.

What "High-Dimensional" Actually Means

Most vectors in modern ML are not three-dimensional. They are hundreds or thousands of dimensions long.

A 2-dimensional vector is a point on a flat plane. A 3-dimensional vector is a point in physical space. A 1536-dimensional vector is a point in 1536-dimensional space — something that cannot be visualized, but can be computed with using the same rules.

The key property that transfers to high dimensions: two vectors that are close together in that space represent content that is similar. Two vectors far apart represent content that is unrelated. This holds whether the space is three-dimensional or 3000-dimensional.

The challenge is that our intuitions about space break down in high dimensions. As the number of dimensions increases, the volume of the space grows so fast that most pairs of points become roughly equidistant. This is called the curse of dimensionality, and it is why standard B-tree indexes fail for vector search. It is covered in depth in the why traditional indexes fail article.

Summary

A vector in machine learning is an ordered list of numbers. Each element is one feature or dimension of a data point. A house can be a vector. A word can be a vector. An image can be a vector. A sentence can be a vector.

The operations that matter are addition, dot product, cosine similarity, and norm calculation. All of them are implemented efficiently in NumPy and underlie every layer of every neural network you will ever work with.

From this foundation, the next logical step is understanding how models learn to produce good vectors — ones where closeness in vector space means closeness in meaning. That is the subject of embeddings.

After that, the question becomes how to store and search millions of those vectors fast. That is the subject of vector databases.

Sources and Further Reading

GeeksforGeeks. Vectors for Machine Learning. geeksforgeeks.org/machine-learning/vectors-for-ml
Machine Learning Mastery. A Gentle Introduction to Vectors for Machine Learning. machinelearningmastery.com/gentle-introduction-vectors-machine-learning
H2O.ai. What Is a Vector? h2o.ai/wiki/vector
Algolia. What Are Vectors and How Do They Apply to Machine Learning? algolia.com/blog/ai/what-are-vectors-and-how-do-they-apply-to-machine-learning
Wikipedia. Word2Vec. en.wikipedia.org/wiki/Word2vec
Serokell. Word2Vec: Explanation and Examples. serokell.io/blog/word2vec
Towards Data Science. Cosine Similarity: How Does It Measure the Similarity. towardsdatascience.com/cosine-similarity
Pathmind. A Beginner's Guide to Word2Vec and Neural Word Embeddings. wiki.pathmind.com/word2vec
NumPy. Official Documentation. numpy.org/doc/stable
Goodfellow, Bengio, Courville. Deep Learning, Chapter 2: Linear Algebra. deeplearningbook.org
Shelf.io. How Vectors in Machine Learning Supply AI Engines with Data. shelf.io/blog/vectors-in-machine-learning

That structure is a vector.

The Mathematical Definition

python

# A 3-dimensional vector
v = [2.5, 170.0, 3.0]

The word "ordered" matters. The list [2.5, 170.0, 3.0] and the list [170.0, 3.0, 2.5] are different vectors, even though they contain the same numbers. The position of each number is its meaning.

The Hierarchy: Scalars, Vectors, Matrices, Tensors

Before going further, the relationship between these four terms is worth establishing clearly because they appear constantly in ML literature.

A scalar is a single number. Temperature in Celsius, a probability score, a loss value — all scalars.

python

temperature = 36.6       # scalar
loss = 0.342             # scalar
probability = 0.87       # scalar

A vector is a one-dimensional ordered list of numbers. Each element is a scalar.

python

# Feature vector for a house
house = [1200.0, 3.0, 8.5, 2.0]
# [area_sqft, bedrooms, school_rating, bathrooms]

A matrix is a two-dimensional grid of numbers. Think of it as multiple vectors stacked as rows or columns. In neural networks, weight matrices are what the model learns during training.

python

import numpy as np

# Weight matrix: 3 inputs, 2 neurons
W = np.array([
    [0.5, -0.3],
    [0.8,  0.1],
    [-0.2, 0.9]
])
# Shape: (3, 2)

What a Vector Actually Represents

The abstract definition is clear enough. The more important question is what a vector represents when you are solving a real problem.

Example 1: Tabular Data

python

# House A
house_a = [1450.0, 3.0, 2.0, 5.2, 12.0]
# [area, bedrooms, bathrooms, dist_km, age_years]

# House B
house_b = [980.0, 2.0, 1.0, 8.7, 25.0]

Each house is now a point in a five-dimensional space. The linear regression model learns a weight vector w such that the dot product of w and the house vector approximates the price.

python

# Linear regression: price = w · x + b
w = np.array([120.0, 8500.0, 6000.0, -3200.0, -500.0])
b = 50000.0

house_a = np.array([1450.0, 3.0, 2.0, 5.2, 12.0])
predicted_price = np.dot(w, house_a) + b
print(f"Predicted price: ${predicted_price:,.0f}")
# Predicted price: $279,060

The equation Y = Xw + b where X is a feature vector, w is a weights vector, and b is the bias term is the defining formula of linear regression, as described in GeeksforGeeks' ML reference.

Example 2: Text Data

python

# Vocabulary (simplified): ["cat", "dog", "run", "sleep", "bark"]
# Index:                       0       1      2       3        4

doc1 = np.array([3, 0, 1, 2, 0])  # "cat cat cat run sleep sleep"
doc2 = np.array([0, 4, 0, 1, 3])  # "dog dog dog dog sleep bark bark bark"
doc3 = np.array([2, 1, 1, 3, 0])  # mixed content

plaintext

vector("king") - vector("man") + vector("woman") ≈ vector("queen")

That is not a trick. It is the geometry of the learned vector space working correctly.

Example 3: Images

A grayscale image of 28x28 pixels contains 784 individual pixel values, each between 0 and 255. Flatten that two-dimensional grid into a single list and you have a 784-dimensional vector.

python

from PIL import Image
import numpy as np

img = Image.open("digit.png").convert("L")   # grayscale
pixel_array = np.array(img)                   # shape: (28, 28)
feature_vector = pixel_array.flatten()        # shape: (784,)
print(feature_vector.shape)
# (784,)

A color image adds a third channel dimension. A 224x224 RGB image used by models like ResNet flattens to 150,528 dimensions before any learned representation is applied.

Vector Operations That Matter

Knowing what a vector is matters less than knowing what you can do with one. Three operations appear everywhere in machine learning.

1. Vector Addition and Subtraction

Adding two vectors produces a new vector. Subtraction works the same way. Both operations are element-wise.

python

a = np.array([2.0, 3.0, 5.0])
b = np.array([1.0, 4.0, 2.0])

addition    = a + b   # [3.0, 7.0, 7.0]
subtraction = a - b   # [1.0, -1.0, 3.0]

In neural networks, adding a bias vector to the output of a weight multiplication is one of the most repeated operations in the entire forward pass.

2. Dot Product

The dot product multiplies corresponding elements of two vectors and sums the results. It produces a single number (a scalar).

python

a = np.array([2.0, 3.0, 5.0])
b = np.array([1.0, 4.0, 2.0])

# Manual dot product
dot = (2.0 * 1.0) + (3.0 * 4.0) + (5.0 * 2.0)
# = 2 + 12 + 10 = 24

# NumPy
dot = np.dot(a, b)   # 24.0

3. Cosine Similarity

Cosine similarity measures the angle between two vectors rather than the distance between them. It ranges from -1 (opposite directions) to 1 (identical direction), with 0 meaning no relationship.

The formula:

plaintext

cosine_similarity(A, B) = (A · B) / (||A|| × ||B||)

Where ||A|| is the L2 norm (length) of vector A.

python

import numpy as np
from numpy.linalg import norm

def cosine_similarity(vec_a, vec_b):
    dot = np.dot(vec_a, vec_b)
    return dot / (norm(vec_a) * norm(vec_b))

# Two documents about machine learning
doc_ml_1 = np.array([3, 2, 0, 5, 1])  # word counts for topic A words
doc_ml_2 = np.array([4, 1, 0, 6, 2])  # slightly longer, same topic

# A document about cooking
doc_cooking = np.array([0, 0, 7, 0, 0])

sim_same_topic = cosine_similarity(doc_ml_1, doc_ml_2)
sim_diff_topic = cosine_similarity(doc_ml_1, doc_cooking)

print(f"Same topic: {sim_same_topic:.4f}")    # ~0.9990 — very similar
print(f"Diff topic: {sim_diff_topic:.4f}")    # ~0.0000 — not similar

Row Vectors vs Column Vectors

This distinction matters more than it seems when you start reading research papers.

A row vector arranges elements horizontally: v = [x1, x2, x3]. Its shape is (1, 3).

A column vector arranges elements vertically, with each element on its own line. Its shape is (3, 1).

python

row_vec    = np.array([[2, 3, 5]])        # shape: (1, 3)
col_vec    = np.array([[2], [3], [5]])    # shape: (3, 1)

# In practice, 1-D arrays are more common
flat_vec   = np.array([2, 3, 5])          # shape: (3,)

Dense Vectors vs Sparse Vectors

Not all vectors look the same. The classification into dense and sparse is important because each requires a different storage and indexing strategy.

A dense vector has most or all elements as non-zero values. Embedding models produce dense vectors. A typical text embedding from OpenAI's embedding API produces 1536 non-zero floats.

python

# Dense vector — all elements carry information
dense = np.array([0.41, -1.22, 0.03, 2.18, 0.77, -0.55, 0.91, ...])

python

# Sparse vector — almost all zeros
# Vocabulary size: 10,000 words
# Document uses only 4 unique words
sparse = [0, 0, 0, 3, 0, 0, ..., 1, 0, ..., 2, 0, ..., 5, ...]
#                   ^                ^               ^          ^
#                word at index 3    index 1200   index 7400  index 9998

How Vectors Flow Through a Neural Network

A neural network is, at its core, a sequence of vector operations. Understanding this removes much of the mystery around how deep learning works.

plaintext

Input Layer
   x = [0.5, 1.2, -0.3, 0.8]   ← input vector (4 features)

Hidden Layer 1
   z1 = W1 · x + b1             ← matrix multiply + bias add
   a1 = ReLU(z1)                ← apply activation function

Hidden Layer 2
   z2 = W2 · a1 + b2
   a2 = ReLU(z2)

Output Layer
   z3 = W3 · a2 + b3
   output = Softmax(z3)          ← probabilities across classes

Vectors in Specific ML Algorithms

The same vector abstraction shows up across completely different algorithm families.

Linear Regression

The model learns a weight vector w that minimizes the error between w · x and the true label y. The prediction is a dot product between the input vector and the learned weight vector.

Support Vector Machines

K-Means Clustering

Word2Vec in NLP

python

from gensim.models import Word2Vec

sentences = [
    ["the", "cat", "sat", "on", "the", "mat"],
    ["the", "dog", "slept", "on", "the", "floor"],
    ["cats", "and", "dogs", "are", "pets"]
]

model = Word2Vec(sentences, vector_size=100, window=5, min_count=1)

# Each word is now a 100-dimensional vector
cat_vector = model.wv["cat"]      # shape: (100,)
dog_vector = model.wv["dog"]      # shape: (100,)

similarity = model.wv.similarity("cat", "dog")
print(f"cat-dog similarity: {similarity:.4f}")   # high — appear in similar contexts

From Vectors to Embeddings

The word "vector" and the word "embedding" are often used interchangeably in AI contexts. They are not exactly the same thing, but the distinction is subtle.

Why Vectors Are the Foundation of Vector Databases

A vector database stores millions of these high-dimensional vectors and finds the nearest ones to any incoming query vector at millisecond speed.

plaintext

User Query: "How do I reset my password?"
    ↓
Embedding model → [0.41, -1.22, 0.03, ..., 0.77]  (1536 floats)
    ↓
Vector database ANN search
    ↓
Most similar stored vectors
    ↓
["Reset instructions", "Account recovery", "Two-factor reset"]

Vector Norms

One concept that appears often in regularization and similarity computation is the vector norm. The L2 norm, also called the Euclidean norm, is the straight-line length of the vector.

python

v = np.array([3.0, 4.0])

l2_norm = np.linalg.norm(v)
print(l2_norm)   # 5.0 (Pythagorean theorem: sqrt(3² + 4²) = 5)

L1 norm (sum of absolute values) and L-infinity norm (maximum absolute value) are used in specific regularization contexts, but L2 is the default for most ML and similarity tasks.

Practical Code: Working With Vectors in NumPy

NumPy is the standard library for vector math in Python. Every major ML framework — PyTorch, TensorFlow, JAX, scikit-learn — is built on top of it or implements the same interface.

python

import numpy as np

# Define vectors
a = np.array([1.0, 2.0, 3.0])
b = np.array([4.0, 5.0, 6.0])

# Basic operations
print(a + b)             # [5.0, 7.0, 9.0]
print(a - b)             # [-3.0, -3.0, -3.0]
print(a * 3)             # [3.0, 6.0, 9.0]  — scalar multiplication

# Dot product
print(np.dot(a, b))      # 32.0

# L2 norm
print(np.linalg.norm(a)) # 3.7417

# Cosine similarity
def cosine_sim(v1, v2):
    return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

print(cosine_sim(a, b))  # 0.9746 — very similar direction

# Stack multiple vectors into a matrix
matrix = np.stack([a, b])   # shape: (2, 3)
print(matrix.shape)

The full NumPy documentation on array operations covers reshaping, transposing, and batching vectors into matrices, which is the next step after single-vector operations.

What "High-Dimensional" Actually Means

Most vectors in modern ML are not three-dimensional. They are hundreds or thousands of dimensions long.

Summary

After that, the question becomes how to store and search millions of those vectors fast. That is the subject of vector databases.

Sources and Further Reading

GeeksforGeeks. Vectors for Machine Learning. geeksforgeeks.org/machine-learning/vectors-for-ml
Machine Learning Mastery. A Gentle Introduction to Vectors for Machine Learning. machinelearningmastery.com/gentle-introduction-vectors-machine-learning
H2O.ai. What Is a Vector? h2o.ai/wiki/vector
Algolia. What Are Vectors and How Do They Apply to Machine Learning? algolia.com/blog/ai/what-are-vectors-and-how-do-they-apply-to-machine-learning
Wikipedia. Word2Vec. en.wikipedia.org/wiki/Word2vec
Serokell. Word2Vec: Explanation and Examples. serokell.io/blog/word2vec
Towards Data Science. Cosine Similarity: How Does It Measure the Similarity. towardsdatascience.com/cosine-similarity
Pathmind. A Beginner's Guide to Word2Vec and Neural Word Embeddings. wiki.pathmind.com/word2vec
NumPy. Official Documentation. numpy.org/doc/stable
Goodfellow, Bengio, Courville. Deep Learning, Chapter 2: Linear Algebra. deeplearningbook.org
Shelf.io. How Vectors in Machine Learning Supply AI Engines with Data. shelf.io/blog/vectors-in-machine-learning

The Mathematical Definition

The Hierarchy: Scalars, Vectors, Matrices, Tensors

What a Vector Actually Represents

Example 1: Tabular Data

Example 2: Text Data

Example 3: Images

Vector Operations That Matter

1. Vector Addition and Subtraction

2. Dot Product

3. Cosine Similarity

Row Vectors vs Column Vectors

Dense Vectors vs Sparse Vectors

How Vectors Flow Through a Neural Network

Vectors in Specific ML Algorithms

From Vectors to Embeddings

Why Vectors Are the Foundation of Vector Databases

Vector Norms

Practical Code: Working With Vectors in NumPy

What "High-Dimensional" Actually Means

Summary

Sources and Further Reading

Krunal Kanojiya

Related Posts

The Mathematical Definition

The Hierarchy: Scalars, Vectors, Matrices, Tensors

What a Vector Actually Represents

Example 1: Tabular Data

Example 2: Text Data

Example 3: Images

Vector Operations That Matter

1. Vector Addition and Subtraction

2. Dot Product

3. Cosine Similarity

Row Vectors vs Column Vectors

Dense Vectors vs Sparse Vectors

How Vectors Flow Through a Neural Network

Vectors in Specific ML Algorithms

From Vectors to Embeddings

Why Vectors Are the Foundation of Vector Databases

Vector Norms

Practical Code: Working With Vectors in NumPy

What "High-Dimensional" Actually Means

Summary

Sources and Further Reading

Krunal Kanojiya

Related Posts