Computational Complexity

Gregory M. Kapfhammer

November 3, 2025

Learning objectives

Learning Objectives for Theoretical Machines

CS-204-1: Use both intuitive analysis and theoretical proof techniques to correctly distinguish between problems that are tractable, intractable, and uncomputable.
CS-204-2: Correctly use one or more variants of the Turing machine (TM) abstraction to both describe and analyze the solution to a computational problem.
CS-204-3: Correctly use one or more variants of the finite state machine (FSM) abstraction to describe and analyze the solution to a computational problem.
CS-204-4: Use a formal proof technique to correctly classify a problem according to whether or not it is in the P, NP, NP-Hard, and/or NP-Complete complexity class(es).
CS-204-5: Apply insights from theoretical proofs concerning the limits of either program feasibility or complexity to the implementation of both correct and efficient real-world Python programs.

Why computational complexity matters

Theory aids with learning objectives CS-204-1 and CS-204-4
Distinguish tractable from intractable problems
Why some problems efficiently solvable while others impractical?
How do we measure and compare algorithm efficiency?
How do we classify problems by their complexity class?

Why computational complexity?

Time matters: Some algorithms are fast, others impossibly slow
Comparing efficiency: Formal ways to compare algorithms
Growth rates: How does running time change with input size?
Cryptography: Security depends on computational hardness
Question: What problems are efficiently solvable?

Complexity distinguishes feasible from infeasible computation
Computational complexity reveals theoretical and practical limits
Exponential algorithms may work sometimes but are often unusable!

Review: finite automata

Finite automata: Restricted Turing machines with limited memory
DFA vs NFA: Deterministic versus nondeterministic computation
Regular languages: Languages recognized by finite automata
Regular expressions: Compact notation for pattern matching
Equivalence: DFAs, NFAs, and regex recognize same language class
Limitations: Cannot count or maintain unbounded memory

Okay, finite automata showed computational restrictions!
Now time restrictions: how quickly can problems be solved?

Review: Pattern recognition efficiency

Linear time: DFAs process input in \(O(n)\) time
Single pass: Read input once, left to right
No backtracking: Decisions made immediately
Applications: Lexical analysis, pattern matching, protocol verification
Connection to complexity: Automata provide efficient solutions for restricted problems. So, how else can restriction prove beneficial?

Automata theory connects to and motivates complexity theory
Finite automata solve their problems efficiently (i.e., in linear time)
Now we ask: which problems need more time? How much more?

Algorithm efficiency challenges

Different Inputs

Varying sizes of input data
Different types of data
Degrees of sortedness
Degrees of duplication
Different data distributions

Different Computers

Diverse hardware setups
Varying system performance
Diverse system building
Different operating systems
Variation in system load

Ultimate goal: rigorous analytical evaluation method that yields actionable insights that supports understanding and prediction

Worst-case time

Growth rates matter: How running time changes with input size \(n\)
Asymptotic analysis: Focus on behavior for large \(n\), ignore constants
Polynomial vs exponential: Fundamental distinction for tractability
Big-O notation: Formalize and compare algorithm efficiency

Understanding growth rates is crucial: an algorithm taking \(100n\) steps is fundamentally faster than one taking \(n^2\) steps for large \(n\). Asymptotic analysis lets proofgrammers compare algorithms meaningfully!

Understanding algorithm efficiency

Absolute time: Depends on hardware, language, implementation
Growth rate: How time changes with input size \(n\)
Asymptotic analysis: Focus on behavior for large \(n\)
Comparing algorithms: Growth rate matters more than constants
Question: How to effectively formalize growth rates?

Asymptotic analysis ignores constant factors and focuses on growth
Since absolute time varies by hardware and implementation, focus on worst-case time complexity to enable meaningful comparisons across machines and implementations. Okay, let’s dive into growth rates!

Vast gulf between growth rates

Growth rate comparison

Insight: Wow, these growth rates differ dramatically!
Insight: Polynomial growth is feasible, exponential is not!

Exponential versus polynomial

Polynomial time: \(n\), \(n^2\), \(n^3\), \(n^{100}\) (i.e., broadly tractable)
Exponential time: \(2^n\), \(3^n\), \(n!\) (i.e., intractable for large \(n\))
Key insight: Polynomial is feasible, exponential is not
Numerical Example: \(2^{100} \approx 10^{30}\) versus \(100^2 = 10,000\)
Cryptography: Security relies on exponential-time attacks

Polynomial-time problems are considered “efficiently computable”
Exponential-time problems are generally “computationally impractical”
The polynomial/exponential boundary demarcates tractability

Grasp Big-O notation

Upper bound: Big-O provides a formal upper bound on growth rate
Dominant terms: Keep only the fastest growing term, drop constants
Standard language: Main notation for discussing algorithm efficiency
Common mistakes: Input size versus value, arithmetic operation costs

Big-O notation standardizes efficiency discussions: \(5n^2 + 100n\) becomes \(O(n^2)\) because we focus on dominant terms and ignore constants. This enables meaningful comparisons across system configurations!

Formalizing asymptotic growth

Big-O notation: Upper bound on growth rate
Dominant terms: Keep fastest growing term
Drop constants: \(5n^2\) becomes \(O(n^2)\)
Drop lower terms: \(n^2 + 100n\) becomes \(O(n^2)\)
Key Question: How to identify dominant terms?

Big-O provides a “standard language” for discussing efficiency
It enables proofgrammers to communicate about algorithm performance
Don’t worry about implementation details or hardware differences since the focus is on growth rates of functions describing algorithms!

Dominant term hierarchy

Common complexity classes

Dominant terms define the growth rates, with many easy to compare
Form a hierarchy based on “common-sense rules” about growth rates

Simplifying to dominant terms

Dominant term examples

For a function \(f(n)\), define the dominant term as \(DT(f)\)
The \(DT(f)\) captures the “fastest growing part” of \(f(n)\)
Makes growth rates more clear, can see algorithm efficiency!

Practical definition of Big-O notation

Practical Big-O definition

\(f(n) = 3n + 5n \times logn\), \(DT(f) = n \times logn\) and \(f(n) \in O(n \times logn)\)
\(g(n) = 3n \times logn^2\), \(DT(g) = n \times logn^2\) and \(g(n) \in O(n \times logn^2)\)
Also conclude that \(f(n) \in O(g)\) since \(n \times logn\) grows slower than \(n \times logn^2\)
Example illustrates how big-O captures growth rate comparisons effectively
Notice that it captures an “upper bound” on an algorithm’s running time
Key task: ignore constants and lower terms to find the dominant term
Important question: how do you understand the meaning of \(f \in O(g)\)?

Ignore constants and lower terms

Practical Big-O examples

Determine the growth function for an algorithm as \(f\)
Identify growth functions dominant term as \(DT(f)\)
Express the computational complexity as \(O(DT(f))\)
Capture the essential growth behavior of the algorithm
Remember that slow growth rates indicate efficient algorithms!

Formal definition of Big-O notation

Formal Big-O definition

\(f(n) \in O(g)\) means constants \(c\) and \(N\) exist
The main constraint is \(f(n) \leq c \times g(n)\) for all \(n \geq N\)

Some common Big-O mistakes

Mistake 1: Confusing input size with input value
- Input size \(n\) is “length in symbols”
- Input value \(M\) is the number represented
- Example: Number 1000 has size \(n=4\) but value \(M=1000\)
Mistake 2: Assuming all operations are constant time
- Arithmetic on large numbers is not \(O(1)\)
- Multiplication is \(O(n^2)\), division is \(O(n^2)\)
- Addition and subtraction are \(O(n)\)
These mistakes may lead to incorrect complexity analysis
Remember that arithmetic complexity depends on “number size”

Running time analysis

Practical examples: Analyze Python programs for complexity
Loop analysis: Nested loops often show quadratic or higher complexity
String operations: Methods like substring search are linear time
Turing machines: Formal running time analysis extends to prior models

Running time analysis reveals algorithm efficiency: count how many times the innermost operations execute as a function of input size \(n\). This skill helps proofgrammers write faster, more scalable Python programs. Or, for a Turing machine, you can count how many transitions you follow! In both cases, make sure to “count” from the perspective of the worst-case input!

Analyzing a simple nested loop

Linear-time string search

Analysis: String search examines each position once
For input length \(n\), contains_gaga is \(O(n)\) or linear time!
Importantly, it is much faster than \(O(n^2)\) for large inputs!

Turing machine running time analysis

containsGAGA Turing machine

Formal definition of running time

Running time definition

Key points: Maximum over all inputs of given length or “worst case”
“Computational step” means single CPU instruction for Python
Or, it could mean taking a single transition for Turing machines
Look at running time and then consider the asymptotic costs!

Python operation costs

Exercise: Analyze this Python program

Python analysis exercise

Run `matchingCharIndices`

Run this program with different inputs and understand how it works!

Initial complexity analysis of `matchingCharIndices`

Function purpose: Find matching character indices between two words
Nested loops: Outer loop iterates len(word1) times, inner len(word2) times
Total iterations: \(n_1 \times n_2\) where \(n_1 =\) len(word1), \(n_2 =\) len(word2)
Worst-case analysis: If both words have equal length \(\approx n/2\)
But wait: What about the work done inside each iteration?

The nested loops suggest \(O(n²)\) complexity: With input 'aaa aaaaa', the function performs \(3 \times 5 = 15\) iterations. While this analysis is not “wrong” per se it does not give the complete picture! The full analysis requires examining what happens inside each iteration, particularly the string construction operations! Let’s explore!

Wait, why is it \(O(n² log n)\)?

Careful examination: The nested loops are only part of the story!
String construction: Line 10 creates thisPair = str(i) + ',' + str(j).
Integer-to-string: Converting integer i or j to string requires \(O(log n)\) time
Why logarithmic?: Number of digits in integer k is \(O(log_{₁₀} k)\), so str(k) takes \(O(log k)\) time. And, importantly, the cost of append is \(O(1)\)!
Combined: \(O(n²)\) iterations times \(O(log n)\) per iteration = \(O(n² log n)\)
Here is the full breakdown for the line ranges in book’s code:
- Lines 3-5: \(O(n)\)
- Lines 7-11: \(O(n² log n)\)
- Lines 13: \(O(n² log n)\)
- Overall: \(O(n + n² log n + n² log n)\)
- Simplified: \(O(n² log n)\)

String operations are not free!

For a list in Python called L the length is len(L)
It is more difficult to understand the length of an int!
The integer 999 has 3 digits, requiring \(O(log 1000) = O(3)\) time to convert and/or process this variable as a string
For input size \(n\), indices can reach \(n\), requiring \(O(log n)\) time per conversion and/or processing of the string
Accordingly, this logarithmic factor multiplies the quadratic loop structure, yielding \(O(n² log n)\) total complexity!
Key insight: Not sufficient to only look at the loop structure!
What is distinction between “size” and “value” of an input?

Input size versus value

Critical distinction: Input size \(n\) is length in symbols, not number’s value
Arithmetic costs: Operations on large numbers are not constant time
Exponential trap: Loop running \(M\) times is exponential in \(n = \log M\)
Cryptography: Factoring large numbers takes exponential time

Don’t confuse size with value: the number 1000 has size \(n=4\) digits but value \(M=1000\). A loop running \(M\) times is exponential in \(n\), not linear. This mistake leads to incorrect complexity analysis! See the book’s discussion of the MCopiesOfC.py program for more details!

Deceptive program analysis

Trick question program called MCopiesOfC.py

Start with an empty list and append character ‘C’ \(M\) times
This program as a whole requires \(O(M)\) time, … or not?
Actually, the running time is \(O(10^n)\) in input size \(n\)!

Size versus value distinction

Input size versus value for numbers in decimal or binary notation

Critical insight: Input size \(n\) is length of input in symbols
Input value \(M\) is the number represented in a notation
To be clear, the loop runs \(M\) times, not \(n\) times
However, this is exponential in \(n\) when using binary notation!
Subtle, but important distinction for complexity analysis!

Size versus value examples

Size versus value examples for integers

Factoring is exponential-time

Factoring algorithm

Factoring complexity: Loop runs up to \(M\) iterations
Input size is actually \(n = \log M\) digits
Time complexity: \(O(M) = O(2^n)\) or exponential in nature
This exponential difficulty underlies RSA cryptography security

Factoring example with Python

Small inputs fast, but doubling the number of digits increases time
This exponential growth makes factoring large numbers infeasible

Aha! Time complexity is measured as a function of the length of the input, not the numerical value(s) of the input!

Arithmetic operation complexity

Arithmetic complexity

Complexity classes

Model distinctions: From the lens of complexity, Turing machines are “less efficient” than typical computer
Model equivalence: Turing machines, Python, and real computers are “polynomially equivalent”
Organizing problems: After picking a model, complexity classes group problems by resource requirements
Poly versus Expo: Tractable versus intractable problems

Complexity classes organize computational difficulty: Poly contains the “tractable” problems, Expo contains “intractable” ones. This means that the polynomial/exponential boundary determines practical solvability!

Model equivalence

Computational model equivalence

Classical models simulate each other with “only” polynomial overhead

Turing machines, Python, and real computers are “equivalent”
However, Turing machines tend to be harder to “program”
From now on, we use Python programs as our standard model
Python programs are a good “abstraction” for Turing machines
Python programs are equivalent to random-access Turing machine and real computers up to polynomial factors
With that in mind, what are the other complexity classes?
Let’s briefly explore these classes in greater detail!

Defining complexity classes

Complexity class: Collection of problems sharing complexity properties
Const: Problems solvable in constant time \(O(1)\)
Lin: Problems solvable in linear time \(O(n)\)
Quad: Problems solvable in quadratic time \(O(n^2)\)
Poly: Problems solvable in polynomial time \(O(n^k)\)
Expo: Problems solvable in exponential time \(O(2^n)\)
PolyCheck: Problems whose solutions are verifiable in polynomial time

Focus on Poly, Expo, and PolyCheck: These classes are “model independent”
They capture fundamental distinctions, generalize by use of the variable \(k\)
Later chapters explore P, NP, and NP-completeness in depth!

Tractable versus intractable

Tractable: Problems in Poly (polynomial time)
- Examples: Sorting, searching, graph connectivity
- Feasible for large inputs
Intractable: Problems requiring exponential time
- Examples: Factoring (no known polynomial algorithm)
- Infeasible for large inputs
Open questions: Many problems have unknown complexity
- Is factoring in Poly?
- Is every problem in PolyCheck also in Poly? (P versus NP!)

Key insights for proofgrammers

Big-O notation: Standard language for algorithm efficiency
Asymptotic analysis: Focus on growth rates, not constants
Input size versus value: Critical distinction for correct analysis
Arithmetic costs: Not constant time for large numbers
Complexity classes: Organize problems by resource requirements
Practical impact: Exponential algorithms are infeasible

Understanding these concepts helps proofgrammers write efficient programs
Use these methods to recognize when problems are inherently difficult
Remember, complexity theory provides essential tools for algorithm design!