Computational Complexity

Gregory M. Kapfhammer

November 3, 2025

Learning objectives

Learning Objectives for Theoretical Machines

  • CS-204-1: Use both intuitive analysis and theoretical proof techniques to correctly distinguish between problems that are tractable, intractable, and uncomputable.
  • CS-204-2: Correctly use one or more variants of the Turing machine (TM) abstraction to both describe and analyze the solution to a computational problem.
  • CS-204-3: Correctly use one or more variants of the finite state machine (FSM) abstraction to describe and analyze the solution to a computational problem.
  • CS-204-4: Use a formal proof technique to correctly classify a problem according to whether or not it is in the P, NP, NP-Hard, and/or NP-Complete complexity class(es).
  • CS-204-5: Apply insights from theoretical proofs concerning the limits of either program feasibility or complexity to the implementation of both correct and efficient real-world Python programs.

Why computational complexity matters

  • Theory aids with learning objectives CS-204-1 and CS-204-4
  • Distinguish tractable from intractable problems
  • Why some problems efficiently solvable while others impractical?
  • How do we measure and compare algorithm efficiency?
  • How do we classify problems by their complexity class?

Why computational complexity?

  • Time matters: Some algorithms are fast, others impossibly slow
  • Comparing efficiency: Formal ways to compare algorithms
  • Growth rates: How does running time change with input size?
  • Cryptography: Security depends on computational hardness
  • Question: What problems are efficiently solvable?
  • Complexity distinguishes feasible from infeasible computation
  • Computational complexity reveals theoretical and practical limits
  • Exponential algorithms may work sometimes but are often unusable!

Review: finite automata

  • Finite automata: Restricted Turing machines with limited memory
  • DFA vs NFA: Deterministic versus nondeterministic computation
  • Regular languages: Languages recognized by finite automata
  • Regular expressions: Compact notation for pattern matching
  • Equivalence: DFAs, NFAs, and regex recognize same language class
  • Limitations: Cannot count or maintain unbounded memory
  • Okay, finite automata showed computational restrictions!
  • Now time restrictions: how quickly can problems be solved?

Review: Pattern recognition efficiency

  • Linear time: DFAs process input in \(O(n)\) time
  • Single pass: Read input once, left to right
  • No backtracking: Decisions made immediately
  • Applications: Lexical analysis, pattern matching, protocol verification
  • Connection to complexity: Automata provide efficient solutions for restricted problems. So, how else can restriction prove beneficial?
  • Automata theory connects to and motivates complexity theory
  • Finite automata solve their problems efficiently (i.e., in linear time)
  • Now we ask: which problems need more time? How much more?

Algorithm efficiency challenges

Different Inputs

  • Varying sizes of input data
  • Different types of data
  • Degrees of sortedness
  • Degrees of duplication
  • Different data distributions

Different Computers

  • Diverse hardware setups
  • Varying system performance
  • Diverse system building
  • Different operating systems
  • Variation in system load

Ultimate goal: rigorous analytical evaluation method that yields actionable insights that supports understanding and prediction

Worst-case time

  • Growth rates matter: How running time changes with input size \(n\)
  • Asymptotic analysis: Focus on behavior for large \(n\), ignore constants
  • Polynomial vs exponential: Fundamental distinction for tractability
  • Big-O notation: Formalize and compare algorithm efficiency

Understanding growth rates is crucial: an algorithm taking \(100n\) steps is fundamentally faster than one taking \(n^2\) steps for large \(n\). Asymptotic analysis lets proofgrammers compare algorithms meaningfully!

Understanding algorithm efficiency

  • Absolute time: Depends on hardware, language, implementation
  • Growth rate: How time changes with input size \(n\)
  • Asymptotic analysis: Focus on behavior for large \(n\)
  • Comparing algorithms: Growth rate matters more than constants
  • Question: How to effectively formalize growth rates?
  • Asymptotic analysis ignores constant factors and focuses on growth
  • Since absolute time varies by hardware and implementation, focus on worst-case time complexity to enable meaningful comparisons across machines and implementations. Okay, let’s dive into growth rates!

Vast gulf between growth rates

Growth rate comparison

  • Insight: Wow, these growth rates differ dramatically!
  • Insight: Polynomial growth is feasible, exponential is not!

Exponential versus polynomial

  • Polynomial time: \(n\), \(n^2\), \(n^3\), \(n^{100}\) (i.e., broadly tractable)
  • Exponential time: \(2^n\), \(3^n\), \(n!\) (i.e., intractable for large \(n\))
  • Key insight: Polynomial is feasible, exponential is not
  • Numerical Example: \(2^{100} \approx 10^{30}\) versus \(100^2 = 10,000\)
  • Cryptography: Security relies on exponential-time attacks
  • Polynomial-time problems are considered “efficiently computable”
  • Exponential-time problems are generally “computationally impractical”
  • The polynomial/exponential boundary demarcates tractability

Grasp Big-O notation

  • Upper bound: Big-O provides a formal upper bound on growth rate
  • Dominant terms: Keep only the fastest growing term, drop constants
  • Standard language: Main notation for discussing algorithm efficiency
  • Common mistakes: Input size versus value, arithmetic operation costs

Big-O notation standardizes efficiency discussions: \(5n^2 + 100n\) becomes \(O(n^2)\) because we focus on dominant terms and ignore constants. This enables meaningful comparisons across system configurations!

Formalizing asymptotic growth

  • Big-O notation: Upper bound on growth rate
  • Dominant terms: Keep fastest growing term
  • Drop constants: \(5n^2\) becomes \(O(n^2)\)
  • Drop lower terms: \(n^2 + 100n\) becomes \(O(n^2)\)
  • Key Question: How to identify dominant terms?
  • Big-O provides a “standard language” for discussing efficiency
  • It enables proofgrammers to communicate about algorithm performance
  • Don’t worry about implementation details or hardware differences since the focus is on growth rates of functions describing algorithms!

Dominant term hierarchy

Common complexity classes

  • Dominant terms define the growth rates, with many easy to compare
  • Form a hierarchy based on “common-sense rules” about growth rates

Simplifying to dominant terms

Dominant term examples

  • For a function \(f(n)\), define the dominant term as \(DT(f)\)
  • The \(DT(f)\) captures the “fastest growing part” of \(f(n)\)
  • Makes growth rates more clear, can see algorithm efficiency!

Practical definition of Big-O notation

Practical Big-O definition

  • \(f(n) = 3n + 5n \times logn\), \(DT(f) = n \times logn\) and \(f(n) \in O(n \times logn)\)
  • \(g(n) = 3n \times logn^2\), \(DT(g) = n \times logn^2\) and \(g(n) \in O(n \times logn^2)\)
  • Also conclude that \(f(n) \in O(g)\) since \(n \times logn\) grows slower than \(n \times logn^2\)
  • Example illustrates how big-O captures growth rate comparisons effectively
  • Notice that it captures an “upper bound” on an algorithm’s running time
  • Key task: ignore constants and lower terms to find the dominant term
  • Important question: how do you understand the meaning of \(f \in O(g)\)?

Ignore constants and lower terms

Practical Big-O examples

  • Determine the growth function for an algorithm as \(f\)
  • Identify growth functions dominant term as \(DT(f)\)
  • Express the computational complexity as \(O(DT(f))\)
  • Capture the essential growth behavior of the algorithm
  • Remember that slow growth rates indicate efficient algorithms!

Formal definition of Big-O notation

Formal Big-O definition

  • \(f(n) \in O(g)\) means constants \(c\) and \(N\) exist
  • The main constraint is \(f(n) \leq c \times g(n)\) for all \(n \geq N\)

Some common Big-O mistakes

  • Mistake 1: Confusing input size with input value
    • Input size \(n\) is “length in symbols”
    • Input value \(M\) is the number represented
    • Example: Number 1000 has size \(n=4\) but value \(M=1000\)
  • Mistake 2: Assuming all operations are constant time
    • Arithmetic on large numbers is not \(O(1)\)
    • Multiplication is \(O(n^2)\), division is \(O(n^2)\)
    • Addition and subtraction are \(O(n)\)
  • These mistakes may lead to incorrect complexity analysis
  • Remember that arithmetic complexity depends on “number size”

Running time analysis

  • Practical examples: Analyze Python programs for complexity
  • Loop analysis: Nested loops often show quadratic or higher complexity
  • String operations: Methods like substring search are linear time
  • Turing machines: Formal running time analysis extends to prior models

Running time analysis reveals algorithm efficiency: count how many times the innermost operations execute as a function of input size \(n\). This skill helps proofgrammers write faster, more scalable Python programs. Or, for a Turing machine, you can count how many transitions you follow! In both cases, make sure to “count” from the perspective of the worst-case input!

Analyzing a simple nested loop

Turing machine running time analysis

containsGAGA Turing machine

Formal definition of running time

Running time definition

  • Key points: Maximum over all inputs of given length or “worst case”
  • “Computational step” means single CPU instruction for Python
  • Or, it could mean taking a single transition for Turing machines
  • Look at running time and then consider the asymptotic costs!

Python operation costs

Exercise: Analyze this Python program

Python analysis exercise

Run matchingCharIndices

  • Run this program with different inputs and understand how it works!

Initial complexity analysis of matchingCharIndices

  • Function purpose: Find matching character indices between two words
  • Nested loops: Outer loop iterates len(word1) times, inner len(word2) times
  • Total iterations: \(n_1 \times n_2\) where \(n_1 =\) len(word1), \(n_2 =\) len(word2)
  • Worst-case analysis: If both words have equal length \(\approx n/2\)
  • But wait: What about the work done inside each iteration?

The nested loops suggest \(O(n²)\) complexity: With input 'aaa aaaaa', the function performs \(3 \times 5 = 15\) iterations. While this analysis is not “wrong” per se it does not give the complete picture! The full analysis requires examining what happens inside each iteration, particularly the string construction operations! Let’s explore!

Wait, why is it \(O(n² log n)\)?

  • Careful examination: The nested loops are only part of the story!
  • String construction: Line 10 creates thisPair = str(i) + ',' + str(j).
  • Integer-to-string: Converting integer i or j to string requires \(O(log n)\) time
  • Why logarithmic?: Number of digits in integer k is \(O(log_{₁₀} k)\), so str(k) takes \(O(log k)\) time. And, importantly, the cost of append is \(O(1)\)!
  • Combined: \(O(n²)\) iterations times \(O(log n)\) per iteration = \(O(n² log n)\)
  • Here is the full breakdown for the line ranges in book’s code:
    • Lines 3-5: \(O(n)\)
    • Lines 7-11: \(O(n² log n)\)
    • Lines 13: \(O(n² log n)\)
    • Overall: \(O(n + n² log n + n² log n)\)
    • Simplified: \(O(n² log n)\)

String operations are not free!

  • For a list in Python called L the length is len(L)
  • It is more difficult to understand the length of an int!
  • The integer 999 has 3 digits, requiring \(O(log 1000) = O(3)\) time to convert and/or process this variable as a string
  • For input size \(n\), indices can reach \(n\), requiring \(O(log n)\) time per conversion and/or processing of the string
  • Accordingly, this logarithmic factor multiplies the quadratic loop structure, yielding \(O(n² log n)\) total complexity!
  • Key insight: Not sufficient to only look at the loop structure!
  • What is distinction between “size” and “value” of an input?

Input size versus value

  • Critical distinction: Input size \(n\) is length in symbols, not number’s value
  • Arithmetic costs: Operations on large numbers are not constant time
  • Exponential trap: Loop running \(M\) times is exponential in \(n = \log M\)
  • Cryptography: Factoring large numbers takes exponential time

Don’t confuse size with value: the number 1000 has size \(n=4\) digits but value \(M=1000\). A loop running \(M\) times is exponential in \(n\), not linear. This mistake leads to incorrect complexity analysis! See the book’s discussion of the MCopiesOfC.py program for more details!

Deceptive program analysis

Trick question program called MCopiesOfC.py

  • Start with an empty list and append character ‘C’ \(M\) times
  • This program as a whole requires \(O(M)\) time, … or not?
  • Actually, the running time is \(O(10^n)\) in input size \(n\)!

Size versus value distinction

Input size versus value for numbers in decimal or binary notation

  • Critical insight: Input size \(n\) is length of input in symbols
  • Input value \(M\) is the number represented in a notation
  • To be clear, the loop runs \(M\) times, not \(n\) times
  • However, this is exponential in \(n\) when using binary notation!
  • Subtle, but important distinction for complexity analysis!

Size versus value examples

Size versus value examples for integers

Factoring is exponential-time

Factoring algorithm

  • Factoring complexity: Loop runs up to \(M\) iterations
  • Input size is actually \(n = \log M\) digits
  • Time complexity: \(O(M) = O(2^n)\) or exponential in nature
  • This exponential difficulty underlies RSA cryptography security

Factoring example with Python

  • Small inputs fast, but doubling the number of digits increases time
  • This exponential growth makes factoring large numbers infeasible

Aha! Time complexity is measured as a function of the length of the input, not the numerical value(s) of the input!

Arithmetic operation complexity

Arithmetic complexity

Complexity classes

  • Model distinctions: From the lens of complexity, Turing machines are “less efficient” than typical computer
  • Model equivalence: Turing machines, Python, and real computers are “polynomially equivalent”
  • Organizing problems: After picking a model, complexity classes group problems by resource requirements
  • Poly versus Expo: Tractable versus intractable problems

Complexity classes organize computational difficulty: Poly contains the “tractable” problems, Expo contains “intractable” ones. This means that the polynomial/exponential boundary determines practical solvability!

Model equivalence

Computational model equivalence

Classical models simulate each other with “only” polynomial overhead

  • Turing machines, Python, and real computers are “equivalent”
  • However, Turing machines tend to be harder to “program”
  • From now on, we use Python programs as our standard model
  • Python programs are a good “abstraction” for Turing machines
  • Python programs are equivalent to random-access Turing machine and real computers up to polynomial factors
  • With that in mind, what are the other complexity classes?
  • Let’s briefly explore these classes in greater detail!

Defining complexity classes

  • Complexity class: Collection of problems sharing complexity properties
  • Const: Problems solvable in constant time \(O(1)\)
  • Lin: Problems solvable in linear time \(O(n)\)
  • Quad: Problems solvable in quadratic time \(O(n^2)\)
  • Poly: Problems solvable in polynomial time \(O(n^k)\)
  • Expo: Problems solvable in exponential time \(O(2^n)\)
  • PolyCheck: Problems whose solutions are verifiable in polynomial time
  • Focus on Poly, Expo, and PolyCheck: These classes are “model independent”
  • They capture fundamental distinctions, generalize by use of the variable \(k\)
  • Later chapters explore P, NP, and NP-completeness in depth!

Tractable versus intractable

  • Tractable: Problems in Poly (polynomial time)
    • Examples: Sorting, searching, graph connectivity
    • Feasible for large inputs
  • Intractable: Problems requiring exponential time
    • Examples: Factoring (no known polynomial algorithm)
    • Infeasible for large inputs
  • Open questions: Many problems have unknown complexity
    • Is factoring in Poly?
    • Is every problem in PolyCheck also in Poly? (P versus NP!)

Key insights for proofgrammers

  • Big-O notation: Standard language for algorithm efficiency
  • Asymptotic analysis: Focus on growth rates, not constants
  • Input size versus value: Critical distinction for correct analysis
  • Arithmetic costs: Not constant time for large numbers
  • Complexity classes: Organize problems by resource requirements
  • Practical impact: Exponential algorithms are infeasible
  • Understanding these concepts helps proofgrammers write efficient programs
  • Use these methods to recognize when problems are inherently difficult
  • Remember, complexity theory provides essential tools for algorithm design!