Generators and Generator Expressions
Generators in Python are a powerful way to create memory-efficient iterators that generate values on-the-fly rather than storing them all in memory. They use lazy evaluation, meaning values are computed only when needed.
Why Use Generators?
Section titled “Why Use Generators?”Without generators (memory-intensive):
def create_large_list(n: int) -> list: """Create a list of squares - stores everything in memory""" result = [] for i in range(n): result.append(i * i) return result
# Problem: Creates entire list in memorysquares = create_large_list(1_000_000) # Uses ~40MB of memory!for square in squares: print(square) break # Only needed first value, but entire list was created!With generators (memory-efficient):
def generate_squares(n: int): """Generator that yields squares - memory efficient""" for i in range(n): yield i * i # Yields one value at a time
# Only creates values as neededsquares = generate_squares(1_000_000) # Uses almost no memory!for square in squares: print(square) break # Only first value was computed!Understanding Generators
Section titled “Understanding Generators”Basic Generator Function
Section titled “Basic Generator Function”A generator function uses yield to produce values:
def countdown(n: int): """Simple generator that counts down""" print("Starting countdown...") while n > 0: yield n # Pauses here and returns n n -= 1 print("Countdown complete!")
# Create generator objectgen = countdown(5)print(type(gen)) # <class 'generator'>
# Iterate over generatorfor num in gen: print(num)
# Output:# Starting countdown...# 5# 4# 3# 2# 1# Countdown complete!Generator vs Regular Function
Section titled “Generator vs Regular Function”def regular_function(n: int) -> list: """Regular function - returns complete list""" result = [] for i in range(n): result.append(i) return result # Returns entire list
def generator_function(n: int): """Generator function - yields values one by one""" for i in range(n): yield i # Yields one value at a time
# Regular functionnumbers = regular_function(5)print(numbers) # [0, 1, 2, 3, 4] - entire list in memory
# Generator functiongen = generator_function(5)print(gen) # <generator object generator_function at 0x...>print(list(gen)) # [0, 1, 2, 3, 4] - convert to list if neededCommon Generator Patterns
Section titled “Common Generator Patterns”1. Infinite Sequences
Section titled “1. Infinite Sequences”Generators can represent infinite sequences:
def fibonacci(): """Generate Fibonacci numbers infinitely""" a, b = 0, 1 while True: yield a a, b = b, a + b
# Create generatorfib = fibonacci()
# Get first 10 Fibonacci numbersfor i, num in enumerate(fib): if i >= 10: break print(num, end=" ")
# Output: 0 1 1 2 3 5 8 13 21 342. Reading Large Files
Section titled “2. Reading Large Files”Generators are perfect for processing large files:
def read_large_file(filename: str): """Generator that reads file line by line""" with open(filename, 'r') as file: for line in file: yield line.strip() # Yield one line at a time
# Process large file without loading it all into memoryfor line in read_large_file('large_file.txt'): process_line(line) # Process one line at a time3. Filtering and Transforming
Section titled “3. Filtering and Transforming”def filter_even(numbers): """Generator that filters even numbers""" for num in numbers: if num % 2 == 0: yield num
def square_numbers(numbers): """Generator that squares numbers""" for num in numbers: yield num * num
# Chain generatorsnumbers = range(10)even_squares = square_numbers(filter_even(numbers))
for num in even_squares: print(num) # 0, 4, 16, 36, 644. Stateful Generators
Section titled “4. Stateful Generators”Generators maintain state between calls:
def counter(start: int = 0, step: int = 1): """Generator that maintains a counter""" current = start while True: yield current current += step
# Create counter starting at 10, incrementing by 5count = counter(10, 5)print(next(count)) # 10print(next(count)) # 15print(next(count)) # 20Generator Expressions
Section titled “Generator Expressions”Generator expressions are a concise way to create generators (like list comprehensions):
# List comprehension - creates list in memorysquares_list = [x * x for x in range(10)]print(type(squares_list)) # <class 'list'>
# Generator expression - creates generatorsquares_gen = (x * x for x in range(10))print(type(squares_gen)) # <class 'generator'>
# Use generator expressionfor square in squares_gen: print(square)
# Generator expressions are memory efficientlarge_gen = (x * x for x in range(1_000_000))print(next(large_gen)) # 0 - only first value computedAdvanced Generator Features
Section titled “Advanced Generator Features”1. Sending Values to Generators
Section titled “1. Sending Values to Generators”Generators can receive values using .send():
def accumulator(): """Generator that accumulates values""" total = 0 while True: value = yield total # Receive value via send() if value is None: break total += value
# Create generatoracc = accumulator()next(acc) # Start generator (prime it)
# Send values to generatorprint(acc.send(10)) # 10print(acc.send(20)) # 30print(acc.send(5)) # 35acc.close() # Close generator2. Throwing Exceptions
Section titled “2. Throwing Exceptions”You can throw exceptions into generators:
def resilient_generator(): """Generator that handles exceptions""" try: for i in range(10): yield i except ValueError as e: print(f"Caught exception: {e}") yield "Error handled"
gen = resilient_generator()print(next(gen)) # 0print(next(gen)) # 1gen.throw(ValueError("Something went wrong")) # Throws exception3. yield from (Delegating to Sub-generators)
Section titled “3. yield from (Delegating to Sub-generators)”yield from delegates to another generator:
def numbers(): """Generator that yields numbers""" yield 1 yield 2 yield 3
def more_numbers(): """Generator that yields more numbers""" yield 4 yield 5
def all_numbers(): """Generator that combines other generators""" yield from numbers() # Delegates to numbers() yield from more_numbers() # Delegates to more_numbers() yield 6
# Use combined generatorfor num in all_numbers(): print(num) # 1, 2, 3, 4, 5, 6Real-World Examples
Section titled “Real-World Examples”1. Processing Large Datasets
Section titled “1. Processing Large Datasets”def process_csv_rows(filename: str): """Generator that processes CSV rows one at a time""" import csv with open(filename, 'r') as file: reader = csv.DictReader(file) for row in reader: # Process one row at a time processed = { 'name': row['name'].upper(), 'age': int(row['age']), 'city': row['city'].title() } yield processed
# Process large CSV without loading it all into memoryfor row in process_csv_rows('large_file.csv'): save_to_database(row) # Process one row at a time2. Pagination Generator
Section titled “2. Pagination Generator”def paginate(items, page_size: int = 10): """Generator that yields pages of items""" for i in range(0, len(items), page_size): yield items[i:i + page_size]
# Process items in pagesitems = list(range(100))for page in paginate(items, page_size=20): print(f"Processing page: {page}") # Process page...3. Sliding Window
Section titled “3. Sliding Window”def sliding_window(sequence, window_size: int): """Generator that yields sliding windows""" for i in range(len(sequence) - window_size + 1): yield sequence[i:i + window_size]
# Get sliding windows of size 3data = [1, 2, 3, 4, 5, 6]for window in sliding_window(data, window_size=3): print(window)
# Output:# [1, 2, 3]# [2, 3, 4]# [3, 4, 5]# [4, 5, 6]4. Chunking Data
Section titled “4. Chunking Data”def chunked(iterable, chunk_size: int): """Generator that yields chunks of data""" iterator = iter(iterable) while True: chunk = [] try: for _ in range(chunk_size): chunk.append(next(iterator)) yield chunk except StopIteration: if chunk: # Yield remaining items yield chunk break
# Process data in chunksdata = range(25)for chunk in chunked(data, chunk_size=7): print(f"Processing chunk: {chunk}")
# Output:# Processing chunk: [0, 1, 2, 3, 4, 5, 6]# Processing chunk: [7, 8, 9, 10, 11, 12, 13]# Processing chunk: [14, 15, 16, 17, 18, 19, 20]# Processing chunk: [21, 22, 23, 24]Generator vs Iterator
Section titled “Generator vs Iterator”When to Use Generators
Section titled “When to Use Generators”Use generators when:
- ✅ You need memory efficiency
- ✅ You’re processing large datasets
- ✅ You only iterate once
- ✅ You want lazy evaluation
- ✅ You need infinite sequences
When to Use Iterators
Section titled “When to Use Iterators”Use iterators (classes with __iter__ and __next__) when:
- ✅ You need complex state management
- ✅ You need to reset the iteration
- ✅ You need multiple iteration methods
- ✅ You need to implement other protocols
Common Mistakes to Avoid
Section titled “Common Mistakes to Avoid”Mistake 1: Consuming Generator Multiple Times
Section titled “Mistake 1: Consuming Generator Multiple Times”# ❌ Bad - generator can only be consumed oncegen = (x * x for x in range(5))print(list(gen)) # [0, 1, 4, 9, 16]print(list(gen)) # [] - generator is exhausted!
# ✅ Good - create new generator each timedef get_squares(): return (x * x for x in range(5))
print(list(get_squares())) # [0, 1, 4, 9, 16]print(list(get_squares())) # [0, 1, 4, 9, 16] - works!Mistake 2: Not Handling StopIteration
Section titled “Mistake 2: Not Handling StopIteration”# ❌ Bad - StopIteration can be raisedgen = (x for x in range(3))while True: value = next(gen) # Raises StopIteration when exhausted print(value)
# ✅ Good - use for loop or handle StopIterationgen = (x for x in range(3))for value in gen: print(value)
# Or handle StopIteration explicitlygen = (x for x in range(3))while True: try: value = next(gen) print(value) except StopIteration: breakMistake 3: Modifying Iterable During Iteration
Section titled “Mistake 3: Modifying Iterable During Iteration”# ❌ Bad - modifying list while iteratingdef filter_evens(numbers): for num in numbers: if num % 2 != 0: numbers.remove(num) # Modifying while iterating! return numbers
# ✅ Good - use generator to create new sequencedef filter_evens(numbers): return [num for num in numbers if num % 2 == 0]
# Or use generatordef filter_evens_gen(numbers): for num in numbers: if num % 2 == 0: yield numPerformance Considerations
Section titled “Performance Considerations”Memory Comparison
Section titled “Memory Comparison”import sys
# List comprehension - stores all valuessquares_list = [x * x for x in range(1_000_000)]print(f"List size: {sys.getsizeof(squares_list)} bytes") # ~8MB
# Generator expression - stores nothingsquares_gen = (x * x for x in range(1_000_000))print(f"Generator size: {sys.getsizeof(squares_gen)} bytes") # ~200 bytesWhen Generators Are Slower
Section titled “When Generators Are Slower”Generators can be slower than lists when:
- You need random access (indexing)
- You iterate multiple times
- You need to check membership frequently
For these cases, consider converting to a list:
# Generator for one-time iterationgen = (x * x for x in range(1000))for square in gen: process(square) # Fast and memory efficient
# List for multiple iterationssquares = [x * x for x in range(1000)]for square in squares: # First iteration process(square)for square in squares: # Second iteration - need list process(square)Best Practices
Section titled “Best Practices”- Use generators for large datasets - Memory efficiency matters
- Use generator expressions for one-time iteration - Clean and efficient
- Convert to list if needed multiple times - Don’t recreate generators
- Use
yield fromfor delegation - Cleaner than manual iteration - Document generator behavior - Especially if it has side effects
- Handle StopIteration properly - Use
forloops when possible
Key Takeaways
Section titled “Key Takeaways”- Generators are memory-efficient iterators using
yield - Generator expressions are concise syntax:
(x for x in iterable) - Lazy evaluation - Values computed only when needed
- Memory efficient - Perfect for large datasets
- Can be infinite - Represent infinite sequences
- One-time use - Generators can only be consumed once
- Use
yield from- For delegating to sub-generators - Convert to list - If you need multiple iterations
Remember: Generators let you work with large datasets efficiently - generate values on-demand instead of storing everything in memory! ⚡