Skip to content

Generators and Generator Expressions

Create memory-efficient iterators with generators - Python's lazy evaluation power.

Generators in Python are a powerful way to create memory-efficient iterators that generate values on-the-fly rather than storing them all in memory. They use lazy evaluation, meaning values are computed only when needed.

Without generators (memory-intensive):

without_generators.py
def create_large_list(n: int) -> list:
"""Create a list of squares - stores everything in memory"""
result = []
for i in range(n):
result.append(i * i)
return result
# Problem: Creates entire list in memory
squares = create_large_list(1_000_000) # Uses ~40MB of memory!
for square in squares:
print(square)
break # Only needed first value, but entire list was created!

With generators (memory-efficient):

with_generators.py
def generate_squares(n: int):
"""Generator that yields squares - memory efficient"""
for i in range(n):
yield i * i # Yields one value at a time
# Only creates values as needed
squares = generate_squares(1_000_000) # Uses almost no memory!
for square in squares:
print(square)
break # Only first value was computed!

A generator function uses yield to produce values:

basic_generator.py
def countdown(n: int):
"""Simple generator that counts down"""
print("Starting countdown...")
while n > 0:
yield n # Pauses here and returns n
n -= 1
print("Countdown complete!")
# Create generator object
gen = countdown(5)
print(type(gen)) # <class 'generator'>
# Iterate over generator
for num in gen:
print(num)
# Output:
# Starting countdown...
# 5
# 4
# 3
# 2
# 1
# Countdown complete!
generator_vs_function.py
def regular_function(n: int) -> list:
"""Regular function - returns complete list"""
result = []
for i in range(n):
result.append(i)
return result # Returns entire list
def generator_function(n: int):
"""Generator function - yields values one by one"""
for i in range(n):
yield i # Yields one value at a time
# Regular function
numbers = regular_function(5)
print(numbers) # [0, 1, 2, 3, 4] - entire list in memory
# Generator function
gen = generator_function(5)
print(gen) # <generator object generator_function at 0x...>
print(list(gen)) # [0, 1, 2, 3, 4] - convert to list if needed

Generators can represent infinite sequences:

infinite_generator.py
def fibonacci():
"""Generate Fibonacci numbers infinitely"""
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Create generator
fib = fibonacci()
# Get first 10 Fibonacci numbers
for i, num in enumerate(fib):
if i >= 10:
break
print(num, end=" ")
# Output: 0 1 1 2 3 5 8 13 21 34

Generators are perfect for processing large files:

file_reader.py
def read_large_file(filename: str):
"""Generator that reads file line by line"""
with open(filename, 'r') as file:
for line in file:
yield line.strip() # Yield one line at a time
# Process large file without loading it all into memory
for line in read_large_file('large_file.txt'):
process_line(line) # Process one line at a time
filtering_generator.py
def filter_even(numbers):
"""Generator that filters even numbers"""
for num in numbers:
if num % 2 == 0:
yield num
def square_numbers(numbers):
"""Generator that squares numbers"""
for num in numbers:
yield num * num
# Chain generators
numbers = range(10)
even_squares = square_numbers(filter_even(numbers))
for num in even_squares:
print(num) # 0, 4, 16, 36, 64

Generators maintain state between calls:

stateful_generator.py
def counter(start: int = 0, step: int = 1):
"""Generator that maintains a counter"""
current = start
while True:
yield current
current += step
# Create counter starting at 10, incrementing by 5
count = counter(10, 5)
print(next(count)) # 10
print(next(count)) # 15
print(next(count)) # 20

Generator expressions are a concise way to create generators (like list comprehensions):

generator_expressions.py
# List comprehension - creates list in memory
squares_list = [x * x for x in range(10)]
print(type(squares_list)) # <class 'list'>
# Generator expression - creates generator
squares_gen = (x * x for x in range(10))
print(type(squares_gen)) # <class 'generator'>
# Use generator expression
for square in squares_gen:
print(square)
# Generator expressions are memory efficient
large_gen = (x * x for x in range(1_000_000))
print(next(large_gen)) # 0 - only first value computed

Generators can receive values using .send():

generator_send.py
def accumulator():
"""Generator that accumulates values"""
total = 0
while True:
value = yield total # Receive value via send()
if value is None:
break
total += value
# Create generator
acc = accumulator()
next(acc) # Start generator (prime it)
# Send values to generator
print(acc.send(10)) # 10
print(acc.send(20)) # 30
print(acc.send(5)) # 35
acc.close() # Close generator

You can throw exceptions into generators:

generator_throw.py
def resilient_generator():
"""Generator that handles exceptions"""
try:
for i in range(10):
yield i
except ValueError as e:
print(f"Caught exception: {e}")
yield "Error handled"
gen = resilient_generator()
print(next(gen)) # 0
print(next(gen)) # 1
gen.throw(ValueError("Something went wrong")) # Throws exception

3. yield from (Delegating to Sub-generators)

Section titled “3. yield from (Delegating to Sub-generators)”

yield from delegates to another generator:

yield_from.py
def numbers():
"""Generator that yields numbers"""
yield 1
yield 2
yield 3
def more_numbers():
"""Generator that yields more numbers"""
yield 4
yield 5
def all_numbers():
"""Generator that combines other generators"""
yield from numbers() # Delegates to numbers()
yield from more_numbers() # Delegates to more_numbers()
yield 6
# Use combined generator
for num in all_numbers():
print(num) # 1, 2, 3, 4, 5, 6
large_dataset.py
def process_csv_rows(filename: str):
"""Generator that processes CSV rows one at a time"""
import csv
with open(filename, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
# Process one row at a time
processed = {
'name': row['name'].upper(),
'age': int(row['age']),
'city': row['city'].title()
}
yield processed
# Process large CSV without loading it all into memory
for row in process_csv_rows('large_file.csv'):
save_to_database(row) # Process one row at a time
pagination.py
def paginate(items, page_size: int = 10):
"""Generator that yields pages of items"""
for i in range(0, len(items), page_size):
yield items[i:i + page_size]
# Process items in pages
items = list(range(100))
for page in paginate(items, page_size=20):
print(f"Processing page: {page}")
# Process page...
sliding_window.py
def sliding_window(sequence, window_size: int):
"""Generator that yields sliding windows"""
for i in range(len(sequence) - window_size + 1):
yield sequence[i:i + window_size]
# Get sliding windows of size 3
data = [1, 2, 3, 4, 5, 6]
for window in sliding_window(data, window_size=3):
print(window)
# Output:
# [1, 2, 3]
# [2, 3, 4]
# [3, 4, 5]
# [4, 5, 6]
chunking.py
def chunked(iterable, chunk_size: int):
"""Generator that yields chunks of data"""
iterator = iter(iterable)
while True:
chunk = []
try:
for _ in range(chunk_size):
chunk.append(next(iterator))
yield chunk
except StopIteration:
if chunk: # Yield remaining items
yield chunk
break
# Process data in chunks
data = range(25)
for chunk in chunked(data, chunk_size=7):
print(f"Processing chunk: {chunk}")
# Output:
# Processing chunk: [0, 1, 2, 3, 4, 5, 6]
# Processing chunk: [7, 8, 9, 10, 11, 12, 13]
# Processing chunk: [14, 15, 16, 17, 18, 19, 20]
# Processing chunk: [21, 22, 23, 24]

Use generators when:

  • ✅ You need memory efficiency
  • ✅ You’re processing large datasets
  • ✅ You only iterate once
  • ✅ You want lazy evaluation
  • ✅ You need infinite sequences

Use iterators (classes with __iter__ and __next__) when:

  • ✅ You need complex state management
  • ✅ You need to reset the iteration
  • ✅ You need multiple iteration methods
  • ✅ You need to implement other protocols

Mistake 1: Consuming Generator Multiple Times

Section titled “Mistake 1: Consuming Generator Multiple Times”
consuming_twice.py
# ❌ Bad - generator can only be consumed once
gen = (x * x for x in range(5))
print(list(gen)) # [0, 1, 4, 9, 16]
print(list(gen)) # [] - generator is exhausted!
# ✅ Good - create new generator each time
def get_squares():
return (x * x for x in range(5))
print(list(get_squares())) # [0, 1, 4, 9, 16]
print(list(get_squares())) # [0, 1, 4, 9, 16] - works!
stop_iteration.py
# ❌ Bad - StopIteration can be raised
gen = (x for x in range(3))
while True:
value = next(gen) # Raises StopIteration when exhausted
print(value)
# ✅ Good - use for loop or handle StopIteration
gen = (x for x in range(3))
for value in gen:
print(value)
# Or handle StopIteration explicitly
gen = (x for x in range(3))
while True:
try:
value = next(gen)
print(value)
except StopIteration:
break

Mistake 3: Modifying Iterable During Iteration

Section titled “Mistake 3: Modifying Iterable During Iteration”
modifying_during_iteration.py
# ❌ Bad - modifying list while iterating
def filter_evens(numbers):
for num in numbers:
if num % 2 != 0:
numbers.remove(num) # Modifying while iterating!
return numbers
# ✅ Good - use generator to create new sequence
def filter_evens(numbers):
return [num for num in numbers if num % 2 == 0]
# Or use generator
def filter_evens_gen(numbers):
for num in numbers:
if num % 2 == 0:
yield num
memory_comparison.py
import sys
# List comprehension - stores all values
squares_list = [x * x for x in range(1_000_000)]
print(f"List size: {sys.getsizeof(squares_list)} bytes") # ~8MB
# Generator expression - stores nothing
squares_gen = (x * x for x in range(1_000_000))
print(f"Generator size: {sys.getsizeof(squares_gen)} bytes") # ~200 bytes

Generators can be slower than lists when:

  • You need random access (indexing)
  • You iterate multiple times
  • You need to check membership frequently

For these cases, consider converting to a list:

when_to_convert.py
# Generator for one-time iteration
gen = (x * x for x in range(1000))
for square in gen:
process(square) # Fast and memory efficient
# List for multiple iterations
squares = [x * x for x in range(1000)]
for square in squares: # First iteration
process(square)
for square in squares: # Second iteration - need list
process(square)
  1. Use generators for large datasets - Memory efficiency matters
  2. Use generator expressions for one-time iteration - Clean and efficient
  3. Convert to list if needed multiple times - Don’t recreate generators
  4. Use yield from for delegation - Cleaner than manual iteration
  5. Document generator behavior - Especially if it has side effects
  6. Handle StopIteration properly - Use for loops when possible
  • Generators are memory-efficient iterators using yield
  • Generator expressions are concise syntax: (x for x in iterable)
  • Lazy evaluation - Values computed only when needed
  • Memory efficient - Perfect for large datasets
  • Can be infinite - Represent infinite sequences
  • One-time use - Generators can only be consumed once
  • Use yield from - For delegating to sub-generators
  • Convert to list - If you need multiple iterations

Remember: Generators let you work with large datasets efficiently - generate values on-demand instead of storing everything in memory!