Phase 4 — Building & Deploying

Building & Deploying

APIs, Flask, databases, Docker, CI/CD, and professional project structure. From consuming software to building and shipping it.

Chapters 13–18Phase Gate + TaskForge

Before You Begin Phase 4

This phase assumes you can: navigate the terminal and use basic shell commands (Ch 10), create Git repositories, make commits, and push to GitHub (Ch 11-12), and write Python functions with error handling and tests (Ch 05, 09).

Chapter 13 APIs and HTTP

Why This Matters Now

Almost every modern application talks to other applications over the internet. Weather apps fetch forecasts. Payment systems process charges. AI tools send prompts and receive completions. The language they all speak is HTTP, and the conversation pattern they follow is called an API. Understanding APIs is how you go from writing code that runs on your machine to writing code that connects to the world.

What Is an API?

An API (Application Programming Interface) is a contract between two programs. One program says "send me a request in this format, and I'll send you a response in that format." The API defines the rules—what you can ask for, how to ask, and what you'll get back.

You've already used APIs without knowing it. When you call len([1, 2, 3]) in Python, you're using a function API—you pass a list, it returns the length. Web APIs work the same way, except the function lives on a remote server and the call travels over the internet.

Think of a restaurant analogy. You (the client) read the menu (the API documentation), place an order with the waiter (send a request), the kitchen (the server) prepares your food, and the waiter brings it back (the response). You never walk into the kitchen. The menu is the contract.

HTTP: The Language of the Web

HTTP (HyperText Transfer Protocol) is the protocol that clients and servers use to communicate. Every time you visit a website, your browser sends an HTTP request and receives an HTTP response. APIs use the same protocol.

An HTTP exchange has two parts:

Request—sent by the client. Contains a method, a URL, headers, and optionally a body.
Response—sent by the server. Contains a status code, headers, and a body.

The HTTP request-response cycle: the client sends a request with a method and URL, the server processes it, and returns a response with a status code and data.

HTTP Methods

The method tells the server what you want to do. There are four you need to know:

HTTP methods and their purposes
Method	Purpose	Example
`GET`	Read/retrieve data	Get a list of tasks
`POST`	Create new data	Add a new task
`PUT`	Update existing data	Mark a task as done
`DELETE`	Remove data	Delete a task

GET and DELETE typically don't send a body. POST and PUT send data in the request body (usually JSON).

Status Codes

Every HTTP response includes a status code—a three-digit number that tells the client what happened. You don't need to memorize all of them, just these five groups:

HTTP status code ranges and their meanings
Range	Meaning	Common Codes
`2xx`	Success	`200 OK`, `201 Created`
`3xx`	Redirect	`301 Moved Permanently`
`4xx`	Client error (you made a mistake)	`400 Bad Request`, `404 Not Found`
`5xx`	Server error (they have a problem)	`500 Internal Server Error`

When something goes wrong, the status code is the first thing to check. A 404 means the URL is wrong. A 401 means you're not authenticated. A 500 means the server is broken—not your fault.

JSON: The Data Format

APIs need a shared format for sending data. The standard is JSON (JavaScript Object Notation). You already know JSON—it looks exactly like Python dictionaries and lists:

{
    "id": 1,
    "title": "Buy groceries",
    "done": false
}

Python's json module converts between JSON strings and Python objects. But when using the requests library (below), this conversion happens automatically.

Headers

Headers are metadata attached to requests and responses. They carry information about the message itself, not the data. Two headers you'll see constantly:

Content-Type: application/json — tells the server "I'm sending you JSON."
Authorization: Bearer <token> — proves your identity (like a password).

You rarely set headers manually—tools like the requests library handle most of them for you.

Using Python's requests Library

Python's requests library makes HTTP calls simple. It's not part of the standard library, so you need to install it:

Current Tool (March 2026)

requests (Python, free). The most popular HTTP library for Python. Install inside your virtual environment: pip install requests.

Terminal# Make sure your virtual environment is activated first pip install requests

Making a GET Request

import requests

response = requests.get("https://api.github.com/users/octocat")

print(response.status_code)  # 200
print(response.json())       # Python dict with user data

requests.get() sends a GET request to the URL. The response object gives you the status code with .status_code and the parsed JSON body with .json().

Making a POST Request

import requests

data = {"title": "Buy groceries", "done": False}
response = requests.post("http://localhost:5000/tasks", json=data)

print(response.status_code)  # 201
print(response.json())       # {"id": 1, "title": "Buy groceries", "done": false}

The json=data parameter automatically converts the Python dict to JSON and sets the Content-Type header for you.

All Four Methods

import requests

BASE = "http://localhost:5000"

# GET    - read
r = requests.get(f"{BASE}/tasks")

# POST   - create
r = requests.post(f"{BASE}/tasks", json={"title": "New task"})

# PUT    - update
r = requests.put(f"{BASE}/tasks/1/done")

# DELETE - remove
r = requests.delete(f"{BASE}/tasks/1")

Consuming a Public API

Let's try a real public API. The GitHub API is free and requires no authentication for basic requests:

import requests

# Get public info about a GitHub user
response = requests.get("https://api.github.com/users/octocat")

if response.status_code == 200:
    user = response.json()
    print(f"Name: {user['name']}")
    print(f"Public repos: {user['public_repos']}")
    print(f"Followers: {user['followers']}")
else:
    print(f"Error: {response.status_code}")

Always check the status code before using the response data. A 200 means success. Anything else means something went wrong, and response.json() might not contain what you expect.

Common Misconceptions

"APIs Are Only for Web Developers"

APIs are how programs talk to each other. Data scientists use APIs to fetch datasets. DevOps engineers use APIs to manage cloud infrastructure. AI agents use APIs to call language models. If you write code that interacts with any external service, you're using an API.

"GET and POST Are Interchangeable"

They're not. GET retrieves data and should never modify anything on the server. POST creates new data. Using the wrong method confuses other developers and breaks tooling that relies on these conventions. The method communicates intent.

TaskForge Connection

Right now TaskForge is a CLI tool—you interact with it from the terminal. But what if a web frontend, a mobile app, or an AI agent wants to manage tasks? They would need an API. In Chapter 14, you'll build exactly that: a Flask web API for TaskForge. But first, you'll practice consuming an API with Python's requests library so you understand both sides of the conversation.

Micro-Exercises

1: Fetch a GitHub User

Use requests to fetch information about any GitHub user and print their name and number of public repos.

import requests

response = requests.get("https://api.github.com/users/octocat")
data = response.json()
print(f"Name: {data['name']}")
print(f"Repos: {data['public_repos']}")

Try replacing "octocat" with your own GitHub username.

2: Check the Status Code

Request a URL that doesn't exist and confirm you get a 404:

import requests

response = requests.get("https://api.github.com/users/this-user-does-not-exist-999999")
print(response.status_code)  # 404

Now try https://api.github.com/ (the root) and confirm you get a 200.

Try This Now

Write a Python script that fetches the 5 most recent public repositories for a GitHub user and prints their names and star counts.

github_repos.pyimport requests username = "octocat" url = f"https://api.github.com/users/{username}/repos" response = requests.get(url, params={"sort": "created", "per_page": 5}) if response.status_code == 200: repos = response.json() for repo in repos: stars = repo["stargazers_count"] print(f"{repo['name']} - {stars} stars") else: print(f"Error: {response.status_code}")

Verification: Running the script prints 5 repository names with their star counts. Change the username variable and run again—you should see different results.

If this doesn't work: (1) ModuleNotFoundError: No module named 'requests' → make sure your virtual environment is activated and run pip install requests. (2) ConnectionError → check your internet connection. (3) Status code 403 → you've hit the GitHub API rate limit (60 requests/hour for unauthenticated users). Wait a few minutes and try again.

Every web service you've ever used—from checking the weather to asking an AI a question—works through HTTP requests and responses. You now understand the protocol that connects the internet.

Interactive Exercises

Knowledge Check

Which HTTP method is used to create a new resource?

Knowledge Check

What does a 404 status code mean?

Design Challenge: "Parse API Response"

Write extract_repos(json_str) that parses a JSON string of GitHub repos and returns a list of (name, stars) tuples, sorted by stars descending.

Use json.loads() to parse the string into a Python list.

Use a list comprehension to extract (name, stargazers_count) tuples.

Sort with sorted(items, key=lambda x: x[1], reverse=True).

Optional: Web Scraping with Python

This Section Is Optional

Web scraping extends the HTTP skills from this chapter but is not required for later chapters. Skip it if you want to move forward; come back when you need to extract data from web pages.

Web scraping means making HTTP requests to web pages and extracting data from the HTML. The difference from APIs: APIs return structured JSON, web pages return HTML that you parse.

When to Scrape vs. When to Use an API

If the site has an API, use the API — it's faster, more reliable, and more polite. Scrape only when there's no API and the data is publicly accessible.

Scraping Ethics

Respect robots.txt, rate-limit your requests with time.sleep(), and check terms of service. Scraping is a tool, not a right.

Basics with BeautifulSoup

Dependency Note

Note: BeautifulSoup (pip install beautifulsoup4) is an optional tool not required for the exercises. The exercise below uses Python’s built-in html.parser instead.

import requests
from bs4 import BeautifulSoup

# Fetch the page
response = requests.get("https://example.com/data")
soup = BeautifulSoup(response.text, "html.parser")

# Find elements
title = soup.find("h1").text
links = [a["href"] for a in soup.find_all("a")]
rows = soup.select("table tr")  # CSS selector

Common patterns:

soup.find("tag") — first matching element
soup.find_all("tag") — all matching elements
soup.select(".class") — CSS selector
tag.text or tag.get_text(strip=True) — extract text
tag["href"] — extract attribute

Exercise: Parse HTML Data

Given an HTML string, extract all link texts and URLs into a list of dicts. Uses Python's built-in html.parser since BeautifulSoup isn't available in the browser.

Inside handle_starttag, check if tag == "a". If so, set self.in_a = True and reset self.current_text = "".

Loop through attrs with for name, value in attrs: and check if name == "href": to grab the URL.

Full solution: if tag == "a": self.in_a = True; self.current_text = ""; [set self.current_href = value for (name, value) in attrs if name == "href"]

Chapter 14 Building Web APIs with Flask

Why This Matters Now

Chapter 13 taught you to consume APIs. Now you build one. This is the bridge from user to creator—and it's what the Phase 4 Gate requires.

What Is Flask?

Flask is a minimal Python web framework. It lets you turn Python functions into API endpoints with just a few lines of code. Unlike larger frameworks (Django), Flask gives you only what you need and gets out of the way.

Current Tool (March 2026)

Flask (Python, free). Minimal web framework for building APIs and web apps. Install: pip install flask.

Your First Flask App

A complete Flask app in 5 lines:

from flask import Flask

app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello, World!"

if __name__ == "__main__":
    app.run(debug=True)

Save as app.py and run with python3 app.py. Open http://localhost:5000 in your browser. You just built a web server.

@app.route("/") is a decorator—it tells Flask which URL triggers which function. debug=True auto-reloads on code changes and shows helpful error pages.

Routes and Methods

Each endpoint is a URL + HTTP method combination. Flask defaults to GET only. To accept other methods, specify them explicitly:

@app.route("/tasks", methods=["GET", "POST"])
def handle_tasks():
    if request.method == "GET":
        # return all tasks
        pass
    elif request.method == "POST":
        # create a new task
        pass

You can also use separate functions for each method, which is cleaner:

@app.route("/tasks", methods=["GET"])
def get_tasks():
    # return all tasks
    pass

@app.route("/tasks", methods=["POST"])
def create_task():
    # create a new task
    pass

Returning JSON

APIs return JSON, not HTML. Flask provides jsonify() to convert Python dicts to proper JSON responses with the correct Content-Type header:

from flask import Flask, jsonify

@app.route("/status")
def status():
    return jsonify({"status": "running", "version": "1.0"}), 200

The second value (200) is the HTTP status code. Common codes:

Common HTTP status codes for Flask API responses
Code	Meaning	When to Use
`200`	OK	Successful GET or PUT
`201`	Created	Successful POST that creates a resource
`400`	Bad Request	Client sent invalid data
`404`	Not Found	Resource doesn't exist
`500`	Server Error	Something broke on your end

Reading Request Data

When a client sends data to your API, you need to read it. Flask provides two main ways:

from flask import Flask, request, jsonify

# JSON body (for POST/PUT requests)
@app.route("/tasks", methods=["POST"])
def create_task():
    data = request.json          # parse JSON body
    title = data.get("title")   # safely get a field
    return jsonify({"title": title}), 201

# Query parameters (for GET requests)
# GET /tasks?status=pending
@app.route("/tasks", methods=["GET"])
def get_tasks():
    status = request.args.get("status", "all")  # default to "all"
    return jsonify({"filter": status})

request.json parses the JSON body sent by the client. request.args reads URL query parameters (everything after the ?).

Building a TaskForge API

Let's build a complete REST API for TaskForge, step by step. We'll use an in-memory list to store tasks—no database needed yet.

Step 1: Setup and Data Store

from flask import Flask, jsonify, request

app = Flask(__name__)

# In-memory task storage
tasks = []
next_id = 1

Step 2: GET /tasks — List All Tasks

@app.route("/tasks", methods=["GET"])
def get_tasks():
    return jsonify(tasks), 200

This returns the entire task list as a JSON array. Simple and direct.

Step 3: POST /tasks — Add a Task

@app.route("/tasks", methods=["POST"])
def create_task():
    global next_id
    data = request.json

    if not data or "title" not in data:
        return jsonify({"error": "title is required"}), 400

    task = {
        "id": next_id,
        "title": data["title"],
        "done": False
    }
    tasks.append(task)
    next_id += 1
    return jsonify(task), 201

Notice the validation: if the client doesn't send a title, we return a 400 Bad Request with a clear error message. Never trust client input.

Step 4: PUT /tasks/<id>/done — Mark Task Complete

@app.route("/tasks/<int:task_id>/done", methods=["PUT"])
def complete_task(task_id):
    for task in tasks:
        if task["id"] == task_id:
            task["done"] = True
            return jsonify(task), 200

    return jsonify({"error": "task not found"}), 404

<int:task_id> is a URL variable—Flask extracts the number from the URL and passes it as a function parameter. If no task matches, we return 404.

Step 5: Run It

if __name__ == "__main__":
    app.run(debug=True)

A request arrives, Flask matches the URL to a route, the handler function runs and returns data, and the client receives a JSON response.

Testing Your API

With your Flask app running in one terminal, open another terminal and test with curl:

Terminal# List all tasks (empty at first) curl http://localhost:5000/tasks # Add a task curl -X POST -H "Content-Type: application/json" \ -d '{"title": "Buy groceries"}' \ http://localhost:5000/tasks # Add another task curl -X POST -H "Content-Type: application/json" \ -d '{"title": "Write tests"}' \ http://localhost:5000/tasks # List again (now shows 2 tasks) curl http://localhost:5000/tasks # Mark task 1 as done curl -X PUT http://localhost:5000/tasks/1/done # Verify it changed curl http://localhost:5000/tasks

You can also test with Python requests (from Chapter 13):

import requests

# Add a task
r = requests.post("http://localhost:5000/tasks",
                   json={"title": "Test from Python"})
print(r.status_code)  # 201
print(r.json())       # {"id": 1, "title": "Test from Python", "done": false}

# List all tasks
r = requests.get("http://localhost:5000/tasks")
print(r.json())       # [{"id": 1, ...}]

Error Handling in APIs

A good API returns clear error responses so the client knows what went wrong and how to fix it:

# Bad request: missing required field
@app.route("/tasks", methods=["POST"])
def create_task():
    data = request.json
    if not data or "title" not in data:
        return jsonify({"error": "title is required"}), 400

    if not isinstance(data["title"], str) or len(data["title"].strip()) == 0:
        return jsonify({"error": "title must be a non-empty string"}), 400

    # ... create the task

# Not found
@app.route("/tasks/<int:task_id>/done", methods=["PUT"])
def complete_task(task_id):
    for task in tasks:
        if task["id"] == task_id:
            task["done"] = True
            return jsonify(task), 200
    return jsonify({"error": f"task {task_id} not found"}), 404

# Catch unexpected errors
@app.errorhandler(500)
def internal_error(e):
    return jsonify({"error": "internal server error"}), 500

The pattern: always return JSON with an "error" key and an appropriate status code. Never return a bare string or an HTML error page from an API.

Common Misconceptions

"Flask Is Only for Small Projects"

Flask scales. Instagram, Pinterest, and Netflix have used Flask in production. The difference between a toy project and a production app is architecture, not the framework. Flask gives you the flexibility to add complexity only when you need it.

"You Need a Database for an API"

Not to start. Our TaskForge API uses a Python list—perfectly fine for learning and prototyping. Data disappears when you restart the server, but that's a problem you'll solve in Chapter 15 with SQLite. Start simple, add complexity when you need it.

TaskForge Connection

You just turned TaskForge from a CLI tool into a web API. Any program—a web frontend, a mobile app, another Python script, or an AI agent—can now create and manage tasks by making HTTP requests. In the Phase 4 Gate, this API is part of the required artifact.

Micro-Exercises

1: Hello API

Create a Flask app with a single GET endpoint that returns {"message": "Hello, World!"}.

from flask import Flask, jsonify

app = Flask(__name__)

@app.route("/")
def hello():
    return jsonify({"message": "Hello, World!"})

if __name__ == "__main__":
    app.run(debug=True)

Run it and visit http://localhost:5000 in your browser. You should see the JSON response.

2: Greeting Endpoint

Add a POST endpoint that accepts a name in JSON and returns a greeting.

from flask import Flask, jsonify, request

app = Flask(__name__)

@app.route("/greet", methods=["POST"])
def greet():
    data = request.json
    name = data.get("name", "stranger")
    return jsonify({"greeting": f"Hello, {name}!"}), 200

if __name__ == "__main__":
    app.run(debug=True)

Test: curl -X POST -H "Content-Type: application/json" -d '{"name": "Alice"}' http://localhost:5000/greet

Try This Now

Build a complete REST API for TaskForge with GET /tasks, POST /tasks, and PUT /tasks/<id>/done endpoints. Test each with curl.

taskforge_api.pyfrom flask import Flask, jsonify, request app = Flask(__name__) tasks = [] next_id = 1 @app.route("/tasks", methods=["GET"]) def get_tasks(): return jsonify(tasks), 200 @app.route("/tasks", methods=["POST"]) def create_task(): global next_id data = request.json if not data or "title" not in data: return jsonify({"error": "title is required"}), 400 task = {"id": next_id, "title": data["title"], "done": False} tasks.append(task) next_id += 1 return jsonify(task), 201 @app.route("/tasks/<int:task_id>/done", methods=["PUT"]) def complete_task(task_id): for task in tasks: if task["id"] == task_id: task["done"] = True return jsonify(task), 200 return jsonify({"error": "task not found"}), 404 if __name__ == "__main__": app.run(debug=True)

Verification: curl http://localhost:5000/tasks returns a JSON array. curl -X POST -H "Content-Type: application/json" -d '{"title":"Test"}' http://localhost:5000/tasks creates a task.

If this doesn't work: (1) Port already in use → kill the existing process or use app.run(port=5001). (2) Import errors → make sure Flask is installed in your venv (pip install flask). (3) curl returns HTML instead of JSON → make sure you're using jsonify(), not returning a plain string.

You just built a web API from scratch. Any program on the internet can now interact with TaskForge—the same pattern behind every web service, from GitHub to Anthropic's Claude API.

Interactive Exercises

Knowledge Check

What decorator makes a function handle GET requests to /tasks?

Knowledge Check

What status code should a successful POST request return when creating a resource?

Test Your API Endpoints

Below is a simplified TaskForge API with get_tasks() and add_task(). Write 4 test functions that verify: (1) GET returns an empty list initially, (2) POST adds a task and returns 201, (3) POST without a title returns 400, (4) GET after adding returns the task. The test helpers get(path) and post(path, data) are provided.

test_get_empty: call get("/tasks") and assert the status is 200 and the json is [].

test_add_task: call post("/tasks", {"title": "Test"}) and assert status is 201 and the returned json has the title.

test_add_without_title: call post("/tasks", {}) and assert status is 400.

Design Challenge: "Request Validator"

Write validate_task(data) that validates a task dict: 'title' is required (non-empty string), 'priority' is optional (must be 'high', 'medium', or 'low'), 'due_date' is optional (must match YYYY-MM-DD). Return a list of error strings, or empty list if valid.

Check if 'title' key exists AND is a non-empty string.

For priority, use if 'priority' in data and data['priority'] not in ('high', 'medium', 'low').

For due_date, try datetime.strptime(data['due_date'], '%Y-%m-%d') in a try/except. Import datetime at top.

Connecting a Frontend to Your API

You built an API. Now let's connect it to a web page. You need just enough HTML, CSS, and JavaScript to consume your own API — this is minimum viable frontend, not a full web development course.

HTML in 5 Minutes

HTML is a tree of nested tags that describe a document's structure:

<!DOCTYPE html>
<html>
<head>
    <title>TaskForge</title>
</head>
<body>
    <h1>My Tasks</h1>
    <ul id="task-list"></ul>
    <input id="new-task" type="text" placeholder="New task...">
    <button id="add-btn">Add Task</button>
    <script src="app.js"></script>
</body>
</html>

Reading HTML, Not Mastering It

AI generates HTML constantly. You need to read it, not master it. Focus on understanding the tree structure: which tags are parents, which are children, and what id and class attributes do.

CSS in 2 Minutes

CSS controls how your HTML looks — colors, layout, spacing, fonts. AI generates CSS well, so you mainly need to read it rather than write it from scratch. The key concept: a <link rel='stylesheet' href='style.css'> tag in your HTML imports a CSS file that styles elements by matching selectors to tags, classes, and IDs.

JavaScript fetch() in 5 Minutes

fetch() calls your API from the browser. Here's a complete example that talks to a TaskForge Flask API:

// GET: Fetch all tasks and display them
fetch("http://localhost:5000/api/tasks")
  .then(response => response.json())
  .then(tasks => {
    const list = document.getElementById("task-list");
    list.innerHTML = "";
    tasks.forEach(task => {
      const li = document.createElement("li");
      li.textContent = task.title;
      list.appendChild(li);
    });
  });

// POST: Add a new task
function addTask(title) {
  fetch("http://localhost:5000/api/tasks", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ title: title, status: "todo" })
  })
  .then(response => response.json())
  .then(task => console.log("Created:", task));
}

Going Deeper: Frontend Development

This section teaches the minimum to consume an API. If you want to build real frontends, start with MDN Web Docs — the authoritative reference for HTML, CSS, and JavaScript.

Chapter 15 Database Basics

Why This Matters Now

TaskForge currently loses all data when you close it. Databases solve this permanently. Every real application uses one.

What Is a Database?

A database is structured storage that persists beyond program execution. When you close your Python script, variables disappear. When you close a program backed by a database, the data stays. Every real application—from TaskForge to GitHub to the Anthropic API—stores its data in a database.

SQL vs NoSQL

There are two major families of databases. For this course, we focus on SQL.

Comparison of SQL and NoSQL databases
Feature	SQL	NoSQL
Structure	Tables with rows and columns	Flexible documents or key-value pairs
Schema	Strict—define columns before inserting data	Flexible—each document can differ
Query Language	SQL (Structured Query Language)	Varies by database
Examples	PostgreSQL, SQLite, MySQL	MongoDB, Redis, DynamoDB
Best For	Structured data with relationships	Rapidly changing schemas, caching

SQLite — Your First Database

We start with SQLite because it removes every barrier to getting started:

No server—it's a library, not a service. No installation, no configuration, no passwords.
Built into Python—import sqlite3 works out of the box. No pip install required.
File-based—your entire database is a single file (tasks.db). Copy it, back it up, delete it—just like any file.

SQLite is not a toy. It's used in production by every iPhone, every Android phone, every web browser, and every copy of Python. For learning and for single-user applications like TaskForge, it's the right choice.

Basic SQL

SQL is a language for talking to databases. The core is just 5 commands:

CREATE TABLE — Define Your Structure

CREATE TABLE tasks (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    title TEXT NOT NULL,
    done BOOLEAN DEFAULT 0
);

This creates a table with three columns. PRIMARY KEY AUTOINCREMENT means each task gets a unique, auto-assigned ID. NOT NULL means the title is required. DEFAULT 0 means new tasks start as not done.

INSERT — Add Data

INSERT INTO tasks (title, done) VALUES ('Buy groceries', 0);
INSERT INTO tasks (title, done) VALUES ('Write tests', 0);
INSERT INTO tasks (title, done) VALUES ('Deploy API', 1);

SELECT — Read Data

-- Get all tasks
SELECT * FROM tasks;

-- Get only incomplete tasks
SELECT * FROM tasks WHERE done = 0;

-- Get just titles
SELECT title FROM tasks WHERE done = 0;

UPDATE — Modify Data

UPDATE tasks SET done = 1 WHERE id = 1;

Always include a WHERE clause. Without it, you update every row.

DELETE — Remove Data

DELETE FROM tasks WHERE id = 1;

Same rule: always use WHERE. DELETE FROM tasks with no WHERE deletes everything.

Python's sqlite3 Module

Python includes sqlite3 in the standard library. Here's the workflow:

import sqlite3

# 1. Connect to database (creates file if it doesn't exist)
conn = sqlite3.connect("tasks.db")

# 2. Create a cursor (the object that executes SQL)
cursor = conn.cursor()

# 3. Execute SQL
cursor.execute("""
    CREATE TABLE IF NOT EXISTS tasks (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        title TEXT NOT NULL,
        done BOOLEAN DEFAULT 0
    )
""")

# 4. Insert data
cursor.execute("INSERT INTO tasks (title, done) VALUES (?, ?)", ("Buy groceries", 0))

# 5. Commit changes (saves to disk)
conn.commit()

# 6. Query data
cursor.execute("SELECT * FROM tasks")
rows = cursor.fetchall()
for row in rows:
    print(row)  # (1, 'Buy groceries', 0)

# 7. Close the connection
conn.close()

Key functions: connect() opens the database, cursor() creates an executor, execute() runs SQL, fetchall() retrieves results, commit() saves changes, close() cleans up.

Parameterized Queries

SQL Injection — Never Use String Formatting

ALWAYS use ? placeholders. NEVER build SQL strings with f-strings, .format(), or + concatenation. String formatting lets attackers inject malicious SQL into your database. This is not theoretical—SQL injection is one of the most common security vulnerabilities in real applications.

# DANGEROUS — never do this
title = input("Task title: ")
cursor.execute(f"INSERT INTO tasks (title) VALUES ('{title}')")
# A user could type: '); DROP TABLE tasks; --
# And your entire table is deleted.

# SAFE — always do this
title = input("Task title: ")
cursor.execute("INSERT INTO tasks (title) VALUES (?)", (title,))
# The ? placeholder handles escaping automatically.

The ? placeholder tells sqlite3 to safely escape the value. This one rule prevents an entire class of security vulnerabilities.

TaskForge Connection

Replace TaskForge's JSON file storage with SQLite. Here's the before and after:

Before: JSON File Storage

import json

def load_tasks(filepath):
    try:
        with open(filepath, "r") as f:
            return json.load(f)
    except FileNotFoundError:
        return []

def save_tasks(tasks, filepath):
    with open(filepath, "w") as f:
        json.dump(tasks, f, indent=2)

def add_task(tasks, title, filepath):
    task = {"id": len(tasks) + 1, "title": title, "done": False}
    tasks.append(task)
    save_tasks(tasks, filepath)
    return task

After: SQLite Storage

import sqlite3

def get_connection():
    conn = sqlite3.connect("tasks.db")
    conn.row_factory = sqlite3.Row  # access columns by name
    return conn

def init_db():
    conn = get_connection()
    conn.execute("""
        CREATE TABLE IF NOT EXISTS tasks (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            title TEXT NOT NULL,
            done BOOLEAN DEFAULT 0
        )
    """)
    conn.commit()
    conn.close()

def add_task(title):
    conn = get_connection()
    conn.execute("INSERT INTO tasks (title) VALUES (?)", (title,))
    conn.commit()
    conn.close()

def list_tasks():
    conn = get_connection()
    rows = conn.execute("SELECT * FROM tasks").fetchall()
    conn.close()
    return [dict(row) for row in rows]

def complete_task(task_id):
    conn = get_connection()
    conn.execute("UPDATE tasks SET done = 1 WHERE id = ?", (task_id,))
    conn.commit()
    conn.close()

The biggest difference: no more loading the entire file into memory, no more rewriting the whole file on every change. The database handles reads and writes efficiently, even with thousands of tasks.

Your Python code sends SQL through the sqlite3 module, which reads and writes a single database file. No server needed.

ORMs — A Preview

An ORM (Object-Relational Mapper) bridges the gap between Python objects and database rows. Instead of writing raw SQL, you define Python classes that map to tables:

# Raw SQL (what you learned above)
cursor.execute("SELECT * FROM tasks WHERE done = 0")
rows = cursor.fetchall()

# ORM style (conceptual — SQLAlchemy)
pending_tasks = Task.query.filter_by(done=False).all()

The most popular Python ORM is SQLAlchemy. We won't teach it in depth here—raw SQL gives you a better understanding of what ORMs do under the hood. But know that ORMs exist and that most production applications use them.

Common Misconceptions

"JSON Files Are Fine for Storage"

They work for simple cases, but they don't handle concurrent access (two processes writing at once corrupts the file), they require loading the entire dataset into memory, and they have no query capability—you can't ask "give me all incomplete tasks" without loading everything. Databases solve all three problems.

"SQL Is Hard"

The basics are 5 commands: CREATE, INSERT, SELECT, UPDATE, DELETE. You just learned them. The advanced features (joins, indexes, transactions) exist, but you can build real applications with just the basics.

Micro-Exercises

1: Create and Populate

Create a SQLite database with a tasks table and insert 3 tasks using the Python sqlite3 module.

import sqlite3

conn = sqlite3.connect("practice.db")
cursor = conn.cursor()

cursor.execute("""
    CREATE TABLE IF NOT EXISTS tasks (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        title TEXT NOT NULL,
        done BOOLEAN DEFAULT 0
    )
""")

cursor.execute("INSERT INTO tasks (title) VALUES (?)", ("Buy groceries",))
cursor.execute("INSERT INTO tasks (title) VALUES (?)", ("Write tests",))
cursor.execute("INSERT INTO tasks (title) VALUES (?)", ("Deploy API",))
conn.commit()

# Verify
rows = cursor.execute("SELECT * FROM tasks").fetchall()
for row in rows:
    print(row)

conn.close()

2: Query Incomplete Tasks

Write a query that returns only incomplete tasks.

import sqlite3

conn = sqlite3.connect("practice.db")
cursor = conn.cursor()

cursor.execute("SELECT * FROM tasks WHERE done = 0")
incomplete = cursor.fetchall()
for task in incomplete:
    print(task)

conn.close()

Try This Now

Refactor TaskForge to use SQLite instead of JSON files. Create functions: init_db(), add_task(title), list_tasks(), complete_task(id).

taskforge_db.pyimport sqlite3 DB_PATH = "taskforge.db" def get_connection(): conn = sqlite3.connect(DB_PATH) conn.row_factory = sqlite3.Row return conn def init_db(): conn = get_connection() conn.execute(""" CREATE TABLE IF NOT EXISTS tasks ( id INTEGER PRIMARY KEY AUTOINCREMENT, title TEXT NOT NULL, done BOOLEAN DEFAULT 0 ) """) conn.commit() conn.close() def add_task(title): conn = get_connection() conn.execute("INSERT INTO tasks (title) VALUES (?)", (title,)) conn.commit() conn.close() def list_tasks(): conn = get_connection() rows = conn.execute("SELECT * FROM tasks").fetchall() conn.close() return [dict(row) for row in rows] def complete_task(task_id): conn = get_connection() conn.execute("UPDATE tasks SET done = 1 WHERE id = ?", (task_id,)) conn.commit() conn.close() if __name__ == "__main__": init_db() add_task("Buy groceries") add_task("Write chapter 15") print("All tasks:", list_tasks()) complete_task(1) print("After completing task 1:", list_tasks())

Verification: Tasks persist between program runs. Close and reopen—your tasks are still there. Run the script, then comment out the add_task lines and run again—list_tasks() still returns the previously added tasks.

If this doesn't work: (1) If "database is locked", make sure you're calling conn.close() or using with statements. (2) If the table doesn't exist, call init_db() at startup. (3) sqlite3.OperationalError: no such table → you're connecting to a different file than the one where the table was created. Check DB_PATH.

You just replaced a fragile JSON file with a real database. TaskForge data now survives restarts, handles queries efficiently, and is protected against injection attacks. This is how every production application stores data.

Interactive Exercises

Knowledge Check

Which SQL statement retrieves data from a table?

Guided Exercise: "SQL in the Browser"

This exercise runs real SQLite in your browser. Create a 'tasks' table, insert 3 tasks, then query for incomplete ones.

Design Challenge: "TaskDB Class"

Build a TaskDB class backed by SQLite in-memory. Methods: add(title) returns the new task ID, complete(task_id) marks it done, search(keyword) finds tasks by title substring, stats() returns a dict with total, completed, and pending counts.

In __init__, create the table with id, title, and done columns.

For add, use cursor.lastrowid after INSERT to get the new ID.

For search, use WHERE title LIKE ? with f'%{keyword}%' as the parameter.

Practice: Interactive SQL

Complete Select Star SQL — a free interactive tutorial that teaches SQL using real data. It covers SELECT, WHERE, GROUP BY, JOIN, and subqueries. You can finish it in an afternoon.

Chapter 16 Docker and Containers

Why This Matters Now

Docker solves "works on my machine" forever. It's how professional teams ensure consistent environments, and it's how you'll run Claude Code agents in isolated containers in Phase 6.

What Is Docker?

Docker lets you package your application and everything it needs—Python version, libraries, system tools—into a single container that runs identically on any machine. Think of a shipping container: it doesn't matter whether the ship is going to Tokyo or Rotterdam, the box is the same and the contents arrive intact. Docker containers work the same way for software.

Containers are not virtual machines. A VM emulates an entire operating system with its own kernel—heavyweight and slow to start. A container shares the host OS kernel and only isolates the application layer. This makes containers lightweight (megabytes instead of gigabytes) and fast to start (seconds instead of minutes).

Key Concepts

Key Docker concepts and analogies
Concept	What It Is	Analogy
Image	A snapshot/blueprint of an environment	Like a class in Python—a template
Container	A running instance of an image	Like an object—a live instance created from the class
Dockerfile	Instructions to build an image	Like a recipe—step-by-step build instructions
docker-compose	A tool to run multiple containers together	Like an orchestra conductor—coordinates multiple players

Installing Docker

Install Docker Desktop, which includes the Docker engine, CLI, and a GUI dashboard:

macOS: Download from docker.com/products/docker-desktop. Drag to Applications. Launch Docker Desktop—the whale icon appears in your menu bar.
Windows: Download the installer from the same URL. Enable WSL 2 backend during setup. Restart when prompted.

Verify the installation:

Terminaldocker --version docker compose version

Your First Container

Run Docker's built-in test image to confirm everything works:

Terminaldocker run hello-world

Docker pulls the hello-world image from Docker Hub (a public registry of images), creates a container from it, runs it, and prints a success message. That's the entire workflow: pull, create, run.

Now try something more useful—a Python REPL inside a container:

Terminaldocker run -it python:3.12 python3

The -it flags give you an interactive terminal. You're now inside a Python 3.12 environment that's completely isolated from your host machine. Type exit() to leave.

Writing a Dockerfile

A Dockerfile is a text file that tells Docker how to build an image. Here's one for TaskForge:

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "taskforge.py"]

Each line is an instruction:

Dockerfile instructions explained line by line
Line	What It Does
`FROM python:3.12-slim`	Start from the official Python 3.12 image (slim variant, smaller size)
`WORKDIR /app`	Set the working directory inside the container to `/app`
`COPY requirements.txt .`	Copy the requirements file into the container first (for caching)
`RUN pip install -r requirements.txt`	Install Python dependencies inside the container
`COPY . .`	Copy all project files into the container
`CMD ["python", "taskforge.py"]`	Default command when the container starts

Why copy requirements.txt before everything else? Docker caches each layer. If your code changes but your dependencies don't, Docker reuses the cached dependency layer—making rebuilds much faster.

Building and Running

Build the image and run a container from it:

Terminal# Build an image tagged "taskforge" from the current directory docker build -t taskforge . # Run a container from the image docker run -it taskforge

The -t taskforge flag gives your image a name (tag). The . tells Docker to use the current directory as the build context (where to find the Dockerfile and files to copy).

Volumes: Persisting Data

Containers are ephemeral—when you stop a container, any data it created inside is lost. This is a problem for TaskForge's SQLite database. Volumes mount a host directory into the container so data survives restarts:

Terminaldocker run -it -v $(pwd)/data:/app/data taskforge

This maps ./data on your host to /app/data inside the container. The SQLite database file lives in this shared directory, so it persists even when the container stops.

Docker Compose Basics

When your project needs multiple services (a web API, a database, etc.), Docker Compose manages them all with a single file:

# docker-compose.yml
services:
  web:
    build: .
    ports:
      - "5000:5000"
    volumes:
      - ./data:/app/data
    command: ["python", "-m", "flask", "run", "--host=0.0.0.0"]

Start everything with one command:

Terminal# Start all services docker compose up # Stop all services docker compose down

The ports mapping "5000:5000" means: forward port 5000 on your host to port 5000 in the container. This is how you access the Flask API from your browser.

Essential Commands

Essential Docker commands
Command	What It Does
`docker build -t name .`	Build an image from a Dockerfile
`docker run -it name`	Run a container interactively
`docker ps`	List running containers
`docker stop id`	Stop a running container
`docker rm id`	Remove a stopped container
`docker logs id`	View container output/logs
`docker exec -it id bash`	Open a shell inside a running container

.dockerignore

Just like .gitignore tells git which files to skip, .dockerignore tells Docker which files to exclude from the build context. This keeps images small and avoids copying sensitive data:

.venv
__pycache__
.git
.env
*.pyc
.pytest_cache

Common Misconceptions

Misconception: "Docker Is Only for Deployment"

Docker is just as valuable during development. It ensures every developer on the team has the same environment—same Python version, same library versions, same system tools. No more "it works on my machine but not yours."

Misconception: "Containers Are Virtual Machines"

Containers share the host OS kernel. They don't emulate hardware or boot a full OS. This is why containers start in seconds while VMs take minutes, and why a container image is megabytes while a VM image is gigabytes.

TaskForge Connection

Containerize TaskForge. Write a Dockerfile, build an image, and run it. Verify that TaskForge works the same inside the container as it does on your host machine. This is the foundation for running Claude Code agents in Docker in Phase 6.

Containers are isolated but share the Docker Engine and host OS. Each container has its own filesystem, network, and process space—but they're lightweight because they share the kernel.

Micro-Exercises

1: Run Python in Docker

Run docker run -it python:3.12 python3 and execute print('Hello from Docker!') inside the container. Type exit() to leave.

2: Build a TaskForge Image

Write a Dockerfile for TaskForge and build it with docker build -t taskforge .. Verify with docker images that the image appears.

Try This Now

Create a docker-compose.yml that runs TaskForge's Flask API on port 5000 with a volume mount for the SQLite database:

docker-compose.ymlservices: web: build: . ports: - "5000:5000" volumes: - ./data:/app/data command: ["python", "-m", "flask", "run", "--host=0.0.0.0"]

Start it with docker compose up, then verify from another terminal:

Terminalcurl http://localhost:5000/tasks

Verification: TaskForge runs inside Docker, accessible from your host machine. Data persists across docker compose down and docker compose up.

If this doesn't work: (1) Port conflict → change "5000:5000" to "5001:5000" and use curl http://localhost:5001/tasks. (2) permission denied → make sure Docker Desktop is running. (3) Build fails → check that requirements.txt exists in your project root.

You just containerized an application so it runs identically anywhere—your laptop, a teammate's machine, or a cloud server. This is the infrastructure that makes isolated AI agent containers possible.

Interactive Exercises

Knowledge Check

What does FROM python:3.12 do in a Dockerfile?

Knowledge Check

What does -v $(pwd):/app do in a docker run command?

Docker Workflow

Docker installed — docker --version shows a version Pulled the Python image — docker pull python:3.12 Built TaskForge image — docker build -t taskforge . Ran TaskForge in a container successfully docker ps shows running containers (or docker ps -a shows history)

Chapter 17 CI/CD and GitHub Actions

Why This Matters Now

Every push to GitHub can automatically run your tests, check your code, and even deploy your app. CI/CD turns manual quality checks into automated pipelines. In Phase 5, you'll use CI-triggered agents that depend on this infrastructure.

What Is CI/CD?

CI (Continuous Integration) means automatically running tests every time code is pushed to a repository. Instead of remembering to run pytest before merging, the system does it for you—every single time, without fail.

CD (Continuous Deployment/Delivery) takes it further: after tests pass, the code is automatically deployed to production (deployment) or packaged and ready for a one-click release (delivery).

The core idea is simple: catch bugs before they reach production. If a test fails, the pipeline stops and notifies you. No broken code gets deployed. No "I forgot to run tests" disasters.

CI/CD terminology and pipeline stages
Term	What It Means	When It Runs
CI	Run tests and checks automatically	On every push or pull request
CD (Delivery)	Package code, ready for manual deploy	After CI passes on main branch
CD (Deployment)	Automatically deploy to production	After CI passes on main branch

GitHub Actions Basics

GitHub Actions is GitHub's built-in CI/CD system. It's free for public repos and has generous free-tier minutes for private repos. The key concepts:

GitHub Actions core concepts
Concept	What It Is	Analogy
Workflow	A YAML file defining your automation pipeline	A recipe with ordered steps
Trigger	The event that starts a workflow	The "go" signal
Job	A set of steps that run on the same machine	One cook at one station
Step	A single command or action within a job	One instruction in the recipe

Workflows live in .github/workflows/ inside your repository. Each workflow is a YAML file (a human-readable data format, similar to JSON but with indentation instead of braces).

Common triggers:

push — runs when code is pushed to specified branches
pull_request — runs when a PR is opened or updated
schedule — runs on a cron schedule (e.g., nightly tests)

Your First Workflow

Here's a complete, working GitHub Actions workflow that runs your tests on every push and pull request:

name: Run Tests
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install -r requirements.txt
      - run: python -m pytest

Let's walk through each line:

GitHub Actions workflow file explained line by line
Line	What It Does
`name: Run Tests`	A human-readable name that appears in the GitHub Actions UI
`on: push: branches: [main]`	Trigger this workflow when code is pushed to the `main` branch
`on: pull_request: branches: [main]`	Also trigger when a PR targets `main`
`jobs: test:`	Define a job called `test`
`runs-on: ubuntu-latest`	Run on a fresh Ubuntu virtual machine (GitHub provides this for free)
`uses: actions/checkout@v4`	Check out your repository code into the VM—without this, the VM is empty
`uses: actions/setup-python@v5`	Install the specified Python version on the VM
`with: python-version: '3.12'`	Use Python 3.12 specifically
`run: pip install -r requirements.txt`	Install your project's dependencies
`run: python -m pytest`	Run your test suite—if any test fails, the workflow fails

Save this as .github/workflows/test.yml in your repository. Push to GitHub, and the workflow runs automatically.

Adding More Checks

Tests alone aren't enough. Professional pipelines also check code quality:

Linting with ruff

Linting catches style issues, unused imports, and potential bugs without running the code. Add a linting step to your workflow:

      - run: pip install ruff
      - run: ruff check .

ruff is fast and catches many common Python mistakes. An alternative is flake8, which has been the standard for years.

Type Checking

If you use type hints in your Python code, tools like mypy can check them in CI. This is more advanced—for now, know that it exists and can be added as another step.

Matrix Strategy: Multiple Python Versions

Your code might work on Python 3.12 but break on 3.11. A matrix strategy tests against multiple versions in parallel:

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ['3.11', '3.12', '3.13']
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
      - run: pip install -r requirements.txt
      - run: python -m pytest

This creates three parallel jobs, one for each Python version. If any fails, you know exactly which version has the problem.

Status Badges

Add a badge to your README.md that shows whether your tests are passing:

![Tests](https://github.com/YOUR-USERNAME/YOUR-REPO/actions/workflows/test.yml/badge.svg)

Replace YOUR-USERNAME and YOUR-REPO with your actual GitHub username and repository name. The badge turns green when tests pass and red when they fail—visible to anyone viewing the repo.

Secrets Management

Some workflows need API keys or tokens (e.g., for deployment or calling external services). Never hardcode credentials in your workflow files or source code.

GitHub provides Secrets—encrypted variables stored in your repository settings. To add one: go to your repo on GitHub, navigate to Settings, then Secrets and variables, then Actions, and click "New repository secret."

Reference secrets in your workflow with the ${{ secrets.NAME }} syntax:

      - run: python deploy.py
        env:
          API_KEY: ${{ secrets.API_KEY }}
          DATABASE_URL: ${{ secrets.DATABASE_URL }}

Never Print Secrets

GitHub automatically masks secrets in logs, but avoid echo $API_KEY or print(os.environ["API_KEY"]) in your workflow steps. If a secret leaks, rotate it immediately.

When Things Fail

Your workflow will fail. This is normal—it's doing its job by catching problems. Here's how to debug:

Reading the logs: On GitHub, go to the Actions tab, click the failed workflow run, click the failed job, and expand the failed step. The log shows the exact error and line number.

Common failures:

Common CI/CD pipeline failures and fixes
Symptom	Likely Cause	Fix
`ModuleNotFoundError`	Missing dependency	Add the package to `requirements.txt`
Wrong Python version errors	Code uses features from a newer Python	Match the `python-version` in your workflow to your development version
Tests pass locally but fail in CI	Different OS, missing env vars, or hardcoded paths	Use `os.path.join()` instead of hardcoded paths; check all env vars are set
`Permission denied`	Script not executable or writing to protected path	Add `chmod +x` step or write to a writable directory

Common Misconceptions

Misconception: "CI Is Only for Big Teams"

Even solo projects benefit from automated tests. You'll forget to run tests before pushing. You'll break something you didn't realize was connected. CI catches these mistakes automatically—it's a safety net for teams of one just as much as teams of fifty.

Misconception: "If Tests Pass Locally, They'll Pass in CI"

CI runs on a different operating system (usually Ubuntu), a potentially different Python version, and a clean environment with no leftover state. Environment variables you set locally don't exist in CI. Files outside your repo don't exist. This is exactly the point—CI proves your code works in a clean environment, not just on your machine.

TaskForge Connection

Add a GitHub Actions workflow to TaskForge that runs pytest on every push and pull request. Save it as .github/workflows/test.yml in your TaskForge repository. After pushing, visit the Actions tab on GitHub to see your tests run automatically. This is the same CI infrastructure that CI-triggered agents will use in Phase 5.

The CI/CD pipeline: every push triggers automated tests. If they pass, code can be deployed. If they fail, you're notified before anything breaks in production.

Micro-Exercises

1: Create a Workflow for TaskForge

Create the file .github/workflows/test.yml in your TaskForge repository with the following content:

name: Run Tests
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install -r requirements.txt
      - run: python -m pytest

2: Add a Linting Step

Add a step to your workflow that checks code formatting with ruff. Insert these lines after the pip install step and before the pytest step:

      - run: pip install ruff
      - run: ruff check .

Try This Now

Push TaskForge to GitHub (from Ch 12), then push the .github/workflows/test.yml you created. Go to the Actions tab on GitHub and watch the workflow run. Make a test fail intentionally, push again, and see the red X.

Terminal# Ensure the workflows directory exists mkdir -p .github/workflows # Create the workflow file (or copy the one from Exercise 1) # Then push to GitHub git add .github/workflows/test.yml git commit -m "Add CI workflow" git push # Now intentionally break a test # Edit a test to assert something wrong, e.g.: # assert 1 == 2 git add -A git commit -m "Break a test intentionally" git push # Go to GitHub > Actions tab and watch the red X appear

Verification: The Actions tab shows a green checkmark on your first push. After breaking a test and pushing again, it shows a red X. You can see the test output in the logs.

If this doesn't work: (1) Workflow doesn't trigger → check the on: section matches your branch name (e.g., main vs master). (2) Python not found → ensure the actions/setup-python step is included. (3) Dependencies missing → ensure pip install -r requirements.txt runs before tests. (4) No Actions tab → make sure the repo is on GitHub, not just local.

You just automated your quality checks so they run on every push—the same infrastructure that CI-triggered agents use to review code, run tests, and guard production in professional workflows.

Interactive Exercises

Knowledge Check

What does on: push mean in a GitHub Actions workflow?

Knowledge Check

What does runs-on: ubuntu-latest specify?

CI/CD Setup

Created .github/workflows/test.yml Pushed workflow file to GitHub Saw workflow run in GitHub Actions tab Green checkmark on a passing run Intentionally broke a test and saw red X

Chapter 18 Project Architecture

Why This Matters Now

You can now build APIs, persist data, containerize applications, and run automated tests. But as projects grow, the way you organize code matters as much as the code itself. Poor architecture turns a working project into an unmaintainable one. Good architecture lets you—and AI coding tools—navigate, extend, and refactor with confidence.

Why Architecture Matters

When TaskForge was a single file, you could hold the entire program in your head. Now it has a Flask API, database functions, tests, a Dockerfile, and a CI pipeline. If all of that lived in one file, you'd spend more time scrolling than coding.

Architecture is how you organize code into files, folders, and modules so that each piece has a clear purpose. Good architecture provides three things:

Findability—you can locate any piece of functionality quickly.
Changeability—you can modify one part without breaking others.
Readability—a new developer (or an AI agent reading your codebase) can understand the structure without reading every line.

Separation of Concerns

Separation of concerns means each file or module handles one responsibility. The API layer handles HTTP requests and responses. The database layer handles data storage and retrieval. The business logic handles the rules of your application. When these concerns are separated, a change to how you store data doesn't require changes to your API routes, and vice versa.

Think of it like a restaurant. The waiter takes orders (API layer), the kitchen cooks food (business logic), and the pantry stores ingredients (data layer). If you reorganize the pantry, the waiter doesn't need retraining.

Common Patterns: From Flat Scripts to Packages

Projects evolve through predictable stages as they grow:

Stage 1: Single File

Everything in one file. Fine for scripts under 100 lines.

# taskforge.py — everything in one file
tasks = []

def add_task(title):
    tasks.append({"title": title, "done": False})

def list_tasks():
    return tasks

if __name__ == "__main__":
    add_task("Buy groceries")
    print(list_tasks())

Stage 2: Multiple Modules

Split by responsibility. Each file does one thing.

# db.py — database operations
# api.py — Flask routes
# cli.py — command-line interface
# models.py — data structures

Stage 3: Package with Directory Structure

Group related modules into directories. Add configuration, tests, and infrastructure files at the top level.

Directory Structure Conventions

Here is a conventional Python project layout. You don't need to memorize it—just know the pattern exists so you can recognize it and follow it.

A conventional Python project separates source code and tests from configuration and infrastructure files. Each file has a single, clear responsibility.

Key conventions:

Source code lives in a package directory (same name as the project) with an __init__.py file.
Tests live in a separate tests/ directory, mirroring the source structure.
Configuration files (Dockerfile, requirements.txt, .gitignore) live at the project root.
CI workflows live in .github/workflows/.

Configuration Management

Applications need configuration: database paths, API keys, debug flags. Hardcoding these values into your source code is fragile and insecure. Two approaches:

Environment Variables

Set configuration outside the code, in the environment where the program runs:

import os

DB_PATH = os.environ.get("TASKFORGE_DB", "taskforge.db")
DEBUG = os.environ.get("FLASK_DEBUG", "0") == "1"

os.environ.get() reads an environment variable with a fallback default. This means the same code works in development (using the default) and in production (using the environment variable set by Docker or CI).

The .env File

During development, you can store environment variables in a .env file at the project root:

# .env — local development configuration
TASKFORGE_DB=taskforge.db
FLASK_DEBUG=1

Add .env to your .gitignore so it never gets committed. Each developer has their own local .env. Production uses real environment variables set by the deployment system.

Never Commit Secrets

API keys, database passwords, and tokens must never appear in source code or be committed to git. Use environment variables or secrets management (Chapter 17). If you accidentally commit a secret, rotate it immediately—even after deleting the file, the secret exists in git history forever.

The TaskForge Evolution

Here's how TaskForge grew across the phases, and why each structural change was made:

TaskForge project structure evolution across phases
Phase	Structure	Why
Phase 2	Single file: `taskforge.py`	Everything in one place while learning functions, classes, and data structures.
Phase 3	Added `.gitignore`, pushed to GitHub	Version control and collaboration require separating tracked from untracked files.
Phase 4	Multiple files: `api.py`, `db.py`, `cli.py`, `tests/`, `Dockerfile`, `.github/workflows/`	Each new capability (API, database, containers, CI) needs its own file. Mixing them would make every file enormous and every change risky.

The progression was natural: you split code when a file gets too long, when two parts change for different reasons, or when you need to test pieces independently. You never reorganize for the sake of it—you reorganize when the current structure creates friction.

When to Refactor

Refactoring means restructuring code without changing what it does. You refactor when:

A file is too long to navigate (rough threshold: 200+ lines).
Two parts of the code change for different reasons (e.g., API routes and database queries).
You find yourself scrolling past large blocks of code to reach the part you need.
Tests are hard to write because everything is tangled together.

You do not refactor when:

The current structure works and is easy to understand.
You're adding a feature and reorganizing at the same time (do one at a time).
You're copying a "best practice" structure for a 50-line script. Simple code deserves simple structure.

"You Need Perfect Architecture Before Writing Code"

You don't. Start with the simplest structure that works. Reorganize when you feel friction—not before. Over-engineering a directory structure for a 100-line script is wasted effort. Architecture should emerge from real needs, not theoretical ideals.

Common Misconceptions

"More Files Always Means Better Organization"

Splitting a 30-line module into 6 files of 5 lines each makes the project harder to understand, not easier. Each file should contain a meaningful, cohesive unit of functionality. If you can't describe what a file does in one sentence, it's either too big (split it) or too vague (merge it with something related).

TaskForge Connection

Take the TaskForge code you've built across Phases 2–4 and reorganize it into the conventional structure shown in the diagram above. Create a taskforge/ package directory with __init__.py, api.py, db.py, and cli.py. Move tests into a tests/ directory. Keep Dockerfile, docker-compose.yml, requirements.txt, and .github/workflows/ at the project root. Verify that python -m pytest and docker compose up still work after the reorganization.

Micro-Exercises

1: Identify the Layers

Look at the TaskForge code you've written so far. Identify which lines belong to the API layer, the database layer, and the CLI layer. Write a comment next to each section marking its layer.

# In your current taskforge files, label sections like this:

# --- API LAYER ---
@app.route("/tasks", methods=["GET"])
def get_tasks():
    ...

# --- DATABASE LAYER ---
def init_db():
    conn = sqlite3.connect(DB_PATH)
    ...

# --- CLI LAYER ---
if __name__ == "__main__":
    import sys
    command = sys.argv[1]
    ...

If you find all three layers in one file, that's a sign it's ready to be split.

2: Create the Package Structure

Create the directory structure for TaskForge. You don't need to move code yet—just create the empty files:

Terminal# Create two folders: one for your code, one for tests # -p means "create parent folders if needed" (no error if they exist) mkdir -p taskforge tests # "touch" creates an empty file (or updates its timestamp if it exists) # __init__.py tells Python this folder is a package it can import from touch taskforge/__init__.py # One file per responsibility: API routes, database logic, CLI touch taskforge/api.py touch taskforge/db.py touch taskforge/cli.py # Matching test files — one per module touch tests/test_api.py touch tests/test_db.py

Run find . -name "*.py" to verify the structure looks correct.

Try This Now

Reorganize TaskForge into the package structure. Move database functions into taskforge/db.py, Flask routes into taskforge/api.py, and CLI logic into taskforge/cli.py. Update imports so everything still works.

taskforge/db.pyimport sqlite3 import os DB_PATH = os.environ.get("TASKFORGE_DB", "taskforge.db") def get_connection(): conn = sqlite3.connect(DB_PATH) conn.row_factory = sqlite3.Row return conn def init_db(): conn = get_connection() conn.execute(""" CREATE TABLE IF NOT EXISTS tasks ( id INTEGER PRIMARY KEY AUTOINCREMENT, title TEXT NOT NULL, done BOOLEAN DEFAULT 0 ) """) conn.commit() conn.close() def add_task(title): conn = get_connection() conn.execute("INSERT INTO tasks (title) VALUES (?)", (title,)) conn.commit() conn.close() def list_tasks(): conn = get_connection() rows = conn.execute("SELECT * FROM tasks").fetchall() conn.close() return [dict(row) for row in rows] def complete_task(task_id): conn = get_connection() conn.execute("UPDATE tasks SET done = 1 WHERE id = ?", (task_id,)) conn.commit() conn.close()

taskforge/api.pyfrom flask import Flask, jsonify, request from taskforge.db import init_db, add_task, list_tasks, complete_task app = Flask(__name__) init_db() @app.route("/tasks", methods=["GET"]) def get_tasks(): return jsonify(list_tasks()), 200 @app.route("/tasks", methods=["POST"]) def create_task(): data = request.json if not data or "title" not in data: return jsonify({"error": "title is required"}), 400 add_task(data["title"]) return jsonify({"status": "created"}), 201 @app.route("/tasks/<int:task_id>/done", methods=["PUT"]) def mark_complete(task_id): complete_task(task_id) return jsonify({"status": "updated"}), 200

Verification: Run python -m pytest from the project root—all tests pass. Run flask --app taskforge.api run and test with curl http://localhost:5000/tasks—the API works. Run docker compose up—the container starts successfully.

If this doesn't work: (1) ModuleNotFoundError: No module named 'taskforge' → make sure taskforge/__init__.py exists and you're running commands from the project root. (2) Circular imports → make sure db.py doesn't import from api.py. Dependencies should flow one direction: api.py imports from db.py, not the reverse. (3) Tests can't find modules → run pip install -e . to install your package in editable mode, or use python -m pytest from the root directory.

Architecture isn't about following rules—it's about making your future self's life easier. A well-structured project is one where every file has a clear purpose and every change has a predictable location.

Interactive Exercises

Knowledge Check

What is separation of concerns?

Design Challenge: "Refactor the Monolith"

This 30-line script mixes input parsing, validation, and output formatting. Refactor it into 3 functions: parse_input(raw), validate(data), and format_output(data). The main logic should just call these 3 functions.

parse_input should only split and strip — no validation.

validate should check the data dict and return error string or None.

format_output takes a validated dict and returns the formatted string.

Architecture Review

TaskForge has separate files for CLI, API, and database logic No function is longer than 30 lines Imports are at the top of each file python -m pytest still passes after reorganizing

Phase 4 Gate Checkpoint & TaskForge Full Stack

Minimum Competency

Build a REST API with Flask, persist data with SQLite, containerize with Docker, and run automated tests via CI. Understand project directory structure and separation of concerns.

Your Artifact

TaskForge with: Flask REST API (GET/POST/PUT endpoints), SQLite database persistence, a Dockerfile, a docker-compose.yml, a GitHub Actions workflow that runs tests on push, and a clean project structure.

Verification

curl http://localhost:5000/tasks returns JSON. Data persists across server restarts. docker compose up runs TaskForge. GitHub Actions shows green checkmark.

Failure Signal

If your API returns HTML instead of JSON, or data disappears on restart, or Docker build fails → return to the specific chapter covering that topic.

TaskForge Checkpoint

TaskForge is now a full-stack application: CLI, REST API, database, container, CI pipeline. It's ready for data structures and algorithms in Phase 5.

What You Can Now Do

Consume and build REST APIs
Persist data with SQL databases
Containerize applications with Docker
Automate testing with CI/CD pipelines
Structure projects for maintainability

Bridge to Phase 5

You now have a full-stack application with professional infrastructure. Phase 5 dives into the data structures and algorithms that power everything you've built—how lists, hash tables, and trees work under the hood, and how to measure and compare their performance. Understanding these foundations makes you a stronger engineer, whether you're writing code by hand or evaluating AI-generated solutions.

Bridge Exercise: Human Code vs AI Code

You've been building TaskForge by hand. Now compare your approach to an AI-generated version of the same feature. This is exactly the skill you'll practice throughout Phase 5.

Spot the Differences

Below are two implementations of a get_task_stats function. One was written by a human following the patterns from this guide. The other was AI-generated. Read both, then write a function evaluate() that returns a dictionary with three keys: "missing_validation" (which version skips input validation—"human" or "ai"), "over_engineered" (which version adds unnecessary complexity), and "better_for_production" (your judgment).

Look at what each function assumes about its input. Does the human version handle missing keys? Does the AI version add fields nobody requested?

The human version uses t["status"] (crashes on missing key) and assumes only two states. The AI version imports datetime and adds a timestamp nobody asked for.