Building & Deploying
APIs, Flask, databases, Docker, CI/CD, and professional project structure. From consuming software to building and shipping it.
This phase assumes you can: navigate the terminal and use basic shell commands (Ch 10), create Git repositories, make commits, and push to GitHub (Ch 11-12), and write Python functions with error handling and tests (Ch 05, 09).
Chapter 13 APIs and HTTP
Almost every modern application talks to other applications over the internet. Weather apps fetch forecasts. Payment systems process charges. AI tools send prompts and receive completions. The language they all speak is HTTP, and the conversation pattern they follow is called an API. Understanding APIs is how you go from writing code that runs on your machine to writing code that connects to the world.
What Is an API?
An API (Application Programming Interface) is a contract between two programs. One program says "send me a request in this format, and I'll send you a response in that format." The API defines the rules—what you can ask for, how to ask, and what you'll get back.
You've already used APIs without knowing it. When you call len([1, 2, 3]) in Python, you're using a function API—you pass a list, it returns the length. Web APIs work the same way, except the function lives on a remote server and the call travels over the internet.
Think of a restaurant analogy. You (the client) read the menu (the API documentation), place an order with the waiter (send a request), the kitchen (the server) prepares your food, and the waiter brings it back (the response). You never walk into the kitchen. The menu is the contract.
HTTP: The Language of the Web
HTTP (HyperText Transfer Protocol) is the protocol that clients and servers use to communicate. Every time you visit a website, your browser sends an HTTP request and receives an HTTP response. APIs use the same protocol.
An HTTP exchange has two parts:
- Request—sent by the client. Contains a method, a URL, headers, and optionally a body.
- Response—sent by the server. Contains a status code, headers, and a body.
HTTP Methods
The method tells the server what you want to do. There are four you need to know:
| Method | Purpose | Example |
|---|---|---|
GET | Read/retrieve data | Get a list of tasks |
POST | Create new data | Add a new task |
PUT | Update existing data | Mark a task as done |
DELETE | Remove data | Delete a task |
GET and DELETE typically don't send a body. POST and PUT send data in the request body (usually JSON).
Status Codes
Every HTTP response includes a status code—a three-digit number that tells the client what happened. You don't need to memorize all of them, just these five groups:
| Range | Meaning | Common Codes |
|---|---|---|
2xx | Success | 200 OK, 201 Created |
3xx | Redirect | 301 Moved Permanently |
4xx | Client error (you made a mistake) | 400 Bad Request, 404 Not Found |
5xx | Server error (they have a problem) | 500 Internal Server Error |
When something goes wrong, the status code is the first thing to check. A 404 means the URL is wrong. A 401 means you're not authenticated. A 500 means the server is broken—not your fault.
JSON: The Data Format
APIs need a shared format for sending data. The standard is JSON (JavaScript Object Notation). You already know JSON—it looks exactly like Python dictionaries and lists:
{
"id": 1,
"title": "Buy groceries",
"done": false
}
Python's json module converts between JSON strings and Python objects. But when using the requests library (below), this conversion happens automatically.
Headers
Headers are metadata attached to requests and responses. They carry information about the message itself, not the data. Two headers you'll see constantly:
Content-Type: application/json— tells the server "I'm sending you JSON."Authorization: Bearer <token>— proves your identity (like a password).
You rarely set headers manually—tools like the requests library handle most of them for you.
Using Python's requests Library
Python's requests library makes HTTP calls simple. It's not part of the standard library, so you need to install it:
requests (Python, free). The most popular HTTP library for Python. Install inside your virtual environment: pip install requests.
Making a GET Request
import requests
response = requests.get("https://api.github.com/users/octocat")
print(response.status_code) # 200
print(response.json()) # Python dict with user data
requests.get() sends a GET request to the URL. The response object gives you the status code with .status_code and the parsed JSON body with .json().
Making a POST Request
import requests
data = {"title": "Buy groceries", "done": False}
response = requests.post("http://localhost:5000/tasks", json=data)
print(response.status_code) # 201
print(response.json()) # {"id": 1, "title": "Buy groceries", "done": false}
The json=data parameter automatically converts the Python dict to JSON and sets the Content-Type header for you.
All Four Methods
import requests
BASE = "http://localhost:5000"
# GET - read
r = requests.get(f"{BASE}/tasks")
# POST - create
r = requests.post(f"{BASE}/tasks", json={"title": "New task"})
# PUT - update
r = requests.put(f"{BASE}/tasks/1/done")
# DELETE - remove
r = requests.delete(f"{BASE}/tasks/1")
Consuming a Public API
Let's try a real public API. The GitHub API is free and requires no authentication for basic requests:
import requests
# Get public info about a GitHub user
response = requests.get("https://api.github.com/users/octocat")
if response.status_code == 200:
user = response.json()
print(f"Name: {user['name']}")
print(f"Public repos: {user['public_repos']}")
print(f"Followers: {user['followers']}")
else:
print(f"Error: {response.status_code}")
Always check the status code before using the response data. A 200 means success. Anything else means something went wrong, and response.json() might not contain what you expect.
Common Misconceptions
APIs are how programs talk to each other. Data scientists use APIs to fetch datasets. DevOps engineers use APIs to manage cloud infrastructure. AI agents use APIs to call language models. If you write code that interacts with any external service, you're using an API.
They're not. GET retrieves data and should never modify anything on the server. POST creates new data. Using the wrong method confuses other developers and breaks tooling that relies on these conventions. The method communicates intent.
TaskForge Connection
Right now TaskForge is a CLI tool—you interact with it from the terminal. But what if a web frontend, a mobile app, or an AI agent wants to manage tasks? They would need an API. In Chapter 14, you'll build exactly that: a Flask web API for TaskForge. But first, you'll practice consuming an API with Python's requests library so you understand both sides of the conversation.
Micro-Exercises
Use requests to fetch information about any GitHub user and print their name and number of public repos.
import requests
response = requests.get("https://api.github.com/users/octocat")
data = response.json()
print(f"Name: {data['name']}")
print(f"Repos: {data['public_repos']}")
Try replacing "octocat" with your own GitHub username.
Request a URL that doesn't exist and confirm you get a 404:
import requests
response = requests.get("https://api.github.com/users/this-user-does-not-exist-999999")
print(response.status_code) # 404
Now try https://api.github.com/ (the root) and confirm you get a 200.
Write a Python script that fetches the 5 most recent public repositories for a GitHub user and prints their names and star counts.
Verification: Running the script prints 5 repository names with their star counts. Change the username variable and run again—you should see different results.
If this doesn't work: (1) ModuleNotFoundError: No module named 'requests' → make sure your virtual environment is activated and run pip install requests. (2) ConnectionError → check your internet connection. (3) Status code 403 → you've hit the GitHub API rate limit (60 requests/hour for unauthenticated users). Wait a few minutes and try again.
Interactive Exercises
Knowledge Check
Which HTTP method is used to create a new resource?
Knowledge Check
What does a 404 status code mean?
Design Challenge: "Parse API Response"
Write extract_repos(json_str) that parses a JSON string of GitHub repos and returns a list of (name, stars) tuples, sorted by stars descending.
Use json.loads() to parse the string into a Python list.
Use a list comprehension to extract (name, stargazers_count) tuples.
Sort with sorted(items, key=lambda x: x[1], reverse=True).
Optional: Web Scraping with Python
Web scraping extends the HTTP skills from this chapter but is not required for later chapters. Skip it if you want to move forward; come back when you need to extract data from web pages.
Web scraping means making HTTP requests to web pages and extracting data from the HTML. The difference from APIs: APIs return structured JSON, web pages return HTML that you parse.
When to Scrape vs. When to Use an API
If the site has an API, use the API — it's faster, more reliable, and more polite. Scrape only when there's no API and the data is publicly accessible.
Respect robots.txt, rate-limit your requests with time.sleep(), and check terms of service. Scraping is a tool, not a right.
Basics with BeautifulSoup
Note: BeautifulSoup (pip install beautifulsoup4) is an optional tool not required for the exercises. The exercise below uses Python’s built-in html.parser instead.
import requests
from bs4 import BeautifulSoup
# Fetch the page
response = requests.get("https://example.com/data")
soup = BeautifulSoup(response.text, "html.parser")
# Find elements
title = soup.find("h1").text
links = [a["href"] for a in soup.find_all("a")]
rows = soup.select("table tr") # CSS selector
Common patterns:
soup.find("tag")— first matching elementsoup.find_all("tag")— all matching elementssoup.select(".class")— CSS selectortag.textortag.get_text(strip=True)— extract texttag["href"]— extract attribute
Exercise: Parse HTML Data
Given an HTML string, extract all link texts and URLs into a list of dicts. Uses Python's built-in html.parser since BeautifulSoup isn't available in the browser.
Inside handle_starttag, check if tag == "a". If so, set self.in_a = True and reset self.current_text = "".
Loop through attrs with for name, value in attrs: and check if name == "href": to grab the URL.
Full solution: if tag == "a": self.in_a = True; self.current_text = ""; [set self.current_href = value for (name, value) in attrs if name == "href"]
Chapter 14 Building Web APIs with Flask
Chapter 13 taught you to consume APIs. Now you build one. This is the bridge from user to creator—and it's what the Phase 4 Gate requires.
What Is Flask?
Flask is a minimal Python web framework. It lets you turn Python functions into API endpoints with just a few lines of code. Unlike larger frameworks (Django), Flask gives you only what you need and gets out of the way.
Flask (Python, free). Minimal web framework for building APIs and web apps. Install: pip install flask.
Your First Flask App
A complete Flask app in 5 lines:
from flask import Flask
app = Flask(__name__)
@app.route("/")
def hello():
return "Hello, World!"
if __name__ == "__main__":
app.run(debug=True)
Save as app.py and run with python3 app.py. Open http://localhost:5000 in your browser. You just built a web server.
@app.route("/") is a decorator—it tells Flask which URL triggers which function. debug=True auto-reloads on code changes and shows helpful error pages.
Routes and Methods
Each endpoint is a URL + HTTP method combination. Flask defaults to GET only. To accept other methods, specify them explicitly:
@app.route("/tasks", methods=["GET", "POST"])
def handle_tasks():
if request.method == "GET":
# return all tasks
pass
elif request.method == "POST":
# create a new task
pass
You can also use separate functions for each method, which is cleaner:
@app.route("/tasks", methods=["GET"])
def get_tasks():
# return all tasks
pass
@app.route("/tasks", methods=["POST"])
def create_task():
# create a new task
pass
Returning JSON
APIs return JSON, not HTML. Flask provides jsonify() to convert Python dicts to proper JSON responses with the correct Content-Type header:
from flask import Flask, jsonify
@app.route("/status")
def status():
return jsonify({"status": "running", "version": "1.0"}), 200
The second value (200) is the HTTP status code. Common codes:
| Code | Meaning | When to Use |
|---|---|---|
200 | OK | Successful GET or PUT |
201 | Created | Successful POST that creates a resource |
400 | Bad Request | Client sent invalid data |
404 | Not Found | Resource doesn't exist |
500 | Server Error | Something broke on your end |
Reading Request Data
When a client sends data to your API, you need to read it. Flask provides two main ways:
from flask import Flask, request, jsonify
# JSON body (for POST/PUT requests)
@app.route("/tasks", methods=["POST"])
def create_task():
data = request.json # parse JSON body
title = data.get("title") # safely get a field
return jsonify({"title": title}), 201
# Query parameters (for GET requests)
# GET /tasks?status=pending
@app.route("/tasks", methods=["GET"])
def get_tasks():
status = request.args.get("status", "all") # default to "all"
return jsonify({"filter": status})
request.json parses the JSON body sent by the client. request.args reads URL query parameters (everything after the ?).
Building a TaskForge API
Let's build a complete REST API for TaskForge, step by step. We'll use an in-memory list to store tasks—no database needed yet.
Step 1: Setup and Data Store
from flask import Flask, jsonify, request
app = Flask(__name__)
# In-memory task storage
tasks = []
next_id = 1
Step 2: GET /tasks — List All Tasks
@app.route("/tasks", methods=["GET"])
def get_tasks():
return jsonify(tasks), 200
This returns the entire task list as a JSON array. Simple and direct.
Step 3: POST /tasks — Add a Task
@app.route("/tasks", methods=["POST"])
def create_task():
global next_id
data = request.json
if not data or "title" not in data:
return jsonify({"error": "title is required"}), 400
task = {
"id": next_id,
"title": data["title"],
"done": False
}
tasks.append(task)
next_id += 1
return jsonify(task), 201
Notice the validation: if the client doesn't send a title, we return a 400 Bad Request with a clear error message. Never trust client input.
Step 4: PUT /tasks/<id>/done — Mark Task Complete
@app.route("/tasks/<int:task_id>/done", methods=["PUT"])
def complete_task(task_id):
for task in tasks:
if task["id"] == task_id:
task["done"] = True
return jsonify(task), 200
return jsonify({"error": "task not found"}), 404
<int:task_id> is a URL variable—Flask extracts the number from the URL and passes it as a function parameter. If no task matches, we return 404.
Step 5: Run It
if __name__ == "__main__":
app.run(debug=True)
Testing Your API
With your Flask app running in one terminal, open another terminal and test with curl:
You can also test with Python requests (from Chapter 13):
import requests
# Add a task
r = requests.post("http://localhost:5000/tasks",
json={"title": "Test from Python"})
print(r.status_code) # 201
print(r.json()) # {"id": 1, "title": "Test from Python", "done": false}
# List all tasks
r = requests.get("http://localhost:5000/tasks")
print(r.json()) # [{"id": 1, ...}]
Error Handling in APIs
A good API returns clear error responses so the client knows what went wrong and how to fix it:
# Bad request: missing required field
@app.route("/tasks", methods=["POST"])
def create_task():
data = request.json
if not data or "title" not in data:
return jsonify({"error": "title is required"}), 400
if not isinstance(data["title"], str) or len(data["title"].strip()) == 0:
return jsonify({"error": "title must be a non-empty string"}), 400
# ... create the task
# Not found
@app.route("/tasks/<int:task_id>/done", methods=["PUT"])
def complete_task(task_id):
for task in tasks:
if task["id"] == task_id:
task["done"] = True
return jsonify(task), 200
return jsonify({"error": f"task {task_id} not found"}), 404
# Catch unexpected errors
@app.errorhandler(500)
def internal_error(e):
return jsonify({"error": "internal server error"}), 500
The pattern: always return JSON with an "error" key and an appropriate status code. Never return a bare string or an HTML error page from an API.
Common Misconceptions
Flask scales. Instagram, Pinterest, and Netflix have used Flask in production. The difference between a toy project and a production app is architecture, not the framework. Flask gives you the flexibility to add complexity only when you need it.
Not to start. Our TaskForge API uses a Python list—perfectly fine for learning and prototyping. Data disappears when you restart the server, but that's a problem you'll solve in Chapter 15 with SQLite. Start simple, add complexity when you need it.
TaskForge Connection
You just turned TaskForge from a CLI tool into a web API. Any program—a web frontend, a mobile app, another Python script, or an AI agent—can now create and manage tasks by making HTTP requests. In the Phase 4 Gate, this API is part of the required artifact.
Micro-Exercises
Create a Flask app with a single GET endpoint that returns {"message": "Hello, World!"}.
from flask import Flask, jsonify
app = Flask(__name__)
@app.route("/")
def hello():
return jsonify({"message": "Hello, World!"})
if __name__ == "__main__":
app.run(debug=True)
Run it and visit http://localhost:5000 in your browser. You should see the JSON response.
Add a POST endpoint that accepts a name in JSON and returns a greeting.
from flask import Flask, jsonify, request
app = Flask(__name__)
@app.route("/greet", methods=["POST"])
def greet():
data = request.json
name = data.get("name", "stranger")
return jsonify({"greeting": f"Hello, {name}!"}), 200
if __name__ == "__main__":
app.run(debug=True)
Test: curl -X POST -H "Content-Type: application/json" -d '{"name": "Alice"}' http://localhost:5000/greet
Build a complete REST API for TaskForge with GET /tasks, POST /tasks, and PUT /tasks/<id>/done endpoints. Test each with curl.
Verification: curl http://localhost:5000/tasks returns a JSON array. curl -X POST -H "Content-Type: application/json" -d '{"title":"Test"}' http://localhost:5000/tasks creates a task.
If this doesn't work: (1) Port already in use → kill the existing process or use app.run(port=5001). (2) Import errors → make sure Flask is installed in your venv (pip install flask). (3) curl returns HTML instead of JSON → make sure you're using jsonify(), not returning a plain string.
Interactive Exercises
Knowledge Check
What decorator makes a function handle GET requests to /tasks?
Knowledge Check
What status code should a successful POST request return when creating a resource?
Test Your API Endpoints
Below is a simplified TaskForge API with get_tasks() and add_task(). Write 4 test functions that verify: (1) GET returns an empty list initially, (2) POST adds a task and returns 201, (3) POST without a title returns 400, (4) GET after adding returns the task. The test helpers get(path) and post(path, data) are provided.
test_get_empty: call get("/tasks") and assert the status is 200 and the json is [].
test_add_task: call post("/tasks", {"title": "Test"}) and assert status is 201 and the returned json has the title.
test_add_without_title: call post("/tasks", {}) and assert status is 400.
Design Challenge: "Request Validator"
Write validate_task(data) that validates a task dict: 'title' is required (non-empty string), 'priority' is optional (must be 'high', 'medium', or 'low'), 'due_date' is optional (must match YYYY-MM-DD). Return a list of error strings, or empty list if valid.
Check if 'title' key exists AND is a non-empty string.
For priority, use if 'priority' in data and data['priority'] not in ('high', 'medium', 'low').
For due_date, try datetime.strptime(data['due_date'], '%Y-%m-%d') in a try/except. Import datetime at top.
Connecting a Frontend to Your API
You built an API. Now let's connect it to a web page. You need just enough HTML, CSS, and JavaScript to consume your own API — this is minimum viable frontend, not a full web development course.
HTML in 5 Minutes
HTML is a tree of nested tags that describe a document's structure:
<!DOCTYPE html>
<html>
<head>
<title>TaskForge</title>
</head>
<body>
<h1>My Tasks</h1>
<ul id="task-list"></ul>
<input id="new-task" type="text" placeholder="New task...">
<button id="add-btn">Add Task</button>
<script src="app.js"></script>
</body>
</html>
AI generates HTML constantly. You need to read it, not master it. Focus on understanding the tree structure: which tags are parents, which are children, and what id and class attributes do.
CSS in 2 Minutes
CSS controls how your HTML looks — colors, layout, spacing, fonts. AI generates CSS well, so you mainly need to read it rather than write it from scratch. The key concept: a <link rel='stylesheet' href='style.css'> tag in your HTML imports a CSS file that styles elements by matching selectors to tags, classes, and IDs.
JavaScript fetch() in 5 Minutes
fetch() calls your API from the browser. Here's a complete example that talks to a TaskForge Flask API:
// GET: Fetch all tasks and display them
fetch("http://localhost:5000/api/tasks")
.then(response => response.json())
.then(tasks => {
const list = document.getElementById("task-list");
list.innerHTML = "";
tasks.forEach(task => {
const li = document.createElement("li");
li.textContent = task.title;
list.appendChild(li);
});
});
// POST: Add a new task
function addTask(title) {
fetch("http://localhost:5000/api/tasks", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ title: title, status: "todo" })
})
.then(response => response.json())
.then(task => console.log("Created:", task));
}
This section teaches the minimum to consume an API. If you want to build real frontends, start with MDN Web Docs — the authoritative reference for HTML, CSS, and JavaScript.
Chapter 15 Database Basics
TaskForge currently loses all data when you close it. Databases solve this permanently. Every real application uses one.
What Is a Database?
A database is structured storage that persists beyond program execution. When you close your Python script, variables disappear. When you close a program backed by a database, the data stays. Every real application—from TaskForge to GitHub to the Anthropic API—stores its data in a database.
SQL vs NoSQL
There are two major families of databases. For this course, we focus on SQL.
| Feature | SQL | NoSQL |
|---|---|---|
| Structure | Tables with rows and columns | Flexible documents or key-value pairs |
| Schema | Strict—define columns before inserting data | Flexible—each document can differ |
| Query Language | SQL (Structured Query Language) | Varies by database |
| Examples | PostgreSQL, SQLite, MySQL | MongoDB, Redis, DynamoDB |
| Best For | Structured data with relationships | Rapidly changing schemas, caching |
SQLite — Your First Database
We start with SQLite because it removes every barrier to getting started:
- No server—it's a library, not a service. No installation, no configuration, no passwords.
- Built into Python—
import sqlite3works out of the box. Nopip installrequired. - File-based—your entire database is a single file (
tasks.db). Copy it, back it up, delete it—just like any file.
SQLite is not a toy. It's used in production by every iPhone, every Android phone, every web browser, and every copy of Python. For learning and for single-user applications like TaskForge, it's the right choice.
Basic SQL
SQL is a language for talking to databases. The core is just 5 commands:
CREATE TABLE — Define Your Structure
CREATE TABLE tasks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
title TEXT NOT NULL,
done BOOLEAN DEFAULT 0
);
This creates a table with three columns. PRIMARY KEY AUTOINCREMENT means each task gets a unique, auto-assigned ID. NOT NULL means the title is required. DEFAULT 0 means new tasks start as not done.
INSERT — Add Data
INSERT INTO tasks (title, done) VALUES ('Buy groceries', 0);
INSERT INTO tasks (title, done) VALUES ('Write tests', 0);
INSERT INTO tasks (title, done) VALUES ('Deploy API', 1);
SELECT — Read Data
-- Get all tasks
SELECT * FROM tasks;
-- Get only incomplete tasks
SELECT * FROM tasks WHERE done = 0;
-- Get just titles
SELECT title FROM tasks WHERE done = 0;
UPDATE — Modify Data
UPDATE tasks SET done = 1 WHERE id = 1;
Always include a WHERE clause. Without it, you update every row.
DELETE — Remove Data
DELETE FROM tasks WHERE id = 1;
Same rule: always use WHERE. DELETE FROM tasks with no WHERE deletes everything.
Python's sqlite3 Module
Python includes sqlite3 in the standard library. Here's the workflow:
import sqlite3
# 1. Connect to database (creates file if it doesn't exist)
conn = sqlite3.connect("tasks.db")
# 2. Create a cursor (the object that executes SQL)
cursor = conn.cursor()
# 3. Execute SQL
cursor.execute("""
CREATE TABLE IF NOT EXISTS tasks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
title TEXT NOT NULL,
done BOOLEAN DEFAULT 0
)
""")
# 4. Insert data
cursor.execute("INSERT INTO tasks (title, done) VALUES (?, ?)", ("Buy groceries", 0))
# 5. Commit changes (saves to disk)
conn.commit()
# 6. Query data
cursor.execute("SELECT * FROM tasks")
rows = cursor.fetchall()
for row in rows:
print(row) # (1, 'Buy groceries', 0)
# 7. Close the connection
conn.close()
Key functions: connect() opens the database, cursor() creates an executor, execute() runs SQL, fetchall() retrieves results, commit() saves changes, close() cleans up.
Parameterized Queries
ALWAYS use ? placeholders. NEVER build SQL strings with f-strings, .format(), or + concatenation. String formatting lets attackers inject malicious SQL into your database. This is not theoretical—SQL injection is one of the most common security vulnerabilities in real applications.
# DANGEROUS — never do this
title = input("Task title: ")
cursor.execute(f"INSERT INTO tasks (title) VALUES ('{title}')")
# A user could type: '); DROP TABLE tasks; --
# And your entire table is deleted.
# SAFE — always do this
title = input("Task title: ")
cursor.execute("INSERT INTO tasks (title) VALUES (?)", (title,))
# The ? placeholder handles escaping automatically.
The ? placeholder tells sqlite3 to safely escape the value. This one rule prevents an entire class of security vulnerabilities.
TaskForge Connection
Replace TaskForge's JSON file storage with SQLite. Here's the before and after:
Before: JSON File Storage
import json
def load_tasks(filepath):
try:
with open(filepath, "r") as f:
return json.load(f)
except FileNotFoundError:
return []
def save_tasks(tasks, filepath):
with open(filepath, "w") as f:
json.dump(tasks, f, indent=2)
def add_task(tasks, title, filepath):
task = {"id": len(tasks) + 1, "title": title, "done": False}
tasks.append(task)
save_tasks(tasks, filepath)
return task
After: SQLite Storage
import sqlite3
def get_connection():
conn = sqlite3.connect("tasks.db")
conn.row_factory = sqlite3.Row # access columns by name
return conn
def init_db():
conn = get_connection()
conn.execute("""
CREATE TABLE IF NOT EXISTS tasks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
title TEXT NOT NULL,
done BOOLEAN DEFAULT 0
)
""")
conn.commit()
conn.close()
def add_task(title):
conn = get_connection()
conn.execute("INSERT INTO tasks (title) VALUES (?)", (title,))
conn.commit()
conn.close()
def list_tasks():
conn = get_connection()
rows = conn.execute("SELECT * FROM tasks").fetchall()
conn.close()
return [dict(row) for row in rows]
def complete_task(task_id):
conn = get_connection()
conn.execute("UPDATE tasks SET done = 1 WHERE id = ?", (task_id,))
conn.commit()
conn.close()
The biggest difference: no more loading the entire file into memory, no more rewriting the whole file on every change. The database handles reads and writes efficiently, even with thousands of tasks.
ORMs — A Preview
An ORM (Object-Relational Mapper) bridges the gap between Python objects and database rows. Instead of writing raw SQL, you define Python classes that map to tables:
# Raw SQL (what you learned above)
cursor.execute("SELECT * FROM tasks WHERE done = 0")
rows = cursor.fetchall()
# ORM style (conceptual — SQLAlchemy)
pending_tasks = Task.query.filter_by(done=False).all()
The most popular Python ORM is SQLAlchemy. We won't teach it in depth here—raw SQL gives you a better understanding of what ORMs do under the hood. But know that ORMs exist and that most production applications use them.
Common Misconceptions
They work for simple cases, but they don't handle concurrent access (two processes writing at once corrupts the file), they require loading the entire dataset into memory, and they have no query capability—you can't ask "give me all incomplete tasks" without loading everything. Databases solve all three problems.
The basics are 5 commands: CREATE, INSERT, SELECT, UPDATE, DELETE. You just learned them. The advanced features (joins, indexes, transactions) exist, but you can build real applications with just the basics.
Micro-Exercises
Create a SQLite database with a tasks table and insert 3 tasks using the Python sqlite3 module.
import sqlite3
conn = sqlite3.connect("practice.db")
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS tasks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
title TEXT NOT NULL,
done BOOLEAN DEFAULT 0
)
""")
cursor.execute("INSERT INTO tasks (title) VALUES (?)", ("Buy groceries",))
cursor.execute("INSERT INTO tasks (title) VALUES (?)", ("Write tests",))
cursor.execute("INSERT INTO tasks (title) VALUES (?)", ("Deploy API",))
conn.commit()
# Verify
rows = cursor.execute("SELECT * FROM tasks").fetchall()
for row in rows:
print(row)
conn.close()
Write a query that returns only incomplete tasks.
import sqlite3
conn = sqlite3.connect("practice.db")
cursor = conn.cursor()
cursor.execute("SELECT * FROM tasks WHERE done = 0")
incomplete = cursor.fetchall()
for task in incomplete:
print(task)
conn.close()
Refactor TaskForge to use SQLite instead of JSON files. Create functions: init_db(), add_task(title), list_tasks(), complete_task(id).
Verification: Tasks persist between program runs. Close and reopen—your tasks are still there. Run the script, then comment out the add_task lines and run again—list_tasks() still returns the previously added tasks.
If this doesn't work: (1) If "database is locked", make sure you're calling conn.close() or using with statements. (2) If the table doesn't exist, call init_db() at startup. (3) sqlite3.OperationalError: no such table → you're connecting to a different file than the one where the table was created. Check DB_PATH.
Interactive Exercises
Knowledge Check
Which SQL statement retrieves data from a table?
Guided Exercise: "SQL in the Browser"
This exercise runs real SQLite in your browser. Create a 'tasks' table, insert 3 tasks, then query for incomplete ones.
Design Challenge: "TaskDB Class"
Build a TaskDB class backed by SQLite in-memory. Methods: add(title) returns the new task ID, complete(task_id) marks it done, search(keyword) finds tasks by title substring, stats() returns a dict with total, completed, and pending counts.
In __init__, create the table with id, title, and done columns.
For add, use cursor.lastrowid after INSERT to get the new ID.
For search, use WHERE title LIKE ? with f'%{keyword}%' as the parameter.
Complete Select Star SQL — a free interactive tutorial that teaches SQL using real data. It covers SELECT, WHERE, GROUP BY, JOIN, and subqueries. You can finish it in an afternoon.
Chapter 16 Docker and Containers
Docker solves "works on my machine" forever. It's how professional teams ensure consistent environments, and it's how you'll run Claude Code agents in isolated containers in Phase 6.
What Is Docker?
Docker lets you package your application and everything it needs—Python version, libraries, system tools—into a single container that runs identically on any machine. Think of a shipping container: it doesn't matter whether the ship is going to Tokyo or Rotterdam, the box is the same and the contents arrive intact. Docker containers work the same way for software.
Containers are not virtual machines. A VM emulates an entire operating system with its own kernel—heavyweight and slow to start. A container shares the host OS kernel and only isolates the application layer. This makes containers lightweight (megabytes instead of gigabytes) and fast to start (seconds instead of minutes).
Key Concepts
| Concept | What It Is | Analogy |
|---|---|---|
| Image | A snapshot/blueprint of an environment | Like a class in Python—a template |
| Container | A running instance of an image | Like an object—a live instance created from the class |
| Dockerfile | Instructions to build an image | Like a recipe—step-by-step build instructions |
| docker-compose | A tool to run multiple containers together | Like an orchestra conductor—coordinates multiple players |
Installing Docker
Install Docker Desktop, which includes the Docker engine, CLI, and a GUI dashboard:
- macOS: Download from
docker.com/products/docker-desktop. Drag to Applications. Launch Docker Desktop—the whale icon appears in your menu bar. - Windows: Download the installer from the same URL. Enable WSL 2 backend during setup. Restart when prompted.
Verify the installation:
Your First Container
Run Docker's built-in test image to confirm everything works:
Docker pulls the hello-world image from Docker Hub (a public registry of images), creates a container from it, runs it, and prints a success message. That's the entire workflow: pull, create, run.
Now try something more useful—a Python REPL inside a container:
The -it flags give you an interactive terminal. You're now inside a Python 3.12 environment that's completely isolated from your host machine. Type exit() to leave.
Writing a Dockerfile
A Dockerfile is a text file that tells Docker how to build an image. Here's one for TaskForge:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "taskforge.py"]
Each line is an instruction:
| Line | What It Does |
|---|---|
FROM python:3.12-slim | Start from the official Python 3.12 image (slim variant, smaller size) |
WORKDIR /app | Set the working directory inside the container to /app |
COPY requirements.txt . | Copy the requirements file into the container first (for caching) |
RUN pip install -r requirements.txt | Install Python dependencies inside the container |
COPY . . | Copy all project files into the container |
CMD ["python", "taskforge.py"] | Default command when the container starts |
Why copy requirements.txt before everything else? Docker caches each layer. If your code changes but your dependencies don't, Docker reuses the cached dependency layer—making rebuilds much faster.
Building and Running
Build the image and run a container from it:
The -t taskforge flag gives your image a name (tag). The . tells Docker to use the current directory as the build context (where to find the Dockerfile and files to copy).
Volumes: Persisting Data
Containers are ephemeral—when you stop a container, any data it created inside is lost. This is a problem for TaskForge's SQLite database. Volumes mount a host directory into the container so data survives restarts:
This maps ./data on your host to /app/data inside the container. The SQLite database file lives in this shared directory, so it persists even when the container stops.
Docker Compose Basics
When your project needs multiple services (a web API, a database, etc.), Docker Compose manages them all with a single file:
# docker-compose.yml
services:
web:
build: .
ports:
- "5000:5000"
volumes:
- ./data:/app/data
command: ["python", "-m", "flask", "run", "--host=0.0.0.0"]
Start everything with one command:
The ports mapping "5000:5000" means: forward port 5000 on your host to port 5000 in the container. This is how you access the Flask API from your browser.
Essential Commands
| Command | What It Does |
|---|---|
docker build -t name . | Build an image from a Dockerfile |
docker run -it name | Run a container interactively |
docker ps | List running containers |
docker stop id | Stop a running container |
docker rm id | Remove a stopped container |
docker logs id | View container output/logs |
docker exec -it id bash | Open a shell inside a running container |
.dockerignore
Just like .gitignore tells git which files to skip, .dockerignore tells Docker which files to exclude from the build context. This keeps images small and avoids copying sensitive data:
.venv
__pycache__
.git
.env
*.pyc
.pytest_cache
Common Misconceptions
Docker is just as valuable during development. It ensures every developer on the team has the same environment—same Python version, same library versions, same system tools. No more "it works on my machine but not yours."
Containers share the host OS kernel. They don't emulate hardware or boot a full OS. This is why containers start in seconds while VMs take minutes, and why a container image is megabytes while a VM image is gigabytes.
TaskForge Connection
Containerize TaskForge. Write a Dockerfile, build an image, and run it. Verify that TaskForge works the same inside the container as it does on your host machine. This is the foundation for running Claude Code agents in Docker in Phase 6.
Micro-Exercises
Run docker run -it python:3.12 python3 and execute print('Hello from Docker!') inside the container. Type exit() to leave.
Write a Dockerfile for TaskForge and build it with docker build -t taskforge .. Verify with docker images that the image appears.
Create a docker-compose.yml that runs TaskForge's Flask API on port 5000 with a volume mount for the SQLite database:
Start it with docker compose up, then verify from another terminal:
Verification: TaskForge runs inside Docker, accessible from your host machine. Data persists across docker compose down and docker compose up.
If this doesn't work: (1) Port conflict → change "5000:5000" to "5001:5000" and use curl http://localhost:5001/tasks. (2) permission denied → make sure Docker Desktop is running. (3) Build fails → check that requirements.txt exists in your project root.
Interactive Exercises
Knowledge Check
What does FROM python:3.12 do in a Dockerfile?
Knowledge Check
What does -v $(pwd):/app do in a docker run command?
Docker Workflow
Chapter 17 CI/CD and GitHub Actions
Every push to GitHub can automatically run your tests, check your code, and even deploy your app. CI/CD turns manual quality checks into automated pipelines. In Phase 5, you'll use CI-triggered agents that depend on this infrastructure.
What Is CI/CD?
CI (Continuous Integration) means automatically running tests every time code is pushed to a repository. Instead of remembering to run pytest before merging, the system does it for you—every single time, without fail.
CD (Continuous Deployment/Delivery) takes it further: after tests pass, the code is automatically deployed to production (deployment) or packaged and ready for a one-click release (delivery).
The core idea is simple: catch bugs before they reach production. If a test fails, the pipeline stops and notifies you. No broken code gets deployed. No "I forgot to run tests" disasters.
| Term | What It Means | When It Runs |
|---|---|---|
| CI | Run tests and checks automatically | On every push or pull request |
| CD (Delivery) | Package code, ready for manual deploy | After CI passes on main branch |
| CD (Deployment) | Automatically deploy to production | After CI passes on main branch |
GitHub Actions Basics
GitHub Actions is GitHub's built-in CI/CD system. It's free for public repos and has generous free-tier minutes for private repos. The key concepts:
| Concept | What It Is | Analogy |
|---|---|---|
| Workflow | A YAML file defining your automation pipeline | A recipe with ordered steps |
| Trigger | The event that starts a workflow | The "go" signal |
| Job | A set of steps that run on the same machine | One cook at one station |
| Step | A single command or action within a job | One instruction in the recipe |
Workflows live in .github/workflows/ inside your repository. Each workflow is a YAML file (a human-readable data format, similar to JSON but with indentation instead of braces).
Common triggers:
push— runs when code is pushed to specified branchespull_request— runs when a PR is opened or updatedschedule— runs on a cron schedule (e.g., nightly tests)
Your First Workflow
Here's a complete, working GitHub Actions workflow that runs your tests on every push and pull request:
name: Run Tests
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: pip install -r requirements.txt
- run: python -m pytest
Let's walk through each line:
| Line | What It Does |
|---|---|
name: Run Tests | A human-readable name that appears in the GitHub Actions UI |
on: push: branches: [main] | Trigger this workflow when code is pushed to the main branch |
on: pull_request: branches: [main] | Also trigger when a PR targets main |
jobs: test: | Define a job called test |
runs-on: ubuntu-latest | Run on a fresh Ubuntu virtual machine (GitHub provides this for free) |
uses: actions/checkout@v4 | Check out your repository code into the VM—without this, the VM is empty |
uses: actions/setup-python@v5 | Install the specified Python version on the VM |
with: python-version: '3.12' | Use Python 3.12 specifically |
run: pip install -r requirements.txt | Install your project's dependencies |
run: python -m pytest | Run your test suite—if any test fails, the workflow fails |
Save this as .github/workflows/test.yml in your repository. Push to GitHub, and the workflow runs automatically.
Adding More Checks
Tests alone aren't enough. Professional pipelines also check code quality:
Linting with ruff
Linting catches style issues, unused imports, and potential bugs without running the code. Add a linting step to your workflow:
- run: pip install ruff
- run: ruff check .
ruff is fast and catches many common Python mistakes. An alternative is flake8, which has been the standard for years.
Type Checking
If you use type hints in your Python code, tools like mypy can check them in CI. This is more advanced—for now, know that it exists and can be added as another step.
Matrix Strategy: Multiple Python Versions
Your code might work on Python 3.12 but break on 3.11. A matrix strategy tests against multiple versions in parallel:
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.11', '3.12', '3.13']
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- run: pip install -r requirements.txt
- run: python -m pytest
This creates three parallel jobs, one for each Python version. If any fails, you know exactly which version has the problem.
Status Badges
Add a badge to your README.md that shows whether your tests are passing:

Replace YOUR-USERNAME and YOUR-REPO with your actual GitHub username and repository name. The badge turns green when tests pass and red when they fail—visible to anyone viewing the repo.
Secrets Management
Some workflows need API keys or tokens (e.g., for deployment or calling external services). Never hardcode credentials in your workflow files or source code.
GitHub provides Secrets—encrypted variables stored in your repository settings. To add one: go to your repo on GitHub, navigate to Settings, then Secrets and variables, then Actions, and click "New repository secret."
Reference secrets in your workflow with the ${{ secrets.NAME }} syntax:
- run: python deploy.py
env:
API_KEY: ${{ secrets.API_KEY }}
DATABASE_URL: ${{ secrets.DATABASE_URL }}
GitHub automatically masks secrets in logs, but avoid echo $API_KEY or print(os.environ["API_KEY"]) in your workflow steps. If a secret leaks, rotate it immediately.
When Things Fail
Your workflow will fail. This is normal—it's doing its job by catching problems. Here's how to debug:
Reading the logs: On GitHub, go to the Actions tab, click the failed workflow run, click the failed job, and expand the failed step. The log shows the exact error and line number.
Common failures:
| Symptom | Likely Cause | Fix |
|---|---|---|
ModuleNotFoundError | Missing dependency | Add the package to requirements.txt |
| Wrong Python version errors | Code uses features from a newer Python | Match the python-version in your workflow to your development version |
| Tests pass locally but fail in CI | Different OS, missing env vars, or hardcoded paths | Use os.path.join() instead of hardcoded paths; check all env vars are set |
Permission denied | Script not executable or writing to protected path | Add chmod +x step or write to a writable directory |
Common Misconceptions
Even solo projects benefit from automated tests. You'll forget to run tests before pushing. You'll break something you didn't realize was connected. CI catches these mistakes automatically—it's a safety net for teams of one just as much as teams of fifty.
CI runs on a different operating system (usually Ubuntu), a potentially different Python version, and a clean environment with no leftover state. Environment variables you set locally don't exist in CI. Files outside your repo don't exist. This is exactly the point—CI proves your code works in a clean environment, not just on your machine.
TaskForge Connection
Add a GitHub Actions workflow to TaskForge that runs pytest on every push and pull request. Save it as .github/workflows/test.yml in your TaskForge repository. After pushing, visit the Actions tab on GitHub to see your tests run automatically. This is the same CI infrastructure that CI-triggered agents will use in Phase 5.
Micro-Exercises
Create the file .github/workflows/test.yml in your TaskForge repository with the following content:
name: Run Tests
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: pip install -r requirements.txt
- run: python -m pytest
Add a step to your workflow that checks code formatting with ruff. Insert these lines after the pip install step and before the pytest step:
- run: pip install ruff
- run: ruff check .
Push TaskForge to GitHub (from Ch 12), then push the .github/workflows/test.yml you created. Go to the Actions tab on GitHub and watch the workflow run. Make a test fail intentionally, push again, and see the red X.
Verification: The Actions tab shows a green checkmark on your first push. After breaking a test and pushing again, it shows a red X. You can see the test output in the logs.
If this doesn't work: (1) Workflow doesn't trigger → check the on: section matches your branch name (e.g., main vs master). (2) Python not found → ensure the actions/setup-python step is included. (3) Dependencies missing → ensure pip install -r requirements.txt runs before tests. (4) No Actions tab → make sure the repo is on GitHub, not just local.
Interactive Exercises
Knowledge Check
What does on: push mean in a GitHub Actions workflow?
Knowledge Check
What does runs-on: ubuntu-latest specify?
CI/CD Setup
Chapter 18 Project Architecture
You can now build APIs, persist data, containerize applications, and run automated tests. But as projects grow, the way you organize code matters as much as the code itself. Poor architecture turns a working project into an unmaintainable one. Good architecture lets you—and AI coding tools—navigate, extend, and refactor with confidence.
Why Architecture Matters
When TaskForge was a single file, you could hold the entire program in your head. Now it has a Flask API, database functions, tests, a Dockerfile, and a CI pipeline. If all of that lived in one file, you'd spend more time scrolling than coding.
Architecture is how you organize code into files, folders, and modules so that each piece has a clear purpose. Good architecture provides three things:
- Findability—you can locate any piece of functionality quickly.
- Changeability—you can modify one part without breaking others.
- Readability—a new developer (or an AI agent reading your codebase) can understand the structure without reading every line.
Separation of Concerns
Separation of concerns means each file or module handles one responsibility. The API layer handles HTTP requests and responses. The database layer handles data storage and retrieval. The business logic handles the rules of your application. When these concerns are separated, a change to how you store data doesn't require changes to your API routes, and vice versa.
Think of it like a restaurant. The waiter takes orders (API layer), the kitchen cooks food (business logic), and the pantry stores ingredients (data layer). If you reorganize the pantry, the waiter doesn't need retraining.
Common Patterns: From Flat Scripts to Packages
Projects evolve through predictable stages as they grow:
Stage 1: Single File
Everything in one file. Fine for scripts under 100 lines.
# taskforge.py — everything in one file
tasks = []
def add_task(title):
tasks.append({"title": title, "done": False})
def list_tasks():
return tasks
if __name__ == "__main__":
add_task("Buy groceries")
print(list_tasks())
Stage 2: Multiple Modules
Split by responsibility. Each file does one thing.
# db.py — database operations
# api.py — Flask routes
# cli.py — command-line interface
# models.py — data structures
Stage 3: Package with Directory Structure
Group related modules into directories. Add configuration, tests, and infrastructure files at the top level.
Directory Structure Conventions
Here is a conventional Python project layout. You don't need to memorize it—just know the pattern exists so you can recognize it and follow it.
Key conventions:
- Source code lives in a package directory (same name as the project) with an
__init__.pyfile. - Tests live in a separate
tests/directory, mirroring the source structure. - Configuration files (
Dockerfile,requirements.txt,.gitignore) live at the project root. - CI workflows live in
.github/workflows/.
Configuration Management
Applications need configuration: database paths, API keys, debug flags. Hardcoding these values into your source code is fragile and insecure. Two approaches:
Environment Variables
Set configuration outside the code, in the environment where the program runs:
import os
DB_PATH = os.environ.get("TASKFORGE_DB", "taskforge.db")
DEBUG = os.environ.get("FLASK_DEBUG", "0") == "1"
os.environ.get() reads an environment variable with a fallback default. This means the same code works in development (using the default) and in production (using the environment variable set by Docker or CI).
The .env File
During development, you can store environment variables in a .env file at the project root:
# .env — local development configuration
TASKFORGE_DB=taskforge.db
FLASK_DEBUG=1
Add .env to your .gitignore so it never gets committed. Each developer has their own local .env. Production uses real environment variables set by the deployment system.
API keys, database passwords, and tokens must never appear in source code or be committed to git. Use environment variables or secrets management (Chapter 17). If you accidentally commit a secret, rotate it immediately—even after deleting the file, the secret exists in git history forever.
The TaskForge Evolution
Here's how TaskForge grew across the phases, and why each structural change was made:
| Phase | Structure | Why |
|---|---|---|
| Phase 2 | Single file: taskforge.py | Everything in one place while learning functions, classes, and data structures. |
| Phase 3 | Added .gitignore, pushed to GitHub | Version control and collaboration require separating tracked from untracked files. |
| Phase 4 | Multiple files: api.py, db.py, cli.py, tests/, Dockerfile, .github/workflows/ | Each new capability (API, database, containers, CI) needs its own file. Mixing them would make every file enormous and every change risky. |
The progression was natural: you split code when a file gets too long, when two parts change for different reasons, or when you need to test pieces independently. You never reorganize for the sake of it—you reorganize when the current structure creates friction.
When to Refactor
Refactoring means restructuring code without changing what it does. You refactor when:
- A file is too long to navigate (rough threshold: 200+ lines).
- Two parts of the code change for different reasons (e.g., API routes and database queries).
- You find yourself scrolling past large blocks of code to reach the part you need.
- Tests are hard to write because everything is tangled together.
You do not refactor when:
- The current structure works and is easy to understand.
- You're adding a feature and reorganizing at the same time (do one at a time).
- You're copying a "best practice" structure for a 50-line script. Simple code deserves simple structure.
You don't. Start with the simplest structure that works. Reorganize when you feel friction—not before. Over-engineering a directory structure for a 100-line script is wasted effort. Architecture should emerge from real needs, not theoretical ideals.
Common Misconceptions
Splitting a 30-line module into 6 files of 5 lines each makes the project harder to understand, not easier. Each file should contain a meaningful, cohesive unit of functionality. If you can't describe what a file does in one sentence, it's either too big (split it) or too vague (merge it with something related).
TaskForge Connection
Take the TaskForge code you've built across Phases 2–4 and reorganize it into the conventional structure shown in the diagram above. Create a taskforge/ package directory with __init__.py, api.py, db.py, and cli.py. Move tests into a tests/ directory. Keep Dockerfile, docker-compose.yml, requirements.txt, and .github/workflows/ at the project root. Verify that python -m pytest and docker compose up still work after the reorganization.
Micro-Exercises
Look at the TaskForge code you've written so far. Identify which lines belong to the API layer, the database layer, and the CLI layer. Write a comment next to each section marking its layer.
# In your current taskforge files, label sections like this:
# --- API LAYER ---
@app.route("/tasks", methods=["GET"])
def get_tasks():
...
# --- DATABASE LAYER ---
def init_db():
conn = sqlite3.connect(DB_PATH)
...
# --- CLI LAYER ---
if __name__ == "__main__":
import sys
command = sys.argv[1]
...
If you find all three layers in one file, that's a sign it's ready to be split.
Create the directory structure for TaskForge. You don't need to move code yet—just create the empty files:
Run find . -name "*.py" to verify the structure looks correct.
Reorganize TaskForge into the package structure. Move database functions into taskforge/db.py, Flask routes into taskforge/api.py, and CLI logic into taskforge/cli.py. Update imports so everything still works.
Verification: Run python -m pytest from the project root—all tests pass. Run flask --app taskforge.api run and test with curl http://localhost:5000/tasks—the API works. Run docker compose up—the container starts successfully.
If this doesn't work: (1) ModuleNotFoundError: No module named 'taskforge' → make sure taskforge/__init__.py exists and you're running commands from the project root. (2) Circular imports → make sure db.py doesn't import from api.py. Dependencies should flow one direction: api.py imports from db.py, not the reverse. (3) Tests can't find modules → run pip install -e . to install your package in editable mode, or use python -m pytest from the root directory.
Interactive Exercises
Knowledge Check
What is separation of concerns?
Design Challenge: "Refactor the Monolith"
This 30-line script mixes input parsing, validation, and output formatting. Refactor it into 3 functions: parse_input(raw), validate(data), and format_output(data). The main logic should just call these 3 functions.
parse_input should only split and strip — no validation.
validate should check the data dict and return error string or None.
format_output takes a validated dict and returns the formatted string.
Architecture Review
Phase 4 Gate Checkpoint & TaskForge Full Stack
Minimum Competency
Build a REST API with Flask, persist data with SQLite, containerize with Docker, and run automated tests via CI. Understand project directory structure and separation of concerns.
Your Artifact
TaskForge with: Flask REST API (GET/POST/PUT endpoints), SQLite database persistence, a Dockerfile, a docker-compose.yml, a GitHub Actions workflow that runs tests on push, and a clean project structure.
Verification
curl http://localhost:5000/tasks returns JSON. Data persists across server restarts. docker compose up runs TaskForge. GitHub Actions shows green checkmark.
If your API returns HTML instead of JSON, or data disappears on restart, or Docker build fails → return to the specific chapter covering that topic.
TaskForge Checkpoint
TaskForge is now a full-stack application: CLI, REST API, database, container, CI pipeline. It's ready for data structures and algorithms in Phase 5.
What You Can Now Do
- Consume and build REST APIs
- Persist data with SQL databases
- Containerize applications with Docker
- Automate testing with CI/CD pipelines
- Structure projects for maintainability
You now have a full-stack application with professional infrastructure. Phase 5 dives into the data structures and algorithms that power everything you've built—how lists, hash tables, and trees work under the hood, and how to measure and compare their performance. Understanding these foundations makes you a stronger engineer, whether you're writing code by hand or evaluating AI-generated solutions.
Bridge Exercise: Human Code vs AI Code
You've been building TaskForge by hand. Now compare your approach to an AI-generated version of the same feature. This is exactly the skill you'll practice throughout Phase 5.
Spot the Differences
Below are two implementations of a get_task_stats function. One was written by a human following the patterns from this guide. The other was AI-generated. Read both, then write a function evaluate() that returns a dictionary with three keys: "missing_validation" (which version skips input validation—"human" or "ai"), "over_engineered" (which version adds unnecessary complexity), and "better_for_production" (your judgment).
Look at what each function assumes about its input. Does the human version handle missing keys? Does the AI version add fields nobody requested?
The human version uses t["status"] (crashes on missing key) and assumes only two states. The AI version imports datetime and adds a timestamp nobody asked for.