Python Web Scraping Example (BeautifulSoup)

This beginner-friendly example shows how to download a web page, parse its HTML with BeautifulSoup, and extract simple data like the page title and links.

The goal is to help you build a small working scraper. This page focuses on a practical example, not every detail of web scraping.

Quick example

import requests
from bs4 import BeautifulSoup

url = "https://example.com"
response = requests.get(url, timeout=10)
response.raise_for_status()

soup = BeautifulSoup(response.text, "html.parser")

print("Page title:", soup.title.string)

for link in soup.find_all("a"):
    print(link.get("href"))

Note: This is a minimal working example. You need to install requests and beautifulsoup4 first.

What this example does

This script:

Downloads HTML from a web page
Parses the HTML with BeautifulSoup
Gets the page title
Finds all link tags
Prints each link URL

What you need before running it

Before you run the example, make sure you have:

Python installed
A basic understanding of import
The requests package installed
The beautifulsoup4 package installed
Internet access for the example URL

If you need help installing packages, see how to install a Python package with pip.

Install the required packages

You can install both packages with pip:

pip install requests beautifulsoup4

If that does not work, try:

python -m pip install requests beautifulsoup4

What each package does:

requests downloads the web page
BeautifulSoup reads the HTML structure so you can search through tags and attributes

Useful commands for checking your setup:

python --version
pip show requests
pip show beautifulsoup4

Step-by-step code breakdown

Here is the same example again:

import requests
from bs4 import BeautifulSoup

url = "https://example.com"
response = requests.get(url, timeout=10)
response.raise_for_status()

soup = BeautifulSoup(response.text, "html.parser")

print("Page title:", soup.title.string)

for link in soup.find_all("a"):
    print(link.get("href"))

1. Set the target URL

url = "https://example.com"

This is the page you want to scrape.

2. Fetch the page with `requests.get()`

response = requests.get(url, timeout=10)

This sends an HTTP request and downloads the page.

url is the page address
timeout=10 means Python will stop waiting after 10 seconds

If you are new to HTTP requests, how to make an API request in Python explains the basic idea.

3. Stop on HTTP errors

response.raise_for_status()

This is important.

It raises an error if the page could not be fetched correctly, such as:

404 Not Found
403 Forbidden
500 Internal Server Error

Without this line, your code may continue with a bad response and give confusing results later.

4. Parse the HTML

soup = BeautifulSoup(response.text, "html.parser")

response.text is the HTML content as a string
"html.parser" tells BeautifulSoup which parser to use

After this, soup becomes an object you can search.

5. Get the page title

print("Page title:", soup.title.string)

This tries to find the <title> tag and print its text.

For example, if the HTML contains:

<title>Example Domain</title>

The output will be:

Page title: Example Domain

6. Find all links

for link in soup.find_all("a"):
    print(link.get("href"))

This finds every <a> tag on the page.

Then it prints the value of the href attribute for each one.

Using link.get("href") is safer than:

link["href"]

Why?

link.get("href") returns None if href is missing
link["href"] raises an error if href is missing

Expected output

The exact output depends on the page, but it will usually look something like this:

Page title: Example Domain
https://www.iana.org/domains/example

Keep in mind:

The page title is printed first
Each link URL is printed on a new line
Some links may be relative paths like /about
Some href values may be None

Beginner-friendly improvements

The first example is intentionally small. Here is a slightly better version that skips missing links and converts relative URLs into full URLs.

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

url = "https://example.com"
response = requests.get(url, timeout=10)
response.raise_for_status()

soup = BeautifulSoup(response.text, "html.parser")

title = soup.title.string if soup.title else "No title found"
print("Page title:", title)

links = []

for link in soup.find_all("a"):
    href = link.get("href")
    text = link.get_text(strip=True)

    if href is None:
        continue

    full_url = urljoin(url, href)
    links.append((text, full_url))

for text, full_url in links:
    print(f"Link text: {text or '[no text]'}")
    print(f"URL: {full_url}")
    print("-" * 20)

This improved version:

Skips links where href is None
Converts relative links into full URLs with urljoin
Extracts link text with get_text()
Stores results in a list
Prints cleaner output

Do not scrape sites that forbid it
Check the site's terms and robots.txt
Do not send too many requests too quickly
Only scrape data you are allowed to access

A small practice scraper is fine for learning, but real websites may have rules you need to follow.

When this example is enough and when it is not

This example is a good starting point when:

You are scraping a simple static HTML page
You want to learn how tags, attributes, and parsing work
You want a small script that gets titles and links

This example is not enough when:

The site requires login
The site depends heavily on JavaScript
You need a large production scraper
You need advanced retry logic, rate limiting, or data storage

Common mistakes

Here are some problems beginners often hit with this kind of script.

`ModuleNotFoundError`

You may see an error because requests or bs4 is not installed.

Try:

pip install requests beautifulsoup4

Or:

python -m pip install requests beautifulsoup4

If needed, see how to fix ModuleNotFoundError: No module named X.

`AttributeError` when a tag does not exist

This can happen if you write code like:

print(soup.title.string)

but the page has no <title> tag.

A safer version is:

title = soup.title.string if soup.title else "No title found"
print(title)

`TypeError` or bad output from missing `href` values

Some <a> tags do not have href.

This is why link.get("href") is safer than link["href"].

HTTP errors like `403` or `404`

These happen when:

The page does not exist
The site blocks the request
The URL is wrong

Using response.raise_for_status() helps catch this early.

Python Web Scraping Example (BeautifulSoup)

Quick example

What this example does

What you need before running it

Install the required packages

Step-by-step code breakdown

1. Set the target URL

2. Fetch the page with `requests.get()`

3. Stop on HTTP errors

4. Parse the HTML

5. Get the page title

6. Find all links

Expected output

Beginner-friendly improvements

Common problems when scraping

The site blocks requests from scripts

The page needs JavaScript before content appears

The HTML structure is different than expected

The request fails because of a bad URL or timeout

The scraper breaks when the site layout changes

Important beginner note about legality and ethics

When this example is enough and when it is not

Common mistakes

`ModuleNotFoundError`

`AttributeError` when a tag does not exist

`TypeError` or bad output from missing `href` values

HTTP errors like `403` or `404`

Empty results because content is loaded with JavaScript

FAQ

What is BeautifulSoup used for?

Do I need requests to use BeautifulSoup?

Why does my scraper return nothing?

Why do some links print `None`?

Can BeautifulSoup scrape JavaScript-rendered websites?

See also