Python Web Scraping Example (BeautifulSoup)

This beginner-friendly example shows how to download a web page, parse its HTML with BeautifulSoup, and extract simple data like the page title and links.

The goal is to help you build a small working scraper. This page focuses on a practical example, not every detail of web scraping.

Quick example

import requests
from bs4 import BeautifulSoup

url = "https://example.com"
response = requests.get(url, timeout=10)
response.raise_for_status()

soup = BeautifulSoup(response.text, "html.parser")

print("Page title:", soup.title.string)

for link in soup.find_all("a"):
    print(link.get("href"))

Note: This is a minimal working example. You need to install requests and beautifulsoup4 first.

What this example does

This script:

  • Downloads HTML from a web page
  • Parses the HTML with BeautifulSoup
  • Gets the page title
  • Finds all link tags
  • Prints each link URL

What you need before running it

Before you run the example, make sure you have:

  • Python installed
  • A basic understanding of import
  • The requests package installed
  • The beautifulsoup4 package installed
  • Internet access for the example URL

If you need help installing packages, see how to install a Python package with pip.

Install the required packages

You can install both packages with pip:

pip install requests beautifulsoup4

If that does not work, try:

python -m pip install requests beautifulsoup4

What each package does:

  • requests downloads the web page
  • BeautifulSoup reads the HTML structure so you can search through tags and attributes

Useful commands for checking your setup:

python --version
pip show requests
pip show beautifulsoup4

Step-by-step code breakdown

Here is the same example again:

import requests
from bs4 import BeautifulSoup

url = "https://example.com"
response = requests.get(url, timeout=10)
response.raise_for_status()

soup = BeautifulSoup(response.text, "html.parser")

print("Page title:", soup.title.string)

for link in soup.find_all("a"):
    print(link.get("href"))

1. Set the target URL

url = "https://example.com"

This is the page you want to scrape.

2. Fetch the page with requests.get()

response = requests.get(url, timeout=10)

This sends an HTTP request and downloads the page.

  • url is the page address
  • timeout=10 means Python will stop waiting after 10 seconds

If you are new to HTTP requests, how to make an API request in Python explains the basic idea.

3. Stop on HTTP errors

response.raise_for_status()

This is important.

It raises an error if the page could not be fetched correctly, such as:

  • 404 Not Found
  • 403 Forbidden
  • 500 Internal Server Error

Without this line, your code may continue with a bad response and give confusing results later.

4. Parse the HTML

soup = BeautifulSoup(response.text, "html.parser")
  • response.text is the HTML content as a string
  • "html.parser" tells BeautifulSoup which parser to use

After this, soup becomes an object you can search.

5. Get the page title

print("Page title:", soup.title.string)

This tries to find the <title> tag and print its text.

For example, if the HTML contains:

<title>Example Domain</title>

The output will be:

Page title: Example Domain
for link in soup.find_all("a"):
    print(link.get("href"))

This finds every <a> tag on the page.

Then it prints the value of the href attribute for each one.

Using link.get("href") is safer than:

link["href"]

Why?

  • link.get("href") returns None if href is missing
  • link["href"] raises an error if href is missing

Expected output

The exact output depends on the page, but it will usually look something like this:

Page title: Example Domain
https://www.iana.org/domains/example

Keep in mind:

  • The page title is printed first
  • Each link URL is printed on a new line
  • Some links may be relative paths like /about
  • Some href values may be None

Beginner-friendly improvements

The first example is intentionally small. Here is a slightly better version that skips missing links and converts relative URLs into full URLs.

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

url = "https://example.com"
response = requests.get(url, timeout=10)
response.raise_for_status()

soup = BeautifulSoup(response.text, "html.parser")

title = soup.title.string if soup.title else "No title found"
print("Page title:", title)

links = []

for link in soup.find_all("a"):
    href = link.get("href")
    text = link.get_text(strip=True)

    if href is None:
        continue

    full_url = urljoin(url, href)
    links.append((text, full_url))

for text, full_url in links:
    print(f"Link text: {text or '[no text]'}")
    print(f"URL: {full_url}")
    print("-" * 20)

This improved version:

  • Skips links where href is None
  • Converts relative links into full URLs with urljoin
  • Extracts link text with get_text()
  • Stores results in a list
  • Prints cleaner output

Common problems when scraping

Web scraping often works well on simple pages, but there are common problems.

The site blocks requests from scripts

Some websites do not allow automated scraping. They may return errors like 403 Forbidden.

The page needs JavaScript before content appears

BeautifulSoup only parses the HTML you give it. It does not run JavaScript.

If the site loads data after the page opens in the browser, your scraper may see very little content.

The HTML structure is different than expected

You may expect a title, a link, or a certain tag, but the page may not contain it.

This can lead to errors such as AttributeError: object has no attribute.

The request fails because of a bad URL or timeout

A typo in the URL or a slow website can cause the request to fail.

The scraper breaks when the site layout changes

If the website changes its HTML, your scraper may stop finding the tags you need.

Important beginner note about legality and ethics

Before scraping any website, be careful.

  • Do not scrape sites that forbid it
  • Check the site's terms and robots.txt
  • Do not send too many requests too quickly
  • Only scrape data you are allowed to access

A small practice scraper is fine for learning, but real websites may have rules you need to follow.

When this example is enough and when it is not

This example is a good starting point when:

  • You are scraping a simple static HTML page
  • You want to learn how tags, attributes, and parsing work
  • You want a small script that gets titles and links

This example is not enough when:

  • The site requires login
  • The site depends heavily on JavaScript
  • You need a large production scraper
  • You need advanced retry logic, rate limiting, or data storage

Common mistakes

Here are some problems beginners often hit with this kind of script.

ModuleNotFoundError

You may see an error because requests or bs4 is not installed.

Try:

pip install requests beautifulsoup4

Or:

python -m pip install requests beautifulsoup4

If needed, see how to fix ModuleNotFoundError: No module named X.

AttributeError when a tag does not exist

This can happen if you write code like:

print(soup.title.string)

but the page has no <title> tag.

A safer version is:

title = soup.title.string if soup.title else "No title found"
print(title)

TypeError or bad output from missing href values

Some <a> tags do not have href.

This is why link.get("href") is safer than link["href"].

HTTP errors like 403 or 404

These happen when:

  • The page does not exist
  • The site blocks the request
  • The URL is wrong

Using response.raise_for_status() helps catch this early.

Empty results because content is loaded with JavaScript

If your browser shows content but your script does not, the page may be using JavaScript to load data after the initial HTML response.

In that case, BeautifulSoup alone may not be enough.

FAQ

What is BeautifulSoup used for?

It parses HTML or XML so you can find tags, attributes, and text more easily in Python.

Do I need requests to use BeautifulSoup?

Not always, but beginners often use requests to download the page and BeautifulSoup to parse it.

Why does my scraper return nothing?

The page may use JavaScript, the selectors may be wrong, or the request may have failed.

Some anchor tags do not have an href attribute, so link.get("href") returns None.

Can BeautifulSoup scrape JavaScript-rendered websites?

It can parse the HTML you give it, but it does not run JavaScript by itself.

See also

Try this scraper on a simple practice page first. After that, move to a smaller focused example, such as scraping only page titles or saving the results to a file with the Python open() function.