Python Web Scraping Example (BeautifulSoup)
This beginner-friendly example shows how to download a web page, parse its HTML with BeautifulSoup, and extract simple data like the page title and links.
The goal is to help you build a small working scraper. This page focuses on a practical example, not every detail of web scraping.
Quick example
import requests
from bs4 import BeautifulSoup
url = "https://example.com"
response = requests.get(url, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.text, "html.parser")
print("Page title:", soup.title.string)
for link in soup.find_all("a"):
print(link.get("href"))
Note: This is a minimal working example. You need to install requests and beautifulsoup4 first.
What this example does
This script:
- Downloads HTML from a web page
- Parses the HTML with BeautifulSoup
- Gets the page title
- Finds all link tags
- Prints each link URL
What you need before running it
Before you run the example, make sure you have:
- Python installed
- A basic understanding of
import - The
requestspackage installed - The
beautifulsoup4package installed - Internet access for the example URL
If you need help installing packages, see how to install a Python package with pip.
Install the required packages
You can install both packages with pip:
pip install requests beautifulsoup4
If that does not work, try:
python -m pip install requests beautifulsoup4
What each package does:
requestsdownloads the web pageBeautifulSoupreads the HTML structure so you can search through tags and attributes
Useful commands for checking your setup:
python --version
pip show requests
pip show beautifulsoup4
Step-by-step code breakdown
Here is the same example again:
import requests
from bs4 import BeautifulSoup
url = "https://example.com"
response = requests.get(url, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.text, "html.parser")
print("Page title:", soup.title.string)
for link in soup.find_all("a"):
print(link.get("href"))
1. Set the target URL
url = "https://example.com"
This is the page you want to scrape.
2. Fetch the page with requests.get()
response = requests.get(url, timeout=10)
This sends an HTTP request and downloads the page.
urlis the page addresstimeout=10means Python will stop waiting after 10 seconds
If you are new to HTTP requests, how to make an API request in Python explains the basic idea.
3. Stop on HTTP errors
response.raise_for_status()
This is important.
It raises an error if the page could not be fetched correctly, such as:
404 Not Found403 Forbidden500 Internal Server Error
Without this line, your code may continue with a bad response and give confusing results later.
4. Parse the HTML
soup = BeautifulSoup(response.text, "html.parser")
response.textis the HTML content as a string"html.parser"tells BeautifulSoup which parser to use
After this, soup becomes an object you can search.
5. Get the page title
print("Page title:", soup.title.string)
This tries to find the <title> tag and print its text.
For example, if the HTML contains:
<title>Example Domain</title>
The output will be:
Page title: Example Domain
6. Find all links
for link in soup.find_all("a"):
print(link.get("href"))
This finds every <a> tag on the page.
Then it prints the value of the href attribute for each one.
Using link.get("href") is safer than:
link["href"]
Why?
link.get("href")returnsNoneifhrefis missinglink["href"]raises an error ifhrefis missing
Expected output
The exact output depends on the page, but it will usually look something like this:
Page title: Example Domain
https://www.iana.org/domains/example
Keep in mind:
- The page title is printed first
- Each link URL is printed on a new line
- Some links may be relative paths like
/about - Some
hrefvalues may beNone
Beginner-friendly improvements
The first example is intentionally small. Here is a slightly better version that skips missing links and converts relative URLs into full URLs.
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
url = "https://example.com"
response = requests.get(url, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.text, "html.parser")
title = soup.title.string if soup.title else "No title found"
print("Page title:", title)
links = []
for link in soup.find_all("a"):
href = link.get("href")
text = link.get_text(strip=True)
if href is None:
continue
full_url = urljoin(url, href)
links.append((text, full_url))
for text, full_url in links:
print(f"Link text: {text or '[no text]'}")
print(f"URL: {full_url}")
print("-" * 20)
This improved version:
- Skips links where
hrefisNone - Converts relative links into full URLs with
urljoin - Extracts link text with
get_text() - Stores results in a list
- Prints cleaner output
Common problems when scraping
Web scraping often works well on simple pages, but there are common problems.
The site blocks requests from scripts
Some websites do not allow automated scraping. They may return errors like 403 Forbidden.
The page needs JavaScript before content appears
BeautifulSoup only parses the HTML you give it. It does not run JavaScript.
If the site loads data after the page opens in the browser, your scraper may see very little content.
The HTML structure is different than expected
You may expect a title, a link, or a certain tag, but the page may not contain it.
This can lead to errors such as AttributeError: object has no attribute.
The request fails because of a bad URL or timeout
A typo in the URL or a slow website can cause the request to fail.
The scraper breaks when the site layout changes
If the website changes its HTML, your scraper may stop finding the tags you need.
Important beginner note about legality and ethics
Before scraping any website, be careful.
- Do not scrape sites that forbid it
- Check the site's terms and
robots.txt - Do not send too many requests too quickly
- Only scrape data you are allowed to access
A small practice scraper is fine for learning, but real websites may have rules you need to follow.
When this example is enough and when it is not
This example is a good starting point when:
- You are scraping a simple static HTML page
- You want to learn how tags, attributes, and parsing work
- You want a small script that gets titles and links
This example is not enough when:
- The site requires login
- The site depends heavily on JavaScript
- You need a large production scraper
- You need advanced retry logic, rate limiting, or data storage
Common mistakes
Here are some problems beginners often hit with this kind of script.
ModuleNotFoundError
You may see an error because requests or bs4 is not installed.
Try:
pip install requests beautifulsoup4
Or:
python -m pip install requests beautifulsoup4
If needed, see how to fix ModuleNotFoundError: No module named X.
AttributeError when a tag does not exist
This can happen if you write code like:
print(soup.title.string)
but the page has no <title> tag.
A safer version is:
title = soup.title.string if soup.title else "No title found"
print(title)
TypeError or bad output from missing href values
Some <a> tags do not have href.
This is why link.get("href") is safer than link["href"].
HTTP errors like 403 or 404
These happen when:
- The page does not exist
- The site blocks the request
- The URL is wrong
Using response.raise_for_status() helps catch this early.
Empty results because content is loaded with JavaScript
If your browser shows content but your script does not, the page may be using JavaScript to load data after the initial HTML response.
In that case, BeautifulSoup alone may not be enough.
FAQ
What is BeautifulSoup used for?
It parses HTML or XML so you can find tags, attributes, and text more easily in Python.
Do I need requests to use BeautifulSoup?
Not always, but beginners often use requests to download the page and BeautifulSoup to parse it.
Why does my scraper return nothing?
The page may use JavaScript, the selectors may be wrong, or the request may have failed.
Why do some links print None?
Some anchor tags do not have an href attribute, so link.get("href") returns None.
Can BeautifulSoup scrape JavaScript-rendered websites?
It can parse the HTML you give it, but it does not run JavaScript by itself.
See also
- How to install a Python package with pip
- How to make an API request in Python
- ModuleNotFoundError: No module named X fix
- AttributeError: object has no attribute fix
- Python simple web scraper for titles example
Try this scraper on a simple practice page first. After that, move to a smaller focused example, such as scraping only page titles or saving the results to a file with the Python open() function.