Python Text Analysis Script Example

This beginner-friendly example shows how to analyze text in Python.

You will build a small script that:

Counts characters
Counts words
Counts lines
Counts how often each word appears

It is a good practice project because it uses basic Python tools in a real script: strings, loops, dictionaries, and printing results clearly.

Quick example

Use this small script if you want a fast example of basic text analysis without reading from a file.

text = "Python is simple. Python is useful."

words = text.lower().replace(".", "").split()
print("Characters:", len(text))
print("Words:", len(words))
print("Lines:", len(text.splitlines()))

counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1

print("Word counts:", counts)

What this script does

This example is useful because it stays small, but still teaches real Python skills.

It shows how to:

Use a string as input text
Analyze text by counting characters, words, and lines
Count repeated words with a dictionary
Build a simple script that you can improve later

If you are new to strings, see Python strings explained: basics and examples.

Skills you practice

By working through this example, you practice:

Working with strings
Using split() to break text into words
Looping through data with for
Storing counts in a dictionary
Printing readable results

If dictionaries are new to you, read Python dictionaries explained.

Basic version: count lines, words, and characters

Start with a simple script that counts the text size in different ways.

text = """Python is simple.
Python is useful.
Python is fun to learn."""

character_count = len(text)
word_count = len(text.split())
line_count = len(text.splitlines())

print("Characters:", character_count)
print("Words:", word_count)
print("Lines:", line_count)

How it works

len(text) counts every character in the string
text.split() breaks the text into words
len(text.split()) gives the number of words
text.splitlines() breaks the text into lines
len(text.splitlines()) gives the number of lines

If you want a closer look at len(), see Python len() function explained.

Expected output

Characters: 57
Words: 11
Lines: 3

The exact character count depends on the text, including spaces and punctuation.

How word counting works

Now let’s extend the script to count how many times each word appears.

text = "Python is simple. Python is useful. Python is fun."

cleaned_text = text.lower().replace(".", "")
words = cleaned_text.split()

counts = {}

for word in words:
    counts[word] = counts.get(word, 0) + 1

print("Words:", words)
print("Word counts:", counts)

Step by step

1. Convert to lowercase

text.lower()

This makes Python and python count as the same word.

2. Remove simple punctuation

text.lower().replace(".", "")

This removes periods so words like simple. become simple.

3. Split into words

words = cleaned_text.split()

This creates a list of words.

4. Create an empty dictionary

counts = {}

This dictionary will store each word and its count.

5. Loop through the words

for word in words:
    counts[word] = counts.get(word, 0) + 1

This does the counting:

counts.get(word, 0) gets the current count
If the word is not in the dictionary yet, it uses 0
Then it adds 1

Expected output

Words: ['python', 'is', 'simple', 'python', 'is', 'useful', 'python', 'is', 'fun']
Word counts: {'python': 3, 'is': 3, 'simple': 1, 'useful': 1, 'fun': 1}

Improving the script

Once the basic version works, you can make it more useful.

Common improvements:

Sort words by frequency
Ignore very common words if needed
Read text from a file instead of using a hardcoded string
Clean more punctuation with replace()
Show only the top 5 or top 10 words

Here is a simple version that sorts word counts from highest to lowest:

text = "Python is simple. Python is useful. Python is fun."

cleaned_text = text.lower().replace(".", "")
words = cleaned_text.split()

counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1

sorted_counts = sorted(counts.items(), key=lambda item: item[1], reverse=True)

print("Most common words:")
for word, count in sorted_counts:
    print(word, count)

Expected output

Most common words:
python 3
is 3
simple 1
useful 1
fun 1

You can also read text from a file and then use the same logic. For that, see How to read a file in Python.

Expected output

Your script should show results like these:

Total characters
Total words
Total lines
A dictionary or sorted list of word counts

For example:

Characters: 42
Words: 7
Lines: 1
Word counts: {'python': 2, 'is': 2, 'simple': 1, 'and': 1, 'useful': 1}

Beginner debugging tips

If your result looks wrong, check the data at each step.

Useful debug prints:

print(text)
print(text.split())
print(text.lower())
print(words)
print(counts)

These help you see:

The original text
How split() is breaking the text
Whether lowercase conversion worked
What is inside the words list
Whether the dictionary is counting correctly

Good things to check:

Print the cleaned text before counting
Print the words list to confirm the split result
Print the dictionary after the loop
Check punctuation if the counts look strange
Check uppercase and lowercase words if duplicates appear

If you want more practice with dictionaries, see how to loop through a dictionary in Python.

Common mistakes

These are some common problems beginners run into:

Forgetting to lowercase text before counting words
Not removing punctuation, which creates different versions of the same word
Using split(' ') instead of split(), which can behave badly with extra spaces
Trying to count words before converting non-string data to text
Expecting perfect natural language analysis from a simple script

A beginner script like this is great for learning, but it is still simple. Real text analysis usually needs better punctuation handling and more advanced cleaning.

FAQ

Does this script count punctuation as characters?

Yes. len(text) counts all characters in the string, including spaces and punctuation.

Why use `lower()` before counting words?

It makes words like Python and python count as the same word.

Can I analyze a text file instead of a string?

Yes. Read the file into a string first, then use the same counting steps.

Is `split()` enough for real text analysis?

It is enough for a beginner example, but more advanced text analysis needs better text cleaning.