Python Text Analysis Script Example

This beginner-friendly example shows how to analyze text in Python.

You will build a small script that:

Counts characters
Counts words
Counts lines
Counts how often each word appears

It is a good practice project because it uses basic Python tools in a real script: strings, loops, dictionaries, and printing results clearly.

Quick example #

Use this small script if you want a fast example of basic text analysis without reading from a file.

text = "Python is simple. Python is useful."

words = text.lower().replace(".", "").split()
print("Characters:", len(text))
print("Words:", len(words))
print("Lines:", len(text.splitlines()))

counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1

print("Word counts:", counts)

What this script does #

This example is useful because it stays small, but still teaches real Python skills.

It shows how to:

Use a string as input text
Analyze text by counting characters, words, and lines
Count repeated words with a dictionary
Build a simple script that you can improve later

If you are new to strings, see Python strings explained: basics and examples.

Skills you practice #

By working through this example, you practice:

Working with strings
Using split() to break text into words
Looping through data with for
Storing counts in a dictionary
Printing readable results

If dictionaries are new to you, read Python dictionaries explained.

Basic version: count lines, words, and characters #

Start with a simple script that counts the text size in different ways.

text = """Python is simple.
Python is useful.
Python is fun to learn."""

character_count = len(text)
word_count = len(text.split())
line_count = len(text.splitlines())

print("Characters:", character_count)
print("Words:", word_count)
print("Lines:", line_count)

How it works #

len(text) counts every character in the string
text.split() breaks the text into words
len(text.split()) gives the number of words
text.splitlines() breaks the text into lines
len(text.splitlines()) gives the number of lines

If you want a closer look at len(), see Python len() function explained.

Expected output #

Characters: 57
Words: 11
Lines: 3

The exact character count depends on the text, including spaces and punctuation.

How word counting works #

Now let’s extend the script to count how many times each word appears.

text = "Python is simple. Python is useful. Python is fun."

cleaned_text = text.lower().replace(".", "")
words = cleaned_text.split()

counts = {}

for word in words:
    counts[word] = counts.get(word, 0) + 1

print("Words:", words)
print("Word counts:", counts)

Step by step #

1. Convert to lowercase #

text.lower()

This makes Python and python count as the same word.

2. Remove simple punctuation #

text.lower().replace(".", "")

This removes periods so words like simple. become simple.

3. Split into words #

words = cleaned_text.split()

This creates a list of words.

4. Create an empty dictionary #

counts = {}

This dictionary will store each word and its count.

5. Loop through the words #

for word in words:
    counts[word] = counts.get(word, 0) + 1

This does the counting:

counts.get(word, 0) gets the current count
If the word is not in the dictionary yet, it uses 0
Then it adds 1

Expected output #

Words: ['python', 'is', 'simple', 'python', 'is', 'useful', 'python', 'is', 'fun']
Word counts: {'python': 3, 'is': 3, 'simple': 1, 'useful': 1, 'fun': 1}

Improving the script #

Once the basic version works, you can make it more useful.

Common improvements:

Sort words by frequency
Ignore very common words if needed
Read text from a file instead of using a hardcoded string
Clean more punctuation with replace()
Show only the top 5 or top 10 words

Here is a simple version that sorts word counts from highest to lowest:

text = "Python is simple. Python is useful. Python is fun."

cleaned_text = text.lower().replace(".", "")
words = cleaned_text.split()

counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1

sorted_counts = sorted(counts.items(), key=lambda item: item[1], reverse=True)

print("Most common words:")
for word, count in sorted_counts:
    print(word, count)

Expected output #

Most common words:
python 3
is 3
simple 1
useful 1
fun 1

You can also read text from a file and then use the same logic. For that, see How to read a file in Python.

Expected output #

Your script should show results like these:

Total characters
Total words
Total lines
A dictionary or sorted list of word counts

For example:

Characters: 42
Words: 7
Lines: 1
Word counts: {'python': 2, 'is': 2, 'simple': 1, 'and': 1, 'useful': 1}

Beginner debugging tips #

If your result looks wrong, check the data at each step.

Useful debug prints:

print(text)
print(text.split())
print(text.lower())
print(words)
print(counts)

These help you see:

The original text
How split() is breaking the text
Whether lowercase conversion worked
What is inside the words list
Whether the dictionary is counting correctly

Good things to check:

Print the cleaned text before counting
Print the words list to confirm the split result
Print the dictionary after the loop
Check punctuation if the counts look strange
Check uppercase and lowercase words if duplicates appear

If you want more practice with dictionaries, see how to loop through a dictionary in Python.

Common mistakes #

These are some common problems beginners run into:

Forgetting to lowercase text before counting words
Not removing punctuation, which creates different versions of the same word
Using split(' ') instead of split(), which can behave badly with extra spaces
Trying to count words before converting non-string data to text
Expecting perfect natural language analysis from a simple script

A beginner script like this is great for learning, but it is still simple. Real text analysis usually needs better punctuation handling and more advanced cleaning.

FAQ #

Does this script count punctuation as characters? #

Yes. len(text) counts all characters in the string, including spaces and punctuation.

Why use `lower()` before counting words? #

It makes words like Python and python count as the same word.

Can I analyze a text file instead of a string? #

Yes. Read the file into a string first, then use the same counting steps.

Is `split()` enough for real text analysis? #

It is enough for a beginner example, but more advanced text analysis needs better text cleaning.

Python Text Analysis Script Example

Quick example #

What this script does #

Skills you practice #

Basic version: count lines, words, and characters #

How it works #

Expected output #

How word counting works #

Step by step #

1. Convert to lowercase #

2. Remove simple punctuation #

3. Split into words #

4. Create an empty dictionary #

5. Loop through the words #

Expected output #

Improving the script #

Expected output #

Expected output #

Beginner debugging tips #

Common mistakes #

FAQ #

Does this script count punctuation as characters? #

Why use lower() before counting words? #

Can I analyze a text file instead of a string? #

Is split() enough for real text analysis? #

See also #

Why use `lower()` before counting words? #

Is `split()` enough for real text analysis? #