Python Text Analysis Script Example

This beginner-friendly example shows how to analyze text in Python.

You will build a small script that:

  • Counts characters
  • Counts words
  • Counts lines
  • Counts how often each word appears

It is a good practice project because it uses basic Python tools in a real script: strings, loops, dictionaries, and printing results clearly.

Quick example

Use this small script if you want a fast example of basic text analysis without reading from a file.

text = "Python is simple. Python is useful."

words = text.lower().replace(".", "").split()
print("Characters:", len(text))
print("Words:", len(words))
print("Lines:", len(text.splitlines()))

counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1

print("Word counts:", counts)

What this script does

This example is useful because it stays small, but still teaches real Python skills.

It shows how to:

  • Use a string as input text
  • Analyze text by counting characters, words, and lines
  • Count repeated words with a dictionary
  • Build a simple script that you can improve later

If you are new to strings, see Python strings explained: basics and examples.

Skills you practice

By working through this example, you practice:

  • Working with strings
  • Using split() to break text into words
  • Looping through data with for
  • Storing counts in a dictionary
  • Printing readable results

If dictionaries are new to you, read Python dictionaries explained.

Basic version: count lines, words, and characters

Start with a simple script that counts the text size in different ways.

text = """Python is simple.
Python is useful.
Python is fun to learn."""

character_count = len(text)
word_count = len(text.split())
line_count = len(text.splitlines())

print("Characters:", character_count)
print("Words:", word_count)
print("Lines:", line_count)

How it works

  • len(text) counts every character in the string
  • text.split() breaks the text into words
  • len(text.split()) gives the number of words
  • text.splitlines() breaks the text into lines
  • len(text.splitlines()) gives the number of lines

If you want a closer look at len(), see Python len() function explained.

Expected output

Characters: 57
Words: 11
Lines: 3

The exact character count depends on the text, including spaces and punctuation.

How word counting works

Now let’s extend the script to count how many times each word appears.

text = "Python is simple. Python is useful. Python is fun."

cleaned_text = text.lower().replace(".", "")
words = cleaned_text.split()

counts = {}

for word in words:
    counts[word] = counts.get(word, 0) + 1

print("Words:", words)
print("Word counts:", counts)

Step by step

1. Convert to lowercase

text.lower()

This makes Python and python count as the same word.

2. Remove simple punctuation

text.lower().replace(".", "")

This removes periods so words like simple. become simple.

3. Split into words

words = cleaned_text.split()

This creates a list of words.

4. Create an empty dictionary

counts = {}

This dictionary will store each word and its count.

5. Loop through the words

for word in words:
    counts[word] = counts.get(word, 0) + 1

This does the counting:

  • counts.get(word, 0) gets the current count
  • If the word is not in the dictionary yet, it uses 0
  • Then it adds 1

Expected output

Words: ['python', 'is', 'simple', 'python', 'is', 'useful', 'python', 'is', 'fun']
Word counts: {'python': 3, 'is': 3, 'simple': 1, 'useful': 1, 'fun': 1}

Improving the script

Once the basic version works, you can make it more useful.

Common improvements:

  • Sort words by frequency
  • Ignore very common words if needed
  • Read text from a file instead of using a hardcoded string
  • Clean more punctuation with replace()
  • Show only the top 5 or top 10 words

Here is a simple version that sorts word counts from highest to lowest:

text = "Python is simple. Python is useful. Python is fun."

cleaned_text = text.lower().replace(".", "")
words = cleaned_text.split()

counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1

sorted_counts = sorted(counts.items(), key=lambda item: item[1], reverse=True)

print("Most common words:")
for word, count in sorted_counts:
    print(word, count)

Expected output

Most common words:
python 3
is 3
simple 1
useful 1
fun 1

You can also read text from a file and then use the same logic. For that, see How to read a file in Python.

Expected output

Your script should show results like these:

  • Total characters
  • Total words
  • Total lines
  • A dictionary or sorted list of word counts

For example:

Characters: 42
Words: 7
Lines: 1
Word counts: {'python': 2, 'is': 2, 'simple': 1, 'and': 1, 'useful': 1}

Beginner debugging tips

If your result looks wrong, check the data at each step.

Useful debug prints:

print(text)
print(text.split())
print(text.lower())
print(words)
print(counts)

These help you see:

  • The original text
  • How split() is breaking the text
  • Whether lowercase conversion worked
  • What is inside the words list
  • Whether the dictionary is counting correctly

Good things to check:

  • Print the cleaned text before counting
  • Print the words list to confirm the split result
  • Print the dictionary after the loop
  • Check punctuation if the counts look strange
  • Check uppercase and lowercase words if duplicates appear

If you want more practice with dictionaries, see how to loop through a dictionary in Python.

Common mistakes

These are some common problems beginners run into:

  • Forgetting to lowercase text before counting words
  • Not removing punctuation, which creates different versions of the same word
  • Using split(' ') instead of split(), which can behave badly with extra spaces
  • Trying to count words before converting non-string data to text
  • Expecting perfect natural language analysis from a simple script

A beginner script like this is great for learning, but it is still simple. Real text analysis usually needs better punctuation handling and more advanced cleaning.

FAQ

Does this script count punctuation as characters?

Yes. len(text) counts all characters in the string, including spaces and punctuation.

Why use lower() before counting words?

It makes words like Python and python count as the same word.

Can I analyze a text file instead of a string?

Yes. Read the file into a string first, then use the same counting steps.

Is split() enough for real text analysis?

It is enough for a beginner example, but more advanced text analysis needs better text cleaning.

See also

Try the same script with file input next, then extend it so it shows the most common words first.