Python Text Analysis Script Example

This beginner-friendly example shows how to analyze text in Python.

You will build a small script that:

  • Counts characters
  • Counts words
  • Counts lines
  • Counts how often each word appears

It is a good practice project because it uses basic Python tools in a real script: strings, loops, dictionaries, and printing results clearly.

Quick example #

Use this small script if you want a fast example of basic text analysis without reading from a file.

text = "Python is simple. Python is useful."

words = text.lower().replace(".", "").split()
print("Characters:", len(text))
print("Words:", len(words))
print("Lines:", len(text.splitlines()))

counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1

print("Word counts:", counts)

What this script does #

This example is useful because it stays small, but still teaches real Python skills.

It shows how to:

  • Use a string as input text
  • Analyze text by counting characters, words, and lines
  • Count repeated words with a dictionary
  • Build a simple script that you can improve later

If you are new to strings, see Python strings explained: basics and examples.

Skills you practice #

By working through this example, you practice:

  • Working with strings
  • Using split() to break text into words
  • Looping through data with for
  • Storing counts in a dictionary
  • Printing readable results

If dictionaries are new to you, read Python dictionaries explained.

Basic version: count lines, words, and characters #

Start with a simple script that counts the text size in different ways.

text = """Python is simple.
Python is useful.
Python is fun to learn."""

character_count = len(text)
word_count = len(text.split())
line_count = len(text.splitlines())

print("Characters:", character_count)
print("Words:", word_count)
print("Lines:", line_count)

How it works #

  • len(text) counts every character in the string
  • text.split() breaks the text into words
  • len(text.split()) gives the number of words
  • text.splitlines() breaks the text into lines
  • len(text.splitlines()) gives the number of lines

If you want a closer look at len(), see Python len() function explained.

Expected output #

Characters: 57
Words: 11
Lines: 3

The exact character count depends on the text, including spaces and punctuation.

How word counting works #

Now let’s extend the script to count how many times each word appears.

text = "Python is simple. Python is useful. Python is fun."

cleaned_text = text.lower().replace(".", "")
words = cleaned_text.split()

counts = {}

for word in words:
    counts[word] = counts.get(word, 0) + 1

print("Words:", words)
print("Word counts:", counts)

Step by step #

1. Convert to lowercase #

text.lower()

This makes Python and python count as the same word.

2. Remove simple punctuation #

text.lower().replace(".", "")

This removes periods so words like simple. become simple.

3. Split into words #

words = cleaned_text.split()

This creates a list of words.

4. Create an empty dictionary #

counts = {}

This dictionary will store each word and its count.

5. Loop through the words #

for word in words:
    counts[word] = counts.get(word, 0) + 1

This does the counting:

  • counts.get(word, 0) gets the current count
  • If the word is not in the dictionary yet, it uses 0
  • Then it adds 1

Expected output #

Words: ['python', 'is', 'simple', 'python', 'is', 'useful', 'python', 'is', 'fun']
Word counts: {'python': 3, 'is': 3, 'simple': 1, 'useful': 1, 'fun': 1}

Improving the script #

Once the basic version works, you can make it more useful.

Common improvements:

  • Sort words by frequency
  • Ignore very common words if needed
  • Read text from a file instead of using a hardcoded string
  • Clean more punctuation with replace()
  • Show only the top 5 or top 10 words

Here is a simple version that sorts word counts from highest to lowest:

text = "Python is simple. Python is useful. Python is fun."

cleaned_text = text.lower().replace(".", "")
words = cleaned_text.split()

counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1

sorted_counts = sorted(counts.items(), key=lambda item: item[1], reverse=True)

print("Most common words:")
for word, count in sorted_counts:
    print(word, count)

Expected output #

Most common words:
python 3
is 3
simple 1
useful 1
fun 1

You can also read text from a file and then use the same logic. For that, see How to read a file in Python.

Expected output #

Your script should show results like these:

  • Total characters
  • Total words
  • Total lines
  • A dictionary or sorted list of word counts

For example:

Characters: 42
Words: 7
Lines: 1
Word counts: {'python': 2, 'is': 2, 'simple': 1, 'and': 1, 'useful': 1}

Beginner debugging tips #

If your result looks wrong, check the data at each step.

Useful debug prints:

print(text)
print(text.split())
print(text.lower())
print(words)
print(counts)

These help you see:

  • The original text
  • How split() is breaking the text
  • Whether lowercase conversion worked
  • What is inside the words list
  • Whether the dictionary is counting correctly

Good things to check:

  • Print the cleaned text before counting
  • Print the words list to confirm the split result
  • Print the dictionary after the loop
  • Check punctuation if the counts look strange
  • Check uppercase and lowercase words if duplicates appear

If you want more practice with dictionaries, see how to loop through a dictionary in Python.

Common mistakes #

These are some common problems beginners run into:

  • Forgetting to lowercase text before counting words
  • Not removing punctuation, which creates different versions of the same word
  • Using split(' ') instead of split(), which can behave badly with extra spaces
  • Trying to count words before converting non-string data to text
  • Expecting perfect natural language analysis from a simple script

A beginner script like this is great for learning, but it is still simple. Real text analysis usually needs better punctuation handling and more advanced cleaning.

FAQ #

Does this script count punctuation as characters? #

Yes. len(text) counts all characters in the string, including spaces and punctuation.

Why use lower() before counting words? #

It makes words like Python and python count as the same word.

Can I analyze a text file instead of a string? #

Yes. Read the file into a string first, then use the same counting steps.

Is split() enough for real text analysis? #

It is enough for a beginner example, but more advanced text analysis needs better text cleaning.

See also #

Try the same script with file input next, then extend it so it shows the most common words first.

Press Esc to close