Python Text Analysis Script Example
This beginner-friendly example shows how to analyze text in Python.
You will build a small script that:
- Counts characters
- Counts words
- Counts lines
- Counts how often each word appears
It is a good practice project because it uses basic Python tools in a real script: strings, loops, dictionaries, and printing results clearly.
Quick example
Use this small script if you want a fast example of basic text analysis without reading from a file.
text = "Python is simple. Python is useful."
words = text.lower().replace(".", "").split()
print("Characters:", len(text))
print("Words:", len(words))
print("Lines:", len(text.splitlines()))
counts = {}
for word in words:
counts[word] = counts.get(word, 0) + 1
print("Word counts:", counts)
What this script does
This example is useful because it stays small, but still teaches real Python skills.
It shows how to:
- Use a string as input text
- Analyze text by counting characters, words, and lines
- Count repeated words with a dictionary
- Build a simple script that you can improve later
If you are new to strings, see Python strings explained: basics and examples.
Skills you practice
By working through this example, you practice:
- Working with strings
- Using
split()to break text into words - Looping through data with
for - Storing counts in a dictionary
- Printing readable results
If dictionaries are new to you, read Python dictionaries explained.
Basic version: count lines, words, and characters
Start with a simple script that counts the text size in different ways.
text = """Python is simple.
Python is useful.
Python is fun to learn."""
character_count = len(text)
word_count = len(text.split())
line_count = len(text.splitlines())
print("Characters:", character_count)
print("Words:", word_count)
print("Lines:", line_count)
How it works
len(text)counts every character in the stringtext.split()breaks the text into wordslen(text.split())gives the number of wordstext.splitlines()breaks the text into lineslen(text.splitlines())gives the number of lines
If you want a closer look at len(), see Python len() function explained.
Expected output
Characters: 57
Words: 11
Lines: 3
The exact character count depends on the text, including spaces and punctuation.
How word counting works
Now let’s extend the script to count how many times each word appears.
text = "Python is simple. Python is useful. Python is fun."
cleaned_text = text.lower().replace(".", "")
words = cleaned_text.split()
counts = {}
for word in words:
counts[word] = counts.get(word, 0) + 1
print("Words:", words)
print("Word counts:", counts)
Step by step
1. Convert to lowercase
text.lower()
This makes Python and python count as the same word.
2. Remove simple punctuation
text.lower().replace(".", "")
This removes periods so words like simple. become simple.
3. Split into words
words = cleaned_text.split()
This creates a list of words.
4. Create an empty dictionary
counts = {}
This dictionary will store each word and its count.
5. Loop through the words
for word in words:
counts[word] = counts.get(word, 0) + 1
This does the counting:
counts.get(word, 0)gets the current count- If the word is not in the dictionary yet, it uses
0 - Then it adds
1
Expected output
Words: ['python', 'is', 'simple', 'python', 'is', 'useful', 'python', 'is', 'fun']
Word counts: {'python': 3, 'is': 3, 'simple': 1, 'useful': 1, 'fun': 1}
Improving the script
Once the basic version works, you can make it more useful.
Common improvements:
- Sort words by frequency
- Ignore very common words if needed
- Read text from a file instead of using a hardcoded string
- Clean more punctuation with
replace() - Show only the top 5 or top 10 words
Here is a simple version that sorts word counts from highest to lowest:
text = "Python is simple. Python is useful. Python is fun."
cleaned_text = text.lower().replace(".", "")
words = cleaned_text.split()
counts = {}
for word in words:
counts[word] = counts.get(word, 0) + 1
sorted_counts = sorted(counts.items(), key=lambda item: item[1], reverse=True)
print("Most common words:")
for word, count in sorted_counts:
print(word, count)
Expected output
Most common words:
python 3
is 3
simple 1
useful 1
fun 1
You can also read text from a file and then use the same logic. For that, see How to read a file in Python.
Expected output
Your script should show results like these:
- Total characters
- Total words
- Total lines
- A dictionary or sorted list of word counts
For example:
Characters: 42
Words: 7
Lines: 1
Word counts: {'python': 2, 'is': 2, 'simple': 1, 'and': 1, 'useful': 1}
Beginner debugging tips
If your result looks wrong, check the data at each step.
Useful debug prints:
print(text)
print(text.split())
print(text.lower())
print(words)
print(counts)
These help you see:
- The original text
- How
split()is breaking the text - Whether lowercase conversion worked
- What is inside the
wordslist - Whether the dictionary is counting correctly
Good things to check:
- Print the cleaned text before counting
- Print the words list to confirm the split result
- Print the dictionary after the loop
- Check punctuation if the counts look strange
- Check uppercase and lowercase words if duplicates appear
If you want more practice with dictionaries, see how to loop through a dictionary in Python.
Common mistakes
These are some common problems beginners run into:
- Forgetting to lowercase text before counting words
- Not removing punctuation, which creates different versions of the same word
- Using
split(' ')instead ofsplit(), which can behave badly with extra spaces - Trying to count words before converting non-string data to text
- Expecting perfect natural language analysis from a simple script
A beginner script like this is great for learning, but it is still simple. Real text analysis usually needs better punctuation handling and more advanced cleaning.
FAQ
Does this script count punctuation as characters?
Yes. len(text) counts all characters in the string, including spaces and punctuation.
Why use lower() before counting words?
It makes words like Python and python count as the same word.
Can I analyze a text file instead of a string?
Yes. Read the file into a string first, then use the same counting steps.
Is split() enough for real text analysis?
It is enough for a beginner example, but more advanced text analysis needs better text cleaning.
See also
- Python strings explained: basics and examples
- Python dictionaries explained
- Python string split() method
- Python len() function explained
- How to read a file in Python
- Python word count script example
Try the same script with file input next, then extend it so it shows the most common words first.