Python Text Analysis Script Example
This beginner-friendly example shows how to analyze text in Python.
You will build a small script that:
- Counts characters
- Counts words
- Counts lines
- Counts how often each word appears
It is a good practice project because it uses basic Python tools in a real script: strings, loops, dictionaries, and printing results clearly.
Quick example #
Use this small script if you want a fast example of basic text analysis without reading from a file.
text = "Python is simple. Python is useful."
words = text.lower().replace(".", "").split()
print("Characters:", len(text))
print("Words:", len(words))
print("Lines:", len(text.splitlines()))
counts = {}
for word in words:
counts[word] = counts.get(word, 0) + 1
print("Word counts:", counts)
What this script does #
This example is useful because it stays small, but still teaches real Python skills.
It shows how to:
- Use a string as input text
- Analyze text by counting characters, words, and lines
- Count repeated words with a dictionary
- Build a simple script that you can improve later
If you are new to strings, see Python strings explained: basics and examples.
Skills you practice #
By working through this example, you practice:
- Working with strings
- Using
split()to break text into words - Looping through data with
for - Storing counts in a dictionary
- Printing readable results
If dictionaries are new to you, read Python dictionaries explained.
Basic version: count lines, words, and characters #
Start with a simple script that counts the text size in different ways.
text = """Python is simple.
Python is useful.
Python is fun to learn."""
character_count = len(text)
word_count = len(text.split())
line_count = len(text.splitlines())
print("Characters:", character_count)
print("Words:", word_count)
print("Lines:", line_count)
How it works #
len(text)counts every character in the stringtext.split()breaks the text into wordslen(text.split())gives the number of wordstext.splitlines()breaks the text into lineslen(text.splitlines())gives the number of lines
If you want a closer look at len(), see Python len() function explained.
Expected output #
Characters: 57
Words: 11
Lines: 3
The exact character count depends on the text, including spaces and punctuation.
How word counting works #
Now let’s extend the script to count how many times each word appears.
text = "Python is simple. Python is useful. Python is fun."
cleaned_text = text.lower().replace(".", "")
words = cleaned_text.split()
counts = {}
for word in words:
counts[word] = counts.get(word, 0) + 1
print("Words:", words)
print("Word counts:", counts)
Step by step #
1. Convert to lowercase #
text.lower()
This makes Python and python count as the same word.
2. Remove simple punctuation #
text.lower().replace(".", "")
This removes periods so words like simple. become simple.
3. Split into words #
words = cleaned_text.split()
This creates a list of words.
4. Create an empty dictionary #
counts = {}
This dictionary will store each word and its count.
5. Loop through the words #
for word in words:
counts[word] = counts.get(word, 0) + 1
This does the counting:
counts.get(word, 0)gets the current count- If the word is not in the dictionary yet, it uses
0 - Then it adds
1
Expected output #
Words: ['python', 'is', 'simple', 'python', 'is', 'useful', 'python', 'is', 'fun']
Word counts: {'python': 3, 'is': 3, 'simple': 1, 'useful': 1, 'fun': 1}
Improving the script #
Once the basic version works, you can make it more useful.
Common improvements:
- Sort words by frequency
- Ignore very common words if needed
- Read text from a file instead of using a hardcoded string
- Clean more punctuation with
replace() - Show only the top 5 or top 10 words
Here is a simple version that sorts word counts from highest to lowest:
text = "Python is simple. Python is useful. Python is fun."
cleaned_text = text.lower().replace(".", "")
words = cleaned_text.split()
counts = {}
for word in words:
counts[word] = counts.get(word, 0) + 1
sorted_counts = sorted(counts.items(), key=lambda item: item[1], reverse=True)
print("Most common words:")
for word, count in sorted_counts:
print(word, count)
Expected output #
Most common words:
python 3
is 3
simple 1
useful 1
fun 1
You can also read text from a file and then use the same logic. For that, see How to read a file in Python.
Expected output #
Your script should show results like these:
- Total characters
- Total words
- Total lines
- A dictionary or sorted list of word counts
For example:
Characters: 42
Words: 7
Lines: 1
Word counts: {'python': 2, 'is': 2, 'simple': 1, 'and': 1, 'useful': 1}
Beginner debugging tips #
If your result looks wrong, check the data at each step.
Useful debug prints:
print(text)
print(text.split())
print(text.lower())
print(words)
print(counts)
These help you see:
- The original text
- How
split()is breaking the text - Whether lowercase conversion worked
- What is inside the
wordslist - Whether the dictionary is counting correctly
Good things to check:
- Print the cleaned text before counting
- Print the words list to confirm the split result
- Print the dictionary after the loop
- Check punctuation if the counts look strange
- Check uppercase and lowercase words if duplicates appear
If you want more practice with dictionaries, see how to loop through a dictionary in Python.
Common mistakes #
These are some common problems beginners run into:
- Forgetting to lowercase text before counting words
- Not removing punctuation, which creates different versions of the same word
- Using
split(' ')instead ofsplit(), which can behave badly with extra spaces - Trying to count words before converting non-string data to text
- Expecting perfect natural language analysis from a simple script
A beginner script like this is great for learning, but it is still simple. Real text analysis usually needs better punctuation handling and more advanced cleaning.
FAQ #
Does this script count punctuation as characters? #
Yes. len(text) counts all characters in the string, including spaces and punctuation.
Why use lower() before counting words? #
It makes words like Python and python count as the same word.
Can I analyze a text file instead of a string? #
Yes. Read the file into a string first, then use the same counting steps.
Is split() enough for real text analysis? #
It is enough for a beginner example, but more advanced text analysis needs better text cleaning.
See also #
- Python strings explained: basics and examples
- Python dictionaries explained
- Python string split() method
- Python len() function explained
- How to read a file in Python
- Python word count script example
Try the same script with file input next, then extend it so it shows the most common words first.