UnicodeDecodeError: 'utf-8' codec can't decode byte (Fix)

If you see a Python error like this:

UnicodeDecodeError: 'utf-8' codec can't decode byte ...

Python is trying to read data as UTF-8 text, but the bytes do not match valid UTF-8.

This usually happens when:

  • you open a text file with the wrong encoding
  • a CSV file came from Excel or another Windows program
  • you decode bytes from a file, download, or API using the wrong encoding
  • the file is actually binary, not text

In most cases, the fix is to open the file with the correct encoding instead of relying on the default.

Quick fix #

with open('data.txt', 'r', encoding='latin-1') as file:
    text = file.read()

print(text)

Use the correct file encoding instead of the default UTF-8. If you do not know the encoding, first check where the file came from.

What this error means #

This error means:

  • Python is trying to turn bytes into text using UTF-8
  • at least one byte is not valid in UTF-8
  • the file or data was probably saved with a different encoding
  • common encodings include:
    • utf-8
    • latin-1
    • cp1252
    • utf-16

A simple way to think about it:

  • bytes are raw data
  • encoding tells Python how to interpret those bytes as text

If Python uses the wrong encoding, it cannot read the text correctly.

When this error usually happens #

You will often see this error when:

  • reading a text file with open() without the right encoding
  • loading CSV or text data created on another system
  • decoding raw bytes from a network response or binary source
  • reading Windows-created files that use cp1252 instead of UTF-8

If you are new to file reading, see how to read a file in Python and Python file handling basics.

Example that causes the error #

Here is a typical example.

with open('data.txt', 'r', encoding='utf-8') as file:
    text = file.read()

print(text)

This code works only if data.txt is really saved as UTF-8.

If the file actually uses another encoding such as cp1252, Python may raise:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 10: invalid start byte
Traceback (most recent call last):File "example.py", line 2, in <module>text = file.read()UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 10: invalid start byteWhere it happened — file and lineWhat went wrong — the exception typeWhy — the detailed message
Read it bottom-up: a byte in the file is not valid UTF-8, raised while reading the file.

What happened:

  • the file was opened in text mode
  • Python tried to decode it as UTF-8
  • one or more bytes were invalid for UTF-8
  • reading failed

Main ways to fix it #

Open the file with the correct encoding #

If you know the file encoding, pass it to open().

with open('data.txt', 'r', encoding='cp1252') as file:
    text = file.read()

print(text)

You can also try latin-1 if that matches the source:

with open('data.txt', 'r', encoding='latin-1') as file:
    text = file.read()

print(text)

For a beginner-friendly explanation of open(), see Python open() function explained.

If the file has a UTF-8 BOM, try utf-8-sig #

Some files are UTF-8 but include a special marker at the start called a BOM.

with open('data.txt', 'r', encoding='utf-8-sig') as file:
    text = file.read()

print(text)

This is often useful for text files saved by certain editors or export tools.

Use errors='ignore' or errors='replace' only if needed #

If you must keep reading even when some characters are bad, you can tell Python how to handle decode errors.

Ignore bad characters:

with open('data.txt', 'r', encoding='utf-8', errors='ignore') as file:
    text = file.read()

print(text)

Replace bad characters:

with open('data.txt', 'r', encoding='utf-8', errors='replace') as file:
    text = file.read()

print(text)

Use this carefully:

  • ignore silently removes characters
  • replace changes them to replacement symbols
  • both can hide the real problem

If the text matters, finding the correct encoding is safer.

Open binary files in binary mode #

If the file is not really text, do not open it in text mode.

Wrong:

with open('image.jpg', 'r', encoding='utf-8') as file:
    data = file.read()

Correct:

with open('image.jpg', 'rb') as file:
    data = file.read()

print(type(data))

Output:

<class 'bytes'>

How to find the right encoding #

The best way is to check where the file came from.

Useful clues:

  • files from modern tools and APIs are often UTF-8
  • text files from Windows programs often use cp1252
  • some exported files use utf-8-sig
  • older data may use latin-1

Try these steps:

  1. Check how the file was created or exported.
  2. Look at the program that produced it.
  3. Test common encodings used by that source.
  4. If it is a CSV file, check the export settings.

CSV files are a common cause of this issue. If that is your case, see how to read a CSV file in Python.

Debugging steps #

These quick tests can help you find the problem.

1. Print the raw bytes #

Open the file in binary mode and inspect the start of the file:

with open('data.txt', 'rb') as f:
    print(f.read(40))

This helps you confirm that:

  • the file exists
  • the file contains bytes
  • the content may not be plain UTF-8 text

2. Try utf-8-sig #

with open('data.txt', 'r', encoding='utf-8-sig') as f:
    print(f.read())

3. Try cp1252 #

with open('data.txt', 'r', encoding='cp1252') as f:
    print(f.read())

4. Try latin-1 #

with open('data.txt', 'r', encoding='latin-1') as f:
    print(f.read())

Also make sure you are opening the correct file path. If Python cannot find the file at all, see FileNotFoundError: No such file or directory.

What not to do #

Common mistakes:

  • Do not use errors='ignore' as your first fix if the text matters.
  • Do not guess random encodings without checking the data source.
  • Do not open binary files like images, PDFs, or Excel files in text mode.
  • Do not assume every CSV file is UTF-8.

Common causes #

Here are the most common reasons for this error:

  • The file was saved in cp1252 or latin-1, not UTF-8.
  • The file contains mixed or damaged text encoding.
  • A binary file was opened as if it were a text file.
  • Bytes from an API or download were decoded with the wrong encoding.
  • A CSV file came from Excel or another tool using a non-UTF-8 encoding.

FAQ #

Why does Python try UTF-8 by default? #

UTF-8 is a common text encoding and is the default in many Python setups and tools.

Should I use errors='ignore' to fix this? #

Only if losing some characters is acceptable. It hides the problem and can remove important text.

What is the difference between latin-1 and cp1252? #

They are similar single-byte encodings, but cp1252 supports some extra printable characters where latin-1 has control codes.

Why does this happen more with CSV files from Excel? #

Some CSV exports use encodings like cp1252 instead of UTF-8, especially on Windows.

See also #

Press Esc to close