Fix Unicode Conversation In Python

If you every end up in a situation where your text contains broken unicode chars and you are not able to apply encode('utf8') as it will raise error. You can use the bellow library to fix the text first then apply encoding.

Install it via pip

# for python2 

pip install ftfy==4.4.3 

# for python3 use the latest one
pip install ftfy

Then use it like bellow,

import ftfy
text = 'Broken text… it’s flubberific!'
fixed_text = ftfy.fix_text(text)

# Now safely encode
fixed_text_encoded = fixed_text.encode('utf8')

Check out the docs for other options like normalization .