If you every end up in a situation where your text contains broken unicode chars and you are not able to apply
encode('utf8') as it will raise error. You can use the bellow library to fix the text first then apply encoding.
Install it via pip
# for python2 pip install ftfy==4.4.3 # for python3 use the latest one pip install ftfy
Then use it like bellow,
import ftfy text = 'Broken text… it’s ﬂubberiﬁc!' fixed_text = ftfy.fix_text(text) # Now safely encode fixed_text_encoded = fixed_text.encode('utf8')
Check out the docs for other options like normalization .