Difference between Encoding and Character set.

For better understand of encoding and character set we will use the UTF-8 encoding and Character set unicode in the below explanation and examples.

So as we all know data is stored on an hard disk in binary format i.e 1 and 0’s. So each system has to read these 1 and 0’s and decode it understand what the data is, so if the data is written on hardisk using UTF-8 encoding then it has to be decoded by UTF-8 decoder to understand the text, any other decoder will convert the text into gibberish. Now the decoded data has to be mapped with a character set which will make the data more readable by humans.

Example : Suppose 1 2 3 4 has to be written on an harddisk then it will be encoded into something like 00000001 00000010 00000011 00000100 using UTF-8 encoder.

Example : Suppose “1101000 1100101 1101100 1101100 1101111” is the data that is stored in the hard disk, now UTF-8 decoding algorithm has to decode it and after decoding it will look something like “104 101 108 108 111”. Now when these numbers are viewed via a unicode character set it will be translated into “hello”

Leave a Comment