Encoding and Decoding

What is Unicode ? (Used for Display)

Characters Before Unicode

Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different systems, called character encodings, for assigning these numbers. These early character encodings were limited and could not contain enough characters to cover all the world’s languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use.

Early character encodings also conflicted with one another. That is, two encodings could use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) would need to support many different encodings. However, when data is passed through different computers or between different encodings, that data runs the risk of corruption.

Unicode Characters

Unicode has changed all that!

The Unicode Standard provides a unique number for every character, no matter what platform, device, application or language. It has been adopted by all modern software providers and now allows data to be transported through many different platforms, devices and applications without corruption. Support of Unicode forms the foundation for the representation of languages and symbols in all major operating systems, search engines, browsers, laptops, and smart phones—plus the Internet and World Wide Web (URLs, HTML, XML, CSS, JSON, etc.). Supporting Unicode is the best way to implement ISO/IEC 10646.

The emergence of the Unicode Standard and the availability of tools supporting it are among the most significant recent global software technology trends.

Converting String to byte Array using different character set and Encoding bytes to a specific character set

class Demo {
public static void main(String[] args) throws UnsupportedEncodingException {
		String text = "Hello";
		printByteArray(text);
		byte[] input = new byte[] {72, 101, 108, 108, 111 }; //Hello
		printString(input);
	}
	public static void printByteArray(String text) {
		System.out.println("Default " + Arrays.toString(text.getBytes())); //Default character set
		System.out.println("ASCII " + Arrays.toString(text.getBytes(StandardCharsets.US_ASCII)));
		System.out.println("UTF-8 " + Arrays.toString(text.getBytes(StandardCharsets.UTF_8)));
		System.out.println("UTF16 " + Arrays.toString(text.getBytes(StandardCharsets.UTF_16)));
		System.out.println("ISO" + Arrays.toString(text.getBytes(StandardCharsets.ISO_8859_1)));
		System.out.println("UTF 16 BE " + Arrays.toString(text.getBytes(StandardCharsets.UTF_16BE)));
		System.out.println("UTF 16 LE " + Arrays.toString(text.getBytes(StandardCharsets.UTF_16LE)));
	}
	
	public static void printString(byte[] input) throws UnsupportedEncodingException {
		System.out.println("Default : "+new String(input));
		System.out.println("ASCII : "+new String(input));
		System.out.println("UTF-8 : "+new String(input,StandardCharsets.UTF_8));
		System.out.println("UTF-16  : "+new String(input,StandardCharsets.UTF_16));
		System.out.println("ISO  : "+new String(input,StandardCharsets.ISO_8859_1));
		System.out.println("UTF-16 BE  : "+new String(input,StandardCharsets.UTF_16BE));
		System.out.println("UTF-16 LE  : "+new String(input,StandardCharsets.UTF_16LE));
	}
}

Base 64 Encoding and Decoding (Used for Data Transfer)

What is Base 64 Encoding ?

  • Base 64 encoding is a technique in which all the input is converted into A-Za-z0-9+/ (Table Below)
  • Why Base 64 ? because 1 byte is 8 bits….and some system uses 7 bits of a byte as data and the 1st bit as the operation that has to be performed on that 7 bits so during transit many of the times old devices(wifi/router/modem) could damage the data….Hence it is usually preferred to send data over a Base 64 bit format i.e 6 bits i.e 64 unique values can be transferred using Base 64
  • How Base 64 Conversion happens ?
    Input : ABC
    ASCII Encodeed : 064 066 067 //ASCII, can you use UTF-8 or any
    Bytes Representation : 01000001 01000010 01000011
    ReGroup to Group of 6 : 010000 010100 001001 000011
    Converting into Decimal : 16 20 9 3
    Mapping it with Base 64 Table : QUJD
    ———Transfer over Value over a network———-
    Reverse the process to get the Actual value
  • The data is padded by = symbol

Example :

public class Base64Demo {
	public static void main(String[] args) {
String inputEncode = "Hello World Hello World Hello World Hello World Hello World Hello World
		System.out.println(Base64.getEncoder().encodeToString(inputEncode.getBytes()));
		System.out.println(Base64.getMimeEncoder().encodeToString(inputEncode.getBytes()));
		System.out.println(Base64.getUrlEncoder().encodeToString(inputEncode.getBytes()));
		
		String inputDecode = "QUJD";
		System.out.println(new String(Base64.getDecoder().decode(inputDecode.getBytes())));
		
	}
}

Note : Mime Encoder is not much different from Actual encoder….the only difference is Each line of the output is no longer than 76 characters and ends with a carriage return followed by a linefeed (\r\n):

Reference :

https://www.baeldung.com/java-base64-encode-and-decode

Leave a Comment