Encoding is the process of converting data into a format required for a number of information processing needs, including:
- Program compiling and execution
- Data transmission, storage and compression/decompression
- Application data processing, such as file conversion
Encoding can have two meanings:
- In computer technology, encoding is the process of applying a specific code, such as letters, symbols and numbers, to data for conversion into an equivalent cipher.
- In electronics, encoding refers to analog to digital conversion.
In computers, encoding is the process of putting a sequence of characters (letters, numbers, punctuation, and certain symbols) into a specialized format for efficient transmission or storage. Decoding is the opposite process -- the conversion of an encoded format back into the original sequence of characters. Encoding and decoding are used in data communications, networking, and storage. The term is especially applicable to radio (wireless) communications systems.
The code used by most computers for text files is known as ASCII (American Standard Code for Information Interchange, pronounced ASK-ee). ASCII can depict uppercase and lowercase alphabetic characters, numerals, punctuation marks, and common symbols. Other commonly-used codes include Unicode, BinHex, Uuencode, and MIME. In data communications, Manchester encoding is a special form of encoding in which the binary digits (bits) represent the transitions between high and low logic states. In radio communications, numerous encoding and decoding methods exist, some of which are used only by specialized groups of people (amateur radio operators, for example). The oldest code of all, originally employed in the landline telegraph during the 19th century, is the Morse code.
The terms encoding and decoding are often used in reference to the processes of analog-to-digital conversion and digital-to-analog conversion. In this sense, these terms can apply to any form of data, including text, images, audio, video, multimedia, computer programs, or signals in sensors, telemetry, and control systems. Encoding should not be confused with encryption, a process in which data is deliberately altered so as to conceal its content. Encryption can be done without changing the particular code that the content is in, and encoding can be done without deliberately concealing the content.
C# Example to do character encoding between different formats for convenient of our target execution
using System; using System.Text; class Example { static void Main() { string unicodeString = "This string contains the unicode character Pi (\u03a0)"; // Create two different encodings. Encoding ascii = Encoding.ASCII; Encoding unicode = Encoding.Unicode; // Convert the string into a byte array. byte[] unicodeBytes = unicode.GetBytes(unicodeString); // Perform the conversion from one encoding to the other. byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes); // Convert the new byte[] into a char[] and then into a string. char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)]; ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0); string asciiString = new string(asciiChars); // Display the strings created before and after the conversion. Console.WriteLine("Original string: {0}", unicodeString); Console.WriteLine("Ascii converted string: {0}", asciiString); } }
// The example displays the following
output:
// Original string: This string contains the unicode character Pi (Π)
// Ascii converted string: This string contains the unicode character Pi (?)
Note that the encoding classes allow errors to:
- Silently change to a "?" character.
- Use a "best fit" character.
- Change to an application-specific behavior through use of the EncoderFallback and DecoderFallback classes with the U+FFFD Unicode replacement character.
Character set vs. character encoding
Recently I was asked to explain the difference between character encoding and character set, and I thought it would be interesting to write about this over here as well.
In these two terms, ‘set’ refers to the set of characters and their numbers (code points), and ‘encoding’ refers to the representation of these code points. For example, Unicode is a character set, and UTF-8 and UTF-16 are different character encodings of Unicode.
To illustrate this difference, in the Unicode character set, the € character has code point 8364 (usually written as U+20AC, in hexadecimal notation). Using the UTF-16LE character encoding this is stored as AC 20, while UTF-16BE stores this as 20 AC, and the UTF-8 representation is E2 82 AC.
In practice however, the two terms are used interchangeably. The difference as described above is not applicable to most non-Unicode character sets (such as Latin-1 and SJIS) because their code points are the same as their encoding. Because of that, there has never been a real distinction from a historical perspective.
The most important difference in English is that the term character set is a little old fashioned, and character encoding is most commonly used nowadays. The reason for this is likely that it is more correct to speak of character encoding when UTF-8 and UTF-16 are different possible encodings.
Some examples:
The HTTP protocol uses
Content-Type: text/html; charset=UTF-8
The more recent XML uses
<?xml version="1.0" encoding="UTF-8"?>
Content-Type: text/html; charset=UTF-8
The more recent XML uses
<?xml version="1.0" encoding="UTF-8"?>
This illustrates how they are used synonymously.
Blogger Labels: concepts,data,information,Program,execution,transmission,storage,compression,Application,conversion,meanings,computer,technology,symbols,cipher,electronics,computers,sequence,radio,systems,text,ASCII,American,Standard,Code,Interchange,numerals,Unicode,BinHex,Uuencode,MIME,Manchester,digits,bits,transitions,logic,operators,example,landline,century,Morse,reference,multimedia,sensors,Encryption,System,Main,Create,encodings,Convert,GetBytes,Perform,GetCharCount,Length,GetChars,Display,Console,WriteLine,Original,output,Note,errors,Change,EncoderFallback,DecoderFallback,FFFD,Character,difference,representation,notation,Latin,SJIS,distinction,perspective,English,Some,examples,protocol,Content,Type,version,analog,punctuation,byte,unicodeBytes,asciiBytes,asciiChars
No comments:
Post a Comment