|
In character encoding terminology a code point is any of the numerical values that make up the codespace.[1] For example, ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises 1,114,112 code points in the range 0hex to 10FFFFhex. The unicode codespace is divided into seventeen planes (the basic multilingual plane, and 16 supplementary planes), each with 65,536 (= 216) code points. Thus the total size of the unicode code space is 17 Code points are normally assigned to abstract characters. An abstract character is not a graphical Glyph but a unit of textual data. The precise appearance of the character depends on the font. However code points may also be left reserved for future assignment (most of the unicode code space is unassigned), or given other designated functions. A unicode text file is not necessarily merely a sequence of code points encoded into 4 byte blocks. Instead an encoding scheme is used to serialize a sequence of code points into a sequence of bytes. A number of such schemes exist, and these trade between space efficiency and ease of encoding. A variable number of bytes can be used for each character. For example UTF-8, maintains some compatibility with ASCII. Encoding schemes also take into account endianness, and may have the property of being self-synchronizing, meaning character boundaries can be found without having to read from the beginning of the string. [edit] NotesPágina espejo de la WikipediaDirectorio de Enlaces Directorio dmoz Directorio espejo dmoz Pedro Bernardo |