Windows-1251

From MobileRead
Jump to: navigation, search

Windows-1251 (also known as code page CP1251) is a popular 8-bit character code, designed to cover languages that use the Cyrillic alphabet.

Contents

[edit] Introduction

The original ASCII code was designed to work in 7 bits which offers 128 separate characters. The first 32 (0 - 31) were reserved for control codes but most of the rest are printable. The character 127 was defined to implement the backspace/delete functionality but on some devices it will be shown as a small box if coded. Since most items in a computer are stored in bytes with its 8 bits there are another 128 characters that could be used. The Windows-1251 code is designed to use these codes for the Cyrillic alphabet and is much more commonly used than the ISO 8859-5 standard intended for the same purpose but was never really adopted by these users. (It is missing some Ukrainian characters.) In the future, both may eventually give way to Unicode which is the preferred character set.

[edit] 1251 Code page layout

The following table shows Windows-1251. Each character is shown with its decimal code and its Unicode equivalent.

Note that the word codes shown are for reference. They will not normally generate these values but will likely generate the equivalent ISO or UTF-8 values depending on the reader (see special characters). The full Windows-1251 includes ASCII values. The ones shown here are unique to this coding.

UTF-8 Character Code Description[1]
U+0402 Ђ 128
U+0403 Ѓ 129
U+201A 130 single low 9 quote
U+0453 ѓ 131
U+201E 132 double low 9 quote
U+2036 133 ellipse
U+2020 134 dagger
U+2021 135 Double Dagger
U+20AC 136 Euro sign[2]
U+2030 137 per mille
U+0409 Љ 138
U+2039 139 left arrow quote
U+040A Њ 140
U+040C Ќ 141
U+040B Ћ 142
U+040F Џ 143
U+0452 ђ 144
U+2018 145 left single curly quote
U+2019 146 right single curly quote
U+201C 147 left double curly quote
U+201D 148 right double curly quote
U+2022 149 bullet
U+2013 150 normal dash
U+2014 151 wide dash
152 undefined[2]
U+2122 153 trade mark
U+0459 љ 154
U+203A 155 right arrow quote
U+045A њ 156
U+045C ќ 157
U+045B ћ 158
U+045F џ 159
U+00A0 160 NBSP
U+040E Ў 161
U+045E ў 162
U+0408 Ј 163
U+00A4 ¤ 164 currency
U+0490 Ґ 165
U+00A6 ¦ 166 broken vertical bar
U+00A7 § 167 section sign
U+0401 Ё 168
U+00A9 © 169 copyright
U+0404 Є 170
U+00AB « 171 left angle quote
U+00AC ¬ 172 not sign
U+00AD 173 SHY (soft hyphen)
U+00AE ® 174 registered trademark
U+0407 Ї 175
U+00B0 ° 176 Degree sign
U+00B1 ± 177 Plus-minus sign
U+0406 І 178
U+0456 і 179
U+0491 ґ 180
U+00B5 µ 181 Micro sign
U+00B6 182 paragraph sign
U+00B7 · 183 middle dot
U+0451 ё 184
U+2116 185 Numero sign[3]
U+0454 є 186
U+00BB » 187 right angle quote
U+0458 ј 188
U+0405 Ѕ 189
U+0455 ѕ 190
U+0457 ї 191

UTF-8 Character Code Name
U+0410 А 192 A
U+0411 Б 193 Be
U+0412 В 194 Ve
U+0413 Г 195 Ge
U+0414 Д 196 De
U+0415 Е 197 E
U+0416 Ж 198 Zhe
U+0417 З 199 Ze
U+0418 И 200 I
U+0419 Й 201 short I
U+041A К 202 Ka
U+041B Л 203 El
U+041C М 204 Em
U+041D Н 205 En/Ne
U+041E О 206 O
U+041F П 207 Pe
U+0420 Р 208 Er/Re
U+0421 С 209 Es
U+0422 Т 210 Te
U+0423 У 211 U
U+0424 Ф 212 Ef/Fe
U+0425 Х 213 Kha
U+0426 Ц 214 Tse
U+0427 Ч 215 Che
U+0428 Ш 216 Sha
U+0429 Щ 217 Shcha, Shta
U+042A Ъ 218 soft sign or small yer
U+042B Ы 219 * Russian
U+042C Ь 220 * Russian
U+042D Э 221 * Russian
U+042E Ю 222 Yu
U+042F Я 223 Ya
U+0430 а 224
U+0431 б 225
U+0432 в 226
U+0433 г 227
U+0434 д 228
U+0435 е 229
U+0436 ж 230
U+0437 з 231
U+0438 и 232
U+0439 й 233
U+043A к 234
U+043B л 235
U+04eC м 236
U+043D н 237
U+043E о 238
U+043F п 239
U+0440 р 240
U+0441 с 241
U+0442 т 242
U+0443 у 243
U+0444 ф 244
U+0445 х 245
U+0446 ц 246
U+0447 ч 247
U+0448 ш 248
U+0449 щ 249
U+044A ъ 250
U+044B ы 251
U+044C ь 252
U+044D э 253
U+044E ю 254
U+044F я 255

[edit] Notes

  1. Unless otherwise indicated the codes listed in the description are the same as Windows-1252 and/or ISO-8859-1.
  2. 2.0 2.1 Not the same code as Windows-1252.
  3. Not the same as ISO-8859-1.

[edit] Coverage

All Cyrillic alphabets such as the Russian language. These include
  • Bulgarian
  • Belorussian
  • Russian
  • Macedonian
  • Serbian Cyrillic
  • Ukrainian
  • Azeri
  • Kyrgyz
  • Mongolian
  • Tatar
  • Uzbek

It is the most widely used character set for encoding the Bulgarian language, Serbian language and Macedonian language.

[edit] Language specific characters

  • Ukrainian and Belorussian characters: there are four special letters Ґ, Є, І, and Ї in both upper and lower case.
  • Serbian and Macedonian characters: there are five special character Ђ, Љ, Њ, Ћ, and Џ in both upper and lower case.

[edit] For more information

Personal tools
Namespaces

Variants
Actions
Navigation
MobileRead Networks
Toolbox