Help:Special characters

Latest revision as of 16:27, 8 December 2009

From MediaWiki 1.5, all projects use Unicode (UTF-8) character encoding.

[edit] Unicode and ISO 8859-1

Until the end of June 2005, when this new version came into use on Wikimedia projects, the English, Dutch, Danish, and Swedish Wikipedias used windows-1252 (they declared themselves to be ISO-8859-1 but in reality browsers treat the two as synonymous and the MediaWiki software made no attempt to prevent use of characters exclusive to windows-1252). Pre-upgrade wikitext in their databases remains stored in Windows-1252 and is converted on load (some of it may also have been converted by gradual changes in the way history is stored). Edits made since the upgrade will be stored as UTF-8 in the database. This conversion on load process is invisible to users. It is also invisible to reusers as Wikimedia now uses XML dumps rather than database dumps.

Unicode (UTF-8) a variable number of bytes per character special characters, including CJK characters, can be treated like normal ones; not only the webpage, but also the edit box shows the character; in addition it is possible to use the multi-character codes; they are not automatically converted in the edit box.

ISO 8859-1
one byte per character
special characters that are not available in the limited character set are stored in the form of a multi-character code; there are usually two or three equivalent representations, e.g. for the character € the named character reference € and the decimal character reference € and the hexadecimal character reference €. The edit box shows the entered code, the webpage the resulting character. Unavailable characters which are copied into the edit box are first displayed as the character, and automatically converted to their decimal codes on Preview or Save.
the most common special characters, such as é, are in the character set, so code like é, although allowed, is not needed.

Note that Special:Export exports using UTF-8 even if the database is encoded in ISO 8859-1, at least that was the case for the English Wikipedia, already when it used version 1.4.

To find out which character set applies in a project, use the browser's "View Source" feature and look for something like this:

<meta http-equiv="Content-type" content="text/html; charset=iso-8859-1" />

or

<meta http-equiv="Content-type" content="text/html; charset=utf-8" />

[edit] Editing

Many characters not in the repertoire of standard ASCII will be useful—even necessary—for projects in a non-latin alphabet language. This page contains recommendations for which characters are safe to use and how to use them. There are four ways to enter a non-ASCII character into the wikitext:

Use a link to a special character listed under the edit box to insert that character. Note however that some characters are not displayed in Internet Explorer. In some fonts, e.g. Arial, all the characters in this box are displayed, but it is not convenient for a user to have to switch fonts between webpages. You have to install the CharInsert extension to use this.
Enter the character directly from a foreign keyboard, or by cut and paste from a "character map" type application, or by some special means provided by the operating system or text editing application. On ISO-8859-1 wikis some browsers will change characters outside the charset of the wiki into html numeric character entities (see below).
Use an HTML named character entity reference like à. This is unambiguous even when the server does not announce the use of any special character set, and even when the character does not display properly on some browsers. However, it may cause difficulties with searches (see below).
Use an HTML numeric character reference like ¡. Unfortunately some old browsers incorrectly interpret these as references to the native character set. It is, however, the only way to enter Unicode values for which there is no named entity, such as the Turkish letters. Because the code points 128 to 159 are unused in both ISO-8859-1 and Unicode, character references in that range such as  are illegal and ambiguous, though they are commonly used by many web sites. (They are not technically unused, but they map to rare control codes that are illegal in HTML.) Almost all browsers treat ISO-8859-1 as Windows-1252, which does have printable characters in that space, and they often found their way into article titles on English projects, which really caused confusion when trying to create interwiki links to said pages.

Generally speaking, Western European languages such as Spanish, French, and German pose few problems.

For the purpose of searching, a word with a special character can best be written using the first method. If the second method is used a word like Odiliënberg can only be found by searching for Odili, euml and/or nberg; this is actually a bug that should be fixed—the entities should be folded into their raw character equivalents so all searches on them are equivalent. See also Help:Searching.

[edit] Browser issues

Some browsers are known to do nasty things to text in the edit box. Most commonly they convert it to an encoding native to the platform (whilst the NT line of Windows is internally UCS-2LE (2 Byte subset of UTF-16) it has a complete duplicate set of APIs in the Windows ANSI code page and many older apps tend to use these, especially for things like edit boxes). Then they let the user edit it using a standard edit control and convert it back. The result is that any characters that do not exist in the encoding used for editing get replaced with something that does (often a question mark though at least one browser has been reported to actually transliterate text!).

[edit] IE for the Mac

This relatively common browser translates to mac-roman for the edit box with the result it munges most Unicode stuff (usually but not always by replacing them with a question mark). It also munges things that are in ISO-8859-1 but not mac-roman (specifically ¤ ¦ ¹ ² ³ ¼ ½ ¾ Ð × Ý Þ ð ý þ and the soft hyphen) so the problems it causes are not limited to Unicode wikis (though they tend to be much worse on Unicode wikis because they affect actual text and interwiki links rather than just fairly obscure symbols).

[edit] Netscape 4.x

Similar issues to IE Mac though the character set converted to and from will obviously not always be mac-roman.

[edit] Console browsers

Lynx, Links (in text mode) and W3M convert to the console character set (Lynx and Links actually using a transliteration engine) for editing and convert back on save. If the console character set is UTF-8 then these browsers are Unicode safe but if it isn't they aren't. With Lynx and Links a possible detection method would be to add another edit box to the login form but this won't work for W3M as it doesn't convert the text to the console character set until the user actually attempts to edit it.

[edit] The workaround

After English Wikipedia switched to UTF-8 and interwiki bots started replacing html entities in interwikis with literal unicode text, edits that broke unicode characters became so common they could no longer be ignored. A workaround was developed to allow the problematic browsers to edit safely provided that MediaWiki knew they have problems.

Browsers listed in the setting $wgBrowserBlackList (a list of regexps that match against user agent strings) are supplied text for editing in a special form. Existing hexadecimal html entities in the page have an extra leading zero added, non-ascii characters that are stored in the wikitext are represented as hexadecimal html entities with no leading zeros.

Currently the default settings only have IE mac and a specific version of netscape 4.x for linux in the blacklist. Nevertheless it seems to have stopped most of the problem. Hopefully the default list will be expanded in future but that relies on getting someone with cvs access to commit the changes.

[edit] Viewing

Most current browsers have some level of Unicode support but some do it better than others. The most commonly encountered problem is that Internet Explorer relies on preconfigured font links in the registry rather than actually searching for a font that can display the character in question. This means that Internet Explorer often has to be forced to use particular fonts. The stuff in Windows Glyph List 4 should be safe to use without such special measures.

... may work, but only for people with that font.

[edit] Displaying Special Characters

To display Unicode or special characters on web page(s), one or more of the Unicode fonts need to be present or installed in your computer, first. For proper working functionality, setup or configuration or settings from the web page viewing browser software also needs to be modified.

The default font for Latin scripts in Internet Explorer(IE) web browser for Windows is Times New Roman. It doesn't include many Unicode blocks. To properly view special characters in IE, you must set your browser font settings to a font that includes many Unicode blocks of characters, such as TITUS Cyberbit, GNU Unifont which are freely available.

Special symbols should display properly without further configuration with Mozilla Firefox, Konqueror, Opera, Safari and most other recent browsers. An optional step can be taken for better (and correct) display of characters with ligature forms, combined characters, after the previously mentioned steps were followed, is to install rendering engine software.

To use one of the available Unicode fonts for displaying special characters inside a HTML table or chart or box, specify the class="Unicode" in the table's TR row tag (or, in each TD tag, but using it in each TR is easier than using it in each TD), in wiki table code, use that after the (TR equivalent) "|-" (like, |- class="Unicode").

For displaying individual special character, template code {{Unicode|char}} for each character can be used. HTML decimal or hexadecimal numeric entity codes can be used in the place of the char. If a paragraph with lots of special Unicode characters need to be displayed, then,  ... , or,  ...  code can also be used.

The class="Unicode" is to be used in web page(s), HTML or wiki tags, where various characters from wide range of various Unicode blocks need to be displayed. If the special characters that need to be displayed on web page(s), are mostly covering fewer Unicode blocks, related to latin scripts, then class="latinx" can be used. For special characters or symbols related to International Phonetic Alphabet, class="IPA" can be used. For polytonic (Greek) characters or related symbols, class="polytonic" can be used.

[edit] Changing Internet Explorer's (IE) default font

From the IE menu bar, follow this path: Tools -> Internet Options -> Fonts -> Webpage Font: to a scrolling list of fonts. As indicated above, the default selection for Windows is Times New Roman. For viewing of many special characters, select a different font, such as Lucida Sans Unicode, and then select OK.

[edit] Alt Keycodes

Many special characters that have decimal equivalent codepoint numbers below 256 can be typed in by using the keyboard's Alt + Decimal equivalent code numbers keys.

For example, the character é (Small e with acute accent, html entity code "é") can be obtained by pressing Alt + 130.

Which means, first press the "Alt" key and keep on pressing it (or keep on holding it), with your left hand, then press the digit keys 1, 3, 0, in sequence, one by one, in the right-side Numeric Keypad part of the keyboard, then release the Alt key.

But special characters, for example, λ (small lambda) cannot be obtained from its decimal code 955 or 0955, by using it with the Alt key, if used inside Notepad or Internet Explorer (IE). You'll get wrong character "╗" or "»".

The "Wordpad" (Windows Operating system) editor accepts the decimal (numeric entity codepoints) values above 256, so it can be used to obtain the Special/Unicode characters, then copy-paste where you need.

To obtain such special characters correctly, which have decimal codepoint values above the 256, another option is to use or type its hex equivalent codepoint first, then press Alt+X keys. To do this, open or start Wordpad, Word, etc editing application software, (this Alt+X process will not work in Internet Explorer, Notepad, etc). Type in 3BB, which is a hexadecimal equivalent numeric codepoint of the character λ, then press Alt+X. Hexcode 3BB will convert/turn into the λ character. If you press the Alt+X key combination again, then λ character will convert back to its hex equivalent codepoint, 3BB. Now character(s) can be copy pasted, where you want to use, or, (in IE) use its html hexadecimal equivalent code λ or its html decimal equivalent code λ.

[edit] See also

Help:Displaying a formula

[edit] External links

http://www.unicode.org/charts/ Unicode character charts; hexadecimal numbers only; PDF files showing all characters independent of browser capabilities
http://www.unicode.org/help/display_problems.html Help for enabling Unicode support on most platforms
Table of Unicode characters from 1 to 65535 - shows how the decimal character references look in one's browser
HTML 4.0 Character Entity References - shows how the named and decimal character references look in one's browser
FileFormat.Info - details of many Unicode characters, including the named, decimal and hexadecimal character reference, showing how it should look and for each, how it looks in one's browser
Alan Wood's Unicode Resources - comprehensive resource with character test pages for all Unicode ranges, as well as OS-specific Unicode support information and links to fonts and utilities.
CharacterPal - Free Mac OS X Dashboard Widget that displays key combinations for special characters.
A convertor that helps you find the right escape sequence to use - helps when you need to escape ASCII/Unicode characters that are special characters in wiki markup

@@ Line 1: / Line 1: @@
-{{see also|Wikipedia:Mathematical symbols}}
+From MediaWiki 1.5, all projects use '''Unicode (UTF-8)'''  character encoding.
-From MediaWiki 1.5, all projects use '''[[w:Unicode|Unicode]] ([[w:UTF-8|UTF-8]])'''  [[w:character encoding|character encoding]].
 == Unicode and ISO 8859-1 ==
-Until the end of June 2005, when this new version came into use on Wikimedia projects, the English, Dutch, Danish, and Swedish Wikipedias used [[w:Windows-1252|windows-1252]] (they declared themselves to be [[w:ISO-8859-1|ISO-8859-1]] but in reality browsers treat the two as synonymous and the MediaWiki software made no attempt to prevent use of characters exclusive to windows-1252). Pre-upgrade wikitext in their databases remains stored in Windows-1252 and is converted on load (some of it may also have been converted by gradual changes in the way history is stored). Edits made since the upgrade will be stored as UTF-8 in the database. This conversion on load process is invisible to users. It is also invisible to reusers as Wikimedia now uses [[m:Data dumps#What happened to the SQL dumps.3F|XML dumps rather than database dumps]].
+Until the end of June 2005, when this new version came into use on Wikimedia projects, the English, Dutch, Danish, and Swedish Wikipedias used windows-1252 (they declared themselves to be ISO-8859-1 but in reality browsers treat the two as synonymous and the MediaWiki software made no attempt to prevent use of characters exclusive to windows-1252). Pre-upgrade wikitext in their databases remains stored in Windows-1252 and is converted on load (some of it may also have been converted by gradual changes in the way history is stored). Edits made since the upgrade will be stored as UTF-8 in the database. This conversion on load process is invisible to users. It is also invisible to reusers as Wikimedia now uses XML dumps rather than database dumps.
 ;Unicode (UTF-8)
 :*a variable number of bytes per character
-:*special characters, including [[w:CJK|CJK]] characters,  can be treated like normal ones; not only the webpage, but also the edit box shows the character; in addition it is possible to use the multi-character codes; they are not automatically converted in the edit box.
+:*special characters, including CJK characters,  can be treated like normal ones; not only the webpage, but also the edit box shows the character; in addition it is possible to use the multi-character codes; they are not automatically converted in the edit box.
 ;ISO 8859-1
 :*one byte per character
-:*special characters that are not available in the limited character set are stored in the form of a multi-character code; there are usually two or three equivalent representations, e.g. for the character &euro;  the '''named character reference''' &amp;euro; and  the '''decimal character reference''' &amp;#8364; and the '''hexadecimal character reference''' &amp;#x20AC;. The edit box shows the entered code, the webpage the resulting character. Unavailable characters which are copied into the edit box are first displayed as the character, and [[Help:Automatic conversion of wikitext|automatically converted]] to their decimal codes on Preview or Save.
+:*special characters that are not available in the limited character set are stored in the form of a multi-character code; there are usually two or three equivalent representations, e.g. for the character &euro;  the '''named character reference''' &amp;euro; and  the '''decimal character reference''' &amp;#8364; and the '''hexadecimal character reference''' &amp;#x20AC;. The edit box shows the entered code, the webpage the resulting character. Unavailable characters which are copied into the edit box are first displayed as the character, and automatically converted to their decimal codes on Preview or Save.
 :*the most common special characters, such as é, are in the character set, so code like &amp;eacute;, although allowed, is not needed.
@@ Line 26: / Line 24: @@
 ==Editing==
-Many characters not in the repertoire of standard [[w:ASCII|ASCII]] will be useful&mdash;even necessary&mdash;for projects in a non-latin alphabet language. This page contains recommendations for which characters are safe to use and how to use them.  There are four ways to enter a non-ASCII character into the wikitext:
+Many characters not in the repertoire of standard ASCII will be useful&mdash;even necessary&mdash;for projects in a non-latin alphabet language. This page contains recommendations for which characters are safe to use and how to use them.  There are four ways to enter a non-ASCII character into the wikitext:
-* Use a link to a special character listed under the edit box to insert that character. Note however that some characters are not displayed in Internet Explorer:<br />[[m:Image:Special characters under edit box, IE.png|500px]]<br />In some fonts, e.g. Arial, all the characters in this box are displayed, but it is not convenient for a user to have to switch fonts between webpages. You have to install the [[mw:Extension:CharInsert|CharInsert]] extension to use this.
+* Use a link to a special character listed under the edit box to insert that character. Note however that some characters are not displayed in Internet Explorer. In some fonts, e.g. Arial, all the characters in this box are displayed, but it is not convenient for a user to have to switch fonts between webpages. You have to install the CharInsert extension to use this.
 * Enter the character directly from a foreign keyboard, or by cut and paste from a "character map" type application, or by some special means provided by the operating system or text editing application. On ISO-8859-1 wikis some browsers will change characters outside the charset of the wiki into html numeric character entities (see below).
-* Use an [[Help:HTML in wikitext|HTML]] named character entity reference like <code>&amp;agrave;</code>.  This is unambiguous even when the server does not announce the use of any special character set, and even when the character does not display properly on some browsers. However, it may cause difficulties with searches (see below).
+* Use an HTML named character entity reference like <code>&amp;agrave;</code>.  This is unambiguous even when the server does not announce the use of any special character set, and even when the character does not display properly on some browsers. However, it may cause difficulties with searches (see below).
-* Use an HTML numeric character reference like <code>&amp;#161;</code>.  Unfortunately some old browsers incorrectly interpret these as references to the native character set.<!--which ones?-->  It is, however, the only way to enter [[w:Unicode|Unicode]] values for which there is no named entity, such as the [[Help:Turkish characters|Turkish]] letters.  Because the code points 128 to 159 are unused in both [[w:ISO-8859-1|ISO-8859-1]] and [[w:Unicode|Unicode]], character references in that range such as <code>&amp;#131;</code> are illegal and ambiguous, though they are commonly used by many web sites. (They are not technically unused, but they map to rare control codes that are illegal in HTML.) Almost all browsers treat ISO-8859-1 as Windows-1252, which does have printable characters in that space, and they often found their way into article titles on English projects, which really caused confusion when trying to create interwiki links to said pages.
+* Use an HTML numeric character reference like <code>&amp;#161;</code>.  Unfortunately some old browsers incorrectly interpret these as references to the native character set.<!--which ones?-->  It is, however, the only way to enter Unicode values for which there is no named entity, such as the Turkish letters.  Because the code points 128 to 159 are unused in both ISO-8859-1 and Unicode, character references in that range such as <code>&amp;#131;</code> are illegal and ambiguous, though they are commonly used by many web sites. (They are not technically unused, but they map to rare control codes that are illegal in HTML.) Almost all browsers treat ISO-8859-1 as Windows-1252, which does have printable characters in that space, and they often found their way into article titles on English projects, which really caused confusion when trying to create interwiki links to said pages.
-Generally speaking, Western European languages such as Spanish, French, and German pose few problems. For specific details about other languages, see: [[Help:Turkish characters]] and [[Help:Romanian characters]]. (More will be added to this list as contributors in other languages appear.)
+Generally speaking, Western European languages such as Spanish, French, and German pose few problems.
 For the purpose of searching, a word with a special character can best be written using the first method. If the second method is used a word like Odiliënberg can only be found by searching for Odili, euml and/or nberg; this is actually a bug that should be fixed&mdash;the entities should be folded into their raw character equivalents so all searches on them are equivalent. See also [[Help:Searching]].
-===Esperanto===
-<table class="wikitable" style="float: right; margin-left: .5em;">
- <tr><td>in edit box<td>in database and output</tr>
- <tr><td>S<td>S</tr>
- <tr><td>Sx<td>Ŝ</tr>
- <tr><td>Sxx<td>Sx</tr>
- <tr><td>Sxxx<td>Ŝx</tr>
- <tr><td>Sxxxx<td>Sxx</tr>
- <tr><td>Sxxxxx<td>Ŝxx</tr>
-</table>
-MediaWiki installations configured for Esperanto use UTF-8 for storage and display. However when editing the text is converted to a form that is designed to be easier to edit with a standard keyboard.
-The characters for which this applies are: Ĉ, Ĝ, Ĥ, Ĵ, Ŝ, Ŭ, ĉ, ĝ, ĥ, ĵ, ŝ, ŭ. You may enter these directly in the edit box if you have the facilities to do so. However when you edit the page again you will see them encoded as Sx. This form is referred to as "x-sistemo" or "x-kodo". In order to preserve round trip capability when one or more x's follow these characters or their non-accented forms (C, G, H, J, S, U, c, g, h, j, s, u), the number of x's in the edit box is double the number in the actual stored article text.
-For example, the interlanguage link <nowiki>[[en:Luxury car]]</nowiki> to
-[[en:Luxury car]] has to be entered in the edit box as <nowiki>[[en:Luxxury car]]</nowiki> on [[eo:]]. This has caused problems with interwiki update bots in the past.
 ===Browser issues===
-Some browsers are known to do nasty things to text in the edit box. Most commonly they convert it to an encoding native to the platform (whilst the NT line of Windows is internally [[w:UTF-16/UCS-2|UCS-2LE]] (2 Byte subset of UTF-16) it has a complete duplicate set of APIs in the Windows ANSI code page and many older apps tend to use these, especially for things like edit boxes). Then they let the user edit it using a standard edit control and convert it back. The result is that any characters that do not exist in the encoding used for editing get replaced with something that does (often a question mark though at least one browser has been reported to actually transliterate text!).
+Some browsers are known to do nasty things to text in the edit box. Most commonly they convert it to an encoding native to the platform (whilst the NT line of Windows is internally UCS-2LE (2 Byte subset of UTF-16) it has a complete duplicate set of APIs in the Windows ANSI code page and many older apps tend to use these, especially for things like edit boxes). Then they let the user edit it using a standard edit control and convert it back. The result is that any characters that do not exist in the encoding used for editing get replaced with something that does (often a question mark though at least one browser has been reported to actually transliterate text!).
 ====IE for the Mac====
-This relatively common browser translates to [[w:mac-roman|mac-roman]] for the edit box with the result it munges most Unicode stuff (usually but not always by replacing them with a question mark). It also munges things that are in ISO-8859-1 but not mac-roman (specifically ¤ ¦ ¹ ² ³ ¼ ½ ¾ Ð × Ý Þ ð ý þ and the soft hyphen) so the problems it causes are not limited to Unicode wikis (though they tend to be much worse on Unicode wikis because they affect actual text and interwiki links rather than just fairly obscure symbols).
+This relatively common browser translates to mac-roman for the edit box with the result it munges most Unicode stuff (usually but not always by replacing them with a question mark). It also munges things that are in ISO-8859-1 but not mac-roman (specifically ¤ ¦ ¹ ² ³ ¼ ½ ¾ Ð × Ý Þ ð ý þ and the soft hyphen) so the problems it causes are not limited to Unicode wikis (though they tend to be much worse on Unicode wikis because they affect actual text and interwiki links rather than just fairly obscure symbols).
 ====Netscape 4.x====
@@ Line 68: / Line 48: @@
 ====The workaround====
-<table class="wikitable" style="float: right; margin-left: .5em;">
-<tr>
- <td>In database and edit<br>box for normal browsers</td>
- <td>In editbox for<br />[[mw:Manual:$wgBrowserBlackList|trouble browsers]]</td>
-</tr>
-<tr>
- <td>œ<td>&amp;#x153;</td>
-</tr>
-<tr>
- <td>&amp;#x153;<td>&amp;#x0153;</td>
-</tr>
-<tr>
- <td>&amp;#x0153;<td>&amp;#x00153;</td>
-</tr>
-</table>
 After English Wikipedia switched to UTF-8 and interwiki bots started replacing html entities in interwikis with literal unicode text, edits that broke unicode characters became so common they could no longer be ignored. A workaround was developed to allow the problematic browsers to edit safely provided that MediaWiki knew they have problems.
-Browsers listed in the setting [[mw:Manual:$wgBrowserBlackList|$wgBrowserBlackList]] (a list of regexps that match against user agent strings) are supplied text for editing in a special form. Existing hexadecimal html entities in the page have an extra leading zero added, non-ascii characters that are stored in the wikitext are represented as hexadecimal html entities with no leading zeros.
+Browsers listed in the setting $wgBrowserBlackList (a list of regexps that match against user agent strings) are supplied text for editing in a special form. Existing hexadecimal html entities in the page have an extra leading zero added, non-ascii characters that are stored in the wikitext are represented as hexadecimal html entities with no leading zeros.
 Currently the default settings only have IE mac and a specific version of netscape 4.x for linux in the blacklist. Nevertheless it seems to have stopped most of the problem. Hopefully the default list will be expanded in future but that relies on getting someone with cvs access to commit the changes.
 ==Viewing==
-Most current browsers have some level of Unicode support but some do it better than others. The most commonly encountered problem is that Internet Explorer relies on preconfigured font links in the registry rather than actually searching for a font that can display the character in question. This means that Internet Explorer often has to be forced to use particular fonts. On English Wikipedia there are a set of templates to do this. For example {{tlw|unicode}} for general Unicode text, {{tlw|polytonic}} for [[w:polytonic Greek|polytonic Greek]] and {{tlw|IPA}} for the [[w:International Phonetic Alphabet|International Phonetic Alphabet]]. The stuff in [[w:Windows Glyph List 4|Windows Glyph List 4]] should be safe to use without such special measures.
+Most current browsers have some level of Unicode support but some do it better than others. The most commonly encountered problem is that Internet Explorer relies on preconfigured font links in the registry rather than actually searching for a font that can display the character in question. This means that Internet Explorer often has to be forced to use particular fonts. The stuff in Windows Glyph List 4 should be safe to use without such special measures.
 <nowiki><font face="Arial Unicode MS">...</font></nowiki> may work, but only for people with that font.
@@ Line 96: / Line 62: @@
 ==Displaying Special Characters==
-To display Unicode or special characters on web page(s), one or more of the [[w:List of typefaces#Unicode_fonts|Unicode fonts]] need to be present or installed in your computer, first. For proper working functionality, ''setup'' or ''configuration'' or ''settings'' from the web page viewing browser software also needs to be modified.
+To display Unicode or special characters on web page(s), one or more of the Unicode fonts need to be present or installed in your computer, first. For proper working functionality, ''setup'' or ''configuration'' or ''settings'' from the web page viewing browser software also needs to be modified.
-The default font for Latin scripts in [[w:Internet Explorer|Internet Explorer]](IE) web browser for Windows is [[w:Times New Roman|Times New Roman]]. It doesn't include many [[w:Mapping of Unicode characters|Unicode blocks]]. To properly view special characters in IE, you must set your browser font settings to a font that includes many Unicode blocks of characters, such as  [[w:TITUS Cyberbit Basic|TITUS Cyberbit]], [[w:GNU Unifont|GNU Unifont]] which are freely available.
+The default font for Latin scripts in Internet Explorer(IE) web browser for Windows is Times New Roman. It doesn't include many Unicode blocks. To properly view special characters in IE, you must set your browser font settings to a font that includes many Unicode blocks of characters, such as TITUS Cyberbit, GNU Unifont which are freely available.
-Special symbols should display properly without further configuration with [[w:Mozilla Firefox|Mozilla Firefox]], [[w:Konqueror|Konqueror]], [[w:Opera (Internet suite)|Opera]], [[w:Safari (web browser)|Safari]] and most other recent browsers. An optional step can be taken for better (and correct) display of characters with [[w:Ligature (typography)|ligature]] forms, [[w:Combining character|combined characters]], after the previously mentioned steps were followed, is to install a [[w:Unicode#Multilingual_text-rendering_engines|rendering engine]] software.
+Special symbols should display properly without further configuration with Mozilla Firefox, Konqueror, Opera, Safari and most other recent browsers. An optional step can be taken for better (and correct) display of characters with ligature forms, combined characters, after the previously mentioned steps were followed, is to install rendering engine software.
-To use one of the available Unicode fonts for displaying special characters inside a [[w:HTML Table|table]] or chart or box, specify the '''class="Unicode"''' in the table's '''TR''' row tag (or, in each TD tag, but using it in each TR is easier than using it in each TD), in [[Help:Table|wiki table]] code, use that after the (TR equivalent) "'''&#124;-'''" (like, '''&#124;- class="Unicode"''').
+To use one of the available Unicode fonts for displaying special characters inside a HTML table or chart or box, specify the '''class="Unicode"''' in the table's '''TR''' row tag (or, in each TD tag, but using it in each TR is easier than using it in each TD), in wiki table code, use that after the (TR equivalent) "'''&#124;-'''" (like, '''&#124;- class="Unicode"''').
-For displaying individual special character, template code '''&#123;&#123;Unicode|'''''char'''''&#125;&#125;''' for each character can be used. HTML decimal or [[w:hexadecimal|hexadecimal]] numeric entity codes can be used in the place of the ''char''. If a paragraph with lots of special Unicode characters need to be displayed, then, '''&#60;p class="Unicode">''' ... '''&#60;/p>''', or, '''&#60;span class="Unicode">''' ... '''&#60;/span>''' code can also be used.
+For displaying individual special character, template code '''&#123;&#123;Unicode|'''''char'''''&#125;&#125;''' for each character can be used. HTML decimal or hexadecimal numeric entity codes can be used in the place of the ''char''. If a paragraph with lots of special Unicode characters need to be displayed, then, '''&#60;p class="Unicode">''' ... '''&#60;/p>''', or, '''&#60;span class="Unicode">''' ... '''&#60;/span>''' code can also be used.
-The class="Unicode" is to be used in web page(s), HTML or wiki tags, where various characters from wide range of various Unicode blocks need to be displayed. If the special characters that need to be displayed on web page(s), are mostly covering fewer Unicode blocks, related to [[w:Unicode Latin|latin scripts]], then '''class="latinx"''' can be used. For special characters or symbols related to [[w:International Phonetic Alphabet|International Phonetic Alphabet]], '''class="IPA"''' can be used. For [[w:Polytonic orthography|polytonic (Greek)]] characters or related symbols, '''class="polytonic"''' can be used.
+The class="Unicode" is to be used in web page(s), HTML or wiki tags, where various characters from wide range of various Unicode blocks need to be displayed. If the special characters that need to be displayed on web page(s), are mostly covering fewer Unicode blocks, related to latin scripts, then '''class="latinx"''' can be used. For special characters or symbols related to International Phonetic Alphabet, '''class="IPA"''' can be used. For polytonic (Greek) characters or related symbols, '''class="polytonic"''' can be used.
 ==== Changing Internet Explorer's (IE) default font ====
-From the IE menu bar, follow this path''':''' &nbsp;{{nowrap|Tools -> Internet Options -> Fonts -> Webpage Font:}}<br>
+From the IE menu bar, follow this path''':''' &nbsp; <code>Tools -> Internet Options -> Fonts -> Webpage Font:</code>
-to a scrolling list of fonts.  As indicated above, the default selection for Windows is [[w:Times New Roman|Times New Roman]].  For viewing of many special characters, select a different font, such as [[w:Lucida Sans Unicode|Lucida Sans Unicode]], and then select '''OK'''.
+to a scrolling list of fonts.  As indicated above, the default selection for Windows is Times New Roman.  For viewing of many special characters, select a different font, such as Lucida Sans Unicode, and then select '''OK'''.
-==Egyptian Hieroglyphs==
-E.g. <nowiki><hiero>P2</hiero></nowiki> gives <hiero>P2</hiero> See [[Help:WikiHiero syntax]].
-This is not dependent on browser capabilities, because it uses images on the servers.
-Hieroglyphs could also be represented using Unicode.  However, browser support is likely to be near non-existent.
-== Shavian text ==
-* Copyleft font is available from [http://www.i18nguy.com/unicode/unicode-font.html here].
-==Linking text with special characters==
-Many users have settings giving underlined links. When linking a special character, in some cases the result may be mistaken for another character with a different meaning:
-Linking + − < > ⊂ ⊃ gives [[+]] [[−]] [[Inequality|<]] [[Inequality|>]] [[⊂]] [[⊃]] which may look like ± = ≤ ≥ ⊆ ⊇. In such cases one can better use a separate link:
-* A ⊂ B (see [[w:Subset|subset]])
-There is less risk of confusion if more than one character is linked, e.g. [[x|''x'' > 3]].
 == Alt Keycodes ==
-&#160;&#160;''See also : [[w:Alt codes|Alt codes]], [[w:Windows Alt keycodes|Windows Alt keycodes]]''
 Many special characters that have decimal equivalent codepoint numbers below 256 can be typed in by using the keyboard's '''Alt + Decimal''' equivalent code numbers keys.
@@ Line 142: / Line 88: @@
 Which means, first press the "Alt" key and keep on pressing it (or keep on holding it), with your left hand, then press the digit keys 1, 3, 0, in sequence, one by one, in the right-side Numeric Keypad part of the keyboard, then release the Alt key.
-But special characters, for example, &#955; (small lambda) cannot be obtained from its decimal code 955 or 0955, by using it with the Alt key, if used inside Notepad or Internet Explorer ([[w:Internet Explorer|IE]]). You'll get wrong character "╗" or "»".
+But special characters, for example, &#955; (small lambda) cannot be obtained from its decimal code 955 or 0955, by using it with the Alt key, if used inside Notepad or Internet Explorer (IE). You'll get wrong character "╗" or "»".
 The "Wordpad" (Windows Operating system) editor accepts the decimal (numeric entity codepoints) values above 256, so it can be used to obtain the Special/Unicode characters, then copy-paste where you need.
-To obtain such special characters correctly, which have decimal codepoint values above the 256, another option is to use or type its hex equivalent codepoint first, then press '''Alt+X''' keys. To do this, open or start ''Wordpad'', ''Word'', etc editing application software, (this Alt+X process will not work in Internet Explorer, Notepad, etc). Type in '''3BB''', which is a hexadecimal equivalent numeric codepoint of the character '''&#955;''', then press Alt+X. Hexcode ''3BB'' will convert/turn into the ''&#955;'' character. If you press the Alt+X key combination again, then &#955; character will convert back to its hex equivalent codepoint, ''3BB''. Now character(s) can be copy pasted, where you want to use, or, (in [[w:Internet Explorer|IE]]) use its html hexadecimal equivalent code &amp;#x3BB; or its html decimal equivalent code &amp;#955;.
+To obtain such special characters correctly, which have decimal codepoint values above the 256, another option is to use or type its hex equivalent codepoint first, then press '''Alt+X''' keys. To do this, open or start ''Wordpad'', ''Word'', etc editing application software, (this Alt+X process will not work in Internet Explorer, Notepad, etc). Type in '''3BB''', which is a hexadecimal equivalent numeric codepoint of the character '''&#955;''', then press Alt+X. Hexcode ''3BB'' will convert/turn into the ''&#955;'' character. If you press the Alt+X key combination again, then &#955; character will convert back to its hex equivalent codepoint, ''3BB''. Now character(s) can be copy pasted, where you want to use, or, (in IE) use its html hexadecimal equivalent code &amp;#x3BB; or its html decimal equivalent code &amp;#955;.
 ==See also==
-*[[Mapping of Unicode characters]]
-*{{ml|Help:Advanced editing|Special characters}}
 *[[Help:Displaying a formula]]
-*[[Help:URL]]
-*[[Help:Romanian characters]]
-*[[Help:Turkish characters]]
-*[[Runic alphabet]]
-*[[Alphabets derived from the Latin]]
-*[[Unicode#input methods]]
-*[[Windows Alt keycodes]]
-*[[Help:Wikitext examples]]
 ==External links==
@@ Line 172: / Line 107: @@
 * A [http://people.w3.org/rishida/scripts/uniview/conversion convertor] that helps you find the right escape sequence to use - helps when you need to escape ASCII/Unicode characters that are special characters in wiki markup
-[[Category:Wikipedia help]]
+[[Category:Help]]
-[[ar:مساعدة:يونيكود]]

Help:Special characters

Latest revision as of 16:27, 8 December 2009

Contents

[edit] Unicode and ISO 8859-1

[edit] Editing

[edit] Browser issues

[edit] IE for the Mac

[edit] Netscape 4.x

[edit] Console browsers

[edit] The workaround

[edit] Viewing

[edit] Displaying Special Characters

[edit] Changing Internet Explorer's (IE) default font

[edit] Alt Keycodes

[edit] See also

[edit] External links

Views

Personal tools

Navigation

Objects

Astronomical Reference

Help

Search

Tools