Ä ÆÝU

Bangla-Ring Minimal Glyph Set FAQ:

What is a Minimal Glyph Set?
What does ISO 8859 compliance mean?
Why do we need an MGS?
What is a Standard Glyph Set?
Why do we need the SGS?
Who make the MGS and the SGS?
How does all this fit into Unicode?
Is this compatible with ISCII?
Do you have a set of proposed rules to define the MGS?
Which softwares support MGS?
How do I make a suggestion for the MGS/SGS?

Answers:

What is a Minimal Glyph Set?

The Minimal Glyph Set for Bangla is an 8 bit font map for Bangla which is in a certain sense "minimally sufficient" but may not be "complete". The character map in the MGS is completely defined and can not be changed. This immutable and sufficient character set can be used to display all used characters of contemporary Bangla texts. However, the character set has empty spaces for private use in it that can be utilized for new characters. These redundant spaces can also be used to redefine a ligature for improved appearance. The MGS is also ISO 8859 compliant.

What does ISO 8859 compliance mean?

ISO 8859-1 is the character set commonly known as Latin 1. It can represent most Western European Languages. The original version of HTML supported only ISO 8859-1. Since then it has expanded its scope to include other character sets. ISO compliance for the MGS primarily means that it does not house glyphs in positions 0-31 and 128-159 of the font character set. These character ranges are reserved for control characters. By not using these spaces, the MGS becomes usable for all Web applications and Java.

Why do we need an MGS?

Currently there is no standard Glyph Set or Character Set that can serve as a standard 8 bit font for Bangla (or any Indic language for that matter). The available 8 bit fonts all differ from one another in the sequence and position of characters. They even differ in the actual repertoir of ligatures used in the font. Chaotic font development has led to two problems:

Absense of a standard font map. Fonts usable in one application are therefore un-usable in another one.
Because the range of applicability of any font is limited, there is no incentive for making well-kerned and well-hinted fonts for computer.

If the MGS is adopted as a standard then the above problems can be tackled and a new generation of fine fonts will appear which can be used by all applications.

What is a Standard Glyph Set?

The Standard Glyph Set for Bangla is a "complete" and "sufficient" font map that is likely to appear by concensus, once the MGS is in use for a while. The SGS will "embed" the MGS, but may use some or all of the private area spaces of the MGS to expand the sope of the font. There may even be several SGS's, named according to some naming convention which will differ from each other in the way the private area spaces are used.

Why do we need the SGS?

The SGS will be the first step toward standardizing the 8 bit Bangla font. The SGS will inherit the benefits of the MGS. In addition it will be more versatile, possibly with specific standards targetted at specific uses of the language.

Who make the MGS and the SGS?

All standards must be evolved by users of the languages. The MGS and SGS will be developed by Bangla font users from all over the world. Parabaas proposes the Bangla Ring an open, public consortium to facilitate this process.

How does all this fit into Unicode?

Unicode is a standard for character maps. For Indic languages, the font map standards as described above (MGS and SGS) have simply not emerged yet. The character maps that have been standardized comprise of only elementary characters. Unicode fixes the position of the letter "ka" for instance but says nothing about where to put the glyph for the compound character "kta" in a Unicode font. Any text can be coded in Unicode by expressing compound characters in terms of elemntary ones. Simple rules can be formed (for instance by using "hasant" as a control character) so that the actual Bangla text can be retrieved from the Unicode encoding. However, Unicode does not dictate how the text is to be displayed in terms of glyphs. In practice, a Unicode font for Bangla must use the private use area of the font to house many compound glyphs. Indeed there is not enough space in the area ear-marked for Bangla to house all of them. However, there is no standard character map for the compound glyphs one needs to house in the private use area. The MGS and the SGS may help us to make a standard Unicode-MGS and Unicode-SGS for Bangla one day.

Is this compatible with ISCII?

ISCII is the equivalent of ASCII for Indic languages as far as text encoding is concerned. It is also just a character map standard for the elementary characters in languages where the character set has a large intersection with the character set of the Brahmi script. In fact the Unicode standard for Indic languages is derived from ISCII. Again, ISCII is purely concerned with text encoding, not text display. Hence MGS and SGS are not in conflict with either ISCII or Unicode. Instead they try to solve a problem left unsolved hitherto by text encoding schemes.

Do you have a set of proposed rules to define the MGS?

The following problems are encountered while trying to formulate a standard Glyph Set for Bangla.

There is not enough space to justify a distinct glyph for every compound character. The ISO compliant font has exactly 190 spaces available for glyphs. Out of this, 38-40 can be used for punctuation characters, special characters and numeric characters. This leaves us with about 150 spaces to fit the font into.
Compound characters may be displayed by using two or more distinct "reduced" forms of elementary characters. But there is no generally accepted forms for all these characters, and font foundries differ widely in their preferences. For instance one font may have a single glyph for "bdha", while another may use two glyphs to display the same. Even elementary characters may be displayed as combinations of two glyphs. For example in ItxBeng, "ka" and "pha" are combinations of two glyphs.
The "elementary" set of glyphs contains many "reduced" forms which appear in the text only in combinations. An example is the "ra phalaa". Depending on the character the "ra-phalaa" is applied to, its position and shape may differ. Hence some fonts use several different glyphs for the "ra-phalaa". The standard font map must somehow justify their number, shapes and positions. There is no literature to guide us in this exercise.

To some extent the space restriction actually helps us to zero in on a unique determination of the MGS. Severe contraints must be applied on the glyph set for it to fit within the 150 allowed spaces. A lot of ambiguity is removed if we adopt general conventions for reducing the glyph number and adhere to those conventions in all cases. Following is a summary of basic principles that we advocate using in the determination of the MGS.

Maximize Kerning: There is no need for multiple instances of a glyph simply because the glyph must be displaced horizontally by different amounts in different compound characters. The correct solution is to kern whenever kerning can solve the problem.(The font can allow you to shift a pair of glyphs horizontally, toward or away from each other, when they occur next to each other in a text. This is called kerning). This principle can reduce the over-usage of spaces for "ra-phalaa" and "u-kaar" for instance.
Minimize Special Ligatures: Some compound characters and special characters like "~ncha" or "shu" are so radically distinct in shape that a space must be devoted to them. However "nda" can be displayed as a combination of a "reduced" "na" and a "reduced" "da". Whenever a reusable set of "reduced" elementary characters can reasonably resolve a compound glyph, dispense with assigning a space for that compound glyph. Which compound characters are really special and which are not is still a subjective question. In case of doubt, MGS will settle the question by voting for the combination display. Spaces saved in this manner can always be re-used for the same special glyphs in the private usage area.
Retain one space for each elementary character: The elementary character "aa" will be regarded by most people as a compound glyph formed from the glyphs of "a" and "aa-kaar". MGS will define a kerning suitable for this purpose, and MGS compliant software should use this compound form of the glyph. Nevertheless, in the font character map a space will be left blank following "a" to house a special rendition of "aa". This blank space is to be regarded as belonging to the private usage space. Similarly "ai" can be formed from "e" and a special glyph (a "shNuR"). The "shNuR" is a desirable glyph in the MGS since it can be re-used for "au". The MGS solution is to keep a blank space for "ai" and "au" and put them in the private usage area. A space must be allotted to the "shNuR" somewhere. The basic idea is that elementary glyphs must have their own spaces arranged alphabetically in the font set. If some of them are clearly compund glyphs and spaces can be gained by displaying them as compounds, we will retain blank spaces for them and count these spaces as part of the private usage area. This will allow an MGS compatible font to be used as a character code for encoding text as in ISCII.
Order alphabetically: The glyph set is to be ordered alphabetically. Special characters are to come last and will be arranged alphabetically according to the first character they are used in making.
Retain Punctuations: As mentioned before, common punctuation marks as appearing in the ASCII charset are to be retained in their usual positions. This will make the use of these fonts in applications supporting a single language or font possible.

Careful consideration must still be given to individual cases, because the above basic principles may not be enough to resolve all issues. A sample for the MGS is proposed below with arguments for deciding ambiguous cases. This proposal should be regarded as just a starting point. Comments and counter-proposals are invited. It is hoped that a workable version of MGS will emerge in a short time from these discussions.

10. Which softwares support MGS?

The members of Bangla Ring will be committed to use the standards adopted in this forum and use them in their computer applications. Applications developed or commisioned by them will support these standards. Whenever a Web page needs a downloadable shared font, an MGS compliant font will be used. Parabaas is making a multilingual Word Processor freely available. Parabaas Axar will serve both Web and Print publications. It is fully MGS compliant.

11. How do I make a suggestion for the MGS/SGS?

Bangla Ring is an open public forum to discuss issues relating to the MGS and SGS and to facilitate the evolution of the standard. All comments, suggestions, questions should be mailed to the WebMaster. These will be postedon this page. All issues are to be resolved in the open moderated discussion section of Bangla Ring.

The Sample MGS

(We will update this section as changes are suggested and made.)

Annotation:

Empty Spaces must be empty for ISO-8859 compliance. However some amount of ISO-8859 violation may still work with today’s browsers. Although there will be problems with Java.

Spaces with blue text match Latin 1 characters.
Spaces with Red Cross (X) belong to private use area.
Spaces with Red text other than a Cross also belong to private use area, but are filled with suggestions. These may be considered last for private usage.
Spaces with magenta text actually denote "reduced" suffixes or prefixes which I couldn’t find in the font I used. So I have used compound glyphs that have these suffixes or prefixes. The only exception is r which should be a special glyph with no "shNuR". This glyph can then be used to form both þ and r.
Here is a detailed account of the logic behind the construction:

0-64: Up until char 64, the Latin 1 charset is followed, with some characters colored red – these are characters that may not be vitally important to the Bangla font set. The numerals will of course be in Bangla. This section of the charset is simple to understand and agree to (one hopes).
65-117: "aa", "ai", "au", "~na", "ra", ".Da", ".Dha" and "Ya" are absent from the MGS. These should be defined by proper kerning of a pair of other glyphs. (For example "ai" comes by kerning in char 235 with "e" and ".Da" and "ra" come from "Da" and "ba" by kerning in char 237 and char 236 respectively. "Ya" (or "fa") comes by kerning in char 236 with "ya"). However there is an empty space for each one of the missing characters. Also, some special characters from the ASCII set have been left alone. This section is also simple to understand and we should have a quick agreement one way or another about the absent characters and the special characters from ASCII (which are in blue).
118-167: Apart from the 32 space gap, this interval houses the matras. Need two matras for "u", "U" and "R^i" because they may be applied to characters extending down to the baseline or beyond the baseline. Vertical displacement of this sort can not be handled by kerning. There are two issues in this section: a) How many "u"s, "U"s or "R^i"s? We have chosen the minimum (2). b) How many "e"’s and "ai"s. We have chosen 1 each, preferring to make the other forms (with matra on top) by kerning in char 175 (which can be the top matra in this charset).
168-186: The reduced characters that serve as prefixes. This and the next section are the only ones that need very careful handling. I had identified the following characters as needing a prefix form: ka, ga, ~Na, cha, chha, ja, ta, da, na, pa, ma, la, sha, shha, sa. kha, gha, chha, jha, ~na, Tha, Dha, Na, pha, bha don’t seem to have any ligatures other than a "ra-phalaa" or a "ba-phalaa" which can be applied to the un-reduced character itself. How to write "kophtaa"? AdarshaLipi does not have a prefix for "pha" and also does not have a special glyph for "phta". So I think we can write it with the ordinary "pha". But I may be wrong. Ta and Da have "TTa" and "DDa" which I treated as special. "ha" has "hla" which is usually formed using the un-reduced "ha" (but I need confirmation of this). Initially I had intended to keep 2 forms each for "ga", "na", "pa", "ma", "la", "sha" because they all come with a "dNaRi" in the rear and in ligatures may either shed the "dNaRi" or keep it (think of "shcha" and "shma" or "pTa" and "pla"). In the end I retained 2 forms for only "na". We can probably use the un-reduced "pa" for "pla". Am I missing a needed prefix? Maybe I should include more here just to be safe. Which ones are best candicates?
187-213: Reduced characters that serve as suffixes. This is the other hard section. kha, ga, gha, ~Na, Ta, Tha, Da, Dha, pha, sha, shha, don’t seem to have special suffix forms. "~na" does occur in "yaach~naa", but ordinary "na" can be used for that purpose. "tta", "tra", "bhra" and "kra" all appear as suffixes too. There are two forms each for "na", "ra", "la" and "ba" which seem to occur very frequently as suffixes under all sorts of characters. Is this completely justified? (The small and nearly invisible characters in positions 208-212 are two "ra"s and two "la"s). Am I missing a needed suffix?
214-238: Special characters. Chars 220 and 221 appear as single quotes in the HTML – this is probably due to non ISO compatibility of AdarshaLipi. They should be "~ncha" and "~nja". The last two dots are the dots for "ra" (and "Ya") and ".Da" (and ".Dha"). The one that seems to be in doubt is "DDa". And the position of the last 8 glyphs can be changed so that they come before the special glyphs (i.e start at char 214).