The Urdu Internet and Making it Pretty

Published: 03.14.08 / 7pm | Category: Internet, Pakistan, Urdu | Author: Nash

Six years ago in 2002, we barely had a million cell phone subscribers in Pakistan. Then came deregulation and a number of foreign players. Today we have a whopping 70+ million cell phone users which is more than 50% population penetration.

Now the government and the private industry is trying the same thing again, this time however with broadband and internet access. The Internet Service Providers Association of Pakistan (ISPAK) estimates we have 2-3 million internet users (March 07 estimates). With WiMax rolling out in Pakistan and heavy investment in internet services; the number of internet users over the next 5 years is expected to grow just as exponentially as cell phone users (10 million broadband users by 2010 according one estimate).

So, the question is with all that much internet; where will all the Pakistanis go? Alexa is a service that estimates the traffic received by various websites. You can segment the traffic by country. Here are the top 10 websites by traffic for Pakistan according to Alexa:

  1. Google.com.pk (Search)
  2. Yahoo.com (Search)
  3. Live.com (Search)
  4. Google.com (Search)
  5. Orkut.com (Socializing)
  6. Facebook.com (Socializing)
  7. Youtube.com (Video content)
  8. Msn.com (Socializing)
  9. Rapidshare.com (Warez downloads)
  10. Wikipedia.org (Encyclopedia)

What does this tell us? Search is the #1 thing Pakistanis like doing, #2 is socializing, #3 is video, #4 is downloading, #5 is education. Now that’s all great but you may notice that most of these websites require the surfers to know English. If we’re going to achieve higher penetration, we need to focus more on localization. And what do I mean by localization? I mean using local languages, slang and geographically specific stuff. This entry is about Urdu and how can you use it well.

Using Urdu

Of all the people in Pakistan who can read, the number of people who can read Urdu is much larger compared to those who can read English. Here is the breakdown of languages in Pakistan according to the CIA Factbook:

Language Share
Punjabi 48%
Sindhi 12%
Siraiki 10%
Pashtu 8%
Urdu 8%
Balochi 3%
Hindko 2%
Brahui 1%
English, Burushaski and others 8%

Other than Urdu and English, all the languages listed above are specific to certain provinces and are not widely spread in Pakistan. Urdu and English however have a general presence all over the country. So, it makes sense for websites targeting Pakistan to incorporate Urdu and other domestic languages. But just how hard is that?

It is possible, there are three ways:

  • The instant way
  • The hard (wrong) way
  • The right (ideal) way

The Instant Way

Recommended for companies that have English content but not Urdu content as of yet but would like to reach the Urdu audience.

Around three two years ago, an extremely talented and hardworking batchmate of mine from GIKI, Zeeshan Ahmed started a project for English-to-Urdu translation. Around a couple of years four months ago he launched his website PakTranslations.com. The idea is simple, you can browse the web using the website and his service will translate pages for you to Urdu on-the-fly. One simple way of achieving this is to prefix any URL with PakTranslations.com? Here’s an example of the Pakistan entry on Wikipedia

http://Paktranslations.com?en.wikipedia.org/wiki/Pakistan

The translation service is always improving; Zeeshan’s goal is to keep incorporating feedback into the service for better accuracy (human aided learning, if you will). Eventually he wants to focus on other local languages like Sindhi and so on.

The Hard (Wrong) Way

When the internet was young, everyone wanted to start sharing stuff right away. The technology was not mature enough especially for internationalization and localization such as Unicode. There were powerful Urdu editing tools during that time. Most notably Urdu Inpage from Concept Software. The problem with InPage was that it didn’t work in Unicode and the best way to export Urdu text from InPage to other programs was exporting as an image. The process is very painful, enough to make any intelligent man scream but that was the only way we knew how to make an Urdu website. Unfortunately, a lot of people still use this technique. There are a lot of websites that have spent thousands of hours generating images of Urdu. So why is it such a bad idea?

  • You can not copy, select or manipulate the text
  • The layout is fixed, the designers cannot do anything about the text
  • Search engines do not understand what is written and completely ignore the images.
  • Images instead of text makes your content very non-discoverable via blogs (which like to quote text), search traffic etc.
  • People who have trouble reading the images cannot change the contrast, size and other factors which they can with text
  • Using images limits what technology can do with your text. For example, there cannot be a screen-reader which can read for the visually impaired.

So, unless you’re in the 1990s, please stop using this techniques. Please spare your employee Ashraf from generating images all day and put him to more meaningful work.

The Right (Ideal) Way

First the bad news: the ideal way doesn’t exist yet. But its becoming more and more possible everyday.

In this day and age (circa 2008), every browser supports Unicode and supports Urdu scripts. Unfortunately, the default Urdu scripts are hideously ugly. Here are a few samples:

Times New Roman
Arial
Courier New
Georgia
Trebuchet
Verdana

Note: All screenshots were taken on Firefox 3.0 beta 4

But despite the look, a number of these fonts are MUCH better than using images. Because the internet was invented by Europeans and Americans, it is indeed optimized for western languages. Till very recently, having Unicode domain names was not even possible. Almost every system comes with the above mentioned fonts. But there is no good-looking, nay, beautiful font for Urdu.

So, if you have decided to use Urdu text, what do you do if you want your text to appear just as good as you see in local Urdu newspapers and books?

Install The Font on the Client System

This is the technique adapted by BBC Urdu, Deutsche Welle Urdu etc. Both have done a very decent job of taking the right technical direction with their websites. BBC Urdu has especially capitalized on Urdu with its technical investment in the Urdu surfers experience. This has certainly paid off; BBC Urdu is the 13th most visited site from Pakistan and the #1 news site for Pakistan (as of 14th March Friday).

This approach is pretty decent but most of these websites focus only on Windows or worse only Internet Explorer which now has a reducing 55% market share. Another problem with this approach is that not every user is comfortable in downloading and installing executable (EXE, DMG, BIN etc) files from your relatively unknown website.

Another approach some sites use is to just give you the font and the instructions on installing it on your system such as Urdu News. These instructions are typically only for Windows and are hard for the uninitiated user.

So how do we make the end user experience better? No font installation and with pretty fonts that make you want to love the website?

There are two ways here again:

CSS @font-face Module

In CSS 2.0, there is a pretty neat feature that allowed you to define any type of font you needed in the CSS and place the font on your webserver. The browser did the rest. No installation and no messing about. Unfortunately, this was removed in CSS 2.1 because not a lot of browsers were supporting it. Except one: Microsoft’s Internet Explorer 4! It was not based on standards but it worked. Till this day, All IE versions (4+ upto IE 8 beta 1 at the time of writing) support limited .eot support.

@font-face came back again in CSS 3.0 Font Module. The embedded font technique is now supported in the following browsers:

IE, Opera, Safari or webkit

Who doesn’t yet? Firefox. Its on their brainstorming list and there is a long running ticket since 2001 on @font-face support. But no progress yet. Considering the marketshare of the browsers

2008
IEs (5, 6 7)
Firefoxes
Safari
Opera
February
54.4%
37.6%
2.0%
1.4%

from <http://www.w3schools.com/browsers/browsers_stats.asp>

Firefox is a significant target audience and one you cannot ignore. There is a work around by Kashif Hisam from Pakistan Data Management Services. He has written a Firefox 2.0.0.x plug-in that supports EOT files much like IE. However, it only works for Windows and does not currently work on Firefox 3 betas. You can visit his PDMS WEFT Plug in page here.

So, you can use a combination of checks in your CSS files. For IE, you can use EOT, for Safari & Opera you can use @font-face and for Firefox 2.0 you can use the PDMS WEFT plugin to support EOT. Its not the most efficient solution and it is limiting but until all browsers support CSS 3.0 @font-face module, there is not much that we can do.

Or is there?

Possible Workaround: sIRF or Scalable Inman Flash Replacement

There is a decent workaround written by the hardworking folks like Mike Davidson and Mark Wubben.

According to Mike, sIRF is…

“…a method to insert rich typography into web pages without sacrificing accessibility, search engine friendliness, or markup semantics. The method, dubbed sIFR (or Scalable Inman Flash Replacement), [it] is the result of many hundreds of hours of designing, scripting, testing, and debugging…”

Mike has an excellent writeup on how to use sIRF on your websites here. I recommend it to anyone looking for a workaround.

Basically, sIRF uses Adobe Flash technology to embed fonts in your website. It is designed in such a way that in case the browser does not have flash or has javascript enabled, it falls back to plain-text mode. Also, it uses plain text underneath the flash which means search engines and screen readers can also read this text and make it discoverable.

You can find a wealth of more information on sIRF on their website here.

But we warned, Flash is known to be notoriously bad when dealing with right-to-left languages, I have not played with sIRF yet, toy at your own risk!.

Bonus: Urdu Fonts

There are a number of Urdu fonts out on the web. Some are commercial, some are free to use. Other than the default fonts that come with your system, almost all of these fonts are badly made :( maybe someday, someone will come and write a great opensource, free font. If you know of any, please let me know. Here is a list of Urdu fonts that I thought to me worthy. (I was not able to run the PDMS font on Fx, anyone has any idea?)

Arial
Nafees Nastaleeq (CRULP)
Nafees Pakistani Naskh v2.01 (CURLP)
Nafees Riqa v1.0 (CURLP)
Nafees Web Naskh (CRULP)
Nafees Naskh v2.01 (CURLP)
Aleem Urdu Unicode (Used by Deutsche Welle)
Nafees Pakistani Naskh v2.01 (CRULP)
Urdu Naskh Asiatype (Used by BBC)

The problem with most of these fonts:

  • Badly dimensioned. All the fonts you see above are sized at 18 pixels. However, almost no font is 18 pixels tall. This creates significant layout problems for designers working in EMs. Some fonts are outrageously disproportionate, like Nafees Nastaleeq from CRULP which can be as much as 2.5 times the given height. Update: A little bit more elaboration on this point on Faheem’s comment; I set font-size to 16px and then generated the following fonts:
    Arial em3.png
    Nafees Nastaleeq (CRULP) em1.png
    Nafees Pakistani Naskh v2.01 (CURLP) em4.png
    Nafees Riqa v1.0 (CURLP) em5.png
    Nafees Web Naskh (CRULP) em2.png
    Urdu Naskh Asiatype (Used by BBC) em6.png

    The grid in the background is 16px (the little boxes are 8px). In theory, the text should not ‘bleed’ out of the 16px height. if the text does, this will create layout problems. Typographers use the font height (called the ‘em’) as a basis of layout. For example, the images are always defined in terms of multiples of the line-height, the line-height is defined as a multiple of the font-height and so on. Arial, Nafees Riqa, Nafees Web Naskh are pretty disciplined compared to the remaining three.

  • Bad numeric support. Most of these fonts have bad numeric digit rendering.
  • Small size legibility. Most of these fonts become unreadable at 12 pixels and thus you have to resort to using large (16px+) sizes.
  • Bad or missing Roman rendering. Some of these fonts do not display English terms inline with Urdu (common requirements).

Based upon these details; my personal favorite is Nafees Riqa and Urdu Naskh AisaType.If there is a font in the list that I have missed out, please let me know.

In Conclusion

So in short:

  1. If you’re targeting Pakistan, Urdu has to be in your strategy
  2. Don’t use images for text, use Unicode. Times New Roman is a good commonly available choice.
  3. Until @font-face gets implemented across all browsers (read: Firefox), consider using a combination of EOT (IE), @font-face (Opera & Safari) and fallback or EOT plugin (Firefox).
  4. sIRF can be a possibility to use as a fix but it is short-term and requires javascript as well as flash.

Updates: Added details on buggy font heights, added conclusion, fixed the timelines for PakTranslations.com at Zeeshan’s input

Muhammed Nasrullah is the CEO of ByteSense (Pvt.) Limited. We make custom technology solutions and are software specialists.


About this entry