Friday, April 11, 2014

Aadhar: A Look from First Principles of Technology



My two blog posts [of Mar. 28, 2014 and Apr. 8, 2014] on the legal and constitutional implications of the Aadhar project evoked strong responses - calls, e-mails, comments and the ilk - from some of my close friends, ex-classmates and ex-colleagues. 

Some said that, by focusing on the legality of the scheme, I was guilty of missing the forest for the trees. Others argued that the benefits of Aadhar were extensive enough to ignore such minor technicalities

The end justifies the means, doesn’t it?” they posed validly

So the scheme was keenly debated each time. But, in each debate / discussion, neither side was convinced about the other side’s views, claims or arguments.  

And then, I watched on YouTube an interview of Sri. Nandan Nilekani with CNN-IBN’s Sri. Rajdeep Sardesai. In that interview, Sri. Nilekani said, “...once anybody looks at the (Aadhar) scheme from first principles, they will come to the conclusion that this was the right way to do it...”

That was an invitation for me to look at the technological first principles behind Aadhar and to attempt verifying and validating Sri. Nilekani’s claim. Well, he had thrown down the gauntlet and how could I back off? 

Caveat: While I am not cynical of “unique identification” per se, I am surely critical about the way UIDAI has gone about the Aadhar implementation. 

Technology Pieces to the Aadhar Puzzle  

As is well-known, the UIDAI paradigm consists of four core operational components: 
  1. Resident Enrollment: Resident data (demographic and biometric) is collected through an application process by Enrollment agencies. Biometric data captured includes all ten finger prints, photograph and both iris scans. The collated resident information is then verified and submitted to the Central Identities Data Repository (CIDR) through designated Registrars. The CIDR runs a de-duplication check to ensure that the resident is not already enrolled.  
  2. Resident Data Storage: The Central Identities Data Repository (CIDR) is at the heart of the UIDAI system. The repository contains demographic and biometric data of all enrolled residents. It stores resident records and issues unique identification numbers based on verification and authentication of resident data.  
  3. Resident Data Update: Updates are periodically effected (based on applications for editing / amending demographic data) to reflect changes in resident data.
  4. Identity Authentication: Registrars will create infrastructure for enabling both online and offline authentication of individual identities based on data contained in the central repository, i.e., CIDR

Resident Enrollment  

Biometric data collection poses major technical challenges. Indeed the success or otherwise of ‘unique identification’ or ‘validation of identity’ of residents depends entirely on the quality, consistency and reliability of biometric data capture and pattern recognition. Hence, let me first examine issues related to biometric data collated by UIDAI.  

A) Fingerprinting 
UIDAI has selected 10 fingerprints, i.e., three slap fingerprint images (4+4+2), as one of the core biometric modalities for unique identification. The quality of fingerprint data captured is a key success factor for the success of ‘unique identification,’ which is accomplished through pattern recognition using algorithms. Fingerprint patterns are aggregates of ridges (arches, loops and whorls), minutia points, etc. 

UIDAI has standardized on scanners based on optical fingerprint imaging. It is well-known that as a technology, optical fingerprint imaging is sensitive to:
  • Scratched or dirty touch surface of the scanner will produce poor quality images of fingerprints.
  • Imaging capabilities are affected by the quality of skin on the finger. A dirty or marked finger is difficult to image properly. Most Indians, particularly in rural areas, cannot be expected to have clean fingers.
  • Eroded outer layer of skin (due to aging or hard labor or damaged papillae or otherwise) may wipe out the finger ridges to the point where fingerprints are no longer visible.
  • In cold temperatures (typically below 15°C, which is common in India, especially during winters), fingers loose moisture. This leads to dry fingertips, which present poor quality image scans. Even though the scanner registers it, the fingerprint fails during matching.
  • Besides the BioScan-10 fingerprint scanner, of BioEnable Technologies, chosen by UIDAI for enrollments does not have ‘live finger’ detection (typically done through detection of blood flow in fingers). This means that the equipment can be fooled with fake fingers or images of fingerprint.  
Hence, with such practical issues and technological challenges in the quality and reliability of fingerprint image capture to tackle, many questions arise. 

To begin with, what is the efficacy of fingerprint pattern recognition and effectiveness of identity authentication in the UIDAI system? How reliable will scanners that are deployed in hot, dusty environments during the enrollment process be in producing good images of fingerprints? What is the average quality level of fingerprint scans across all demographic segments? 


B) Iris Scanning
Iris patterns are no doubt complex, random and unique. They lend themselves to pattern recognition and hence identity authentication. It uses camera technology and subtle infrared illumination to acquire images of the detail rich, intricate structures of the iris. Besides speed of matching and low probability of false matches, iris scanning offers the advantage of stability of an internal, protected, yet externally visible organ for identity authentication.

Nevertheless, iris pattern recognition too offers some practical and technological difficulties, such as:
  • Iris scanners are sensitive to lighting levels and hence, accuracy and efficacy can be affected by changes in lighting.  
  • Iris recognition is susceptible to poor image quality and can be tricked using images generated from digital codes of stores irises.
  • Alcohol consumption causes recognition degradation, since the pupil dilates / constricts causing deformation of the iris pattern.
  • Cataract surgery too can cause iris texture changes, thus making pattern recognition no longer feasible.  
Thus, it is apparent that iris recognition too entails uncertainties about effectiveness. 

Database Management System Architecture  

It is further unclear as to how the UIDAI data repository is designed and implemented. Is the database distributed in a n-tier architecture? Or, is it centralized in a monolithic database? If it is the former, then what design precautions have been taken to ensure data integrity and consistency? If it is the latter, what mechanisms are in place to ensure data availability?

Further, it appears the impact of network and application architecture on system performance has also not been either thoroughly tested or made public. 

Identity Authentication  

Another area of the Aadhar system that lacks clarity is the robustness and reliability of identity verification. For instance, what is the incidence of false negatives and false positives in the system during identity authentication (which is the primary revenue-generator)? Further, it is not clear if any consistency and reliability tests with regard to identity verification were carried out on the Aadhar application to evaluate its effectiveness.  

The impact of network and application architecture on system performance has perhaps also not been tested.

Aadhar Proof-of-Concept (PoC)

The UIDAI Proof-of-Concept (PoC) was restricted to the resident enrollment process. Further, the Biometric Technology in Aadhar Enrollment report states that the PoC only looked into false positive identification rate (FPIR = 0.057%) and false negative identification rate (FNIR = 0.035%) in the enrollment process. 

Besides, the PoC study was carried out in South India during the summer months, when the impact of many aggravating conditions (such as, dry fingertips, ambient light, etc.) would be minimal.

Hence, it is unclear as to what percent false positives and false negatives would have resulted from a study of end-to-end, "enrollment to de-duplication to repeated identity authentication" process

Further, the PoC study did not focus on the system’s effectiveness in detecting manipulated biometric submissions (e.g., left hand of one person, right hand of another person and eyes of a third person for scans) for creating fake and fraudulent identities. Were any volume or stress or load tests carried out to determine system robustness and reliability? 

A relevant point to be noted is that the UIDAI Biometrics Standards Committee, in its report titled ‘Biometrics Design Standards for UID Applications’ concluded that two factors raise uncertainty on the extent of accuracy achievable through fingerprints. First, the scaling of database size from fifty million to a billion has not been adequately analyzed. Second, the fingerprint quality, the most important variable for determining accuracy, has not been studied in depth in the Indian context. 

The report goes on to claim that biometric software needs to be tuned to local data. If the software is not tuned, it can generate additional errors in the range of 2 to 3%. As per the report, an unchecked operational process too can increase the false acceptance rate to over 10%. It is not clear how UIDAI went about addressing these issues in the PoC and beyond. 

Conclusion  

In the overall analysis, the PoC did NOT really look into the end-to-end process of fingerprint data capture, comparison and matching studies over a protracted period of time to truly simulate real-world conditions and to ascertain true error rates of identity authentication. This is contrary to what one would expect for a project with as huge a proposed expenditure outlay as the Aadhar scheme. 

Thus, my evaluation of the UIDAI system from first principles yielded more questions than answers. Indeed whether the cumulative effect of all system inadequacies, inefficiencies and inherent weaknesses have been factored into risk assessment is unclear.

So then, will Sri. Nandan Nilekani accept that, at a bare minimum, the Aadhar system has been implemented hastily?

Regardless, the questions that will plague people's mind are: 
  1. Did due diligence suffer because it was public money being spent at UIDAI?   
  2. Would Sri. Nilekani have rushed through with the execution of any such project at Infosys?
Well, I guess those questions will never be answered satisfactorily. 

I only hope though that UPA and Sri. Nilekani's haste does not lead to waste on Aadhar!!

Thursday, January 26, 2012

Reviving This Blog; Announcing My First Book

Hello everyone! Three years without a blog post. That is not good!

Well the fact is I had taken a long blog-posting sabbatical to devote my creative juices and energies for penning a book, my first one. The manuscript of “Gita and the Art of Selling: Memoirs of a Sales Yogi” is all ready; and, a contract with the publisher (Leadstart Publishing) all signed, sealed and delivered.

So I say in an Arnold Schwarzenegger-esque (and The-Terminator-like) vein: “I’m back!”

The Shrimad Bhagavad Gita ["Song of (the) Blessed Lord"], often shortened to simply The Gita, is a sacred Hindu scripture. Authored more than a couple of millennia ago, it is one of the greatest texts in the history of theology, literature & philosophy. It has been described as a lodestar of eternal wisdom that can inspire anyone to accomplishment and enlightenment.

The Gita embodies the teachings of the Divine One, Lord Krishna a reincarnation of Lord Vishnu, the protector of the universe in the Hindu pantheon of Gods. At the start of the Kurukshetra War in the epic Mahabharata, the warrior prince, Arjuna lays down his arms on the battlefield refusing to fight his own cousins. The Almighty dispels Arjuna’s tumult and turmoil through His discourse, the Bhagavad Gita.

But what is the tenuous relevance of The Gita to selling? How does a spiritual and philosophical magnum opus relate to business matters? The truths and the tenets espoused in The Gita are amazingly germane to the executive experiencing flap and dither on the battlegrounds of sales. In this book I hope to connect the dots between the precepts of The Gita and best practices of selling.

Selling is pervasive in modern society. It is an integral part of every walk of our social and economic lives. Professional selling revolves around trade and commerce; yet, the process of preparation for sales is a science and its execution an art. So mastery in the profession requires flair and finesse in the artistic, scientific and commercial elements of the craft.

There are umpteen books on the process, techniques and methodologies of selling. All of them invariably adopt a regimented, structured and scientific approach to the subject matter. Not only do these theoretical publications make for some vapid reading, but they also fail to flesh out the subtleties of sales and the artistry involved in its practice.

In this book, “Gita and the Art of Selling: Memoirs of a Sales Yogi,” I have attempted to use the fine art of story-telling to convey the essence of selling and sales management. It chalks up inspiration from ancient literary masterpieces, like Hitopadesha, Panchatantra, etc., which wed education and entertainment to convey esoteric messages on purity and morality.

The retro storyline of the book is woven around a protagonist, Mahesh Kumar, who joins a nascent outfit, BCL (Bharat Computers Ltd.) and moves up the rank and file of its hierarchy. He learns traits essential for sales success; and acquires qualities crucial for effective sales management. Mahesh narrates ordeals and occurrences from his stint at the fledgling enterprise that grows rapidly into a behemoth. His career progression too is equally impressive.

Thus Mahesh describes the unique sales culture and conventions at BCL termed 360°-Selling that create a shining example of entrepreneurial success in corporate India. He discerns the fascinating links between apposite tenets in the Gita and prudent sales practices.

Mirroring the Gita, this novel has eighteen chapters. Each chapter details a major event or a milestone in Mahesh’s life. All chapters are enjoined into one seamless narrative thread. The memoirs are based on a rhetorical mode of narration for story-telling; yet use exposition to explain sales theory. The insights expressed in each chapter links to a canon espoused in the Gita.

The novel has been penned as a funnily-serious or seriously-funny (despite the oxymoron) book that reads like a thriller, feels like a soap and serves as a sermon. In some ways, it parallels the mishmash of a typical Bollywood potboiler with four “-tions” in it action, emotion and recreation; and education added to the mix.

The book has been scribed with three cornerstones in mind:

1. Style: The body has a narrative style ideal for light reading. The syntactic and semantic presentation is such that the fictional story flows smoothly and reads like a “humorous thriller.” At a superficial nuts and bolts level, I hope the book makes for nice and nuanced reading in English. The plot of this memoir develops like a Sagen, but attempts to achieve the effect of a Märchen. It bridges the traditional chasm that demarcates fiction and non-fiction.

2. Substance: The exposition of sales-related content is non-fictional & non-pedantic. With its 'learn while you laugh' or ‘laugh while you learn’ mode, the novel aids learning. It unravels the processes & practices that are the bedrocks of sales. The building blocks of sales are described using straight-from-the-gut stories. Also exposed is the blueprint for sculpting a big-league sales team.

3. Soul ‘n Spirit: At a sublime, spiritual level, this essay serves as a self-help or motivational book, which professionals can use for selling skills improvement, personality development etc. The hope is that the novel will trigger sales success and business achievement for the reader. If the material provides inspiration for taking a leap into entrepreneurship, that is icing on the cake!

My target audience for this literary effort is the well-educated, urbane crowd in the 18-30 years age group. That is a raw, inexperienced group at the dawn of its professional career and can potentially benefit the most from the wisdom shared. Skills and knowledge I gained in my stints have been shared. I believe the sales insights presented also cater to the needs of the upwardly-mobile, 30-50 years-old demographic.

Will post again when the book hits the stand in all its print and paper glory!

Tuesday, June 24, 2008

Major Reason For The Success of Social Media

In recent times, we have seen the emergence of ‘new’ media sites – aka social or citizen media. Based on the principles of a democratic Web, these sites have witnessed through-the-roof growth. Flickr, YouTube, Digg, etc. are prime examples of such sites that have gained tremendous popularity.

So, what did these companies do that made people flock around these sites? Is the opportunity to express oneself without editorial controls – ‘citizen journalism’ – the reason? Or, is it the ability to showcase one’s talents – exhibitionism – an adequate reason that explains the popularity of these sites. Is there some common denominator amongst the top sites that satisfactorily explains the meteoric growth of social media?

To examine these issues one has to understand how traffic gets generated on the Internet.

Generating Web Traffic
Web visitors arrive at a webpage (thus generating Web traffic) in one of three primary ways:

  1. Typing the webpage address directly onto a Web browser. This results in “Direct Traffic.” Generation of a lot of direct traffic is contingent upon a lot of people knowing about the webpage and remembering its address correctly. Further, the webpage may be required to satisfy an existing need that compels the “surfing public” to return to the page repeatedly. Information about the webpage address and its contents has to be disseminated widely amongst Internet users either through promotion (by traditional means or otherwise) or through viral means (word-of-mouth, e-mail, etc.).


  2. Clicking the hyperlink (to the webpage) provided on another webpage the Internet user has visited. Such traffic is termed “Referring Sites Traffic.” If the content on the webpage is useful, it is possible that Internet users may refer and even hyperlink the webpage (called ‘inward link’) in their own Web content. Thus, visitors looking up other Web content may follow links provided and arrive at the webpage. Such inward links also bump up the importance of the webpage (i.e., its “Page Rank”) and lead to higher rank in the listing of results for relevant searches.


  3. Clicking the result (i.e., the hyperlink corresponding to the webpage) of a search on a search engine such as Google or Yahoo. This traffic is called “Search Engine Traffic.” As explained earlier, the more inward links to the webpage, the higher our webpage will rank in search results. And the higher a webpage ranks in search results, the more likely the user is to click the link and arrive at the page.


Now, the above explanation holds good for a website, which has multiple pages. So, the traffic of the website is the sum of the traffic generated by each of its individual web pages.

Generating Traffic for Social Media
Social media sites are typically content-intensive sites. Apart from lacking “stickiness”, social media sites with very little content are unlikely to generate any appreciable amount of traffic (particularly, referring site and search engine traffic). Besides, most users of these sites initially will be just content consumers (and NOT content contributors).

Further, old media has often provided the seed material (often copyright-protected print and multimedia content) for social media site owners during the initial stages. This has often provided the impetus for Internet surfers to flock these sites. Consequently, it is customary for social media site owners to seed their sites with content (relevant to the site’s theme / concept) initially. Seeding content has proven to be the best bet for attracting initial traffic.

Once the social media site gets seeded, its traffic's is likely to increase over a period of time, particularly if the site is actively promoted. Promotion typically means getting inward links into various pages of the site and attracting users through conventional advertising. Aggressively promoted sites thus get more eyeballs, which makes the sites attractive for online advertising. The social media site owners then capitalize on the opportunity to sell or earmark space on web pages for advertising.

However, it is simply not economically viable for social media sites to sustain traffic growth by seeding content on their sites. Eventually, they must rely on user-contributed content for traffic growth. But, what is the motivation for the vast majority of content consumers to go that extra mile for taking the time and effort to create and contribute content on social media sites? In the answer to that question lies the secret of the success of many social media sites.

Opportunity for Generating Traffic from Social Media Sites
The Internet as we all know is being used for a wide range of business-related activities (including, but not restricted to, online advertising). Clearly, a for-profit website needs to enhance its traffic – the more traffic a site generates, more the potential for revenue generation. Thus, owners of business-oriented websites will devote substantial effort to get eyeballs to their sites.

Individuals seeking to promote themselves (e.g., a speech or presentation; an amateur music composition) have also tended to use social media extensively. Often self-promoters (typically website owners) use “old media” content and copyrighted material to promote themselves (e.g., using a MTV video clip with their website address captioned on it). For owners of for-profit websites, social media offers a no- or low-cost means for site promotion. The absence of editorial control provides an opportunity for creating inward links into their web pages and for generating eyeballs.

Clearly, vested interests drive both ‘self- & site-promoters’ to use social media. Looked at differently, social media sites have proved to be a breeding ground for a new-genre of “spamming”. And, being non-disruptive, it also has a certain degree of legitimacy (without the ill-will) that conventional (e-mail) spamming never did.

Therefore, it is little wonder that much of the content on social media sites is promotional in nature!

Sunday, March 30, 2008

Does Community Trump Content?

The Online Publisher Association announced in January 2008 that it added “Community” as a category to its Internet Activity Index (IAI). Since 2003, the IAI has provided monthly reports of time spent with Content, Communications, Commerce and Search, and in January 2008, a Community category was added.

This is somewhat reflective of the prevalent feeling that “content” is passé and that “community” represents the future of the Web. This thinking may be driven by the astronomical valuations of community sites like Facebook, which have dwarfed the previous valuations (by no means insignificant) of content-centric social media sites.

Pure Community Site
To really examine this issue of “content vs. community”, let us first imagine a website catering to a community brought together by a common interest:
  1. Without ANY content. Assume this hypothetical site is a doorway to a hidden, presumably vibrant private community. In other words, it does not have any content – effectively Web pages - exposed to the external world.
  2. With registered members who can post information regarding their common interest (by, shall we say, filling out a form). Other registered users get periodic emails related to their common interest – in a way similar to a private e-group (such as Yahoo groups!). In other words, the subscribers to this site do not get to see any content either.

Such a site is unlikely to get indexed very effectively by some search engines because of the lack of publicly visible content. Other search engines might index it highly if the marketing efforts were to focus on the creation of inward links.

Due to aggressive promotion, the Website can derive its traffic organically through searches or through referrals driven by inward links. Traffic can also arrive directly with the Web address of the site. Regardless of the origin, the incoming traffic to the Website is likely to be volatile – pretty much “Get In / Get Out” (GIGO) traffic. The primarily reason for this volatility is the lack of content – consequently, retention of traffic is low as there is neither stickiness nor a compelling reason for the visitor to continue browsing. Thus heavy promotion of the site will be ineffective – visits might go up significantly, but not page views.

The lack of clarity about what the site is about (since there is no content exposed to users) will also result in low new user registrations. Even the few who do get converted will in all probability soon think the site generates excessive spam and terminate their subscription.

Fundamental Building Block
Thus content is the fundamental building block for any site. It drives stickiness and leads to successful community formation (by compelling users to return). In other words, content is a prerequisite for community besides being a component of community. After all, as others have pointed out communities are combinations of content, commerce, communications and search. Others have even declared that “When It Comes to Time Spent Online, Content Trumps Community.”

Wednesday, October 31, 2007

Has Facebook taken Microsoft for a ride?

The Microsoft investment of $240 million to pick up a 1.6% stake in Facebook has attracted significant media attention. The investment obviously puts Facebook valuation at a whopping $15 billion. Let us take a look at some hard, cold facts to analyze the merits of the investment.
  1. A Wall Street Journal (Online) report has indicated that Facebook expects to breakeven in 2007 with revenues of $150 million. The Journal also estimates Facebook’s 2007 earnings at $30 million. That equates the Facebook valuation (of $15 billion) at 500 times estimated earnings. Google in comparison is trading at $707 (Oct. 31, 2007) for a market capitalization of $164.66 billion giving it a value multiple of 49.33 (on earnings of $3.3375 billion) – less than 1/10th the valuation multiple of the Microsoft investment in Facebook.
  2. If the same “100X multiple” on revenue were used on Wal-Mart Stores, the largest 2007 Fortune 500 company in America would have a market capitalization of $35.1 trillion – almost 3 times the GDP of US.
  3. Facebook is believed to have more than 50 million users worldwide. Thus, their value works out to $300 per user. It can be estimated that with its Sept. 2007 global search engine market share of almost 57% (per HitsLink statistics), Google has almost 710 million users worldwide (taking Sept. 2007 Internet World Stats estimate of Internet users into account). At $300 per user, Google should have a market cap of almost $215 billion, which is 30% more than what Google’s current market cap is. And, oh, by the way that would take Google to a share price of almost $1000.
  4. Valleywag reports click-through rates on Facebook are astonishingly low at 0.04% (Myspace is 0.10%). This probably is a clear indication of lower disposable incomes of Facebook’s user base. Further, taking comScore Sept. 2007 statistics into account, Facebook generates about 6.0 million ad clicks per month. In comparison, MySpace generates about 45.0 million ad clicks per month. MySpace revenue has been estimated at $525 million for 2007 or roughly at $1 per click per year. The corresponding figure for Facebook is $2 per click per year.
  5. According to market research firm Parks Associates, few U.S. consumers are willing to pay a monthly fee to use social networking sites. This online survey of Internet users found 72% of social networking users would stop using a site if required to pay a $2 monthly fee. Likewise, nearly 40% would stop if a site contains too many advertisements. Clearly, Microsoft must have seen value in Facebook’s potential to generate ad revenues and NOT subscription revenues.

Facebook’s monthly burn rate must be in the vicinity of about $15 million. If it has indeed broken even, the Microsoft investment is just insurance money - something that reassures Facebook about its future. Thus, it really does not make a whole lot of sense why Facebook is supposedly thinking of raising additional capital from hedge funds. Do we know all that we need to know about what is going on within Facebook? Why would anyone value Facebook so high?

The Microsoft investment is clearly a bet by Microsoft – a “leap of faith” if you will. Microsoft is banking on the fact that eventually Facebook would be a better destination for online advertisers. Microsoft clearly thinks that unlike Google, Facebook knows a lot about its users, their profiles, hobbies, interests, activities and so on. This helps advertisers run targeted campaigns more effectively. On the contrary, Google does not know any of this information. Thus, Microsoft is hoping that Facebook with help them become a serious player in the growing market of "social advertising".