Return to Archives
Return to Article Summaries

January 2007, Vol. 6, Issue 1

APPLYING THE FIVE Ws TO THE ONLINE WORLD
Netcraft did not pass a simple, pragmatic website analysis, which is a twist on the journalism maxim commonly known as the Five Ws - Who, What, Where, When, Why, and How. (The Five Ws as a term and practice includes an unnamed H.) Here’s how to apply the Five Ws to any website:

Who: Name the people or person in charge and provide their bios, with photos and an overview, as well as links to, downloadable files and/or web pages of past working experiences and client samples. (Some would disagree about the photos, saying that people make rash judgements based on mug shots.) Make it easy for your visitor to know who you are and be able to contact you or a representative of your business. There’s an oversupply of websites that don’t clearly tell you who runs the business, or these sites have a anonymous "contact-us" form that does not clearly identify who the form is actually being delivered to. This is one of the biggest credibility failures of many websites. If you can’t tell people who you are, your readers might think you are hiding from something. There’s more to the "who" function of any website. Explaining who you are should also include your version, or someone else’s version, of what gives you the authority to present information that is credible and trustworthy. Netcraft.com failed the vitally important "who" test.

What: Any website should describe the nature of its owner’s business in easy-to-understand terms. The other part of "what" concerns quality of content and design. Is the content well written? Is the website graphically consistent and easy to navigate through? The Internet industry term for this is "usability." In September 2005, the man known as "the king of usability" by Internet Magazine and "the guru of web page usability" by the New York Times, Jakob Nielsen, explained the irony of the web, noting succinctly that, while the primary purpose of the web is to provide information, it is also loaded with "bad content and a lack of information people need - either because it is not provided at all or because it is written in a poor, impenetrable style."2 All you need is one search engine response to a query to realize that this still holds true today.

Where: Something is definitely amiss if there are not working e-mail addresses, a real physical address of where the business is located, and working telephone numbers listed in a easy-to-find spot on a website. Are you located in Silicon Valley, Silicon Alley (the Manhattan version) or Bath, England? Are directions to your place of business provided? Are you hiding?

When: Today, when it comes to information about the Internet, World Wide Web, and communications and information technology, in general, timeliness is vital, as everything changes so quickly. At the same time, we web surfers often fall into the trap of depending too much on the very latest information out there, when, in fact, there’s plenty of valid and important, but older, information available online about any given topic, dating back as far as the web will take you. The important thing is that anything posted on a website should have some kind of time-stamp on it, so the reader understands the currency of the information being displayed and can then discern its applicability or non applicability to the task at hand.

Why: What are the motives behind the website owner’s content? Is it clearly spelled out that the purpose of the content is to sell you something? Is the cost clearly noted, or is that information buried someplace at the end of a shopping-cart function? Is the content geared toward only providing information to its visitors in the spirit of sharing, or is there some other, not-so-evident, ulterior motive.

How: How was the information presented on the website discovered or created, and is it consistent with other information from other reliable sources? For example, I could not find any other authority claiming that there were 100 million websites. CNN, however, found it worthwhile to mention this figure like it was an absolute truth, as did hundreds of blogs. One would think that generic, statistical information, such as the total number of websites on the Internet, would have some corroborative numbers elsewhere on the web, but, in this case, I could not find anything.

Finding the Hard to Find
There are numerous websites that fail to meet some of the Five Ws but are still credible and packed with trustworthy research-based information. Many are in the academic realm, created by educators who typically post their curriculum vitas, along with links to their scholarly writings, on very unattractive and poorly designed web pages that do not have decent metadata.

Metadata, for those who are not familiar with this word, is really not as technical as it sounds. Simply put, it is the identifying words and tags that are put inside a website’s background code. Metadata is utilized by search engines to index and ultimately reveal search results when users conduct queries online. Metadata is not discernible on the web pages we see on our web browsers unless we go to the source view.

At a November 2006 Chronicle of Higher Education technology conference, Adam Smith, group business product manager for Google Book Search and Google Scholar had this to say about metadata: "We love metadata; we’ll take all you can give us, but it is a mess. When you really dig into it a little bit, parsing it and making sense of it for the older material is a disaster. But we are doing our best." Danielle Tiedt, general manager of Windows Live Premium Search added that "metadata is in a bad state."3

So, without going into extraordinarily technical details, the basic message is that there’s plenty of intelligent life on the web that is not so easy to discover through the major search engines (e.g., Google, Yahoo, MSN and Ask.com in that order of popularity) because of the lack of good metadata.

Accessing the Smart Web Through Proprietary Databases
Another enormous block of authoritative and trustworthy information available online resides within the full text of scholarly journals and various other publications that are accessible only through a paid subscription.

There are two ways to obtain access to the full text of such paid-subscription information: purchase it yourself or become a member or employee of an institution or company that provides access to such information through its library system. Academic libraries, for instance, subscribe to proprietary databases and provide access to numerous paid-subscription publications to its students, faculty and staff. These patrons are authenticated users assigned with usernames and passwords that allow them into the institution’s virtual libraries.

So, if you don’t have lots of discretionary funds to purchase paid subscriptions, or you are not an authenticated user of some proprietary library system, the accessibility to some of the most authoritative and trustworthy information online is not at your fingertips.

The Drive to Bring More to Your Fingertips
Both Google and Microsoft are developing online services that allow users to more easily gain access to such authoritative and trustworthy content. Google Scholar, for instance, has a Library Links program where partnering academic libraries are able to make their licensed resources available, through link-resolver technology, to those authenticated patrons who prefer to conduct their research through the Google Scholar interface as opposed to going directly to their institutional virtual library system interface.

Similar to Google Scholar, but not nearly as developed or sophisticated, is Microsoft’s Windows Live Academic, which, according to Microsoft representative Tiedt, is focused on answering online search questions better by analyzing how its users conduct online queries. At the aforementioned Chronicle conference, Tiedt explained that Microsoft was working diligently on developing methods and functions that would bring more authoritative and trusted search results to its users.4

Both of these projects were in their early developmental phase at the time of this writing in early 2007, and it is anyone’s best guess as to when they might be considered out of Beta and useful to any degree of acceptability by serious researchers.

Overall, if you depend only on search results from Google, Yahoo, MSN, Ask, or pretty much any of the search engines out there to answer your queries, you are not getting the most intelligent results. Most search engines simply give you extremely long lists of results that, for the most part, are a source of confusion and not very focused on showing you the most authoritative and trustworthy online information available today at the top of its search results.

Search Engine Boom
In a subsequent chapter, more details about search engine technologies are provided. There are many search engine companies in early stages of development that are striving to be alternatives to the big four - Google, Yahoo, Microsoft and Ask. Many of these newcomers’ "main business proposition is to be bought by Google, or for that matter by Yahoo or Microsoft," wrote Miguel Helft in the New York Times. Helft also noted that, according to the National Venture Capital Association, venture capitalists, since early 2004 up through the end of 2006, invested about $350 million dollars in 79 start-ups "that had something to do with Internet search."5

Other young search engine companies seem to be on a solid pathway for creating a search alternative that will be their own for years to come. One such company is Kosmix, whose co-founder, Anand Rajaraman, explained to me that they are not in business to be bought by Google. Instead, they are in business to provide a more sophisticated alternative to searching than any of the other search engine companies.

It’s interesting to note that Kosmix was founded by two computer science Ph.Ds from Stanford University. The other founder of Kosmix is Venky Harinarayan. Google’s Larry Page and Sergey Brin are both computer science Ph.D. candidates on leave from Stanford, and Yahoo’s co-founders Jerry Yang and David Filo are on leave of absence from Stanford’s electrical engineering Ph.D. program.

Ragaraman and Harinarayan have a strong background in building database technologies, having built online comparison-shopping technologies as co-founders of a company called Junglee, which was eventually acquired by Amazon in 1998 for $250 million. Take a look at their website (Kosmix.com) and you’ll see that their search results are quite unique.

Mass Digitization
Google and Microsoft are also in the mass digitization/eBooks business. Google has "Book Search" and Microsoft has "Windows Live Book Search" - both of which were also in Beta in January 2007. Others, who came into this field well before Google and Microsoft, include the Open Content Alliance, Project Gutenburg, the Million Books project, the University of Virginia Electronic Text Center, and the Internet Archive (not an exhaustive list).

Some of today’s Web pundits say that we have entered a new era in which it is possible to digitize all of the world’s books into a universal, online accessible library. "Might the long-heralded great library of all knowledge really be within our grasp?" asks Kevin Kelly from Wired Magazine in a New York Times Magazine article. His answer is yes, and he provides his proof of concept, in part, by explaining how the digitization process is being accomplished today:

Stanford University is scanning its eight-million-book collection using a state of the art robot from the Swiss Company 4DigitalBooks. This machine, the size of a small SUV, automatically turns the pages of each book as it scans it, at the rate of 1,000 pages per hour. A human operator places a book in a flat carriage, and then pneumatic robot fingers flip the pages - delicately enough to handle rare volumes - under the scanning eyes of digital cameras.6

There’s controversy surrounding the mass digitization world, with questions about copyright infringement and who gets control of any such universal library. The Associated Press reported that there’s a philosophical debate concerning Google, a commercial entity that is scanning 3,000 books per day, and possibly controlling mankind’s accumulative knowledge. The article quoted Brewster Kahle, founder of the Internet Archive, as saying that "they [Google] don’t want the books to appear in anyone else’s search engine but their own, which is a little peculiar for a company that says its mission is to make information universally accessible."

In a subsequent chapter, mass digitization and eBooks are explored in greater depth.

Return to Archives
Return to Article Summaries


Copyright. All rights reserved. Lorenzo Associates, Inc., P.O. Box 74, Clarence Center, NY 14032.