I've been very actively involved in online genealogy since 1985. I've seen how the practice of genealogy research has radically evolved in a relatively short period of time. At the same time, I've seen how the rapid changes spurned by the widespread adoption of the Internet have made the pursuit of family history more challenging than ever. It has become almost effortless to locate vast amounts of information. Yet, a tremendous effort is required to determine if any of this information is truly relevant to the research in question.
The issue is compounded further by a lack of consistency in the online world. On just about every genealogy conference agenda you'll find a session covering the importance of keeping accurate source records for the information you capture and add to your family tree. And yet, there are no efforts to ensure that the people and companies (i.e. data providers) providing the information online follow comparable standards in accurately reflecting the source of their works.
Ultimately, there are three basic types of genealogical information: an original record, a secondary document that transcribe information directly from original records, and documents produced without ever seeing the originals (e.g. a compilation from secondary documents). The further away from an original record you get, the less likely the information has been captured accurately. As a genealogist traversing the World Wide Web, therefore, how do you know which data to trust and which to disregard?
Complicating matters is that fact that in the process of publishing more information, data providers often give their work an alternate title and/or neglect to acknowledge the source they used. Keep in mind, as well, that some data providers publish subsets of larger works, making knowledge of the source even more critical. Without these details, how are we to know if we're looking at something unique or simply another view of the same information we already have captured on our tree?
To emphasize this point, let me give a quick example (of a situation that I frequent encounter). Passenger lists are an excellent resource for genealogists. Passenger manifests are the original record; the book into which the names of every passenger were handwritten (or later typed). These passenger manifests were often microfilmed, making two possible originals for researchers to access. Over the years, many data providers transcribed information directly from the manifests of the microfilm, producing a variety of books, databases and sometimes even newspaper articles (let's call these once-removed items). Prior to the Internet, Society newsletters and journals became a mainstay of intermediate information distribution, and I've seen passenger lists in these publications derived from books or newspapers (twice-removed items). Along comes the Internet, and well-intentioned people who stumble across these society journals, decide to transcribe the information onto a web page (three times removed) -- some times without clearly stating where they found the data.
So, with this example we've got four different views of the same information, but as you get further and further removed from the original, the likelihood of typographical and interpretational errors increases. And since the final version of the information is posted on a web page for free, it often becomes the first version that many researchers will locate.
While I cannot solve the problem of information degradation, with the Live Roots project, I hope to level the playing field some by orienting results displayed in searches around the original sources involved. This will address two issues: how to guide researchers towards finding the most accurate information, and helping researchers manage their genealogical budgets better by highlighting where the same information may be available for less (or no) cost.
I'm realistic, however, and realize that there's no way to launch a web site with these lofty objectives addressed from the onset. It will take months (possibly even years) to carefully review each item to determine the source. And in some cases it will require the cooperation of the data provider to answer my inquiry for source details when they aren't clearly listed on their site. To stay focused on this objective, I plan to post the percentage of fully sourced items, in addition to the total number of items captured, on the home page.