We are pleased to announce the publication of a massive new collection of 982 million names, extracted from our U.S. and Canadian historical newspaper collections.
Historical newspapers are some of the most important sources for genealogical information because they are very rich in detail. Newspapers can often add color and personality to the dry facts that are often the output of other genealogical sources such as census records.
About the collection
The collection is an index of names that were extracted from existing free-text U.S. and Canadian newspaper collections on MyHeritage. The free text in these collections was generated from the scanned images of newspapers using Optical Character Recognition (OCR) technology, which converts images into text.
The new Newspaper Name Index does not replace the free-text newspaper collections, but is added on top of them as a separate collection. What’s more, this name index is the fruit of only half of our newspapers, and the other half of the name index is currently being generated and will be published soon, so that nearly one billion additional records will soon be added.
Records in the index include a person’s name, a snippet of text mentioning them in the newspaper, and the newspaper’s publication title, date, and place of publication. Each record includes a scanned image of the original newspaper article. Some records will also include additional searchable information such as the name of a spouse and the place of residence based on the information extracted by the machine learning algorithms. Year range and place coverage in this collection vary greatly.
The new Newspaper Name Index will make it much easier for you to locate exciting details about your ancestors that you may have missed in prior searches. With the addition of this huge collection, there are now 15.1 billion historical records on MyHeritage.
Why we created the Newspaper Name Index
Although the same content already existed in our newspaper collections, it was previously in free-text format which meant that search capability was more limited. If you were looking for an ancestor with the first name of William, it would not have found newspaper articles where your ancestor was mentioned as Bill or Willie. And it would have returned irrelevant articles about people with the surname William. Following a smart extraction process, which we implemented using machine learning, the new name index is a structured collection which fully supports synonyms in searches, and differentiates between first and last names. The name index even includes relationships between people, and addresses, whenever these could be extracted. For example, a newspaper article mentioning “William and Roberta Miller” contributes to the structured index records for both William Miller and Roberta Miller, who are assumed to be spouses, and can be matched automatically to family trees using MyHeritage’s formidable Record Matching technology. Previously, even if you searched for “William Miller” you could have missed this mention because the names “William” and “Miller” are further apart in the article, resulting in lower ranking in a free-text search.
The Newspaper Name Index employs Global Name Translation — MyHeritage’s unique technology that automatically translates names between languages. This means searching for names in a foreign alphabet such as Hebrew or Cyrillic will return search results from newspapers in English. MyHeritage pioneered Global Name Translation Technology to help users overcome language barriers and allow users to locate records that mention their ancestors in different languages (as well as in variations of a name in each language). Learn more about MyHeritage’s Global Name Translation Technology in this recent post.
The Newspaper Name Index contains a record about music legend Johnny Cash. The record is based on short descriptions of upcoming TV programs found in the Sarasota Herald-Tribune from April 6, 1978. Johnny Cash’s new play was set to air on TV, so the newspaper featured a short description about the play. In the free-text version of the newspaper collection, you would just see the snippet of text relating to Johnny’s name. The Newspaper Name Index, in contrast, includes Johnny’s name as well as the name of his wife, June Cash.
Also in the collection is a record about renowned architect Frank Lloyd Wright. The article is about an upcoming realtor conference where Wright will be one of the main speakers. The article also references Wright’s residence in Spring Green, Wisconsin, where his family estate was located. The Newspaper Name Index extracts Frank Lloyd Wright’s name as well as his address. If you were searching for Frank Lloyd Wright in the free-text version of the newspaper collections, you would see only the snippet related to Frank’s name and not his address.
Newspaper collections are an incredible genealogical resource as they contain rich detail, with formats that genealogists find very useful such as obituaries, wedding announcements, and birth notices. Society pages and stories of local interest contain information on activities and events in the community and often provide details about the people involved. The new name index enhances MyHeritage’s American and Canadian newspapers and opens the door to finding details about relatives that have eluded you in the past when searching the free-text version of these collections. It is our hope that with this new index, you’ll be able to more easily find family treasures in the newspapers on MyHeritage.
Searching the collections on MyHeritage is free. To view these records or to save records to your family tree, you’ll need a Data or Complete subscription. If you have a family tree on MyHeritage, our Record Matching technology will notify you automatically if records from the name index and the free-text newspaper collections match your relatives.
Enjoy the new collection!
Source: My Heritage