Press "Enter" to skip to content

How Machine-Learning and OCR Are Changing Family History

If this article caught your eye, you probably have an interest in indexing or in online historical records. Maybe you’ve made indexing a part of your weekly or monthly volunteer efforts. If so, keep up the amazing work! You’re making it possible for people around the world to discover their ancestors and learn more about their family histories.

Still, our indexing volunteers have a colossal task in front of them. The world has billions and billions of records waiting to be indexed. Although we have hundreds of thousands of people willing to help out, we’re still outnumbered and it is clear that our volunteers will need help.

Piles of handwritten documents, hard for computers to read.

Enter optical character recognition—also called OCR, or computer-assisted indexing. Either name works—the more important thing is that the technology works. Thanks to OCR, we’re improving the quality of indexing, increasing the number of indexed records, and accelerating the speed at which historical records become available to the people who visit our website.

The result is more information for people to search and more documents to explore—in short, more opportunities to make that discovery about your family that connects you to your past.

What Is Optical Character Recognition (OCR)?

In simple terms, optical character recognition is a computer reading an image and trying to extract the information—names, dates, places, events, and other text—that it finds there. As you might expect, the computer can do this very fast—much faster than a person. In light of the many, many historical records needing to be indexed—now and in the future—optical character recognition is more than convenient. It’s miraculous.

An computer-created index of a Spanish record, as shown on FamilySearch.org.
Handwritten historical record in Spanish, showing information for Manuel M Bacalao and Matilde Avez.

The Special Case of Historical Records

Using OCR on records sounds great! You might ask, why haven’t we been using OCR to index every record out there? The problem is that a computer isn’t as precise as a human being or as good at figuring out conundrums. An unusual style of handwriting or a slight change in the structure of a printed form can throw the computer a real curve ball. The computer’s interpretation of an image is usually accurate enough to make the information available to our search engines. However, for the information to be really useful—and findable—we still need a human being to quickly review it and fix any mistakes.

How Indexers and OCR Can Work Together

Today, FamilySearch needs your help with indexing more than ever. As OCR technology develops, how you help with indexing may change slightly. Instead of indexing a record from scratch, you may review a record that the computer indexed, making sure that the information is correct and fixing any errors you encounter. At FamilySearch, indexed records have always been reviewed for accuracy, so this task is essentially what reviewers undertake when they review a batch of records that has been indexed by another volunteer.

Two women looking at web indexing interface on FamilySearch.org.

FamilySearch and Computer-Assisted Indexing

So far, FamilySearch has employed optical character recognition to index a whopping 64 million historical records. The project in question involves a collection of Spanish-language records—namely christenings, marriages, burials, and other church documents. When the project is complete, nearly 900 million records will have been indexed and in need of review by an actual person.


Want to help with indexing records? Find an indexing project here.

Once you have experience indexing, you can also become an indexing reviewer.


Take Advantage of All These OCR-Indexed Records

Nine hundred million records. Almost a billion. And this number comes from only one project. If you’re wondering what you should do as a result of all this indexing, the answer is simple: take advantage of it. Continue searching for your ancestors and building your family tree on FamilySearch.org. If you can’t find what you’re looking for, don’t give up! Come back in a few weeks or months, and try again. With computer-assisted indexing, more information is coming.

And remember, the more dates and places you add about ancestors, the more record hints we can send you. With 900 million new records to draw on, you can be sure that we will have a lot more hints to send out.

Don’t miss out! The FamilySearch computer-assisted indexing team will be live on Facebook this Wednesday, October 28, at 4:00 p.m. Learn more about OCR-indexed records and ask your questions in the interactive chat.

Source: Family Search

Be First to Comment

    Leave a Reply

    Your email address will not be published. Required fields are marked *