Surname Study and AI Part 3: Combining Census Data

Blog banner Blog Post Surname and AI 3

In this series of posts about a surname study, Part 1 described the study and Part 2 included how census data was collected and formatted for use.

Census data definitely provides a backbone for research about a family. In this case, I had collected census data from both the federal censuses and the Rhode Island state censuses that were described previously. The next step was to use AI to combine the census data, then analyze it to create that backbone. I wanted ChatGPT to build backbones for the multiple families in the censuses. I uploaded the spreadsheet with the collected census data into my ChatGPT Plus chat, along with the prompt:

I would like to begin by giving you a spreadsheet with US Population Censuses and Rhode Island State Censuses between 1850 and 1900. I would like you to take a look at this data and see if it can be combined into family units, and keep the data from different censuses even though it is dissimilar. From this we will have a backbone to put together the different Gilroy families living in Rhode Island during that time so that we can add more data from different sources. Ask me questions about anything that is unclear.

ChatGPT did have some clarifying questions, then it proceeded to work with me to create a report with Provisional Family Lines.

ChatGPT had enumerated the family units, then analyzed the data and had identified four households with a high confidence level.

  • Timothy and Eliza Gilroy line (TE) [my known direct ancestors]
  • Lockey/Lackey and Ellen (Bristol rubber line)
  • Catharine Gilroy as head with sons Peter and James (Newport line)
  • Philip Gilroy Providence line (NY to RI step-migration pattern)

My confidence was very high as I had already identified these by my offline analysis. Different relationships had been detected after my immediate ancestors had both died and the younger children moved in with older, married siblings. Suggestions were made about how to verify these relationships. (Those records are coming in the birth/marriage/death phase.)

We had several conversations throughout the process, breaking the analysis into steps. I gave information about the family that was the main focus, which changed the order of data presentation. ChatGPT gave me insights into how others did one name surname studies and favorably compared the approach we were taking to them.

In one conversation ChatGPT explained how it was “crosswalking” through the census. It explained that crosswalks are used in: longitudinal population studies and archival metadata. Crosswalking was being used to link family units, rather than just individuals, between censuses.

ChatGPT was working as an assistant.

At the end of this step:

ChatGPT had compiled and organized data into a report: Gilroy Surname Study Backbone (Rhode Island, 1850-1900). The report also documented the constraints and additional information that had been provided during this phase as external proof controls.

An Excel spreadsheet had been produced to document the four family units, with tabs for the unlinked individuals and the evidence legend.

ChatGPT had built a backbone for the study. We worked together on the contents of a report the captured families, relationships, and unlinked individuals, recommendations for next steps, and an appendix with abbreviation.

The next step would occur after I asked ChatGPT:

Would it be good to have this chat in a project?

You can search the 1950 US Census!

Blog Banner - You Can Search the 1950 US Census

Searching the 1950 US Census will be an awkward and cumbersome search until every field is indexed. But you can give it a try.

Be sure you to navigate to the search page:  https://1950census.archives.gov/search

The 1950 US Census NARA Search Page



The search has limited features that include: name, state, county and enumeration district. You do not have to enter search terms any field. For example, you can leave the county or the enumeration district blank.

1950 US Census Search Inputs


If you cannot see the population schedule sheet for the search result on the right, click on “Population Schedule” to see the actual census sheet.

Of course I had better luck in small towns with families having unique names. Just as in any census, try to search for unusual family names. I have even had some success searching boroughs of New York City.

How about a quick hands-on exercise to find a name on the census? I have a simple example using my favorite poet, Ogden Nash.


Name: Frederick Ogden Nash
State: Maryland

1950 US Census Search fields for Ogn Nash


The first result on the right-hand side, listed Odgen Nash (rather than Frederick Ogden Nash) and showed him with his wife Frances, and his children Linell and Isabel. Note: this result came up without entering a county or enumeration district.

Ogden Nash search result in 1950 US Census

On the bottom of each search result is the “Machine Learning (AI) Extracted Names” section that can help by showing you the names that appear on the same census sheet. The AI-generated indexing was surprising to me because it does try to offer alternate spellings of names.

Odgen Nash and family in 1950 US Census

To download the sheet, click on the three dots that appear under “Help Us To Transcribe Names” to see the option to download the sheet.

Option to download

Only the first entry is expanded. If your family member is in one of the other entries, click on “Population Schedule” to see the actual page of the census.

Multiple results (unexpanded)

And the population schedule for that result will expand. (Only one population schedule sheet will appear in the results on the right at one time.)

Expanded Population Schedule

I have posted a short video on our YouTube channel with the example search in action at: https://youtu.be/rLgq2nqNmbA

Let me know how you do.

This blog post is copyright ©2022 by Margaret M. McMahon, Teaching & Training Co., LLC. All rights reserved. No part of this post may be reproduced in any manner whatsoever without written permission, except in the case of brief quotations in articles and reviews. All copyrights and trademarks mentioned herein are the possession of their respective owners and the author makes no claims of ownership by mention of the products that contain these marks.

It’s Been Confidential for 72 Years: The 1950 Census

It's been confidential for 72 years: the 1950 Census

There’s been great information published about the upcoming release of the 1950 US Census. I have been collecting it and want to share with you a reference of helpful resources, along with activities that you can do to prepare for the release!

Important date: 1 April 2022

What is going to happen

The 1950 US Census will be released, 72 years after it was taken.

The National Archives and Records Administration (NARA), Amazon Web Services (AWS) and Artificial Intelligence will give us an initial index including name and locations on the day of its release. The AWS artificial intelligence/optical character recognition (AI/OCR) Textract tool is being used to create that initial index. The index will be available for the P1 Population Schedule and the P8 Indian Reservation Schedule. Since the index will probably not be perfect at first, the National Archives asks us to submit name updates to the index using a transcription tool that will be available on the 1950 Census website.

Interesting facts about the 1950 Census

5% of those responding were asked additional questions, including those about where the person lived a year ago, education, employment, marital status, military service (for males) and the country of their parents’ births.

An exciting fact about this census was that it was the first time Americans abroad were enumerated. In practice the enumeration of Americans in the armed forces, US government employees and vessel crews were counted more reliably than others living aboard. Family and neighbors might report others living abroad.

It would also be the last time that enumerators went around to large multifamily dwellings. In future, the blank forms would be mailed.

What’s different from past census releases

Last release: 2 April 2012. We had to wait an extra day because 1 April fell on a Sunday!

The 1940 US Census was made available to us unindexed. Digital images are great, but without an index you had to identify a set of images to look at, then look at each image to see if your family member was on it. The process involved people figuring out the census enumeration district in which their ancestor lived, then going through the pages for that district page-by-page and line-by-line. Simultaneous with the release, volunteers and genealogical record companies began creating indexes, transcribing the census line-by-line and page-by-page. More than 163,000 volunteers were organized by FamilySearch and managed to create an index for the more than 3.8 million images in a lightning four months. (If you have not been part of a FamilySearch indexing project, please consider it. It is an amazing thing to do. Two indexers transcribe data, and a third arbitrates any differences between the two transcriptions.) This time, on the day of release we have an initial index of names and locations, which will be a good starting place.

This year, for the first time, those who have over 165 terabytes of available computer memory and download the whole census dataset in bulk.

What you can do now

1. Bookmark NARA’s 1950 Census Records webpage. That is where the link to the dedicated website will be posted.

2. View the Questions Asked on the 1950 Census and also view samples of all the Census Forms in the 1950 Census Dataset.

Census Forms in the 1950 Census Dataset

You will probably want to start with: Form P1 – Census of Population and Housing (front). The back of the page with housing information was not microfilmed, and only aggregate data exists.

3. Watch the videos at the National Archives Genealogy Series: 1950 Census and download the handouts. Previous presentations have been recorded for later viewing.

National Archives Genealogy Series: 1950 Census

4. Gather blank questionnaires and fill in the censuses during your lifetime. Imagine how glad you would have been if your ancestors had done this for you! Head over to the US Census Bureau to learn more about the Censuses and Download census forms at the Decennial Census of Population and Housing by Decades.

Decennial Census of Population and Housing by Decades

On each page for the decennial census, there will be a link to download that Decennial Census Questionnaire & Instructions. From that page you can download a sample questionnaire. Or you can go directly to the 1950 Census page where you can download blank forms and view the index of questions.

Census Bureau 1950 Census Page

On the Through the Decades webpage you can find a link to download “Measuring America: The Decennial Censuses From 1790 to 2000” in that includes information and questionnaires from the 1790 up to the 2000 US Census in pdf format.

Through the Decades download page

On 1 April 2022

Travel to the NARA’s 1950 Census Records web page, where there will be a link the dedicated website.

Many thanks to all those at NARA who worked tirelessly throughout the pandemic to bring this data to us on time.

Genealogy and the 2020 U.S. Census

You have probably received, or are about to receive, your invitation to complete the 2020 U.S. Census online.

One thing I always recommend at census time is saving a paper (and electronic!) copy of the census after you fill it out. Since the censuses are closed for 72 years, how great would be researchers to have copies of our censuses for those years?

I’ve seen a lot of comments about how disappointing it is that you cannot print out all the responses when you are done completing the online forms.

With that in mind, here are two solutions:

1) Take screenshots as you fill out the forms on your computer. You can save them as images, or just cut-and-paste them into a word processing document.

– OR –

2) A better choice is probably to download and print a pdf file of the 2020 Census. Then you can fill it in and have all the answers together in one place. Of course, feel free to scan it and have it both on paper and electronically!

The 2020 Census Form can be downloaded here.

If you missed saving you previous census forms, you can find blank forms and instructions to enumerators here.

You can select the census year to locate links to blank forms. For 2000, you might want to reconstruct the long version of the form.  

The US Census Bureau website hosts a wealth of information and data, so explore it if have a chance. Educational material about the 2020 Census can be found here.