Surname Study and AI Part 3: Combining Census Data

Posted by Dr. Mac on Apr 19, 2026 in Census Records, Irish Research, Surname Study | 0 comments

In this series of posts about a surname study, Part 1 described the study and Part 2 included how census data was collected and formatted for use.

Census data definitely provides a backbone for research about a family. In this case, I had collected census data from both the federal censuses and the Rhode Island state censuses that were described previously. The next step was to use AI to combine the census data, then analyze it to create that backbone. I wanted ChatGPT to build backbones for the multiple families in the censuses. I uploaded the spreadsheet with the collected census data into my ChatGPT Plus chat, along with the prompt:

I would like to begin by giving you a spreadsheet with US Population Censuses and Rhode Island State Censuses between 1850 and 1900. I would like you to take a look at this data and see if it can be combined into family units, and keep the data from different censuses even though it is dissimilar. From this we will have a backbone to put together the different Gilroy families living in Rhode Island during that time so that we can add more data from different sources. Ask me questions about anything that is unclear.

ChatGPT did have some clarifying questions, then it proceeded to work with me to create a report with Provisional Family Lines.

ChatGPT had enumerated the family units, then analyzed the data and had identified four households with a high confidence level.

Timothy and Eliza Gilroy line (TE) [my known direct ancestors]
Lockey/Lackey and Ellen (Bristol rubber line)
Catharine Gilroy as head with sons Peter and James (Newport line)
Philip Gilroy Providence line (NY to RI step-migration pattern)

My confidence was very high as I had already identified these by my offline analysis. Different relationships had been detected after my immediate ancestors had both died and the younger children moved in with older, married siblings. Suggestions were made about how to verify these relationships. (Those records are coming in the birth/marriage/death phase.)

We had several conversations throughout the process, breaking the analysis into steps. I gave information about the family that was the main focus, which changed the order of data presentation. ChatGPT gave me insights into how others did one name surname studies and favorably compared the approach we were taking to them.

In one conversation ChatGPT explained how it was “crosswalking” through the census. It explained that crosswalks are used in: longitudinal population studies and archival metadata. Crosswalking was being used to link family units, rather than just individuals, between censuses.

ChatGPT was working as an assistant.

At the end of this step:

ChatGPT had compiled and organized data into a report: Gilroy Surname Study Backbone (Rhode Island, 1850-1900). The report also documented the constraints and additional information that had been provided during this phase as external proof controls.

An Excel spreadsheet had been produced to document the four family units, with tabs for the unlinked individuals and the evidence legend.

ChatGPT had built a backbone for the study. We worked together on the contents of a report the captured families, relationships, and unlinked individuals, recommendations for next steps, and an appendix with abbreviation.

The next step would occur after I asked ChatGPT:

Would it be good to have this chat in a project?

Surname Study and AI Part 3: Combining Census Data

Blog

Categories

Credits

We are on GeneaBloggers!