Surname Study and AI Part 1: The Approach

This blog post begins a series of posts exploring an ongoing surname study and my recent use of artificial intelligence (AI) in it. In this post, I will describe the history of getting to this point in my efforts.
Over the course of several years, I have been working on a surname study. My goal was to find out if and how families who lived in Rhode Island from 1850-1900 were connected. Chain migration to the United States from Ireland was entirely likely, and by connecting these family units I could potentially research collateral relatives to learn more about the family unit(s) back in Ireland.
Using what I had learned from researching my direct ancestors, these were the parameters:
- Surname: Gilroy
- Place: Rhode Island, US
- Timeframe: 1850-1900
For this project, I collected both federal and census data to use as the backbone of the research. Then I built upon the intermediate years using vital records. I faced some challenges when collecting the data. At that time, Rhode Island Censuses and vital records were obtained by mailing requests to an incredibly helpful and knowledgeable staff at the Rhode Island State Archives. Copies of the records were available for modest fees, but you did require data about the record you sought. (Contrast that with the ability to search for everyone with the same or similar name in a record set through a digital database.) At the time that meant that some of the names came from index-only databases as place holders until copies of the original records could be found. An index of vital records for the state was available on Ancestry, as were a composite of indexed city directories which formed an 1890 US Census substitute.
Another challenge was correlating dissimilar data. Just as every federal census asks different questions, so does every state census. Vital records change what data is recorded over time, too. The data found in city directories is also different from the other records, containing addresses and occupations but lacking explicit family connections.
My main product was an Excel spreadsheet with tabs for the data collected from each record type by year. I worked to reconcile the different data collected from similar record types. From that spreadsheet, I extracted family units, capturing them in PowerPoint to visually show how the family units changed over time. This gave me some insights but was labor intensive. I contemplated my next steps, knowing that analyses of ages, appearances of people with the same surnames in Rhode Island, and child naming patterns, as well as mapping the neighborhoods were among them.
Fast-forward to now, when more records are available online. For example, in addition to the vital record indexes, images of the RI vital record ledgers are now online. The Rhode Island state censuses are also online. And then there is AI to help with formatting, visualizing and analyzing data.
Some challenges still exist. There were gaps in census coverage, due to the 1890 US Population Census and the 1895 Rhode Island Census no longer being available. The use of other record types will help to fill in the census gaps. A state-specific challenge is the fact that the 1885 Rhode Island Census is available as an alphabetized index of names, requiring family units to be connected using data in the “Family Number” column.
The state of AI is constantly changing, but I decided to investigate how AI could help this the collection and analysis of data.
I did try an analysis of the whole spreadsheet in ChatGPT, and I had been able to create family groups and use them to discriminate between some people who had the same name. However, the data was not combined in an efficient manner, and rather than have one large spreadsheet, I decided it would be more understandable to break the data into more manageable pieces, based on the record types. The composite spreadsheet was broken down into different spreadsheets: (1) censuses, (2) births, marriages, and deaths and (3) city directories. I also decided to use AI to help with the data collection process, the analysis and different ways to visualize the data.
At the end of this step: I had a basic plan to redo the data collection, collect additional data that had become available online, and developed ideas on how AI could support this study. The next step will be to use only census date and have AI create the backbone of a timeline for the individuals and families.

