Surname Study and AI Part 2: Collecting Census Data

blog banner - Surname Study and AI
Part 2

In the Surname Study and AI Part 1 post, I described the reasons that motivated me to undertake a surname study in Rhode Island, US, and the approach I took. The use of AI tools to help with formatting, visualizing and analyzing data is a goal in this latest iteration of the project.

Both US Population and Rhode Island State Census data were used as a backbone for the study.

My next step was to use AI to capture the transcriptions of key record information from the censuses, and work to normalize it. For this first step, I decided to limit my search to census databases, for exact and similar spelling of the surname, using the exact location of Rhode Island, USA. Even though I collected the images of the census, I collected the data presented on the Record Page to populate the columns of the spreadsheet.

My search settings were:

Last name: Gilroy; Slider: Exact and similar

Lived in: Rhode Island, USA ; Slider: Exact

Focus: United States [this setting was not necessary because I searched for records specific to the United States and Rhode Island]

On the search results page, I used filters to narrow down to one census at a time so that I could collect the data.

Thanks to a great idea I learned from Jon Smith of the North Carolina Genealogical Society, I decided to use Ancestry.com in a Chrome browser with Gemini AI enabled to capture the Record pages.

Gemini in top of Chrome Browser

If you do not see Gemini on the top of Chrome:

First, be sure that you are logged into your Google account. You can do this by logging into your Gmail account in the browser.

Then, try this to enable Gemini in Chrome:

Click the three dots (More), and select Settings from the menu

In Settings, click AI innovations in the left menu, then select Gemini in Chrome.

Chrome Settings to use Gemini
Chrome Preference to open Gemini

To collect the data in the US Census, I signed into HeritageQuest in the Chrome browser. Always check your county library, as HeritageQuest may be free to access from home.

I searched for all the occurrences of the surname in Rhode Island, one census at a time for the 1850, 1870, 1880 and 1900 US Censuses. My plan was to collect one line of data for each name that appeared in the search results.

These are example results for the search for exact and similar surnames to Gilroy.

Example HeritageQuest Search Results Page

Example Search Results Page (courtesy HeritageQuest.com)

From the 1860 US Census Search Results Page, I right clicked on the View button to open each Record in a new tab.

Example HeritageQuest Record Page

Example Record Page (courtesy HeritageQuest.com)

Gemini in top of Chrome Browser

Some of the issues and limitations that I found may be due to the fact that I use a free version of Gemini. I had to work on my prompt to have the data captured in a Comma Separated Values (csv) format, so that I could use the data from the transcription of the record in my Excel spreadsheet. I tried to have Gemini decide what to label the columns, but it worked out better when I told it the names of the columns in the prompt.

NOTE: Later on, Gemini and I decided to format the collected data in Markdown tables. This simplified the process, because the data could be pasted directly into the Excel worksheet.

In the interest of time, I used copied all the data from one Record page and asked ChatGPT to extract the data tags, using the prompt:

keep only the data tags such as Name, Age, etc and show them in a comma separated sentence on one line.

That provided me with column names which could then be used in the Gemini prompt. (This was done once for each census.) That way the line for each enumerated person in a worksheet would have the same data in the same columns.

In my type of account (free), Gemini would only look at ten open tabs in the Chrome browser as input to a prompt, so I knew that I would have to collect the data in steps. Gemini wanted to jump right in and give me analysis based on the data in those tabs, and it took some coaxing through prompt refinement to get the data in a form to put into a spreadsheet.

I added tabs using the plus sign until I had selected the Current tab and 9 others to share with Gemini. (When you select more than 10 tabs a warning appears: “Only 10 tabs can be shared.”

Select Multiple Tabs as Input to the Gemini Prompt in Chrome Browser

Select Multiple Tabs as Input to the Gemini Prompt in Chrome Browser

Prompts may need refinement, and in this case Gemini and I chatted back and forth to get the results that I wanted. Gemini warned me that it could not directly create or download an Excel (.xlsx) file for me, but that it could format the data into a standard CSV (Comma Separated Values) format.

For the 1860 US Census, this is a prompt that I used in Gemini in the Chrome browser. This was the result of refinement, and needed to be changed slightly for each census.

For all open census records, extract the data and generate the full CSV text. For each record, transcribe it into a new row of the CSV . Put the CSV text in a canvas so that I can copy it from the prompt. Structure the output so that each record (the main person detailed on the page) is a single row, and list all their household members’ names in a single column titled ‘Other Household Members (Names)’. **Only transcribe data explicitly visible in the current tab’s detail and household sections.**

Here are only the data tags, formatted as a single comma-separated line:

Name, Age, Birth Year, Gender, Race, Birth Place, Home in 1860, Post Office, Dwelling Number, Family Number, Occupation, Real Estate Value, Inferred Spouse, Household Members (Name)

**For any column field where data is not transcribed, insert a blank space to ensure all records have identical column structures.**

The response included this CSV text.

CSV from the Gemini Canvas

I used the copy icon at the top right to capture the CSV text, and pasted it into an open Notepad file. The Notepad file was saved as type “All files” and I created a file name ending with the extension “.csv” (CSV = comma separated values)

Save Notepad file as CSV

Then I opened the CSV file in Excel, and copied and pasted the lines into the Excel worksheet.

It seemed that when Gemini was used in the browser, it did not have a large memory, so I would have to reload the prompt during my next session. (Always save your prompts!) Sometimes Gemini wanted to use older data for the task I was giving, so I needed to modify the prompt to remind it to only work on the set of selected tabs.

Since this version of Gemini-enabled browser only allowed me to work on 10 tabs at a time, I stepped carefully through the results to be sure that each person with a name that was Gilroy or similar was included.

In an Excel spreadsheet, I pasted the data from the 1860 census in a worksheet, and labeled its tab “with the year and the type of census”1860 US Census.”

I repeated these steps for each US Population Census.

The Rhode Island state censuses are available on Ancestry.com, and I repeated the same process for each one.

Engineers do enjoy visualizing data, so using Excel, I created a graph of the number of individuals with the exact surname Gilroy or a similar surname for each type of census. Then I combined the number of individuals from both types of censuses, for all available years. Note: the US Census for 1890 and the RI State Census are unavailable.

graph US Census Results for Gilroys in Rhode Island by Year
graph RI Census Results for Gilroys in Rhode Island by Year
graph Census Results for RI Gilroys by Year Combined

The story that I know from my hands-on analysis involves people with the Gilroy name arriving and departing Rhode Island through immigration or moving from or to another state in the US. The number of individuals with the same surname varied by marriage, birth and death. Women would either gain the surname through marriage, or lose it when enumerated using their husband’s surname.

Even though I did collect the citations from Ancestry.com, they are not sufficient for publication and I would have to do some more work to create any citations. There are limits to the approach I used. The enumerators may not have visited all the people who shared that surname, and that different transcription efforts may result in different spelling of the surname.

At the end of this step: I had an Excel spreadsheet, with a worksheet for each census. Each worksheet contained a line for each person who was enumerated in the census as having the exact surname Gilroy or a similar surname that was present in the online databases. Each column in a census worksheet has the same type of data, or was blank, for ease of analysis.

Excel spreadsheet, with a worksheet for each census.

Next, I can use an AI tool to analyze the data in each census, and across censuses. My goal is to identify family groups as well as individuals and track their changes through the years of interest.

Surname Study and AI Part 1: The Approach

log banner - Blog Post Surname and AI 1

This blog post begins a series of posts exploring an ongoing surname study and my recent use of artificial intelligence (AI) in it. In this post, I will describe the history of getting to this point in my efforts.

Over the course of several years, I have been working on a surname study. My goal was to find out if and how families who lived in Rhode Island from 1850-1900 were connected. Chain migration to the United States from Ireland was entirely likely, and by connecting these family units I could potentially research collateral relatives to learn more about the family unit(s) back in Ireland.

Using what I had learned from researching my direct ancestors, these were the parameters:

  • Surname: Gilroy
  • Place: Rhode Island, US
  • Timeframe: 1850-1900

For this project, I collected both federal and census data to use as the backbone of the research. Then I built upon the intermediate years using vital records. I faced some challenges when collecting the data. At that time, Rhode Island Censuses and vital records were obtained by mailing requests to an incredibly helpful and knowledgeable staff at the Rhode Island State Archives. Copies of the records were available for modest fees, but you did require data about the record you sought. (Contrast that with the ability to search for everyone with the same or similar name in a record set through a digital database.) At the time that meant that some of the names came from index-only databases as place holders until copies of the original records could be found. An index of vital records for the state was available on Ancestry, as were a composite of indexed city directories which formed an 1890 US Census substitute.

Another challenge was correlating dissimilar data. Just as every federal census asks different questions, so does every state census. Vital records change what data is recorded over time, too. The data found in city directories is also different from the other records, containing addresses and occupations but lacking explicit family connections.

My main product was an Excel spreadsheet with tabs for the data collected from each record type by year. I worked to reconcile the different data collected from similar record types. From that spreadsheet, I extracted family units, capturing them in PowerPoint to visually show how the family units changed over time. This gave me some insights but was labor intensive. I contemplated my next steps, knowing that analyses of ages, appearances of people with the same surnames in Rhode Island, and child naming patterns, as well as mapping the neighborhoods were among them.

Fast-forward to now, when more records are available online. For example, in addition to the vital record indexes, images of the RI vital record ledgers are now online. The Rhode Island state censuses are also online. And then there is AI to help with formatting, visualizing and analyzing data.  

Some challenges still exist. There were gaps in census coverage, due to the 1890 US Population Census and the 1895 Rhode Island Census no longer being available. The use of other record types will help to fill in the census gaps. A state-specific challenge is the fact that the 1885 Rhode Island Census is available as an alphabetized index of names, requiring family units to be connected using data in the “Family Number” column.

The state of AI is constantly changing, but I decided to investigate how AI could help this the collection and analysis of data. 

I did try an analysis of the whole spreadsheet in ChatGPT, and I had been able to create family groups and use them to discriminate between some people who had the same name. However, the data was not combined in an efficient manner, and rather than have one large spreadsheet, I decided it would be more understandable to break the data into more manageable pieces, based on the record types. The composite spreadsheet was broken down into different spreadsheets: (1) censuses, (2) births, marriages, and deaths and (3) city directories. I also decided to use AI to help with the data collection process, the analysis and different ways to visualize the data.

At the end of this step: I had a basic plan to redo the data collection, collect additional data that had become available online, and developed ideas on how AI could support this study. The next step will be to use only census date and have AI create the backbone of a timeline for the individuals and families.

AI: Meta Prompting

Blog Banner AI Meta Prompting

If you have attended one of my AI presentations, then you know how important it is to develop prompt engineering skills to get the most out of Large Language Models (LLMs). The good news is that we do not always have to create the perfect prompt on our own!   

There is a harsh term used in my field, GIGO, which stands for Garbage In, Garbage Out. When it comes to AIs, this applies to the fact that the LLM response (output) will only be as good as our prompts (input).  

A simple explanation of meta prompting is to have one Large Language Model (LLM) create a prompt for another one. Meta prompting is more involved than that because it builds a prompt with more specific instructions about the steps to take to realize the goal of the prompt. It is as if the LLM is translating what you want to do into LLM language!

The cinematic arts student at my home gave me some insights into his practical use of meta prompting. He was having an issue with an AI that generates video. It was not creating what he was describing, so he turned to ChatGPT to explain his vision and ask for a prompt to use for generating that image. ChatGPT dutifully responded with a prompt that did work with the AI video generator. The message is that when it comes to crafting prompts, we are not on our own.

While working to understand meta prompting, I thought of an example application to try before applying this skill to genealogy. I asked ChatGPT to create a prompt for me that I could use to have a research report generated for me about a topic. I also specified what and how I wanted to investigate the topic, as well as the fact that I wanted sources and in-text citations. Using the power of the AI to recognize patterns, I certainly wanted analysis to be part of generating the data in the report.

Prompt for a prompt to generate a world building prompt

A prompt was created, but ChatGPT had some specific questions that it included in its response about the type of citation I wanted and asked if there were other constraints, such as word count or including quotes. We had a conversation to refine the prompt, starting with a 308-word prompt and concluding with the final response which was a modular, reusable 1122-word prompt.

The prompt began with: “You are an expert in …

The prompt contained sections for FOCUS & SCOPE, RESEARCH & SOURCES, STRUCTURE OF THE REPORT, STYLE & LENGTH and FINAL OUTPUT

ChatGPT’s prompt also included some interesting anti-hallucination guidance: “If there are areas where evidence is limited (for instance, few direct author comments about a particular name), clearly indicate uncertainty and base comments on reasonable inference, not fabrication.”

I decided to use the prompt in ChatGPT, and opened a new chat. I pasted in the prompt, and it responded with a request for clarification:

ChatGPT asks for clarification

It offered me options, providing details, which are omitted for brevity:

  • Option A — Use only 100% verifiable, well-known, widely documented sources
  • Option B — Allow me to cite plausible but harder-to-verify sources
  • Option C — A blended approach

Then it asked me to respond with which option it should use:

ChatGPT asks for which option to use

After the clarification interaction, ChatGPT told me that

ChatGPT advising me of a long reply

It waited for my response before it began to generate the report:

My response to generate the report

The report was reasonable, and described patterns. ChatGPT offered me formats for downloading the report and other products based on the report, an executive summary and PowerPoint presentations. If I want to dig deeper, this report is valuable to me as a starting place.

Of course, the caveats still remain about not using this for school reports (unless the assignment calls for the use of AI) and not submitting it to a client. There can be tell-tale signs of an AI-generated report, as I know from a high school science fair project done by that same cinematic arts student, and documentation out on the web.

So, will you try meta prompting? Let me know how you do.

NCGS Fall Conference 2025

Blog Post Banner NCGS Fall Conference 2025

Recently I had the pleasure of presenting at, and attending, the North Carolina Genealogical Society Fall Conference 2025. The Conference was very well planned and organized at a wonderful venue with great food. As much as I appreciate the reach of virtual presentations to give presentations at many places far from where I am based, it was nice to be with a group of genealogists, learning and chatting.  

At the Conference, I presented sessions about Military Research and Artificial Intelligence (AI). When speaking about military research, I always customize my presentation to include finding military records for the location of the audience. North Carolina has great resources, both in person and online!

NCGS Military Presentation - Cover

With a Ph.D. in Computer Science and Engineering, I am always reaching deep into the technology of AI to learn its inner workings, and to then share an understanding of how it works and how to use it. As a graduate school professor in cybersecurity, and having tested computer code used on military aircraft for years, I also have a perspective about what we should be concerned about and what can go wrong.

Ancestors, AI and Prompt Engineering NCGS - COVER

What was also fantastic about the Conference was that people could attend the lectures virtually. The NCGS members and technical staff streamed the presentations and recorded them for attendees to watch later. I knew everything was working when questions from online viewers came during the lectures and insightful questions via email were waiting when I returned to my hotel.

Even though my research in North Carolina is limited to a few months during WWII at Camp Davis, I did attend J. Mark Lowe’s presentation, “Creating North Carolina Local and Regional Locality Guides.” (Mark’s smile is even bigger in person!) The presentation definitely had information that I will carry forward to the places where I do research. I will never look at detailed maps the same way again.

I attended another terrific presentation about using DNA to solve maternal surnames by Kate Penney Howard. Jon Smith’s workshop about using AI for creating locality guides certainly shifted my mindset from the free form text I have been using, and his tips about using Gemini in Chrome tabs were game changers. Thankfully the presentations were recorded so that I can enjoy Diane L. Richard’s presentation about Researching Your Ancestors as Kids. (Diane and I share an educational experience: Go RPI Engineers!)

The beginning-to-intermediate artificial intelligence presentation I gave on the first morning may have provided a warm-up for Steve Little’s intermediate artificial intelligence presentation. It is always interesting to see how other genealogists are using AI tools, and how its use is gaining acceptance. Promise to keep checking your output and stay sensitive to privacy concerns!

Thank you to everyone who planned and worked on making the 2025 North Carolina Genealogical Society Annual Conference such a great experience, to the audience members who shared their time with me, and all the other instructors and attendees for a rewarding and fun time!

Recent AI Developments

Blog post banner Recent AI Developments

Have you been following the latest in AI?

One thing I always guarantee during my presentations is that AI models will change! There have been changes to ChatGPT’s video generating model, Sora. As a result, I don’t see Sora anymore when I login to my Plus account on ChatGPT. Now I have to login separately to use Sora. Part of the change is that Sora 2 is now available! Pro users can use it now, but as a Plus user, it may be a while before I get a chance. You can read about the new video model at: https://openai.com/index/sora-2/

An AI ‘actor’ known as Tilly Norwood has been provoking Hollywood. She is a purely AI-generated character coming from Xicoia, the AI division of Particle6. You can watch her, and a cast of AI-generated characters in a sketch written by ChatGPT: AI Commissioner | Comedy Sketch | Particle6

When exploring the world of copyright and artificial intelligence, you may want to check out the U.S Copyright Office’s 3-part Report on Copyright and Artificial Intelligence that can be viewed and downloaded at https://www.copyright.gov/ai/ Purely AI-generated content is not protected by copyright. There has to be a human contribution.

AI copyright infringement lawsuits continue, with the latest one being Warner Bros. Discovery against Midjourney, an AI image generator. You can read about it at:  https://apnews.com/article/warner-bros-midjourney-ai-copyright-lawsuit-dc-studios-b87d80d7b4a4dfdcf0ee149d30830551 This article describes how this AI can output images that violate copyright.

Meanwhile, some lawsuits are drawing to a close. Although a judge stated that Anthropic AI training a model using authors’ material was fair use, the problem was that it used pirated versions of the books for that training. Anthropic agreed to pay $1.5 billion to settle this copyright infringement lawsuit, but the court will need to approve this settlement. If approved, the authors of over 500,000 books will each receive about $3,000. You can read an NPR article about it: “Anthropic settles with authors in first-of-its-kind AI copyright infringement lawsuit” at https://www.npr.org/2025/09/05/nx-s1-5529404/anthropic-settlement-authors-copyright-ai.

NCGS 2025 Fall Conference

NCGS Fall Genealogical Society 2025 Fall Conference ad

Will I see you there?

I am excited to be invited to present in person and online!

On Friday, I will be presenting Ancestors, AI, and Prompt Engineering.

NCGS Fall Genealogical Society 2025 Fall Conference McMahon AI

On Saturday, I will be presenting a Crash Course in Researching Ancestors in the US Military.

NCGS Fall Genealogical Society 2025 Fall Conference McMahon Military Research

There are great speakers, and great talks, Friday and Saturday. There is also an optional Beginner Day on Thursday, featuring four lectures just for beginners!

NCGS Fall Genealogical Society 2025 Fall Conference Beginner Day Ad