Surname Study and AI Part 5: Adding City Directories

blog banner Surname Study and AI Part 5: Adding City Directories

In this series of posts about a surname study, Part 1 described the study, Part 2 included how census data was collected and formatted for use and Part 3 described how to combine and analyze the census data. Part 4 showed how to create a project as part of a surname study (or any any task you are doing). In this part, adding collecting and adding data from city directories to the surname study will be discussed.

When I was approaching this project without AI, I had gathered many records. I had compiled them in spreadsheets, planning to do analysis. Since I am a visual person, the data I collected was compiled in PowerPoint slides with graphics. That meant that I the collected data in a spreadsheet.

Capturing the city directory data for the four cities with directories that listed Gilroys from 1850-1900 took some thought. By its nature, the tables in the spreadsheet were not completely populated; I only had data for years that I could find directories and could only fill in the table when a person was listed in the directory. There had been movement of Gilroys between cities, too.

Since the data collection had been a while ago, I had a chance to revisit Ancestry.com to see if there had been additional city directories added to the database. Spoiler alert: there had. I also used ChatGPT’s services to collect sources for the actual city directories that appear online. This helped, and in some cases I used data from House Directory and Family Address Books.

While ChatGPT and I worked on defining and refining the product, it became important for me to redesign the spreadsheet of city directory data in a different, and more uniform pattern. I was careful to separate out people with the same name, treating them as different individuals until there was confirmation that they could combined into being records for the same person. I experimented with filling in the blank cells of the spreadsheet with a dash, but in the end, leaving the empty cells blank worked better.

The final spreadsheet was all in one worksheet, but had four separate sections for the cities in Rhode Island where I had located Gilroys: Newport, Providence, Westerly and Bristol. There pairs of columns for each entry with occupation and address. Above those headers, I merged the cells to enter the year. Above the year was a brief title for the source, with the page numbers. As you can imagine, the spreadsheet had many columns.

For me, it was important to know that ChatGPT was using all the data in the spreadsheet, so I had it create a listing based on each entry (name, year, occupation, address). I then verified that all the individuals that I had input were being seen by ChatGPT. A complication arose. Even though I uploaded the spreadsheet with the revised city directory entries into the Project, ChatGPT told me that it could not access the spreadsheet. It suggested that I post a screenshot, or the cells, to accompany my questions. So I did.

At the end of ironing out the directions for this task, based on the output, I asked ChatGPT to provide a prompt that would have created a list with all the collected data for me. In the end, I added two columns to this integrity checking spreadsheet: a number to correspond with the individual in the row of the spreadsheet, and a number referencing the source of the data. We also decided it was best for ChatGPT to take in one city at a time, then have me verify that it had the entries before doing the analysis.

ChatGPT created a very detailed reusable prompt with sections describing for the following subtasks:

  • Work by city section only
  • Preserve spreadsheet row order exactly
  • Extract strictly left to right
  • Use exact visible cell text only
  • Output format
  • Add source list below each city table Include abbreviation notes
  • Mandatory verification stop

A report correlating the city directories with the census data was generated. Four families were identified as units, and the backbone of the migration of Gilroys was hypothesized. I then asked for additional insights. I reminded ChatGPT about the single women who came before the families, and they were also discussed. I also had lists of all the people found in the city directories printed by city to include in an appendix.

A lot of the push-pull between ChatGPT during the creation of a report was the fact that it seemed to want to talk about the report more than create it. I had to guide it to create a product more through this task than through the previous one. Honestly, the effort to manually reformat and check the data did help me get immersed in the data in a way that telling someone, or an AI, to collect and analyze data would ever do. (Being hands-on also helped me to combine data during the next phase when I was collecting vital records data.)

In its crosswalk through the city directory data and census data, ChatGPT now had five strong lines. It also saw the connections between parts of the family and addresses. There were clusters on streets: Burn’s Court / Byrnes Court cluster in Newport, and Manton Avenue in Providence. The census where Timothy and Eliza were in Providence were correlated to directory entry for him Manton Avenue. That address was also one where my great-grandfather would live after his parents’ deaths. (More on that story after vital records.) This work may be laying groundwork for chain migration!

Although I know more about the family at this point than ChatGPT (until I add vital records data), I became excited because the married women were speaking through their husbands’ entries! The power of combining different types of records by compiling them was becoming more obvious.

I knew Timothy and Eliza were married in Newport and died there. However, from the census work, I knew that  there was a RI Census showing them living in Providence. After looking at the addresses in Providence, I opened up a new chat within the project and asked about the street where Timothy-Eliza family members lived before and after the immigrant couple’s deaths. This opened up understanding of the historical context of Manton Avenue in Providence, RI.

Other Gilroy families also lived in streets around this industrial area that was home to several mills. ChatGPT shared that this was a likely destination for internal migration within Rhode Island. After Timothy and Eliza’s death, my great-grandfather lived and worked in Providence. He lived at the same address as William Patrick Rafferty, who had married Katie Josephine Gilroy (Timothy and Eliza’s daughter) in Newport in 1889. They would later move to Long Island, NY.  The Manton Avenue connection became more intriguing. In a separate conversation, I asked ChatGPT:

Tell me about Providence, RI Manton Avenue in the 1850-1900 timeframe

This conversation was illuminating, as part of the answer was: “By 1850–1900, it had become one of the city’s major mill and worker-residential corridors.” Mills in the area were named as were streets in the area.

Since ChatGPT had been trained on Gilroy data, unsurprisingly ChatGPT asked:

“Would you like me to help analyze whether any of your Providence Gilroy directory entries fall near Manton Avenue or the Olneyville mill corridor?”

Yes, look especially at city directory entries for these addresses and occupations

ChatGPT gave a listing of the addresses, and explained which streets in those addresses were all within walking distance of each other. It gave me its insight that family members within the a few blocks might be indicators of chain migration, sibling households, and a kin boarding network. This

At the end of this step:

I had created a table with the city directory entries that had been located, by year, for people named Gilroy in the Rhode Island cities of Newport, Providence, Westerly and Bristol. The spreadsheet contained the occupation and address for each person, by year as well as the name of the reference.

ChatGPT had generated a report: Gilroy City Directories Analysis Report With Timelines And Census Correlation, Rhode Island (Newport, Bristol, Providence, Westerly), 1850–1900. There were also separate sections to add to an appendix with all listings of people with the name Gilroy from the cities Newport, Providence, Westerly and Bristol.

More than that, I had a better understanding of a neighborhood in Providence that may be a nexus for my immigrant family.

The way to tie these individuals together would be vital records, so I was eager to move forward to them. I just had to review the data I had, capture data if needed, organize it logically and load it into ChatGPT.

Surname Study and AI Part 4: Making A ChatGPT Project

blog banner Surname Study and AI Part 4: Making A ChatGPT Project

In this series of posts about a surname study, Part 1 described the study, Part 2 included how census data was collected and formatted for use and Part 3 described how to combine and analyze the census data. This blog post will show how to create a project in ChatGPT. Even though the example shows creating a project as part of a surname study, the steps can be used for any task you are doing.

In the work done during the previous part of my project, I asked ChatGPT:

Would it be good to have this chat in a project?

ChatGPT suggested that a Project is good for a long, multi-stage surname study. It explained the benefits of having the related chats and files grouped together for organization. It also recommended creating separate threads as I continued my effort, with tips for naming the chats. ChatGPT went on to suggest which threads and files to include.

NOTE: Although I kept detailed notes about the steps of this study, I had not written the full blog posts as I performed the steps. As a results, some details of the interface had changed, so please keep in mind, ChatGPT is always evolving!

In the menu section Projects I clicked on the + sign next to New Project.

New Project on the ChatGPT Menu

Then a dialog box opened so that I could enter the Project name.

Create Project option on ChatGPT

And the Gilroy Surname Study (RI, 1850-1900) project was created in ChatGPT. It appears in the menu, above the other chats.

New Project created on ChatGPT

Since I already had chats to add to the project, I clicked on the three dots (ellipses, sideways snowman) next to the name of the existing chat I wanted to add (Surname study assistance). Choosing Move to project gave me the option to create a New project and the name of the already existing one.

Move existing chat to the Project

Next, I wanted to add the data files to the project. I clicked on the Project in the menu, and where Chats was already selected.

Project Chats and Sources

I selected Sources, then + Add sources

Project select Sources, then Add sources

The dialog box opened to allow me to add sources. In this case, the sources were my files, and I dragged and dropped them.

Add sources

ChatGPT had offered suggestions about what products to add to the Project, such as checklists it had generated.

NOTE: At this point (ChatGPT 5.2), the names of sources in the project cannot be edited. The types of files that can be added to a project have been expanded, and are: .docx, .pdf, .txt, .md, .xlsx, .csv, .jpg, .jpeg, .png, .tiff, .json, .xml, .pptx, .mp3, .wav, .mp4, .html, .mhtml

At the end of this step:

The chats and source files had been grouped together into a project.

Next step: I decided to look at the data from city directories.

Surname Study and AI Part 3: Combining Census Data

Blog banner Blog Post Surname and AI 3

In this series of posts about a surname study, Part 1 described the study and Part 2 included how census data was collected and formatted for use.

Census data definitely provides a backbone for research about a family. In this case, I had collected census data from both the federal censuses and the Rhode Island state censuses that were described previously. The next step was to use AI to combine the census data, then analyze it to create that backbone. I wanted ChatGPT to build backbones for the multiple families in the censuses. I uploaded the spreadsheet with the collected census data into my ChatGPT Plus chat, along with the prompt:

I would like to begin by giving you a spreadsheet with US Population Censuses and Rhode Island State Censuses between 1850 and 1900. I would like you to take a look at this data and see if it can be combined into family units, and keep the data from different censuses even though it is dissimilar. From this we will have a backbone to put together the different Gilroy families living in Rhode Island during that time so that we can add more data from different sources. Ask me questions about anything that is unclear.

ChatGPT did have some clarifying questions, then it proceeded to work with me to create a report with Provisional Family Lines.

ChatGPT had enumerated the family units, then analyzed the data and had identified four households with a high confidence level.

  • Timothy and Eliza Gilroy line (TE) [my known direct ancestors]
  • Lockey/Lackey and Ellen (Bristol rubber line)
  • Catharine Gilroy as head with sons Peter and James (Newport line)
  • Philip Gilroy Providence line (NY to RI step-migration pattern)

My confidence was very high as I had already identified these by my offline analysis. Different relationships had been detected after my immediate ancestors had both died and the younger children moved in with older, married siblings. Suggestions were made about how to verify these relationships. (Those records are coming in the birth/marriage/death phase.)

We had several conversations throughout the process, breaking the analysis into steps. I gave information about the family that was the main focus, which changed the order of data presentation. ChatGPT gave me insights into how others did one name surname studies and favorably compared the approach we were taking to them.

In one conversation ChatGPT explained how it was “crosswalking” through the census. It explained that crosswalks are used in: longitudinal population studies and archival metadata. Crosswalking was being used to link family units, rather than just individuals, between censuses.

ChatGPT was working as an assistant.

At the end of this step:

ChatGPT had compiled and organized data into a report: Gilroy Surname Study Backbone (Rhode Island, 1850-1900). The report also documented the constraints and additional information that had been provided during this phase as external proof controls.

An Excel spreadsheet had been produced to document the four family units, with tabs for the unlinked individuals and the evidence legend.

ChatGPT had built a backbone for the study. We worked together on the contents of a report the captured families, relationships, and unlinked individuals, recommendations for next steps, and an appendix with abbreviation.

The next step would occur after I asked ChatGPT:

Would it be good to have this chat in a project?

Surname Study and AI Part 2: Collecting Census Data

blog banner - Surname Study and AI
Part 2

In the Surname Study and AI Part 1 post, I described the reasons that motivated me to undertake a surname study in Rhode Island, US, and the approach I took. The use of AI tools to help with formatting, visualizing and analyzing data is a goal in this latest iteration of the project.

Both US Population and Rhode Island State Census data were used as a backbone for the study.

My next step was to use AI to capture the transcriptions of key record information from the censuses, and work to normalize it. For this first step, I decided to limit my search to census databases, for exact and similar spelling of the surname, using the exact location of Rhode Island, USA. Even though I collected the images of the census, I collected the data presented on the Record Page to populate the columns of the spreadsheet.

My search settings were:

Last name: Gilroy; Slider: Exact and similar

Lived in: Rhode Island, USA ; Slider: Exact

Focus: United States [this setting was not necessary because I searched for records specific to the United States and Rhode Island]

On the search results page, I used filters to narrow down to one census at a time so that I could collect the data.

Thanks to a great idea I learned from Jon Smith of the North Carolina Genealogical Society, I decided to use Ancestry.com in a Chrome browser with Gemini AI enabled to capture the Record pages.

Gemini in top of Chrome Browser

If you do not see Gemini on the top of Chrome:

First, be sure that you are logged into your Google account. You can do this by logging into your Gmail account in the browser.

Then, try this to enable Gemini in Chrome:

Click the three dots (More), and select Settings from the menu

In Settings, click AI innovations in the left menu, then select Gemini in Chrome.

Chrome Settings to use Gemini
Chrome Preference to open Gemini

To collect the data in the US Census, I signed into HeritageQuest in the Chrome browser. Always check your county library, as HeritageQuest may be free to access from home.

I searched for all the occurrences of the surname in Rhode Island, one census at a time for the 1850, 1870, 1880 and 1900 US Censuses. My plan was to collect one line of data for each name that appeared in the search results.

These are example results for the search for exact and similar surnames to Gilroy.

Example HeritageQuest Search Results Page

Example Search Results Page (courtesy HeritageQuest.com)

From the 1860 US Census Search Results Page, I right clicked on the View button to open each Record in a new tab.

Example HeritageQuest Record Page

Example Record Page (courtesy HeritageQuest.com)

Gemini in top of Chrome Browser

Some of the issues and limitations that I found may be due to the fact that I use a free version of Gemini. I had to work on my prompt to have the data captured in a Comma Separated Values (csv) format, so that I could use the data from the transcription of the record in my Excel spreadsheet. I tried to have Gemini decide what to label the columns, but it worked out better when I told it the names of the columns in the prompt.

NOTE: Later on, Gemini and I decided to format the collected data in Markdown tables. This simplified the process, because the data could be pasted directly into the Excel worksheet.

In the interest of time, I used copied all the data from one Record page and asked ChatGPT to extract the data tags, using the prompt:

keep only the data tags such as Name, Age, etc and show them in a comma separated sentence on one line.

That provided me with column names which could then be used in the Gemini prompt. (This was done once for each census.) That way the line for each enumerated person in a worksheet would have the same data in the same columns.

In my type of account (free), Gemini would only look at ten open tabs in the Chrome browser as input to a prompt, so I knew that I would have to collect the data in steps. Gemini wanted to jump right in and give me analysis based on the data in those tabs, and it took some coaxing through prompt refinement to get the data in a form to put into a spreadsheet.

I added tabs using the plus sign until I had selected the Current tab and 9 others to share with Gemini. (When you select more than 10 tabs a warning appears: “Only 10 tabs can be shared.”

Select Multiple Tabs as Input to the Gemini Prompt in Chrome Browser

Select Multiple Tabs as Input to the Gemini Prompt in Chrome Browser

Prompts may need refinement, and in this case Gemini and I chatted back and forth to get the results that I wanted. Gemini warned me that it could not directly create or download an Excel (.xlsx) file for me, but that it could format the data into a standard CSV (Comma Separated Values) format.

For the 1860 US Census, this is a prompt that I used in Gemini in the Chrome browser. This was the result of refinement, and needed to be changed slightly for each census.

For all open census records, extract the data and generate the full CSV text. For each record, transcribe it into a new row of the CSV . Put the CSV text in a canvas so that I can copy it from the prompt. Structure the output so that each record (the main person detailed on the page) is a single row, and list all their household members’ names in a single column titled ‘Other Household Members (Names)’. **Only transcribe data explicitly visible in the current tab’s detail and household sections.**

Here are only the data tags, formatted as a single comma-separated line:

Name, Age, Birth Year, Gender, Race, Birth Place, Home in 1860, Post Office, Dwelling Number, Family Number, Occupation, Real Estate Value, Inferred Spouse, Household Members (Name)

**For any column field where data is not transcribed, insert a blank space to ensure all records have identical column structures.**

The response included this CSV text.

CSV from the Gemini Canvas

I used the copy icon at the top right to capture the CSV text, and pasted it into an open Notepad file. The Notepad file was saved as type “All files” and I created a file name ending with the extension “.csv” (CSV = comma separated values)

Save Notepad file as CSV

Then I opened the CSV file in Excel, and copied and pasted the lines into the Excel worksheet.

It seemed that when Gemini was used in the browser, it did not have a large memory, so I would have to reload the prompt during my next session. (Always save your prompts!) Sometimes Gemini wanted to use older data for the task I was giving, so I needed to modify the prompt to remind it to only work on the set of selected tabs.

Since this version of Gemini-enabled browser only allowed me to work on 10 tabs at a time, I stepped carefully through the results to be sure that each person with a name that was Gilroy or similar was included.

In an Excel spreadsheet, I pasted the data from the 1860 census in a worksheet, and labeled its tab “with the year and the type of census”1860 US Census.”

I repeated these steps for each US Population Census.

The Rhode Island state censuses are available on Ancestry.com, and I repeated the same process for each one.

Engineers do enjoy visualizing data, so using Excel, I created a graph of the number of individuals with the exact surname Gilroy or a similar surname for each type of census. Then I combined the number of individuals from both types of censuses, for all available years. Note: the US Census for 1890 and the RI State Census are unavailable.

graph US Census Results for Gilroys in Rhode Island by Year
graph RI Census Results for Gilroys in Rhode Island by Year
graph Census Results for RI Gilroys by Year Combined

The story that I know from my hands-on analysis involves people with the Gilroy name arriving and departing Rhode Island through immigration or moving from or to another state in the US. The number of individuals with the same surname varied by marriage, birth and death. Women would either gain the surname through marriage, or lose it when enumerated using their husband’s surname.

Even though I did collect the citations from Ancestry.com, they are not sufficient for publication and I would have to do some more work to create any citations. There are limits to the approach I used. The enumerators may not have visited all the people who shared that surname, and that different transcription efforts may result in different spelling of the surname.

At the end of this step: I had an Excel spreadsheet, with a worksheet for each census. Each worksheet contained a line for each person who was enumerated in the census as having the exact surname Gilroy or a similar surname that was present in the online databases. Each column in a census worksheet has the same type of data, or was blank, for ease of analysis.

Excel spreadsheet, with a worksheet for each census.

Next, I can use an AI tool to analyze the data in each census, and across censuses. My goal is to identify family groups as well as individuals and track their changes through the years of interest.

Surname Study and AI Part 1: The Approach

log banner - Blog Post Surname and AI 1

This blog post begins a series of posts exploring an ongoing surname study and my recent use of artificial intelligence (AI) in it. In this post, I will describe the history of getting to this point in my efforts.

Over the course of several years, I have been working on a surname study. My goal was to find out if and how families who lived in Rhode Island from 1850-1900 were connected. Chain migration to the United States from Ireland was entirely likely, and by connecting these family units I could potentially research collateral relatives to learn more about the family unit(s) back in Ireland.

Using what I had learned from researching my direct ancestors, these were the parameters:

  • Surname: Gilroy
  • Place: Rhode Island, US
  • Timeframe: 1850-1900

For this project, I collected both federal and census data to use as the backbone of the research. Then I built upon the intermediate years using vital records. I faced some challenges when collecting the data. At that time, Rhode Island Censuses and vital records were obtained by mailing requests to an incredibly helpful and knowledgeable staff at the Rhode Island State Archives. Copies of the records were available for modest fees, but you did require data about the record you sought. (Contrast that with the ability to search for everyone with the same or similar name in a record set through a digital database.) At the time that meant that some of the names came from index-only databases as place holders until copies of the original records could be found. An index of vital records for the state was available on Ancestry, as were a composite of indexed city directories which formed an 1890 US Census substitute.

Another challenge was correlating dissimilar data. Just as every federal census asks different questions, so does every state census. Vital records change what data is recorded over time, too. The data found in city directories is also different from the other records, containing addresses and occupations but lacking explicit family connections.

My main product was an Excel spreadsheet with tabs for the data collected from each record type by year. I worked to reconcile the different data collected from similar record types. From that spreadsheet, I extracted family units, capturing them in PowerPoint to visually show how the family units changed over time. This gave me some insights but was labor intensive. I contemplated my next steps, knowing that analyses of ages, appearances of people with the same surnames in Rhode Island, and child naming patterns, as well as mapping the neighborhoods were among them.

Fast-forward to now, when more records are available online. For example, in addition to the vital record indexes, images of the RI vital record ledgers are now online. The Rhode Island state censuses are also online. And then there is AI to help with formatting, visualizing and analyzing data.  

Some challenges still exist. There were gaps in census coverage, due to the 1890 US Population Census and the 1895 Rhode Island Census no longer being available. The use of other record types will help to fill in the census gaps. A state-specific challenge is the fact that the 1885 Rhode Island Census is available as an alphabetized index of names, requiring family units to be connected using data in the “Family Number” column.

The state of AI is constantly changing, but I decided to investigate how AI could help this the collection and analysis of data. 

I did try an analysis of the whole spreadsheet in ChatGPT, and I had been able to create family groups and use them to discriminate between some people who had the same name. However, the data was not combined in an efficient manner, and rather than have one large spreadsheet, I decided it would be more understandable to break the data into more manageable pieces, based on the record types. The composite spreadsheet was broken down into different spreadsheets: (1) censuses, (2) births, marriages, and deaths and (3) city directories. I also decided to use AI to help with the data collection process, the analysis and different ways to visualize the data.

At the end of this step: I had a basic plan to redo the data collection, collect additional data that had become available online, and developed ideas on how AI could support this study. The next step will be to use only census date and have AI create the backbone of a timeline for the individuals and families.