Surname Study and AI Part 5: Adding City Directories

blog banner Surname Study and AI Part 5: Adding City Directories

In this series of posts about a surname study, Part 1 described the study, Part 2 included how census data was collected and formatted for use and Part 3 described how to combine and analyze the census data. Part 4 showed how to create a project as part of a surname study (or any any task you are doing). In this part, adding collecting and adding data from city directories to the surname study will be discussed.

When I was approaching this project without AI, I had gathered many records. I had compiled them in spreadsheets, planning to do analysis. Since I am a visual person, the data I collected was compiled in PowerPoint slides with graphics. That meant that I the collected data in a spreadsheet.

Capturing the city directory data for the four cities with directories that listed Gilroys from 1850-1900 took some thought. By its nature, the tables in the spreadsheet were not completely populated; I only had data for years that I could find directories and could only fill in the table when a person was listed in the directory. There had been movement of Gilroys between cities, too.

Since the data collection had been a while ago, I had a chance to revisit Ancestry.com to see if there had been additional city directories added to the database. Spoiler alert: there had. I also used ChatGPT’s services to collect sources for the actual city directories that appear online. This helped, and in some cases I used data from House Directory and Family Address Books.

While ChatGPT and I worked on defining and refining the product, it became important for me to redesign the spreadsheet of city directory data in a different, and more uniform pattern. I was careful to separate out people with the same name, treating them as different individuals until there was confirmation that they could combined into being records for the same person. I experimented with filling in the blank cells of the spreadsheet with a dash, but in the end, leaving the empty cells blank worked better.

The final spreadsheet was all in one worksheet, but had four separate sections for the cities in Rhode Island where I had located Gilroys: Newport, Providence, Westerly and Bristol. There pairs of columns for each entry with occupation and address. Above those headers, I merged the cells to enter the year. Above the year was a brief title for the source, with the page numbers. As you can imagine, the spreadsheet had many columns.

For me, it was important to know that ChatGPT was using all the data in the spreadsheet, so I had it create a listing based on each entry (name, year, occupation, address). I then verified that all the individuals that I had input were being seen by ChatGPT. A complication arose. Even though I uploaded the spreadsheet with the revised city directory entries into the Project, ChatGPT told me that it could not access the spreadsheet. It suggested that I post a screenshot, or the cells, to accompany my questions. So I did.

At the end of ironing out the directions for this task, based on the output, I asked ChatGPT to provide a prompt that would have created a list with all the collected data for me. In the end, I added two columns to this integrity checking spreadsheet: a number to correspond with the individual in the row of the spreadsheet, and a number referencing the source of the data. We also decided it was best for ChatGPT to take in one city at a time, then have me verify that it had the entries before doing the analysis.

ChatGPT created a very detailed reusable prompt with sections describing for the following subtasks:

  • Work by city section only
  • Preserve spreadsheet row order exactly
  • Extract strictly left to right
  • Use exact visible cell text only
  • Output format
  • Add source list below each city table Include abbreviation notes
  • Mandatory verification stop

A report correlating the city directories with the census data was generated. Four families were identified as units, and the backbone of the migration of Gilroys was hypothesized. I then asked for additional insights. I reminded ChatGPT about the single women who came before the families, and they were also discussed. I also had lists of all the people found in the city directories printed by city to include in an appendix.

A lot of the push-pull between ChatGPT during the creation of a report was the fact that it seemed to want to talk about the report more than create it. I had to guide it to create a product more through this task than through the previous one. Honestly, the effort to manually reformat and check the data did help me get immersed in the data in a way that telling someone, or an AI, to collect and analyze data would ever do. (Being hands-on also helped me to combine data during the next phase when I was collecting vital records data.)

In its crosswalk through the city directory data and census data, ChatGPT now had five strong lines. It also saw the connections between parts of the family and addresses. There were clusters on streets: Burn’s Court / Byrnes Court cluster in Newport, and Manton Avenue in Providence. The census where Timothy and Eliza were in Providence were correlated to directory entry for him Manton Avenue. That address was also one where my great-grandfather would live after his parents’ deaths. (More on that story after vital records.) This work may be laying groundwork for chain migration!

Although I know more about the family at this point than ChatGPT (until I add vital records data), I became excited because the married women were speaking through their husbands’ entries! The power of combining different types of records by compiling them was becoming more obvious.

I knew Timothy and Eliza were married in Newport and died there. However, from the census work, I knew that  there was a RI Census showing them living in Providence. After looking at the addresses in Providence, I opened up a new chat within the project and asked about the street where Timothy-Eliza family members lived before and after the immigrant couple’s deaths. This opened up understanding of the historical context of Manton Avenue in Providence, RI.

Other Gilroy families also lived in streets around this industrial area that was home to several mills. ChatGPT shared that this was a likely destination for internal migration within Rhode Island. After Timothy and Eliza’s death, my great-grandfather lived and worked in Providence. He lived at the same address as William Patrick Rafferty, who had married Katie Josephine Gilroy (Timothy and Eliza’s daughter) in Newport in 1889. They would later move to Long Island, NY.  The Manton Avenue connection became more intriguing. In a separate conversation, I asked ChatGPT:

Tell me about Providence, RI Manton Avenue in the 1850-1900 timeframe

This conversation was illuminating, as part of the answer was: “By 1850–1900, it had become one of the city’s major mill and worker-residential corridors.” Mills in the area were named as were streets in the area.

Since ChatGPT had been trained on Gilroy data, unsurprisingly ChatGPT asked:

“Would you like me to help analyze whether any of your Providence Gilroy directory entries fall near Manton Avenue or the Olneyville mill corridor?”

Yes, look especially at city directory entries for these addresses and occupations

ChatGPT gave a listing of the addresses, and explained which streets in those addresses were all within walking distance of each other. It gave me its insight that family members within the a few blocks might be indicators of chain migration, sibling households, and a kin boarding network. This

At the end of this step:

I had created a table with the city directory entries that had been located, by year, for people named Gilroy in the Rhode Island cities of Newport, Providence, Westerly and Bristol. The spreadsheet contained the occupation and address for each person, by year as well as the name of the reference.

ChatGPT had generated a report: Gilroy City Directories Analysis Report With Timelines And Census Correlation, Rhode Island (Newport, Bristol, Providence, Westerly), 1850–1900. There were also separate sections to add to an appendix with all listings of people with the name Gilroy from the cities Newport, Providence, Westerly and Bristol.

More than that, I had a better understanding of a neighborhood in Providence that may be a nexus for my immigrant family.

The way to tie these individuals together would be vital records, so I was eager to move forward to them. I just had to review the data I had, capture data if needed, organize it logically and load it into ChatGPT.

Surname Study and AI Part 4: Making A ChatGPT Project

blog banner Surname Study and AI Part 4: Making A ChatGPT Project

In this series of posts about a surname study, Part 1 described the study, Part 2 included how census data was collected and formatted for use and Part 3 described how to combine and analyze the census data. This blog post will show how to create a project in ChatGPT. Even though the example shows creating a project as part of a surname study, the steps can be used for any task you are doing.

In the work done during the previous part of my project, I asked ChatGPT:

Would it be good to have this chat in a project?

ChatGPT suggested that a Project is good for a long, multi-stage surname study. It explained the benefits of having the related chats and files grouped together for organization. It also recommended creating separate threads as I continued my effort, with tips for naming the chats. ChatGPT went on to suggest which threads and files to include.

NOTE: Although I kept detailed notes about the steps of this study, I had not written the full blog posts as I performed the steps. As a results, some details of the interface had changed, so please keep in mind, ChatGPT is always evolving!

In the menu section Projects I clicked on the + sign next to New Project.

New Project on the ChatGPT Menu

Then a dialog box opened so that I could enter the Project name.

Create Project option on ChatGPT

And the Gilroy Surname Study (RI, 1850-1900) project was created in ChatGPT. It appears in the menu, above the other chats.

New Project created on ChatGPT

Since I already had chats to add to the project, I clicked on the three dots (ellipses, sideways snowman) next to the name of the existing chat I wanted to add (Surname study assistance). Choosing Move to project gave me the option to create a New project and the name of the already existing one.

Move existing chat to the Project

Next, I wanted to add the data files to the project. I clicked on the Project in the menu, and where Chats was already selected.

Project Chats and Sources

I selected Sources, then + Add sources

Project select Sources, then Add sources

The dialog box opened to allow me to add sources. In this case, the sources were my files, and I dragged and dropped them.

Add sources

ChatGPT had offered suggestions about what products to add to the Project, such as checklists it had generated.

NOTE: At this point (ChatGPT 5.2), the names of sources in the project cannot be edited. The types of files that can be added to a project have been expanded, and are: .docx, .pdf, .txt, .md, .xlsx, .csv, .jpg, .jpeg, .png, .tiff, .json, .xml, .pptx, .mp3, .wav, .mp4, .html, .mhtml

At the end of this step:

The chats and source files had been grouped together into a project.

Next step: I decided to look at the data from city directories.

Surname Study and AI Part 3: Combining Census Data

Blog banner Blog Post Surname and AI 3

In this series of posts about a surname study, Part 1 described the study and Part 2 included how census data was collected and formatted for use.

Census data definitely provides a backbone for research about a family. In this case, I had collected census data from both the federal censuses and the Rhode Island state censuses that were described previously. The next step was to use AI to combine the census data, then analyze it to create that backbone. I wanted ChatGPT to build backbones for the multiple families in the censuses. I uploaded the spreadsheet with the collected census data into my ChatGPT Plus chat, along with the prompt:

I would like to begin by giving you a spreadsheet with US Population Censuses and Rhode Island State Censuses between 1850 and 1900. I would like you to take a look at this data and see if it can be combined into family units, and keep the data from different censuses even though it is dissimilar. From this we will have a backbone to put together the different Gilroy families living in Rhode Island during that time so that we can add more data from different sources. Ask me questions about anything that is unclear.

ChatGPT did have some clarifying questions, then it proceeded to work with me to create a report with Provisional Family Lines.

ChatGPT had enumerated the family units, then analyzed the data and had identified four households with a high confidence level.

  • Timothy and Eliza Gilroy line (TE) [my known direct ancestors]
  • Lockey/Lackey and Ellen (Bristol rubber line)
  • Catharine Gilroy as head with sons Peter and James (Newport line)
  • Philip Gilroy Providence line (NY to RI step-migration pattern)

My confidence was very high as I had already identified these by my offline analysis. Different relationships had been detected after my immediate ancestors had both died and the younger children moved in with older, married siblings. Suggestions were made about how to verify these relationships. (Those records are coming in the birth/marriage/death phase.)

We had several conversations throughout the process, breaking the analysis into steps. I gave information about the family that was the main focus, which changed the order of data presentation. ChatGPT gave me insights into how others did one name surname studies and favorably compared the approach we were taking to them.

In one conversation ChatGPT explained how it was “crosswalking” through the census. It explained that crosswalks are used in: longitudinal population studies and archival metadata. Crosswalking was being used to link family units, rather than just individuals, between censuses.

ChatGPT was working as an assistant.

At the end of this step:

ChatGPT had compiled and organized data into a report: Gilroy Surname Study Backbone (Rhode Island, 1850-1900). The report also documented the constraints and additional information that had been provided during this phase as external proof controls.

An Excel spreadsheet had been produced to document the four family units, with tabs for the unlinked individuals and the evidence legend.

ChatGPT had built a backbone for the study. We worked together on the contents of a report the captured families, relationships, and unlinked individuals, recommendations for next steps, and an appendix with abbreviation.

The next step would occur after I asked ChatGPT:

Would it be good to have this chat in a project?

Google NotebookLM Tutorial

Blog Banner Google NotebookLM Tutorial

This is it! You have decided to give Google’s NotebookLM a try!

Maybe you want step-by-step instructions, or just want to look over the process before diving in. Either way, this tutorial stands ready to help.

What will you do in this Notebook? One suggestion is to upload a group of documents related to a subject or ancestor. These are documents that you want to understand better or analyze. Don’t overthink it. You just need to have an idea of your subject, because once you begin to use the Notebook more ideas will probably come to you.

In this tutorial, we’ll get started with a brand new NotebookLM, add documents to it, then based on those documents generate an Audio Overview, an Infographic, a Slide Deck and a Video Overview.

NOTE: For this tutorial, keep in mind that Google may change how it looks or add/remove specific functionality and labels at any time, but the basic ideas will remain.

When you have decided the topic for your Notepad, it’s time to get going and create it.

In my example I will add only a few documents: the homestead patents and pages from the tract books for Charles F. Gilroy.         

Here’s the link:

https://sites.google.com/view/notebook-lm/login

NotebookLM Login Page

Login to your Google account here. If you are already logged into Google in the same browser, you may go directly to this page:

NotebookLM Welcome Page

You’re in!

Select Create new notebook to start.

After you have created a new notebook, a window pops up asking you to add media. (This is the same window that will open when you select + Add sources)

As of this writing the Notebook supports: Google Docs, Slides, PDFs, text files, web URLs, YouTube transcripts, and audio files. When you enter a link a YouTube video, only the transcript will be used and the video has to be public.

For best results, enter documents with text in them. There is no guarantee that images will be transcribed properly.

From this window you can drag and drop the files you want to add to your Notebook.

NotebookLM adding sources

When adding to this Notebook, I have to admit that I did not follow the text-is-best rule. That means I will need to verify the transcription that the Notebook is using was done correctly. I added Land Patents and Tract Book images. (The Tract Book images had been located by FamilySearch Full-Text Search!)

On the left, I selected one of the sources, and viewed a description of the document containing key information from it that had been extracted. 

NotebookLM Source Guide

The workspace that opens is called the Notebook, and it has three windows labeled: Sources, Chat, and Studio. The first two are self-explanatory.

The third window is the Studio Window, which is also called the Studio Panel.

There are two sections within the Studio Panel. One section is home to the buttons, called Action Tiles, where you ask the Notebook to generate complicated multimedia products. By selecting an Action Tile, the Notebook to generate audio or visual presentations, infographic, slide decks, reports, mind maps and more. At this point, several Tiles are labeled “Beta” which means they are almost ready to be full-fledged features but are still being evaluated. Do not let that dissuade you from trying them! Test them out for yourself.

The second section is the Generated Resource List. When you request a product, you will see it added to that list. The list is empty for a new Notebook. As you choose products, the list is populated with the generated media. Next to each resource in that appears in the list there is a 3 dot menu (snowman) where you can Rename, Download, Share or Delete a resource. When you rename a resource, that changes only the name and does not change any of the media’s content.

NotebookLM three windows

After uploading the documents, a name for the notebook was automatically generated.

NotebookLM Sources Window

I renamed the Notebook.

NotebookLM after updating Notebook name

Audio Overview

First, I tried an Audio Overview based on the few documents I had uploaded. This action offers to “Generate an AI podcast based on your sources.”

NotebookLM Audio Overview Tile Detail

Documentation for the Notebook had explained that it may take some time for the Audio Overview to be generated.

NotebookLM Studio Panel Audio being generated on Generation Resource List

Within minutes, I was listening to audio in a podcast format of two people explaining and discussing the documents and their context in a pleasant conversation presentation. It was 19 minutes, 12 seconds in length.

NotebookLM Studio Panel Audio on Generation Resource List

A clip from this audio is here:

Infographic

Next, I decided to generate an Infographic based on the documents.

NotebookLM Infographic Tile Detail

In the Generated Resource List at the bottom of the Studio Panel, there was a spinning circle to indicate that the infographic was being generated. When it was done, I could select it from the list.

I clicked on the Infographic in the list in the Studio window

NotebookLM Studio Panel Infographic on Generation Resource List

and a Viewer opened up. I had options to share, download, collapse the Viewer and close the Viewer in the upper right hand corner.  

NotebookLM Infographic Window

After I closed the Viewer, I could click on the snowman (3 dot menu) and to be presented with options: Rename, Download, Share, Delete

This is one of the features that in BETA, but the infographic that was generated was interesting.

Slide Deck

An option is to generate a Slide Deck. At this time, this feature is in BETA.

NotebookLM Slide Deck Tile Detail

I selected Slide Deck and waited while it was generated

NotebookLM Studio Panel Generation Resource List Overview

When I clicked on the Slide Deck in the Resource List, a Viewer opened up where I could look at the slides, and interact with them.

NotebookLM Slide Deck overview window

I particularly liked this slide

NotebookLM Generated Slide

NotebookLM Generated Slide

I also liked the option to download the slide deck as a PDF or a PowerPoint document.

download the slide deck as a PDF or a PowerPoint document

Selecting “Revise” gives you the chance to interact and make change to the slide. The pending changes will be generated in a few minutes (or longer).

Video Overview

I selected the Video Overview Tile

and accepted the default selections, which included the longer Explainer format.

NotebookLM Customize Video Overview Window – Explainer Format

Generating that video took a long time. When I quizzed Gemini if I could find out how long it took to generate a product, I was told no, but that this task usually took from 5 to 30+ minutes.

NotebookLM Generated Resource List

At the end of that response, Gemini asked me if generating was taking a long time, and when I said yes, Gemini recommended that I refresh the webpage because the user interface had not updated. When I followed this recommendation, it appeared that the Video Overview generation had failed.

NotebookLM Generated Resource List – Video Overview failed

I deleted the Video Overview entry on the Generated Resource List, and tried again. This time I selected the option for a Brief Format.

NotebookLM Customize Video Overview Window – Brief Format

The brief format video was generated within minutes, providing me with a video 1 minute and 50 seconds long.

NotebookLM Generated Resource List – Video Overview

When I clicked on the Video Overview in the Generated Resource List it opened a window within the Studio Panel. The video gave the context of the Homestead Act then dove into presenting data about the two homesteads’ and their patents.

An excerpt from the video:

An Experiment in the Chat Window

I have engineering experience in testing, which matches my style of pressing the buttons and trying the features. That made me want to see if I could get some general information in a Chat within the Notebook.

I asked in the Chat window of the Notebook: If I upload a Word document with newspaper clippings can you transcribe all of them?

This was answered literally, using only the data within the Notebook. (At that point, there was no Word document in the sources containing newspaper clippings.) So if you have a general question that is not based on the information loaded into the Notebook, or have a question about how NotebookLM works it would be better to ask it in Google so that Gemini can answer it.

Gemini told me that “…if the clippings are embedded as images (e.g., photos or scans of newspaper pages), NotebookLM may not automatically transcribe that visual information into searchable, readable text” reminding me that “NotebookLM is designed to work with machine-readable text. If your Word document contains photos of newspaper clippings, the AI may be unable to “read” or transcribe the text inside those images.”

Getting back to my Notebook

When you need to revisit your Notebook, or login on a different computer, you can choose it from your list of Recent notebooks.

NotebookLM Recent Notebooks

Current Limitations

According to Gemini, currently free accounts have limits of generating approximately 3 Audio/Video Overviews per day, and can only send 50 chat queries per day. The Free accounts are limited to 50 sources per notebook, and are limited to 100 notebooks. (Workaround for large projects: Try combining multiple, smaller documents into a single PDF or Google Doc before uploading.)

Google has a tutorial that provides good information in an overview, and it can be found at: https://sites.google.com/view/notebook-lm/tutorial

Give this a try and explore the Tiles and Chat. Let me know how you do.

Have You Tried Google’s NotebookLM Yet?

Blog post banner - have you tried Google's NotebookLM Yet?

Trying out NotebookLM has been on my to-do list for months. I just did, and I was blown away by it. The accessibility of technologies that I knew existed but had so well not seen integrated was impressive. You can chat with the AI about what has been added to the Notebook, and you can generate products based on what the uploaded documents. The AI-generated media and responses in the Notebook are all based on the documents that you upload to it, which should reduce the opportunity for AI hallucinations. Keep in mind that the best idea is to enter documents with text; there is no guarantee that images will be transcribed properly.

I had already identified a couple of ancestors as test cases. One is all-time family favorite who was born and raised in Newport, Rhode Island, served in the Army during Spanish-American War, then settled on a homestead out in Oregon. He was a poet and a raconteur who loved to travel and was always involved in social movements.

Another ancestor is one of my brick walls. He is the only German immigrant in my tree (so far), and while I have clues about his origins in Germany, I cannot pin down his arrival to the United States or from whence he came. What I have learned about him is in the U.S., and begins when he was married to an Irish woman, after he had anglicized his name. From the time of his marriage, he never lived near other German immigrants. Very knowledgeable and generous researchers in Brooklyn, New York, and in Germany have helped me follow up on the very limited clues I have developed. The ability to pull together the material and look at it from different perspectives has the potential to help with this brick wall.

If you have not had a chance to try out NotebookLM, here is the link:

https://sites.google.com/view/notebook-lm/login

NotebookLM Welcome Page

If you are interested, I have put together a step-by-step tutorial that will get you started here: Google NotebookLM Tutorial.