In this series of posts about a surname study, Part 1 described the study and Part 2 included how census data was collected and formatted for use.
Census data definitely provides a backbone for research about a family. In this case, I had collected census data from both the federal censuses and the Rhode Island state censuses that were described previously. The next step was to use AI to combine the census data, then analyze it to create that backbone. I wanted ChatGPT to build backbones for the multiple families in the censuses. I uploaded the spreadsheet with the collected census data into my ChatGPT Plus chat, along with the prompt:
I would like to begin by giving you a spreadsheet with US Population Censuses and Rhode Island State Censuses between 1850 and 1900. I would like you to take a look at this data and see if it can be combined into family units, and keep the data from different censuses even though it is dissimilar. From this we will have a backbone to put together the different Gilroy families living in Rhode Island during that time so that we can add more data from different sources. Ask me questions about anything that is unclear.
ChatGPT did have some clarifying questions, then it proceeded to work with me to create a report with Provisional Family Lines.
ChatGPT had enumerated the family units, then analyzed the data and had identified four households with a high confidence level.
Timothy and Eliza Gilroy line (TE) [my known direct ancestors]
Lockey/Lackey and Ellen (Bristol rubber line)
Catharine Gilroy as head with sons Peter and James (Newport line)
Philip Gilroy Providence line (NY to RI step-migration pattern)
My confidence was very high as I had already identified these by my offline analysis. Different relationships had been detected after my immediate ancestors had both died and the younger children moved in with older, married siblings. Suggestions were made about how to verify these relationships. (Those records are coming in the birth/marriage/death phase.)
We had several conversations throughout the process, breaking the analysis into steps. I gave information about the family that was the main focus, which changed the order of data presentation. ChatGPT gave me insights into how others did one name surname studies and favorably compared the approach we were taking to them.
In one conversation ChatGPT explained how it was “crosswalking” through the census. It explained that crosswalks are used in: longitudinal population studies and archival metadata. Crosswalking was being used to link family units, rather than just individuals, between censuses.
ChatGPT was working as an assistant.
At the end of this step:
ChatGPT had compiled and organized data into a report: Gilroy Surname Study Backbone (Rhode Island, 1850-1900). The report also documented the constraints and additional information that had been provided during this phase as external proof controls.
An Excel spreadsheet had been produced to document the four family units, with tabs for the unlinked individuals and the evidence legend.
ChatGPT had built a backbone for the study. We worked together on the contents of a report the captured families, relationships, and unlinked individuals, recommendations for next steps, and an appendix with abbreviation.
This is it! You have decided to give Google’s NotebookLM a try!
Maybe you want step-by-step instructions, or just want to look over the process before diving in. Either way, this tutorial stands ready to help.
What will you do in this Notebook? One suggestion is to upload a group of documents related to a subject or ancestor. These are documents that you want to understand better or analyze. Don’t overthink it. You just need to have an idea of your subject, because once you begin to use the Notebook more ideas will probably come to you.
In this tutorial, we’ll get started with a brand new NotebookLM, add documents to it, then based on those documents generate an Audio Overview, an Infographic, a Slide Deck and a Video Overview.
NOTE: For this tutorial, keep in mind that Google may change how it looks or add/remove specific functionality and labels at any time, but the basic ideas will remain.
When you have decided the topic for your Notepad, it’s time to get going and create it.
In my example I will add only a few documents: the homestead patents and pages from the tract books for Charles F. Gilroy.
Login to your Google account here. If you are already logged into Google in the same browser, you may go directly to this page:
You’re in!
Select Create newnotebook to start.
After you have created a new notebook, a window pops up asking you to add media. (This is the same window that will open when you select + Add sources)
As of this writing the Notebook supports: Google Docs, Slides, PDFs, text files, web URLs, YouTube transcripts, and audio files. When you enter a link a YouTube video, only the transcript will be used and the video has to be public.
For best results, enter documents with text in them. There is no guarantee that images will be transcribed properly.
From this window you can drag and drop the files you want to add to your Notebook.
When adding to this Notebook, I have to admit that I did not follow the text-is-best rule. That means I will need to verify the transcription that the Notebook is using was done correctly. I added Land Patents and Tract Book images. (The Tract Book images had been located by FamilySearch Full-Text Search!)
On the left, I selected one of the sources, and viewed a description of the document containing key information from it that had been extracted.
The workspace that opens is called the Notebook, and it has three windows labeled: Sources, Chat, and Studio. The first two are self-explanatory.
The third window is the Studio Window, which is also called the Studio Panel.
There are two sections within the Studio Panel. One section is home to the buttons, called Action Tiles, where you ask the Notebook to generate complicated multimedia products. By selecting an Action Tile, the Notebook to generate audio or visual presentations, infographic, slide decks, reports, mind maps and more. At this point, several Tiles are labeled “Beta” which means they are almost ready to be full-fledged features but are still being evaluated. Do not let that dissuade you from trying them! Test them out for yourself.
The second section is the Generated Resource List. When you request a product, you will see it added to that list. The list is empty for a new Notebook. As you choose products, the list is populated with the generated media. Next to each resource in that appears in the list there is a 3 dot menu (snowman) where you can Rename, Download, Share or Delete a resource. When you rename a resource, that changes only the name and does not change any of the media’s content.
After uploading the documents, a name for the notebook was automatically generated.
I renamed the Notebook.
Audio Overview
First, I tried an Audio Overview based on the few documents I had uploaded. This action offers to “Generate an AI podcast based on your sources.”
Documentation for the Notebook had explained that it may take some time for the Audio Overview to be generated.
Within minutes, I was listening to audio in a podcast format of two people explaining and discussing the documents and their context in a pleasant conversation presentation. It was 19 minutes, 12 seconds in length.
A clip from this audio is here:
Infographic
Next, I decided to generate an Infographic based on the documents.
In the Generated Resource List at the bottom of the Studio Panel, there was a spinning circle to indicate that the infographic was being generated. When it was done, I could select it from the list.
I clicked on the Infographic in the list in the Studio window
and a Viewer opened up. I had options to share, download, collapse the Viewer and close the Viewer in the upper right hand corner.
After I closed the Viewer, I could click on the snowman (3 dot menu) and to be presented with options: Rename, Download, Share, Delete
This is one of the features that in BETA, but the infographic that was generated was interesting.
Slide Deck
An option is to generate a Slide Deck. At this time, this feature is in BETA.
I selected Slide Deck and waited while it was generated
When I clicked on the Slide Deck in the Resource List, a Viewer opened up where I could look at the slides, and interact with them.
I particularly liked this slide
NotebookLM Generated Slide
I also liked the option to download the slide deck as a PDF or a PowerPoint document.
Selecting “Revise” gives you the chance to interact and make change to the slide. The pending changes will be generated in a few minutes (or longer).
Video Overview
I selected the Video Overview Tile
and accepted the default selections, which included the longer Explainer format.
Generating that video took a long time. When I quizzed Gemini if I could find out how long it took to generate a product, I was told no, but that this task usually took from 5 to 30+ minutes.
At the end of that response, Gemini asked me if generating was taking a long time, and when I said yes, Gemini recommended that I refresh the webpage because the user interface had not updated. When I followed this recommendation, it appeared that the Video Overview generation had failed.
I deleted the Video Overview entry on the Generated Resource List, and tried again. This time I selected the option for a Brief Format.
The brief format video was generated within minutes, providing me with a video 1 minute and 50 seconds long.
When I clicked on the Video Overview in the Generated Resource List it opened a window within the Studio Panel. The video gave the context of the Homestead Act then dove into presenting data about the two homesteads’ and their patents.
An excerpt from the video:
An Experiment in the Chat Window
I have engineering experience in testing, which matches my style of pressing the buttons and trying the features. That made me want to see if I could get some general information in a Chat within the Notebook.
I asked in the Chat window of the Notebook: If I upload a Word document with newspaper clippings can you transcribe all of them?
This was answered literally, using only the data within the Notebook. (At that point, there was no Word document in the sources containing newspaper clippings.) So if you have a general question that is not based on the information loaded into the Notebook, or have a question about how NotebookLM works it would be better to ask it in Google so that Gemini can answer it.
Gemini told me that “…if the clippings are embedded as images (e.g., photos or scans of newspaper pages), NotebookLM may not automatically transcribe that visual information into searchable, readable text” reminding me that “NotebookLM is designed to work with machine-readable text. If your Word document contains photos of newspaper clippings, the AI may be unable to “read” or transcribe the text inside those images.”
Getting back to my Notebook
When you need to revisit your Notebook, or login on a different computer, you can choose it from your list of Recent notebooks.
Current Limitations
According to Gemini, currently free accounts have limits of generating approximately 3 Audio/Video Overviews per day, and can only send 50 chat queries per day. The Free accounts are limited to 50 sources per notebook, and are limited to 100 notebooks. (Workaround for large projects: Try combining multiple, smaller documents into a single PDF or Google Doc before uploading.)
Google has a tutorial that provides good information in an overview, and it can be found at: https://sites.google.com/view/notebook-lm/tutorial
Give this a try and explore the Tiles and Chat. Let me know how you do.
Trying out NotebookLM has been on my to-do list for months. I just did, and I was blown away by it. The accessibility of technologies that I knew existed but had so well not seen integrated was impressive. You can chat with the AI about what has been added to the Notebook, and you can generate products based on what the uploaded documents. The AI-generated media and responses in the Notebook are all based on the documents that you upload to it, which should reduce the opportunity for AI hallucinations. Keep in mind that the best idea is to enter documents with text; there is no guarantee that images will be transcribed properly.
I had already identified a couple of ancestors as test cases. One is all-time family favorite who was born and raised in Newport, Rhode Island, served in the Army during Spanish-American War, then settled on a homestead out in Oregon. He was a poet and a raconteur who loved to travel and was always involved in social movements.
Another ancestor is one of my brick walls. He is the only German immigrant in my tree (so far), and while I have clues about his origins in Germany, I cannot pin down his arrival to the United States or from whence he came. What I have learned about him is in the U.S., and begins when he was married to an Irish woman, after he had anglicized his name. From the time of his marriage, he never lived near other German immigrants. Very knowledgeable and generous researchers in Brooklyn, New York, and in Germany have helped me follow up on the very limited clues I have developed. The ability to pull together the material and look at it from different perspectives has the potential to help with this brick wall.
If you have not had a chance to try out NotebookLM, here is the link:
In the Surname Study and AI Part 1 post, I described the reasons that motivated me to undertake a surname study in Rhode Island, US, and the approach I took. The use of AI tools to help with formatting, visualizing and analyzing data is a goal in this latest iteration of the project.
Both US Population and Rhode Island State Census data were used as a backbone for the study.
My next step was to use AI to capture the transcriptions of key record information from the censuses, and work to normalize it. For this first step, I decided to limit my search to census databases, for exact and similar spelling of the surname, using the exact location of Rhode Island, USA. Even though I collected the images of the census, I collected the data presented on the Record Page to populate the columns of the spreadsheet.
My search settings were:
Last name: Gilroy; Slider: Exact and similar
Lived in: Rhode Island, USA ; Slider: Exact
Focus: United States [this setting was not necessary because I searched for records specific to the United States and Rhode Island]
On the search results page, I used filters to narrow down to one census at a time so that I could collect the data.
Thanks to a great idea I learned from Jon Smith of the North Carolina Genealogical Society, I decided to use Ancestry.com in a Chrome browser with Gemini AI enabled to capture the Record pages.
If you do not see Gemini on the top of Chrome:
First, be sure that you are logged into your Google account. You can do this by logging into your Gmail account in the browser.
Then, try this to enable Gemini in Chrome:
Click the three dots (More), and select Settings from the menu
In Settings, click AI innovations in the left menu, then select Gemini in Chrome.
To collect the data in the US Census, I signed into HeritageQuest in the Chrome browser. Always check your county library, as HeritageQuest may be free to access from home.
I searched for all the occurrences of the surname in Rhode Island, one census at a time for the 1850, 1870, 1880 and 1900 US Censuses. My plan was to collect one line of data for each name that appeared in the search results.
These are example results for the search for exact and similar surnames to Gilroy.
Example Search Results Page (courtesy HeritageQuest.com)
From the 1860 US Census Search Results Page, I right clicked on the View button to open each Record in a new tab.
Example Record Page (courtesy HeritageQuest.com)
Some of the issues and limitations that I found may be due to the fact that I use a free version of Gemini. I had to work on my prompt to have the data captured in a Comma Separated Values (csv) format, so that I could use the data from the transcription of the record in my Excel spreadsheet. I tried to have Gemini decide what to label the columns, but it worked out better when I told it the names of the columns in the prompt.
NOTE: Later on, Gemini and I decided to format the collected data in Markdown tables. This simplified the process, because the data could be pasted directly into the Excel worksheet.
In the interest of time, I used copied all the data from one Record page and asked ChatGPT to extract the data tags, using the prompt:
keep only the data tags such as Name, Age, etc and show them in a comma separated sentence on one line.
That provided me with column names which could then be used in the Gemini prompt. (This was done once for each census.) That way the line for each enumerated person in a worksheet would have the same data in the same columns.
In my type of account (free), Gemini would only look at ten open tabs in the Chrome browser as input to a prompt, so I knew that I would have to collect the data in steps. Gemini wanted to jump right in and give me analysis based on the data in those tabs, and it took some coaxing through prompt refinement to get the data in a form to put into a spreadsheet.
I added tabs using the plus sign until I had selected the Current tab and 9 others to share with Gemini. (When you select more than 10 tabs a warning appears: “Only 10 tabs can be shared.”
Select Multiple Tabs as Input to the Gemini Prompt in Chrome Browser
Prompts may need refinement, and in this case Gemini and I chatted back and forth to get the results that I wanted. Gemini warned me that it could not directly create or download an Excel (.xlsx) file for me, but that it could format the data into a standard CSV (Comma Separated Values) format.
For the 1860 US Census, this is a prompt that I used in Gemini in the Chrome browser. This was the result of refinement, and needed to be changed slightly for each census.
For all open census records, extract the data and generate the full CSV text. For each record, transcribe it into a new row of the CSV . Put the CSV text in a canvas so that I can copy it from the prompt. Structure the output so that each record (the main person detailed on the page) is a single row, and list all their household members’ names in a single column titled ‘Other Household Members (Names)’. **Only transcribe data explicitly visible in the current tab’s detail and household sections.**
Here are only the data tags, formatted as a single comma-separated line:
Name, Age, Birth Year, Gender, Race, Birth Place, Home in 1860, Post Office, Dwelling Number, Family Number, Occupation, Real Estate Value, Inferred Spouse, Household Members (Name)
**For any column field where data is not transcribed, insert a blank space to ensure all records have identical column structures.**
The response included this CSV text.
I used the copy icon at the top right to capture the CSV text, and pasted it into an open Notepad file. The Notepad file was saved as type “All files” and I created a file name ending with the extension “.csv” (CSV = comma separated values)
Then I opened the CSV file in Excel, and copied and pasted the lines into the Excel worksheet.
It seemed that when Gemini was used in the browser, it did not have a large memory, so I would have to reload the prompt during my next session. (Always save your prompts!) Sometimes Gemini wanted to use older data for the task I was giving, so I needed to modify the prompt to remind it to only work on the set of selected tabs.
Since this version of Gemini-enabled browser only allowed me to work on 10 tabs at a time, I stepped carefully through the results to be sure that each person with a name that was Gilroy or similar was included.
In an Excel spreadsheet, I pasted the data from the 1860 census in a worksheet, and labeled its tab “with the year and the type of census”1860 US Census.”
I repeated these steps for each US Population Census.
The Rhode Island state censuses are available on Ancestry.com, and I repeated the same process for each one.
Engineers do enjoy visualizing data, so using Excel, I created a graph of the number of individuals with the exact surname Gilroy or a similar surname for each type of census. Then I combined the number of individuals from both types of censuses, for all available years. Note: the US Census for 1890 and the RI State Census are unavailable.
The story that I know from my hands-on analysis involves people with the Gilroy name arriving and departing Rhode Island through immigration or moving from or to another state in the US. The number of individuals with the same surname varied by marriage, birth and death. Women would either gain the surname through marriage, or lose it when enumerated using their husband’s surname.
Even though I did collect the citations from Ancestry.com, they are not sufficient for publication and I would have to do some more work to create any citations. There are limits to the approach I used. The enumerators may not have visited all the people who shared that surname, and that different transcription efforts may result in different spelling of the surname.
At the end of this step: I had an Excel spreadsheet, with a worksheet for each census. Each worksheet contained a line for each person who was enumerated in the census as having the exact surname Gilroy or a similar surname that was present in the online databases. Each column in a census worksheet has the same type of data, or was blank, for ease of analysis.
Next, I can use an AI tool to analyze the data in each census, and across censuses. My goal is to identify family groups as well as individuals and track their changes through the years of interest.
For researching a WWI or a WWII soldier, have you considered using the Rosters at NARA? They are located Series: Muster Rolls and Rosters, November 1, 1912–December 31, 1943 within Record Group 64. This blog post will show where to search for rosters, including how to use an online finding aid for finding WWII rosters that will make your task much easier.
The rosters are arranged in three subseries within Muster Rolls and Rosters, November 1, 1912–December 31, 1943:
Muster Rolls, November 1, 1912 – June 30, 1918 and Enlisted and Officer Rosters, July 1, 1918 – December 31, 1939,
Officer Rosters, 1920 – 1939,
Army and Army Air Force (Air Corp) Rosters, 1940 – 1943
On this page, you will find information about how to locate WWII rosters organized by:
Army enlisted service members
Army officers
Army Air Force (Air Corp) enlisted service members
Army Air Force (Air Corp) officers
Within those categories, the rosters are organized by type of reporting unit.
To use the finding aid, click on the plus sign to expand the link to locate the type of unit. There will be box numbers shown, but some entries will contain links to digitized rosters, or to a pdf that contains the National Archives Identifier (NAID) in NARA’s Catalog to use when locating the online rosters.
In this example, I am searching for the rosters for a soldier in Battery A of the 500th AAA Gun Battalion, so clicked on the plus sign next to “Chemical and Antiaircraft Artillery” to expand the section.
I licked on the link for “Antiaircraft Battalion – Boxes 246-348.”
The PDF file shows that the rosters are stored by increasing NAID numbers, by the number of the organization.The first page contains the column headers. (They are not repeated on subsequent pages.)
Scrolling down to the beginning of page 4 of the PDF, I find Btry C, 500th AAA Gn Bn, 1943. That means Roll Number 307 (2 of 3), which begins with Battery C of that Battalion has NAID Identifier 371744319.
Since the soldier is in Battery A, I will want to check the previous part of the roll, listed at the bottom of page 3 of the PDF, NAID 371744318, Roll Number 307 (1 of 3). I would expect that rosters for Battery A would be closer to the end of the Roll. (Remember to use the Chrome Browser to see the images in order, as Firefox has a documented bug of showing images in reverse order.)
There is a blank page between the months, so I began by checking the image after the blank pages in the image range of 800-900.
I recommend building a list or a table with the information for the organization, to keep track of links. (Note: Organization and Link appear in the same column for readability. In my Excel worksheet, they appear in separate columns.)
I would want to continue to go backwards chronologically to collect the rosters for the time the soldier I am researching was in the Battery.
Another option, as described in our blog post about locating WWII Morning Reports in PDF Files can also make the task easier. Search the NARA catalog for: 371745320, which is the NAID for the final part of the Roll Number 307, part 3 of 3.
From there, the PDF files containing groups of 125 images from the Roll can be downloaded. Images for Battery A can be found in the files Roll-0307_07.pdf and Roll-0307_08.pdf