Have You Tried Google’s NotebookLM Yet?

Blog post banner - have you tried Google's NotebookLM Yet?

Trying out NotebookLM has been on my to-do list for months. I just did, and I was blown away by it. The accessibility of technologies that I knew existed but had so well not seen integrated was impressive. You can chat with the AI about what has been added to the Notebook, and you can generate products based on what the uploaded documents. The AI-generated media and responses in the Notebook are all based on the documents that you upload to it, which should reduce the opportunity for AI hallucinations. Keep in mind that the best idea is to enter documents with text; there is no guarantee that images will be transcribed properly.

I had already identified a couple of ancestors as test cases. One is all-time family favorite who was born and raised in Newport, Rhode Island, served in the Army during Spanish-American War, then settled on a homestead out in Oregon. He was a poet and a raconteur who loved to travel and was always involved in social movements.

Another ancestor is one of my brick walls. He is the only German immigrant in my tree (so far), and while I have clues about his origins in Germany, I cannot pin down his arrival to the United States or from whence he came. What I have learned about him is in the U.S., and begins when he was married to an Irish woman, after he had anglicized his name. From the time of his marriage, he never lived near other German immigrants. Very knowledgeable and generous researchers in Brooklyn, New York, and in Germany have helped me follow up on the very limited clues I have developed. The ability to pull together the material and look at it from different perspectives has the potential to help with this brick wall.

If you have not had a chance to try out NotebookLM, here is the link:

https://sites.google.com/view/notebook-lm/login

NotebookLM Welcome Page

If you are interested, I have put together a step-by-step tutorial that will get you started here: Google NotebookLM Tutorial.

Surname Study and AI Part 2: Collecting Census Data

blog banner - Surname Study and AI
Part 2

In the Surname Study and AI Part 1 post, I described the reasons that motivated me to undertake a surname study in Rhode Island, US, and the approach I took. The use of AI tools to help with formatting, visualizing and analyzing data is a goal in this latest iteration of the project.

Both US Population and Rhode Island State Census data were used as a backbone for the study.

My next step was to use AI to capture the transcriptions of key record information from the censuses, and work to normalize it. For this first step, I decided to limit my search to census databases, for exact and similar spelling of the surname, using the exact location of Rhode Island, USA. Even though I collected the images of the census, I collected the data presented on the Record Page to populate the columns of the spreadsheet.

My search settings were:

Last name: Gilroy; Slider: Exact and similar

Lived in: Rhode Island, USA ; Slider: Exact

Focus: United States [this setting was not necessary because I searched for records specific to the United States and Rhode Island]

On the search results page, I used filters to narrow down to one census at a time so that I could collect the data.

Thanks to a great idea I learned from Jon Smith of the North Carolina Genealogical Society, I decided to use Ancestry.com in a Chrome browser with Gemini AI enabled to capture the Record pages.

Gemini in top of Chrome Browser

If you do not see Gemini on the top of Chrome:

First, be sure that you are logged into your Google account. You can do this by logging into your Gmail account in the browser.

Then, try this to enable Gemini in Chrome:

Click the three dots (More), and select Settings from the menu

In Settings, click AI innovations in the left menu, then select Gemini in Chrome.

Chrome Settings to use Gemini
Chrome Preference to open Gemini

To collect the data in the US Census, I signed into HeritageQuest in the Chrome browser. Always check your county library, as HeritageQuest may be free to access from home.

I searched for all the occurrences of the surname in Rhode Island, one census at a time for the 1850, 1870, 1880 and 1900 US Censuses. My plan was to collect one line of data for each name that appeared in the search results.

These are example results for the search for exact and similar surnames to Gilroy.

Example HeritageQuest Search Results Page

Example Search Results Page (courtesy HeritageQuest.com)

From the 1860 US Census Search Results Page, I right clicked on the View button to open each Record in a new tab.

Example HeritageQuest Record Page

Example Record Page (courtesy HeritageQuest.com)

Gemini in top of Chrome Browser

Some of the issues and limitations that I found may be due to the fact that I use a free version of Gemini. I had to work on my prompt to have the data captured in a Comma Separated Values (csv) format, so that I could use the data from the transcription of the record in my Excel spreadsheet. I tried to have Gemini decide what to label the columns, but it worked out better when I told it the names of the columns in the prompt.

NOTE: Later on, Gemini and I decided to format the collected data in Markdown tables. This simplified the process, because the data could be pasted directly into the Excel worksheet.

In the interest of time, I used copied all the data from one Record page and asked ChatGPT to extract the data tags, using the prompt:

keep only the data tags such as Name, Age, etc and show them in a comma separated sentence on one line.

That provided me with column names which could then be used in the Gemini prompt. (This was done once for each census.) That way the line for each enumerated person in a worksheet would have the same data in the same columns.

In my type of account (free), Gemini would only look at ten open tabs in the Chrome browser as input to a prompt, so I knew that I would have to collect the data in steps. Gemini wanted to jump right in and give me analysis based on the data in those tabs, and it took some coaxing through prompt refinement to get the data in a form to put into a spreadsheet.

I added tabs using the plus sign until I had selected the Current tab and 9 others to share with Gemini. (When you select more than 10 tabs a warning appears: “Only 10 tabs can be shared.”

Select Multiple Tabs as Input to the Gemini Prompt in Chrome Browser

Select Multiple Tabs as Input to the Gemini Prompt in Chrome Browser

Prompts may need refinement, and in this case Gemini and I chatted back and forth to get the results that I wanted. Gemini warned me that it could not directly create or download an Excel (.xlsx) file for me, but that it could format the data into a standard CSV (Comma Separated Values) format.

For the 1860 US Census, this is a prompt that I used in Gemini in the Chrome browser. This was the result of refinement, and needed to be changed slightly for each census.

For all open census records, extract the data and generate the full CSV text. For each record, transcribe it into a new row of the CSV . Put the CSV text in a canvas so that I can copy it from the prompt. Structure the output so that each record (the main person detailed on the page) is a single row, and list all their household members’ names in a single column titled ‘Other Household Members (Names)’. **Only transcribe data explicitly visible in the current tab’s detail and household sections.**

Here are only the data tags, formatted as a single comma-separated line:

Name, Age, Birth Year, Gender, Race, Birth Place, Home in 1860, Post Office, Dwelling Number, Family Number, Occupation, Real Estate Value, Inferred Spouse, Household Members (Name)

**For any column field where data is not transcribed, insert a blank space to ensure all records have identical column structures.**

The response included this CSV text.

CSV from the Gemini Canvas

I used the copy icon at the top right to capture the CSV text, and pasted it into an open Notepad file. The Notepad file was saved as type “All files” and I created a file name ending with the extension “.csv” (CSV = comma separated values)

Save Notepad file as CSV

Then I opened the CSV file in Excel, and copied and pasted the lines into the Excel worksheet.

It seemed that when Gemini was used in the browser, it did not have a large memory, so I would have to reload the prompt during my next session. (Always save your prompts!) Sometimes Gemini wanted to use older data for the task I was giving, so I needed to modify the prompt to remind it to only work on the set of selected tabs.

Since this version of Gemini-enabled browser only allowed me to work on 10 tabs at a time, I stepped carefully through the results to be sure that each person with a name that was Gilroy or similar was included.

In an Excel spreadsheet, I pasted the data from the 1860 census in a worksheet, and labeled its tab “with the year and the type of census”1860 US Census.”

I repeated these steps for each US Population Census.

The Rhode Island state censuses are available on Ancestry.com, and I repeated the same process for each one.

Engineers do enjoy visualizing data, so using Excel, I created a graph of the number of individuals with the exact surname Gilroy or a similar surname for each type of census. Then I combined the number of individuals from both types of censuses, for all available years. Note: the US Census for 1890 and the RI State Census are unavailable.

graph US Census Results for Gilroys in Rhode Island by Year
graph RI Census Results for Gilroys in Rhode Island by Year
graph Census Results for RI Gilroys by Year Combined

The story that I know from my hands-on analysis involves people with the Gilroy name arriving and departing Rhode Island through immigration or moving from or to another state in the US. The number of individuals with the same surname varied by marriage, birth and death. Women would either gain the surname through marriage, or lose it when enumerated using their husband’s surname.

Even though I did collect the citations from Ancestry.com, they are not sufficient for publication and I would have to do some more work to create any citations. There are limits to the approach I used. The enumerators may not have visited all the people who shared that surname, and that different transcription efforts may result in different spelling of the surname.

At the end of this step: I had an Excel spreadsheet, with a worksheet for each census. Each worksheet contained a line for each person who was enumerated in the census as having the exact surname Gilroy or a similar surname that was present in the online databases. Each column in a census worksheet has the same type of data, or was blank, for ease of analysis.

Excel spreadsheet, with a worksheet for each census.

Next, I can use an AI tool to analyze the data in each census, and across censuses. My goal is to identify family groups as well as individuals and track their changes through the years of interest.

Finding WWII Rosters Online at NARA

Blog post Banner  Finding WWII Rosters Online at NARA

For researching a WWI or a WWII soldier, have you considered using the Rosters at NARA? They are located Series: Muster Rolls and Rosters, November 1, 1912–December 31, 1943 within Record Group 64. This blog post will show where to search for rosters, including how to use an online finding aid for finding WWII rosters that will make your task much easier.

The rosters are arranged in three subseries within Muster Rolls and Rosters, November 1, 1912–December 31, 1943:

  • Muster Rolls, November 1, 1912 – June 30, 1918 and Enlisted and Officer Rosters, July 1, 1918 – December 31, 1939,
  • Officer Rosters, 1920 – 1939,
  • Army and Army Air Force (Air Corp) Rosters, 1940 – 1943

The Series is located at: https://catalog.archives.gov/id/85713803

There are 625 pages of links viewing on this Series webpage, so you can browse for an organization.

First page of Series: Muster Rolls and Rosters, November 1, 1912–December 31, 1943

You can also search within the Series for a soldier’s name, military serial number, or even an organization: https://catalog.archives.gov/search-within/85713803

Search within Series: Muster Rolls and Rosters, November 1, 1912–December 31, 1943

When researching WWII soldiers, there is an online finding aid to streamline the process: https://www.archives.gov/st-louis/archival-programs/army-rosters-1940-1943

Finding Aide for Army Rosters 1940-1943 Online

On this page, you will find information about how to locate WWII rosters organized by:

  • Army enlisted service members
  • Army officers
  • Army Air Force (Air Corp) enlisted service members
  • Army Air Force (Air Corp) officers

Within those categories, the rosters are organized by type of reporting unit.

Table for RG 64, Series: Muster Rolls and Rosters, November 1, 1912–December 31, 1943 Subseries 3: Army and Army Air Force (Air Corp) Rosters, 1940 – 1943

To use the finding aid, click on the plus sign to expand the link to locate the type of unit. There will be box numbers shown, but some entries will contain links to digitized rosters, or to a pdf that contains the National Archives Identifier (NAID) in NARA’s Catalog to use when locating the online rosters.

In this example, I am searching for the rosters for a soldier in Battery A of the 500th AAA Gun Battalion, so clicked on the plus sign next to “Chemical and Antiaircraft Artillery” to expand the section.

Finding aid webpage for AAA Battalion rosters

I licked on the link for “Antiaircraft Battalion – Boxes 246-348.”

The link led to a PDF file with the Catalog NAID.

https://www.archives.gov/files/antiaircraft-artillery-battalion-index.pdf

The PDF file shows that the rosters are stored by increasing NAID numbers, by the number of the organization.The first page contains the column headers. (They are not repeated on subsequent pages.)

Finding aid file for AAA Battalion rosters

Scrolling down to the beginning of page 4 of the PDF, I find Btry C, 500th AAA Gn Bn, 1943. That means Roll Number 307 (2 of 3), which begins with Battery C of that Battalion has NAID Identifier 371744319.

Finding aid file for 500th AAA Battalion rosters

Since the soldier is in Battery A, I will want to check the previous part of the roll, listed at the bottom of page 3 of the PDF, NAID 371744318, Roll Number 307 (1 of 3). I would expect that rosters for Battery A would be closer to the end of the Roll. (Remember to use the Chrome Browser to see the images in order, as Firefox has a documented bug of showing images in reverse order.)

I searched from the NARA Catalog Home Page: https://catalog.archives.gov

Roster Roll for 500th AAA Battalion

There is a blank page between the months, so I began by checking the image after the blank pages in the image range of 800-900.

I recommend building a list or a table with the information for the organization, to keep track of links. (Note: Organization and Link appear in the same column for readability. In my Excel worksheet, they appear in separate columns.)

Table for tracking Roster images, links and dates

I would want to continue to go backwards chronologically to collect the rosters for the time the soldier I am researching was in the Battery.

Another option, as described in our blog post about locating WWII Morning Reports in PDF Files can also make the task easier. Search the NARA catalog for: 371745320, which is the NAID for the final part of the Roll Number 307, part 3 of 3.

PDF files of images available for download

From there, the PDF files containing groups of 125 images from the Roll can be downloaded. Images for Battery A can be found in the files Roll-0307_07.pdf and Roll-0307_08.pdf

Give it a try and let me know how you do!

Now Open: “Tracing Your New York Ancestors with the NYG&B”

Are you researching Ancestors in New York State? If so, you will probably be interested in the free on-demand online course from The New York Genealogy & Biographical Society: “Tracing Your New York Ancestors with the NYG&B.”

In the six video sessions of the course, you will learn about NYG&B, its services, membership and publications. The sessions cover the use of their online collections.

When you sign up for the course, you have 60 days to complete it. During that time you can review the on-demand lectures and revisit the materials. When the 60 days expires, you can re-register for the course again. When you complete the videos for all the sessions you can enter your name to receive a personalized digital certificate.

NYG&B Certificate

The opportunity to learn about the resources of the NYG&B through the sessions given by their esteemed experts is terrific. The information that I found most interesting were about the resources available on their website, addressing how to navigate their website and ways to search their databases. The demonstration of how to search their collections through the New York Public Library catalog were invaluable.

It was fascinating to learn about the products created by the scholars in residence, who have created work about the resources at the NYG&B. Those products are available on the website, and might be leveraged in our own research.

Insights into the educational programs and how to use the tools were also important parts of the course. Be sure to download the handouts that accompany each class.

You can register for the online course, as well as learn about the sessions and presenters at: https://www.newyorkfamilyhistory.org/tracing-your-new-york-ancestors-nygb-registration

Surname Study and AI Part 1: The Approach

log banner - Blog Post Surname and AI 1

This blog post begins a series of posts exploring an ongoing surname study and my recent use of artificial intelligence (AI) in it. In this post, I will describe the history of getting to this point in my efforts.

Over the course of several years, I have been working on a surname study. My goal was to find out if and how families who lived in Rhode Island from 1850-1900 were connected. Chain migration to the United States from Ireland was entirely likely, and by connecting these family units I could potentially research collateral relatives to learn more about the family unit(s) back in Ireland.

Using what I had learned from researching my direct ancestors, these were the parameters:

  • Surname: Gilroy
  • Place: Rhode Island, US
  • Timeframe: 1850-1900

For this project, I collected both federal and census data to use as the backbone of the research. Then I built upon the intermediate years using vital records. I faced some challenges when collecting the data. At that time, Rhode Island Censuses and vital records were obtained by mailing requests to an incredibly helpful and knowledgeable staff at the Rhode Island State Archives. Copies of the records were available for modest fees, but you did require data about the record you sought. (Contrast that with the ability to search for everyone with the same or similar name in a record set through a digital database.) At the time that meant that some of the names came from index-only databases as place holders until copies of the original records could be found. An index of vital records for the state was available on Ancestry, as were a composite of indexed city directories which formed an 1890 US Census substitute.

Another challenge was correlating dissimilar data. Just as every federal census asks different questions, so does every state census. Vital records change what data is recorded over time, too. The data found in city directories is also different from the other records, containing addresses and occupations but lacking explicit family connections.

My main product was an Excel spreadsheet with tabs for the data collected from each record type by year. I worked to reconcile the different data collected from similar record types. From that spreadsheet, I extracted family units, capturing them in PowerPoint to visually show how the family units changed over time. This gave me some insights but was labor intensive. I contemplated my next steps, knowing that analyses of ages, appearances of people with the same surnames in Rhode Island, and child naming patterns, as well as mapping the neighborhoods were among them.

Fast-forward to now, when more records are available online. For example, in addition to the vital record indexes, images of the RI vital record ledgers are now online. The Rhode Island state censuses are also online. And then there is AI to help with formatting, visualizing and analyzing data.  

Some challenges still exist. There were gaps in census coverage, due to the 1890 US Population Census and the 1895 Rhode Island Census no longer being available. The use of other record types will help to fill in the census gaps. A state-specific challenge is the fact that the 1885 Rhode Island Census is available as an alphabetized index of names, requiring family units to be connected using data in the “Family Number” column.

The state of AI is constantly changing, but I decided to investigate how AI could help this the collection and analysis of data. 

I did try an analysis of the whole spreadsheet in ChatGPT, and I had been able to create family groups and use them to discriminate between some people who had the same name. However, the data was not combined in an efficient manner, and rather than have one large spreadsheet, I decided it would be more understandable to break the data into more manageable pieces, based on the record types. The composite spreadsheet was broken down into different spreadsheets: (1) censuses, (2) births, marriages, and deaths and (3) city directories. I also decided to use AI to help with the data collection process, the analysis and different ways to visualize the data.

At the end of this step: I had a basic plan to redo the data collection, collect additional data that had become available online, and developed ideas on how AI could support this study. The next step will be to use only census date and have AI create the backbone of a timeline for the individuals and families.