Surname Study and AI Part 1: The Approach

log banner - Blog Post Surname and AI 1

This blog post begins a series of posts exploring an ongoing surname study and my recent use of artificial intelligence (AI) in it. In this post, I will describe the history of getting to this point in my efforts.

Over the course of several years, I have been working on a surname study. My goal was to find out if and how families who lived in Rhode Island from 1850-1900 were connected. Chain migration to the United States from Ireland was entirely likely, and by connecting these family units I could potentially research collateral relatives to learn more about the family unit(s) back in Ireland.

Using what I had learned from researching my direct ancestors, these were the parameters:

  • Surname: Gilroy
  • Place: Rhode Island, US
  • Timeframe: 1850-1900

For this project, I collected both federal and census data to use as the backbone of the research. Then I built upon the intermediate years using vital records. I faced some challenges when collecting the data. At that time, Rhode Island Censuses and vital records were obtained by mailing requests to an incredibly helpful and knowledgeable staff at the Rhode Island State Archives. Copies of the records were available for modest fees, but you did require data about the record you sought. (Contrast that with the ability to search for everyone with the same or similar name in a record set through a digital database.) At the time that meant that some of the names came from index-only databases as place holders until copies of the original records could be found. An index of vital records for the state was available on Ancestry, as were a composite of indexed city directories which formed an 1890 US Census substitute.

Another challenge was correlating dissimilar data. Just as every federal census asks different questions, so does every state census. Vital records change what data is recorded over time, too. The data found in city directories is also different from the other records, containing addresses and occupations but lacking explicit family connections.

My main product was an Excel spreadsheet with tabs for the data collected from each record type by year. I worked to reconcile the different data collected from similar record types. From that spreadsheet, I extracted family units, capturing them in PowerPoint to visually show how the family units changed over time. This gave me some insights but was labor intensive. I contemplated my next steps, knowing that analyses of ages, appearances of people with the same surnames in Rhode Island, and child naming patterns, as well as mapping the neighborhoods were among them.

Fast-forward to now, when more records are available online. For example, in addition to the vital record indexes, images of the RI vital record ledgers are now online. The Rhode Island state censuses are also online. And then there is AI to help with formatting, visualizing and analyzing data.  

Some challenges still exist. There were gaps in census coverage, due to the 1890 US Population Census and the 1895 Rhode Island Census no longer being available. The use of other record types will help to fill in the census gaps. A state-specific challenge is the fact that the 1885 Rhode Island Census is available as an alphabetized index of names, requiring family units to be connected using data in the “Family Number” column.

The state of AI is constantly changing, but I decided to investigate how AI could help this the collection and analysis of data. 

I did try an analysis of the whole spreadsheet in ChatGPT, and I had been able to create family groups and use them to discriminate between some people who had the same name. However, the data was not combined in an efficient manner, and rather than have one large spreadsheet, I decided it would be more understandable to break the data into more manageable pieces, based on the record types. The composite spreadsheet was broken down into different spreadsheets: (1) censuses, (2) births, marriages, and deaths and (3) city directories. I also decided to use AI to help with the data collection process, the analysis and different ways to visualize the data.

At the end of this step: I had a basic plan to redo the data collection, collect additional data that had become available online, and developed ideas on how AI could support this study. The next step will be to use only census date and have AI create the backbone of a timeline for the individuals and families.

Book Review: Your Stripped Bare Guide

blog banner - Book Review Your Stripped Bare Guide

Having earned a Ph.D., worked as a professor, and published research, I know that citing sources is essential in academic work. Having published in multiple disciplines, I have used different styles of citations and variations of those styles. Students can usually understand why direct quotations need to be cited, but do not always grasp why the facts they use in their writing must also be attributed. The methods in technical papers are explained so that they can be reproducible. For a technical discipline, citations can be used by readers to go upstream to the authoritative sources and investigate the source material for themselves. Those sources are typically published and readily accessible.

Genealogical writing is different. Using historical sources can be far more complicated. They are evidence of the past events, which support interpretations of that past. The access to an historical source may limited. A source may exist in one location with restrictions limiting its physical access, which means that readers may not be able to examine the source themselves. A source may have been destroyed or lost, leaving us with only an image or description of it. It may exist only in the private files of a researcher or in unpublished manuscripts. Primary sources may also be subjective, which introduces another layer of evaluation. Therefore, citations of historical sources need to convey to readers the information about the source and the implications about its reliability.

This is where Your Stripped Bare Guide to Citing & Using History Sources becomes invaluable. Ms. Mills wants us to understand why and how to do this effectively. Improving how we cite our sources will result in better products. As we evaluate our sources our citations present evaluation of their reliability to others. She guides us as we turn our evidence into proof.

There are sobering thoughts throughout this book. Without DNA evidence, attributing relationships between individuals in the past is built on trust in the informants. The authenticity of a tombstone image downloaded from an online cemetery website differs from that of an image we photographed on our own in-person visit to a cemetery.

All sources are not created equal, nor do all have the same weight when considered as proof. From the very start of our research, we need to not only be tracking where our sources come from but evaluating them as we collect them. We must also be wary of bad data; multiple references to it should not be mistaken for proof of its veracity. There is a good reminder to put full citations on the front page of all our notes or copies of documents.

Enjoy the guidelines for analyzing evidence. The book also includes universal templates and construction notes for our use, including templates for the daunting layered citations with an explanation of why they are important. While you might be tempted to ignore Appendix One until you need to define terms, I recommend that you review it early to make sure that you understand the language of citing and using historical sources. (The first term you look up should be “q.v.”)

Consider allowing Ms. Mills to guide you toward stronger, clearer, more reliable research!

The book can be found at: https://genealogical.com/store/your-stripped-bare-guide-to-citing-using-history-sources/

Note: A review copy was provided by the publisher.

More WWII Morning Reports in the NARA Catalog

Blog banner - More WWII Morning Reports in the NARA Catalog

It should not be a surprise that to anyone who has read my most recent book Finding and Using U.S. Army WWI and WWII Morning Reports: A Research Guide for Historians and Genealogists or seen my presentation about Morning Reports that I periodically check the NARA catalog. Specifically, I have been checking to see if reports later than July 1944 have been uploaded. Today I searched, and success!

As a reminder, I search from the main catalog page at https://catalog.archives.gov/ so that I can benefit from the links to the search terms in the results. To see if more Morning Report are available I search for terms such as: “Morning Reports” AND “October 1944”

This time there were results! I kept searching, and Morning Reports up to December 1944 are available. (Search terms: “Morning Reports” AND “December 1944”) I did check for any from 1945, there were no results (yet!).

Of course, I did search for my father, by name and by serial number. This time I did find a mention of him. My brother was the first to know of this find, grateful that this was not a middle-of-the-night call!

For tonight, I will share that SGT James C. McMahon appeared in a Morning Report for 13 October 1944, still in Narsarssuak, Greenland. July 1944 had left me with a cliffhanger, and this record provided data about two military organizations with which he served after the 500th AAA Gun Battalion left Greenland.

Morning Report 416th Base Hq & Air Base Squadron, 13 October 1944

I will post about my continuing research as I use Morning Reports to reconstruct my father’s WWII service. From here, I will be busy moving forward and backward in time to track what was happening in Greenland.

Thank you to NARA! Good luck searching, and let me know how you do!

AI: Meta Prompting

Blog Banner AI Meta Prompting

If you have attended one of my AI presentations, then you know how important it is to develop prompt engineering skills to get the most out of Large Language Models (LLMs). The good news is that we do not always have to create the perfect prompt on our own!   

There is a harsh term used in my field, GIGO, which stands for Garbage In, Garbage Out. When it comes to AIs, this applies to the fact that the LLM response (output) will only be as good as our prompts (input).  

A simple explanation of meta prompting is to have one Large Language Model (LLM) create a prompt for another one. Meta prompting is more involved than that because it builds a prompt with more specific instructions about the steps to take to realize the goal of the prompt. It is as if the LLM is translating what you want to do into LLM language!

The cinematic arts student at my home gave me some insights into his practical use of meta prompting. He was having an issue with an AI that generates video. It was not creating what he was describing, so he turned to ChatGPT to explain his vision and ask for a prompt to use for generating that image. ChatGPT dutifully responded with a prompt that did work with the AI video generator. The message is that when it comes to crafting prompts, we are not on our own.

While working to understand meta prompting, I thought of an example application to try before applying this skill to genealogy. I asked ChatGPT to create a prompt for me that I could use to have a research report generated for me about a topic. I also specified what and how I wanted to investigate the topic, as well as the fact that I wanted sources and in-text citations. Using the power of the AI to recognize patterns, I certainly wanted analysis to be part of generating the data in the report.

Prompt for a prompt to generate a world building prompt

A prompt was created, but ChatGPT had some specific questions that it included in its response about the type of citation I wanted and asked if there were other constraints, such as word count or including quotes. We had a conversation to refine the prompt, starting with a 308-word prompt and concluding with the final response which was a modular, reusable 1122-word prompt.

The prompt began with: “You are an expert in …

The prompt contained sections for FOCUS & SCOPE, RESEARCH & SOURCES, STRUCTURE OF THE REPORT, STYLE & LENGTH and FINAL OUTPUT

ChatGPT’s prompt also included some interesting anti-hallucination guidance: “If there are areas where evidence is limited (for instance, few direct author comments about a particular name), clearly indicate uncertainty and base comments on reasonable inference, not fabrication.”

I decided to use the prompt in ChatGPT, and opened a new chat. I pasted in the prompt, and it responded with a request for clarification:

ChatGPT asks for clarification

It offered me options, providing details, which are omitted for brevity:

  • Option A — Use only 100% verifiable, well-known, widely documented sources
  • Option B — Allow me to cite plausible but harder-to-verify sources
  • Option C — A blended approach

Then it asked me to respond with which option it should use:

ChatGPT asks for which option to use

After the clarification interaction, ChatGPT told me that

ChatGPT advising me of a long reply

It waited for my response before it began to generate the report:

My response to generate the report

The report was reasonable, and described patterns. ChatGPT offered me formats for downloading the report and other products based on the report, an executive summary and PowerPoint presentations. If I want to dig deeper, this report is valuable to me as a starting place.

Of course, the caveats still remain about not using this for school reports (unless the assignment calls for the use of AI) and not submitting it to a client. There can be tell-tale signs of an AI-generated report, as I know from a high school science fair project done by that same cinematic arts student, and documentation out on the web.

So, will you try meta prompting? Let me know how you do.

Book Review: Genealogy in Reverse

Blog Banner book review Genealogy in Reverse

Have you ever wanted to look at the playbook that someone who specializes in finding living descendants has created? Have you wondered what resources a researcher would employ online and in the real world? Have you wanted to use such a playbook in your own reverse genealogy efforts? Within its 54 pages, Genealogy in Reverse: Finding the Living lets you have your own copy of concise notes written by Ms. Passey. She does this professionally and her impressive credentials include working as a subcontractor in the important genealogical work involved in repatriating the remains of WWII service members.

I have a cybersecurity certification in ethical hacking, and my favorite phase of the process is reconnaissance. Although DNA is not a part of the hacking methodology (yet), and the goals are different, the skills I have learned have assisted me in finding heirs and helping a museum find living descendants of a person that they planned to honor. These efforts made me curious about seeing that playbook, too.

The methodology and guidance for finding the living that are presented in the book are based on sound principles. Those principles are shared with the reader. There are instructions and screenshots for importing and exporting family trees between Ancestry.com and Legacy Family Tree Software, and how to split off a separate tree in Legacy Family Tree Software. The use of online databases and websites is part of the process, as is research at real-world locations. The value of newspapers is shown; helpful details may be tucked within their pages. Suggestions about contacting living people are also included. The final chapter contains a brief anonymized repatriation case. 

The book is intended to benefit all levels of genealogists. To make the most of the tools, methods and guidance introduced, I would recommend that a genealogist be comfortable with the individual parts of the process. It would be beneficial for a genealogist to be comfortable with using online family trees and/or a family tree software program of their choice, have a little experience with newspaper research, and a good understanding of DNA types and testing plans.

The book can be found at: https://genealogical.com/store/genealogy-in-reverse-finding-the-living/

Note: A review copy was provided by the publisher.