ChatGPT and GEDCOM Files

Blog Post Banner ChatGPT and GEDCOM Files

Before I was a professor, I was a flight test engineer. My love of testing systems goes back to my early days working in a lab during college. My particular gift was always find a way to “break” hardware or software through use. My desire to investigate the use of ChatGPT in genealogy has definitely coincided with my enjoyment of testing. In this blog post, I take a look at what ChatGPT knows about GEDCOMs, how it builds one and how it can create a narrative when given an individual’s data formatted in a GEDCOM.

The technical jargon in this paragraph is available for those who want a slightly deeper understanding. In computer science, data can be grouped together in meaningful representation of things that live in the real world. A data structure is a way to group fields in a specific order for a program to input data, manipulate it, and output it. The way that genealogical data is formatted and shared is the GEnealogical Data COMmunication (GEDCOM) standard.

GEDCOM (Genealogical Data Communication) is a file format used to exchange genealogical data between different genealogy software programs. It is a standard format for saving family tree data, and it allows users to transfer their family tree data from one program to another.

GEDCOM files are saved with the extension “.ged” and are made up of text-based data that includes information about individuals, families, and events such as births, marriages, and deaths. The data is organized in a hierarchical format, with each record containing information about a single individual or event.

GEDCOM files can be used to create family trees, research family history, and share information with other genealogists. They are widely used by genealogy software programs and online genealogy databases. For example, you can export a GEDCOM from your family tree program or download a GEDCOM from Ancestry.com.

NOTE: DO NOT ENTER PRIVATE OR SENSITIVE DATA INTO ChatGPT. Your data is used for training, and is reviewed by OpenAI to verify that content complies with their policies and safety requirements. They may be used for training purposes.

I asked ChatGPT what it knew about GEDCOMs with prompts: What is a GEDCOM file? What is the GEDCOM standard? What are the fields in the GEDCOM standard?

ChatGPT answered reasonably well, except that it confidently stated the latest version of GEDCOM being used was 5.5.1. This is understandable because ChatGPT’s training ended in 2021. (As of the writing of this blog post, the current version  is 7.0. For more information see the FamilySearch wiki entry for GEDCOM.)

Knowing that ChatGPT was using GEDCOM 5.5.1 was not a problem for these experiments.

Creating a GEDCOM

I would not choose to build a GEDCOM in this manner, but I could see how entering a narrative about ancestors into the prompt and let ChatGPT build the relationships from written language could be helpful. Beginning a family tree or adding a separate branch could be done by ChatGPT, then imported into a family tree program.

Investigating how effective ChatGPT was at creating a simple GEDCOM, I asked it to:

Create a GEDCOM file for James Charles McMahon, born 10 Oct 1920, father Joseph Francis McMahon, mother was Ella Small.

GEDCOM file from ChatGPT

ChatGPT extracted the information from my request and filled in the fields. I only asked for a simple GEDCOM file, and had been very specific in what details to include. ChatGPT did fine with this request. You can see the button to copy the code so that I could store it in a file with a .ged extension that would be usable by a family tree program that conformed to the GEDCOM specification. In fact, it even warned me:

ChatGPT warning

By the way, the clipboard next to the response lets a user copy the whole response so that a user can paste the response into the document of their choice. When clicked, the clipboard turns into a checkmark momentarily, then returns to being a clipboard. The thumbs up and thumbs down allow a user to provide additional feedback. If the feedback is thumbs down, another version of the reply is generated and a user has the opportunity to share whether the new one or previous response is better, or if they were the same. Giving feedback is always optional.

ChatGPT feedback

NOTE: This is a representation of an individual in a GEDCOM format and is not a file that can be directly imported into a family tree program. The header and footer information is not present, however, I could give ChatGPT that information and ask it to update the GEDCOM to include it.

I tried again with a new prompt that contained more details about the person’s life:

Create a GEDCOM file for James Charles McMahon, born 10 Oct 1920 in Brooklyn, Kings County, NY, father Joseph Francis McMahon, mother was Ella Small. James Charles McMahon died on 28 Nov 1987 in New York, New York, New York, US.

The response was filed the additional data correctly into the GEDCOM:

Updated GEDCOM file from ChatGPT

Using the GEDCOM as input to a family tree program

I asked for the file in a couple of different ways, but ChatGPT gave me only the section of the file for an individual. Rootsmagic had problems with importing this and creating a family tree, but after a little experimentation, I found that was because the was missing the header and trailer information. This was quickly remedied by editing the file.

It was interesting how the placeholder text for the birth and date information for the individual’s mother and father was inserted into the GEDCOM to be interpreted by the program. Of course, this could be fixed later in the conversation by asking for an updated GEDCOM with this information. As the chat went on, I also gave ChatGPT their marriage information and asked it to update the GEDCOM.

Creating a narrative from a GEDCOM

For my next experiment, I copied the second GEDCOM that ChatGPT had generated and fed it back into the prompt, asking:

Write a narrative for James Charles McMahon given his GEDCOM information:

0 @I1@ INDI

1 NAME James Charles /McMahon/

1 SEX M

[the rest of the file is not shown for brevity]

ChatGPT had learned details from our previous conversation, and inserted details about the individual learned from previous GEDCOMS. Starting the request in a new conversation brought its knowledge about the individual back to the nothing and the story included only the information from the prompt.

Of course, ChatGPT only uses what I told it. In reality, this individual was not an only child. Interestingly, after it writes that he grew up in a family of three, with himself and his parents, he was depicted as a beloved brother. This is due to large language models relying on their training to build the next part of their output.

Next, I checked if the format of the input mattered to ChatGPT, and made the GEDCOM data into one continuous stream, rather than distinct lines, in my prompt:

Write a narrative for James Charles McMahon given his GEDCOM information:

0 @I1@ INDI 1 NAME James Charles /McMahon/ 1 SEX M [the rest of the file is not shown for brevity]

ChatGPT did not need lines of the file to be formatted; it interpreted the data correctly then wrote a narrative. (This is also true when entering data from a table into the prompt.) Without information about the individual’s parents death, the model built the text that they survived him, and in the same sentence that they were deceased before his passing. ChatGPT can appear to loose its mind, so always proofread any output before using it.

Next, I carved out the lines for this individual from a GEDCOM that had been exported from a family tree program, complete with source citations embedded in the code. This text was used it as input to ChatGPT, and I asked it again to write a narrative from the GEDCOM. ChatGPT was successful in capturing the details it knew. It also created some generalizations like: “Throughout his life, James was a beloved member of his family and community.” It also added context without being prompted: “Though we don’t have much information about his specific experiences, we can imagine that he lived through many significant moments in history, including World War II and the civil rights movement.”

The tales that ChatGPT weaves from a user’s input can be a combination of technically accurate and fanciful. The facts that are input can be woven into a smoother and grammatically correct output. Any additional text that ChatGPT generates or additional contextual content it adds does need to be verified. ChatGPT is a generative language model that creates sentences without judgement, and those facts are presented as correct.  (Always check the details that ChatGPT adds, as it may “hallucinate”!)

ChatGPT generates text with an optimistic tone. The tales do all seem to end on a positive note, reminding me of appending “and a good time was had by all” to a story.

As with any tool, how we used the output matters. ChatGPT has the flexibility to regenerate a response to our prompt, and we have the ability to edit the text as we see fit. This tool could be helpful to a genealogist trying to get started on that family history they have been planning to write. ChatGPT can help someone get around a writer’s block by providing a starting place. It can also proofread what you generate. All you have to do is ask.

It was instructive to see how the narrative text that was put into the prompt was translated into lines in the GEDCOM file. I enjoyed peaking under the hood of the implementation that is at the heart of family tree programs.

Let me know how you do, and send along any questions.

ChatGPT May 3 Version was used for these experiments. Expect ChatGPT to change over time as the technology matures.

Please check out other posts about ChatGPT and Artificial Intelligence:

5 Ways to Use ChatGPT to Research an Ancestor

Getting Started with ChatGPT

Artificial Intelligence and Genealogy

Cultural Anthropology and Genealogy

Blog Header Cultural Anthropology and Genealogy

Cultural Anthropology

Last semester I took a third course in anthropology. After taking courses in Archaeology and Biological Anthropology, the next for me to tackle was Cultural Anthropology. (Our local community college does not offer a course in the fourth area of anthropology, linguistic anthropology.) Due to the nature of the subject material, this class was the least rooted in hard science. Cultural Anthropology studies how a society organizes itself. This is done through its beliefs, and how people live, think, create and find meaning. It introduces the concept that cultures have an intrinsic logic in their practices.

A big part of this branch of anthropology is fieldwork. Anthropologists in the field study societies, collecting data to build ethnographies. This data is often qualitative. Originally fieldworkers studied societies as impartial and distant observers; later they shifted to coming off the veranda to be participant observers.

When we go beyond our ancestors’ birth and death dates to fill in the dashes with what they did between those two dates, we are doing something similar to the fieldwork done by anthropologists. We often wish that we could go back in time to come off the veranda to be participant observers but lacking that option we can use the older anthropologists’ method of building their work on others’ first-hand source material. In our pursuit, we can use published sources that were contemporary to their times to learn about their culture at their time. When we research and write about our ancestors, we are building an ethnography. We can interact with the artifacts that they and their contemporaries left behind, which is like the activities of archaeologists.

Even though we cannot be participant observers in our ancestor’s society during their time, sometimes we can participate with a society that is close to theirs. This can be done through participating in ethnic crafts, cooking, dancing, clothing, reading the books they read, learning stories they told and heard, and learning about or practicing their beliefs.

Interview with Mark Hildebrand about the Annapolis Past Port Wiki

Blog Post - Annapolis Past Port

Recently we had a chance to speak with Mark Hildebrand, the Executive Director of Make Your Mark Media, Inc., in Annapolis, MD, to discuss a remarkable collaborative project that captures the memories and history of community members. In this interview you can learn about a creative and engaging approach to capturing history and how you can participate in this project.

What is the Annapolis Past Port Wiki?

Basically, Annapolis Past Port is a history wiki for stories and history in and around Annapolis, Maryland. It is free and open to the public, and it uses the same software and structure as Wikipedia. It was created in 2017 as part of a summer internship program by Make Your Mark Media – an Annapolis-based nonprofit. The interns were Science Technology Engineering and Mathematics (STEM) students enrolled in Anne Arundel County Public Schools. We wanted to capture stories about people, places, things and events as they are remembered by the community that experienced or heard about them. The focus of Annapolis Past Port is not the hard facts and statistics that are on Wikipedia already, but the memories, stories and even tall tales that should be preserved for future generations.

What motivated you to begin the Annapolis Past Port Wiki?

Several years ago, one of my board members asked me if there was a way to develop a public database of historic sites and people in the Annapolis area. She had attended a conference in New England where they showcased one developed by a local historic society. Because it was a series of web pages, it seemed a bit restrictive and reliant upon a web developer to create the content. I found that a wiki could be a great format that would make it easy for the public to upload content and share it. And unlike Facebook or Blogs, that content would not get buried under subsequent entries.

What challenges have you faced with the wiki and what surprises have you had?

One of the biggest challenges has been to get others to add content to the wiki. I have created most of the current pages, and although I have reached out to local historians and even done a few public workshops, few have taken the next step to upload their research or stories. So I was very pleasantly surprised when one of our interns from this past supper created wiki pages for a Nike missile site I had never heard of, just outside of Annapolis. And then one on Lee Airport in Edgewater. As with all of the wiki pages, they need more information and contributions from other sources, but they are a great beginning.

Where can people find out more about Past Port?

You can find the Annapolis Past Port history wiki at pastport.org. You can browse and search the wiki. The Main Page has a list of some of the recently added pages. There are images as well as audio and video. 

How can people participate in the Annapolis Past Port?

Anyone is welcome to create an account and add or edit content. We have provided links to guidelines on creating and formatting content. Through links to Make Your Mark Media (www.mym-media.org) you can contact me at mark@mym-media.org for direct assistance.

NOW AVAILABLE: Our New Research WWI Guide

Our newest book is NOW AVAILABLE!

Researching U.S. WWI Military Members, Military Organizations and Overseas Noncombatants:

A Research Guide for Historians and Genealogists

Have you been wanting to do research about the military and supporting organizations in World War I? With these 30 chapters, this book shows how you can learn about the service of a U.S. World War I military member, WWI military organizations and about noncombatants who went overseas.

Based on feedback for the popular “Researching Your U.S. WWI Army Ancestors” and questions asked during popular lectures, this book reaches beyond researching ancestors in the Army to include information about researching service members in the U.S. Navy, the U.S. Marine Corps, the U.S. Coast Guard and the Merchant Marine, along with the civilian noncombatants who went overseas to support the troops. The strategies presented can also be used in larger projects to research a military organization.

Among the topics covered are how to research the U.S. Army, the U.S. Navy, the U.S. Marine Corps and the U.S. Coast Guard. Also included are some starting places for civilian organizations who supported the troops overseas. Information about the Merchant Marine is also included, and prisoners of war. Other chapters cover specific record sets. There is a chapter about researching fallen service members who died overseas. A variety of sources are presented to dig deeper for information gathering through types of sources and where to find them. There are ideas about using social media and what to do with what you learned.

This book will lead you to use a timeline so that you can capture what you will learn during your WWI research. Learn to use a variety of resources including online records, social networking, archives and how to expand your search to other places where material from WWI can be found. It contains ideas to turn your research into works that can be shared with others.

Based on feedback for the popular “Researching Your U.S. WWI Army Ancestors” and questions asked during popular lectures, this book reaches beyond researching ancestors in the Army to include information about researching service members in the U.S. Navy, the U.S. Marine Corps, the U.S. Coast Guard and the Merchant Marine, along with the civilian noncombatants who went overseas to support the troops. The strategies presented can also be used in larger projects to research a military organization.

“Researching U.S. WWI Military Members, Military Organizations and Overseas Noncombatants” can be found on Amazon.

Book Review: “History for Genealogists”

Book Review "History for Genealogists"
"History for Genealogists" book cover

When I envision a commercial for this book, it would have to be a full infomercial rather than a short spot between segments of a favorite program. Timelines are well known tools for genealogy, and are my go-to tool for unraveling mysteries. This book contains historical timelines and so much more. Ms. Jacobson gives context to the timelines, which in turn add context to the genealogical research of individuals and families.

Using history in our genealogy is that extra step to bring our research to a higher level by understanding our ancestors’ lives in the context of the world around them. It is common to hear others suggest going out on the web to find events to add to our timelines. How do you choose what to add? What timeline do you look at? Our ancestors made changes for a reason, and this book provides us with matter and timelines about the reasons motivating those changes.

Among the many things discussed in the book was the role of Europeans coming to the US to farm. Since my ancestors lived in cities, I had not previously investigated this topic in depth. Railroads received large grants of land from the federal government, and so set up a system for Europeans to purchase land and then travel to occupy it. More than transporting people, they had actually streamlined the process of coming to the United States.

The chapter about oral histories impressed me. It was a succinct but rich outline of how to conduct them. The author’s motivating words say it best: “Oral history can put the soul and flesh on the skeleton of a pedigree chart.” This quote applies to the intent of the whole book.

This book is a good starting place with historical timelines relevant to genealogical research. This book contains a timeline for the history of each state and the District of Columbia, from its first beginnings to the 1940s for most locations. The book expands to discuss other geographical regions around the world.

The chapters “Why Did They Leave,” “How Did They Go” and “Coming To America” were thought provoking. Brief case studies show the role of timelines in interpreting an ancestor’s life when viewing it in the context of a bigger history, or if too many events have been attributed to one individual. At all times we are reminded of the interconnection between different counties, and the fluid borders between countries, states and counties.

The 2016 Addendum by Denise Larsen is a separate part of the book, positioned after the original book’s timeline, bibliography and index. The Addendum covers the context and events of the early 20th century in the US up to post-WWII, followed by a timeline about fashion and entertainment.

I read this book cover-to-cover, and can recommend that approach to open a reader’s horizons. However, this book is structured so that it can be used by the chapter applicable to your current research question.

My recommendation is to have maps nearby when reading or using this book. Online maps would be a perfect accompaniment to use when comprehending the interactions between locations and their populations.

“History for Genealogists” provides key historical context and usable information for your research. It also lives up to its subtitle of “Using Chronological Time Lines to Find and Understand Your Ancestors.” As well as being a resource to support your research, it is a solid foundation to jump off from to dig deeper into the more detailed history of a place and time that you are researching. I can see this book being used to complement locality research, by introducing time and events to your research.

The book is available at Genealogical.com and other booksellers.

Note: A review copy was provided by the publisher

This blog post is copyright ©2022 by Margaret M. McMahon, Teaching & Training Co., LLC. All rights reserved. No part of this post may be reproduced in any manner whatsoever without written permission, except in the case of brief quotations in articles and reviews. All copyrights and trademarks mentioned herein are the possession of their respective owners and the author makes no claims of ownership by mention of the products that contain these marks.

New Offering: Member Survey plus Class

Blog Header - “Creating an Individualized Genealogical Educational Plan.”

We offer a new service!

Have you wanted to learn more about your society members current interests? We can help.

When booking the presentation “Creating an Individualized Genealogical Educational Plan,” We can work with your society to help you learn more about your members’ current interests.

Here’s what is included with the speaker’s fee:

  • Work with your designated society member to create a customized survey
  • Provide a link for society members to use
  • Provide a brief report, with suggestions about how to use the results

Here is a review from the Baltimore County Genealogical Society:

As always, our society meeting attendance is higher with any of Dr. McMahon’s presentations.  It is a reflection of how valuable the information she has to offer is in expanding ancestral research. Her latest guide, Creating an Individualized Genealogical Education Plan provides an introspective approach to research that is deeper than the traditional “to do” list.  With many societies and genealogy groups stepping up their outreach with more online content and lectures via zoom, the Educational Plan presentation is practical and essential for targeting your research goals. 

Contact us to book your society’s survey and talk!