Learning About AI

Blog Post Banner - Learning About AI

Have you wanted to learn more about Artificial Intelligence?

Recently I gave a talk about Using AI for Genealogy, and shared some of my sources for education about AI. You can find out more about the talk  and if you want to learn from a genealogist who is a professor with a Ph.D. in Computer Science and Engineering, you might consider having your group book it.

There are many resources available to learn how to get started with generative AI, and some ideas for using it in genealogy. Among them are posts on this blog .

NOTE: DO NOT put any sensitive information into any AI tool.

The first recommendation is a paper that you can download. Genealogists need to learn about prompt engineering to use AI tools effectively. A great paper that offers a catalog of prompt patterns is a good place to start. These prompts presented in the paper are general in nature, but they can be applied to genealogy. The paper is “A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT.” It is an academic paper, and they can look intimidating, but they do not have to be! You can copy-and-paste parts of it into ChatGPT (or another text-to-text AI tool) and ask it to create a summary or explain it. My specific advice is to look at the tables labeled “Contextual Statements” to learn the patterns. I actually copied text from these tables, and combined information about the patterns offered by a generative AI to create a personal cheat sheet.

If you want to dig deeper and understand more, you may want to look beyond genealogical applications and learn about the technology. Understanding what the tools are and how they work might help you be more comfortable with using them and applying them in genealogy.

In its “AI Ready” commitment, Amazon Web Services (AWS) has set a goal to train 2 million people. As part of this commitment, AWS offers free courses about AI. These courses are written for all different levels of knowledge. From the AWS webpage describing the commitment, scroll down to the section “Courses for business and nontechnical audiences” where you can follow the links to register for the courses. A free account is needed. “Introduction to Generative Artificial Intelligence” is a good starting point, with simple and understandable explanations and no formal assessments. (That means no tests!)

AWS Courses for business and nontechnical audiences list

If you want to learn in a more structured way, there are online classes available. These are more formal, with structured lessons and activities that you have to turn in. That should not intimidate you, as these courses are designed for beginners who have little or no technical background. The beginner aspect should not dissuade people with more experience, as there is always something to learn in courses like these. I enjoyed the first and simplest course on the list, as well as courses in the series “Generative AU Learning Planning for Decision Makers” and the “Foundations of Prompt Engineering.”

Coursera

Coursera offers “Prompt Engineering for ChatGPT.” It is taught by the professor who wrote the article that I recommended. If you take this course for free, be sure to allocate time for it each week because the course materials are only available to paying participants after the end of the class. I found this to be a very enjoyable course, with the assignments being as simple as using ChatGPT 3.5 to try the patterns from lessons and submitting the prompt and response via a text box.

Coursera Prompt Engineering for ChatGPT image

GALE Courses

Another course that I have begun is “Introduction to Artificial Intelligence” on GALE Courses (formerly known as Learn4Life). GALE Courses may be available from your local library website, or from a neighboring county for free by using a library card they issue. For those in Anne Arundel County, Maryland, you will find GALE Courses offered by the Howard County Public Library, so get a library card from them. (At Howard County Public Library this is the link to the description https://education.gale.com/l-howardmain/online-courses/introduction-to-artificial-intelligence/?tab=detail). These are 6-week courses, organized into two lessons per week, and there are discussion boards and ungraded quizzes. In order to obtain a certificate for this course, you have to pass a final containing multiple choice questions that appears to be based on the ungraded, optional quizzes for each lesson. Check on your library’s website for an alphabetical listing of online resources or contact a librarian.

Howard County Library System GALE Courses Introduction to Artificial Intelligence course

This is a technology course about the science of how a computer can perform tasks that usually require human intelligence. It covers the forms of AI, how AIs learn, AI applications and ethics. It will not be something that you can use immediately for genealogy, but it will give a foundation as we go forward seeing more and more AIs.

No matter how you decide to learn, keep learning!

Let me know how you are learning about AI.

NOTE: I have no affiliation with any of the courses or services in this post.

Shake That Family Tree Event

Blog Post Banner for "Shake That Family Tree" Event

On 14 October 2023 the Howard County Genealogical and Historical Societies and the Howard County Public Library System organized the “Shake That Family Tree” event at the Miller Library in Ellicott City, MD. This was intended as a beginner-level event, but there was certainly great information for all the genealogists who attended.

I was delighted to have been invited to host a table about military research and my books. All day long there were interesting talks, and a room full of tables with representatives from local history and genealogical societies who were eager to share information about what they were doing and offer help to genealogists at all levels.

Many of the people who stopped by did not know if they had ancestors who served in WWI. The best place to check is the FamilySearch database for the VA Master Index, which has been covered in this updated blog’s post “Did My Ancestor Serve in WWI?” to reflect the changed search interface.

The Howard County Genealogical and Historical Societies and the Miller Library hosted a wonderful event, and it is certainly my hope that this might become an annual event!

"Shake That Family Tree" tables

Back to School: Genealogy Style

Blog Post Cover: Back to school genealogy style

When autumn comes, we think of going back to school. Genealogists are always learning, and webinars are a great way to do that. Presentations give us information, introduce us to new techniques or provide a new way of looking at our research. These resources in the blog post offer great classes and more.


The Genealogy Center at the Allen County Public Library hosts the Periodical Source Index (PERSI) and has many recorded webinars available on its YouTube Channel. You can even send them an email if you have a question.

ACPL Genealogy Center Home Page

The Midwest Genealogy Center at Mid-Continent Public Library offers a variety of resources. You can even request an Appointment with a Genealogy Consultant. Be sure to check out their upcoming and register for them on their events page. You can view their recorded talks on their YouTube Channel.

The Midwest Genealogy Center at Mid-Continent Public Library Home Page

BYU’s Harold B. Lee Library offers new webinars every week. They also offer a large library of recorded webinars.

BYU Library Webinar Page

Of course for the more adventurous, consider a class at your local community college.

When everyone around is going back to school, join them!

Artificial Intelligence: Google Bard vs. ChatGPT

Blog Post Banner: Google Bard vs. ChatGPT

It is inevitable that similar AI tools will be compared. This blog post takes a look at comparing OpenAI’s ChatGPT and Google Bard.

When Google Bard was asked how it was different from ChatGPT, it answered that its training data contained images, that it could access the internet, and that it was a more general AI rather than a text-generating AI. Bard also told me that while ChatGPT was creative, it was more creative.

Google Bard has an interesting approach to answering prompts. Unlike ChatGPT, its training and knowledge does not end at 2021. It can go out and get content from the web to generate its answers.

Like ChatGPT, Google Bard can also hallucinate and authoritatively state inaccurate information. The “Google it” button found under a Bard response can help comfort you that the answer is not a hallucination.

These two AI tools do have some differences.

ChatGPT offers multiple conversations so that a user’s conversations can stay organized. It also has the ability to present previous conversations and pick up where it left off. Google Bard holds one conversation. It allows users to return to their previous prompts by selecting their Bard Activity from its menu. The responses to the prompts have to be selected at the time, and cannot be recalled.

Bard responses can contain images returned from the web. The responses, without any images, can be uploaded to a document in the user’s Google Drive or into the text of a gmail. ChatGPT responses are text-only, and need to be copied and pasted from the browser into a document. (Note: browser addons or plugins to capture responses are not discussed in this blog post. Any code you add to your browser this way should be researched thoroughly!)

Given that the content generated by Bard is not owned by the user, I will probably use ChatGPT for generating text and explaining concepts. (Of course, what ChatGPT generates should be verified!) I do prefer that responses are saved, and that different chats can be active.

It is possible that I might lean on Google Bard more for research questions, and will ask Bard for its sources. Undoubtedly the “Google it” button will be used in those efforts.

Of course, we can expect that Google Bard and other AI tools will continue to evolve at a rapid pace.

Based on the versions available at the time of this blog post being written, below is a table comparing features of OpenAI’s ChatGPT and Google Bard that I found notable.

Genealogy and AI: Google Bard

Blog Post banner - Genealogy and AI: Google Bard

Although Google Bard states that it removes personally identifiable information when using conversations to improve the model, DO NOT INCLUDE PERSONAL INFORMATION IN YOUR CHATS.

This week I spent some time working with Google’s challenger to OpenAI’s ChatGPT. Google Bard is a Language Model for Dialogue Applications (LaMDA), and I was working on the day that Bard began to bring images from Google Search into its results. Bard advertises that it helps you plan, solves complicated problems and supports your creative process. When it quotes content, it will cite where it found the material, or the computer code repository that was used.  

In this article you will notice that I do include screenshots of answers, as I did for ChatGPT. That is because Google owns the generated content. If anyone wants to publish the content, they have to get permission from Google; at best Google and the user might share the rights to material. (In comparison, when you generate content in ChatGPT you own the rights to that content.)

Google Bard can be found at: https://bard.google.com.

I also recommend viewing the Frequently Asked Questions.

You can select to save your Bard Activity, but even if that option is deselected, your activity is saved for up to 48 hours so that the feedback can be used. You can also select whether or not your activity is auto-deleted after 18 months. In my Bard Activity I could see the prompts that were given, but could not retrieve Bard’s responses, so remember to SAVE the responses. At the time of your session you can copy-and-paste them into a document or use the upload function described below.  

The menu for Google Bard appears on the left hand side of the window.

At the bottom of each response were buttons to like, dislike, upload the response, or “Google it”

bottom of Google Bard response

When you select to upload a response, you have options to upload it your Google Drive or draft an email in Gmail. The uploaded response will include text but NOT include the images you see in Bard.

Google Bard upload options

I decided to ask some genealogically oriented questions, as I had done when testing ChatGPT.

“What are good resources for genealogy?” The answer to this prompt was a reasonable list of record databases, websites and societies.

“How can you help me with my genealogy?” This prompt was answered with ideas about genealogy research, finding resources, interpreting data and help create a family tree.

“How did someone travel from New York City to Newport, RI, in 1850?” Google Bard answered that the travel would have been by stage coach and steam ship, presenting me with images and data from the web along with its answer.

I asked what its sources were for this information, and it was listed sources. It also shared that it used its own knowledge to fill in the gaps.

When I asked how to find the source material, it provided links to places to buy a physical copy but did not provide a link to the source on Google Books. It did tell me that the book was available to view and download from Google Books. That resource was actually a very interesting discussion on travel to Newport between 1800 and 1850.

Since the user does not own the generated content, and the web is used to help answer prompts, this AI tool may be more useful to me as a research assistant.

You can always give it a try!

ChatGPT and GEDCOM Files

Blog Post Banner ChatGPT and GEDCOM Files

Before I was a professor, I was a flight test engineer. My love of testing systems goes back to my early days working in a lab during college. My particular gift was always find a way to “break” hardware or software through use. My desire to investigate the use of ChatGPT in genealogy has definitely coincided with my enjoyment of testing. In this blog post, I take a look at what ChatGPT knows about GEDCOMs, how it builds one and how it can create a narrative when given an individual’s data formatted in a GEDCOM.

The technical jargon in this paragraph is available for those who want a slightly deeper understanding. In computer science, data can be grouped together in meaningful representation of things that live in the real world. A data structure is a way to group fields in a specific order for a program to input data, manipulate it, and output it. The way that genealogical data is formatted and shared is the GEnealogical Data COMmunication (GEDCOM) standard.

GEDCOM (Genealogical Data Communication) is a file format used to exchange genealogical data between different genealogy software programs. It is a standard format for saving family tree data, and it allows users to transfer their family tree data from one program to another.

GEDCOM files are saved with the extension “.ged” and are made up of text-based data that includes information about individuals, families, and events such as births, marriages, and deaths. The data is organized in a hierarchical format, with each record containing information about a single individual or event.

GEDCOM files can be used to create family trees, research family history, and share information with other genealogists. They are widely used by genealogy software programs and online genealogy databases. For example, you can export a GEDCOM from your family tree program or download a GEDCOM from Ancestry.com.

NOTE: DO NOT ENTER PRIVATE OR SENSITIVE DATA INTO ChatGPT. Your data is used for training, and is reviewed by OpenAI to verify that content complies with their policies and safety requirements. They may be used for training purposes.

I asked ChatGPT what it knew about GEDCOMs with prompts: What is a GEDCOM file? What is the GEDCOM standard? What are the fields in the GEDCOM standard?

ChatGPT answered reasonably well, except that it confidently stated the latest version of GEDCOM being used was 5.5.1. This is understandable because ChatGPT’s training ended in 2021. (As of the writing of this blog post, the current version  is 7.0. For more information see the FamilySearch wiki entry for GEDCOM.)

Knowing that ChatGPT was using GEDCOM 5.5.1 was not a problem for these experiments.

Creating a GEDCOM

I would not choose to build a GEDCOM in this manner, but I could see how entering a narrative about ancestors into the prompt and let ChatGPT build the relationships from written language could be helpful. Beginning a family tree or adding a separate branch could be done by ChatGPT, then imported into a family tree program.

Investigating how effective ChatGPT was at creating a simple GEDCOM, I asked it to:

Create a GEDCOM file for James Charles McMahon, born 10 Oct 1920, father Joseph Francis McMahon, mother was Ella Small.

GEDCOM file from ChatGPT

ChatGPT extracted the information from my request and filled in the fields. I only asked for a simple GEDCOM file, and had been very specific in what details to include. ChatGPT did fine with this request. You can see the button to copy the code so that I could store it in a file with a .ged extension that would be usable by a family tree program that conformed to the GEDCOM specification. In fact, it even warned me:

ChatGPT warning

By the way, the clipboard next to the response lets a user copy the whole response so that a user can paste the response into the document of their choice. When clicked, the clipboard turns into a checkmark momentarily, then returns to being a clipboard. The thumbs up and thumbs down allow a user to provide additional feedback. If the feedback is thumbs down, another version of the reply is generated and a user has the opportunity to share whether the new one or previous response is better, or if they were the same. Giving feedback is always optional.

ChatGPT feedback

NOTE: This is a representation of an individual in a GEDCOM format and is not a file that can be directly imported into a family tree program. The header and footer information is not present, however, I could give ChatGPT that information and ask it to update the GEDCOM to include it.

I tried again with a new prompt that contained more details about the person’s life:

Create a GEDCOM file for James Charles McMahon, born 10 Oct 1920 in Brooklyn, Kings County, NY, father Joseph Francis McMahon, mother was Ella Small. James Charles McMahon died on 28 Nov 1987 in New York, New York, New York, US.

The response was filed the additional data correctly into the GEDCOM:

Updated GEDCOM file from ChatGPT

Using the GEDCOM as input to a family tree program

I asked for the file in a couple of different ways, but ChatGPT gave me only the section of the file for an individual. Rootsmagic had problems with importing this and creating a family tree, but after a little experimentation, I found that was because the was missing the header and trailer information. This was quickly remedied by editing the file.

It was interesting how the placeholder text for the birth and date information for the individual’s mother and father was inserted into the GEDCOM to be interpreted by the program. Of course, this could be fixed later in the conversation by asking for an updated GEDCOM with this information. As the chat went on, I also gave ChatGPT their marriage information and asked it to update the GEDCOM.

Creating a narrative from a GEDCOM

For my next experiment, I copied the second GEDCOM that ChatGPT had generated and fed it back into the prompt, asking:

Write a narrative for James Charles McMahon given his GEDCOM information:

0 @I1@ INDI

1 NAME James Charles /McMahon/

1 SEX M

[the rest of the file is not shown for brevity]

ChatGPT had learned details from our previous conversation, and inserted details about the individual learned from previous GEDCOMS. Starting the request in a new conversation brought its knowledge about the individual back to the nothing and the story included only the information from the prompt.

Of course, ChatGPT only uses what I told it. In reality, this individual was not an only child. Interestingly, after it writes that he grew up in a family of three, with himself and his parents, he was depicted as a beloved brother. This is due to large language models relying on their training to build the next part of their output.

Next, I checked if the format of the input mattered to ChatGPT, and made the GEDCOM data into one continuous stream, rather than distinct lines, in my prompt:

Write a narrative for James Charles McMahon given his GEDCOM information:

0 @I1@ INDI 1 NAME James Charles /McMahon/ 1 SEX M [the rest of the file is not shown for brevity]

ChatGPT did not need lines of the file to be formatted; it interpreted the data correctly then wrote a narrative. (This is also true when entering data from a table into the prompt.) Without information about the individual’s parents death, the model built the text that they survived him, and in the same sentence that they were deceased before his passing. ChatGPT can appear to loose its mind, so always proofread any output before using it.

Next, I carved out the lines for this individual from a GEDCOM that had been exported from a family tree program, complete with source citations embedded in the code. This text was used it as input to ChatGPT, and I asked it again to write a narrative from the GEDCOM. ChatGPT was successful in capturing the details it knew. It also created some generalizations like: “Throughout his life, James was a beloved member of his family and community.” It also added context without being prompted: “Though we don’t have much information about his specific experiences, we can imagine that he lived through many significant moments in history, including World War II and the civil rights movement.”

The tales that ChatGPT weaves from a user’s input can be a combination of technically accurate and fanciful. The facts that are input can be woven into a smoother and grammatically correct output. Any additional text that ChatGPT generates or additional contextual content it adds does need to be verified. ChatGPT is a generative language model that creates sentences without judgement, and those facts are presented as correct.  (Always check the details that ChatGPT adds, as it may “hallucinate”!)

ChatGPT generates text with an optimistic tone. The tales do all seem to end on a positive note, reminding me of appending “and a good time was had by all” to a story.

As with any tool, how we used the output matters. ChatGPT has the flexibility to regenerate a response to our prompt, and we have the ability to edit the text as we see fit. This tool could be helpful to a genealogist trying to get started on that family history they have been planning to write. ChatGPT can help someone get around a writer’s block by providing a starting place. It can also proofread what you generate. All you have to do is ask.

It was instructive to see how the narrative text that was put into the prompt was translated into lines in the GEDCOM file. I enjoyed peaking under the hood of the implementation that is at the heart of family tree programs.

Let me know how you do, and send along any questions.

ChatGPT May 3 Version was used for these experiments. Expect ChatGPT to change over time as the technology matures.

Please check out other posts about ChatGPT and Artificial Intelligence:

5 Ways to Use ChatGPT to Research an Ancestor

Getting Started with ChatGPT

Artificial Intelligence and Genealogy