Microsoft Copilot vs ChatGPT – which one works best?

By Siobhan King |

February 16, 2024

How many animals did Moses take on the ark?

Comparing how would ChatGPT and Microsoft Copilot answer this question

You could be forgiven for saying “two of each species were taken on the Ark” which is a common response to this question. The correct answer of course is “none” because Noah built the Ark, while Moses liberated the Israelites from Egypt.

The so-called “Moses Illusion” highlights “shallow semantic processing,” which causes most of us to overlook errors, perhaps because our attention is drawn elsewhere, such as solving the problem of how many animals were on Noah’s Ark.

At Metataxis, we have been exploring the uses for AI, considering the practical information management applications as well as the information architecture implications. When I learned about the Moses Illusion, I just had to see whether common AI tools such as ChatGPT and Microsoft Copilot would fall for it. Of course, the answer is no, though each of the tools handled the query quite differently.

Let’s see how they compare:

ChatGPT

ChatGPT was not take in by the Moses Illusion and in fact, asked me which of the following two answers I would prefer:

Answer 1: “Moses is not associated with the story of the ark. The biblical account of Noah and the ark is found in the Book of Genesis, where Noah, not Moses, is the central figure. According to the Bible, God instructed Noah to build an ark to save himself, his family, and pairs of every kind of land-dwelling animal from the impending flood. The specific number of animals taken onto the ark is not specified in the Genesis account.”

or…

Answer 2: “Moses did not take any animals on the ark. The story of Noah’s Ark is found in the Bible, specifically in the book of Genesis. According to the biblical account, Noah, not Moses, was instructed by God to build an ark to save himself, his family, and two of every kind of animal from a global flood. Moses is a different biblical figure, known for leading the Israelites out of Egypt and receiving the Ten Commandments on Mount Sinai.”

What is interesting about these responses is that they try to achieve two things:

1) Answer the question as it was intended; albeit in typical ChatGPT non-answer form: “that information is not available to me, Dave”

2) Highlight the cognitive oversight inherent in the question and provide disambiguation between Moses and Noah.

So, in terms of performance there is some impressive predictive power, but the source material available to answer the question perhaps is not as great. I also must recognise that no matter how many ways I ask the question, ChatGPT has the good sense to repeat that the question cannot be definitively answered. (“It’s ineffable.”) This is fair.

Were I interested in actual attempts to calculate the number of animals in the Ark, ChatGPT isn’t going to get into the specifics. ChatGPT isn’t going to suddenly start quoting James Ussher.

The tool does try to be even-handed, pointing out whether views are theological or scientific, but I can’t help feeling a bit like it doesn’t think much of Creationism.

When you ask “Has anyone tried to calculate how many animals were on the ark?“, you receive the more nuanced but equally dismissive statement which starts with: “The question of how many animals were on Noah’s Ark has been a topic of discussion and debate for centuries. Various scholars, theologians, and researchers have attempted to calculate the number of species or animals involved in the biblical account of the flood. However, it’s important to note that the Bible provides limited details on the exact number or species of animals taken onto the ark.”

ChatGPT still requires a series of prompts from me to get a fuller answer, say if I were interested more in the attempts people have made to answer the question of the animals in the Ark, or about the Moses Illusion itself. This means that the onus is still on the user to ask the right question and most significantly, not to give up after asking the first question. I had to formulate five questions to get the full answer I needed.

It’s not enough that it knows what I meant to ask. I still need to be more inquisitive to get my best answer. Through its often phlegmatic responses, ChatGPT has prompted me to provide more specific and thoughtful questions. Is it teaching me or am I teaching it?

Microsoft Copilot

Having completed this experiment in ChatGPT, I was now keen to test how Copilot for the web dealt with exactly the same query.

Not only was Microsoft Copilot not taken in by the Moses Illusion, it also saw me coming:

Copilot replied: “The question of how many animals Moses took on the ark is an intriguing one, but it actually involves a delightful cognitive illusion known as the Moses Illusion.”

Allow me to explain: When asked, “How many animals of each kind did Moses take on the Ark?” approximately 50% of people confidently respond with “two” instead of the correct answer, which is “none.” Despite knowing that it was Noah, not Moses, who took animals onto the Ark, the illusion persists. The similarity between the names Noah and Moses plays a significant role in this cognitive mix-up. Their semantic and phonological resemblance—both being two-syllable names with similar stress patterns—leads to this intriguing error.”

To add insult to injury, this explanation of the Moses Illusion is footnoted with a link to a reputable source: Oxford Reference. (Moses illusion – Oxford Reference) Immediately I can see how Copilot outperforms ChatGPT by being up front with its receipts, giving me the opportunity to assess the quality of the sources for its responses.

And that’s not all. Of course, Copilot answers the question correctly, again citing its sources: “The correct answer: In reality, it was Noah who followed God’s instructions and took two of every kind of animal onto the Ark. This monumental task ensured the survival of various species during the great flood. So, Moses didn’t take any animals on the Ark; it was all Noah’s doing”

Unlike ChatGPT, Copilot for the web is willing to draw on theological sources to answer questions. These sources are cited so that it is left up to the individual to use judgement to evaluate answers. What is noticeable is that Copilot seems to accept the story of Noah’s Ark uncritically without any of the caveats about worldviews given to us by ChatGPT.

The icing on the cake though, is that Copilot goes on, unbidden to attempt to answer the actual question: “Fun Fact: If we were to estimate the number of animals based on the Moses illusion, we’d have quite a menagerie! But let’s stick to the facts: Noah’s Ark carried a diverse collection of animals, both clean and unclean. The total number of individuals could vary from 254 (127 pairs) to 889 (127 pairs multiplied by 7), depending on the species.”

The source for this calculation you might ask? I don’t know (user error), and here’s where things fall down.

Having carried out my Copilot query, copied and pasted the text of the answer, I then got interrupted, as commonly happens in our current work environments. On returning to my screen a few hours later, my query and the Copilot response history was gone. I still had the text of the answer because I thought to copy it, but apart from the bare facts of the Moses Illusion and the story of Noah and the Ark, I was unable to replicate the response I got to my first question.

Copilot for the web only operates in real time and does not store your queries. While I suppose this is better for security, this isn’t great for referring back. To my chagrin, I also lost the linked sources when I pasted my answer into Notepad, but I can confidently say that you can paste results into Word and carry over references. This ability to check back through sources is especially important when you are asking questions about contested world views, so I am particularly sore about losing the results of my first query.

The lack of consistency of answers is clearly due to the system pulling from different sources on the web on each occasion. What I did find was that on trying to replicate my first result, I consistently got answers that pulled from various Creationist websites, which shouldn’t be surprising because it’s not really a question the scientific community is worried about.

Which AI chatbot works best

Overall, both AI systems handled the Moses Illusion even better than I expected.

Comparing the tools has been useful to understand how they each deal with sources used to answer questions, but also seems to highlight some inherent bias. Perhaps ChatGPT is more equivocal because its sources are less transparent, but it does seem to gently be telling us the Bible is not to be taken literally.

Does Copilot have a fundamentalist Christian bias? I’d like to give Copilot the benefit of the doubt and say that it is egalitarian in its presentation of answers. But the fact that the tool immediately picked up on the Moses Illusion and explained it to me, but didn’t provide commentary of theological and scientific views about the Ark, makes me hesitant to grant this easily.

While both tools do the same thing, I’m learning that they do them just differently enough to not assume they are interchangeable.

Our team has been testing Copilot for Microsoft 365 and learning more about the power of generative AI. Take a moment to check out their initial observations and intriguing findings

Find out more about Copilot for your organisation

Get in touch