Posted on

ChatGPT may be a modern marvel of computer engineering and a shockingly good practitioner of the English language — but don’t expect it to actually be correct.

The artificial intelligence language tool seems to get things wrong when it comes to facts on a variety of topics, including history, government finances and pop culture.

Ask ChatGPT 3.5, the current free public version, what the most popular YouTube video of 2010 was, and it says it was “Bed Intruder Song,” an early social media musical remix of a weird news clip, which it said had 62 million views that year. In fact, the Justin Bieber song, “Baby,” walked away with more than 400 million views.

Ask about the relative popularity of baby names, and it stumbles, getting the rankings wrong and sometimes saying a particular name didn’t even crack the top 1,000, when in fact it was hundreds of places higher.

Ask about the length of the wall along the U.S.-Mexico border and ChatGPT gives an answer that’s a decade old and doesn’t include any of the mileage that former President Donald Trump added.

ChatGPT is a language model artificial intelligence, meaning it has trained to engage with users by consuming a massive amount of data, then tries to deliver answers based on that data set.

But at times it seems about as accurate as the know-it-all sitting at the end of the dive bar, confidently spouting out answers with only a passing wave at the truth.

In one frustrating exchange, ChatGPT apologized six times as it tried to answer a question about the location of the 1826 duel between then-Secretary of State Henry Clay and Sen. William Randolph, which took place along the southern side of the Potomac River near the Chain Bridge.

At first, the AI said the duel was in Kentucky, then in Richmond, Virginia, then in Ashland, near Richmond. It then switched north, saying it was in Maryland, just over the line from the District of Columbia. Told that the duel was actually south of the Potomac, ChatGPT gave a succession of three more incorrect answers, never reaching the correct one.

Nathaniel Lovin, senior research associate at the Technology Policy Institute, said trivia isn’t really what language AI models do.

“I think these tools are better used as something you say, ‘Here’s five paragraphs about something, extract this data,’ or ‘rewrite this paragraph to be cleaner,’” he said. “It doesn’t have a real model of the world, so it doesn’t remember all the details of everything. It’s predicting the next of its tokens that it thinks should be the next thing to be said.”

In other words, ChatGPT isn’t going back into its memory banks and trying to spot the right answer. It’s looking at what the user typed and then trying to guess what should come next.

“It has knowledge of things because it’s read the whole internet, basically, but it doesn’t have a source it’s referring to,” Mr. Lovin said.

OpenAI, the creators of ChatGPT, didn’t respond to a request for comment for this report.

Ask ChatGPT itself, and it repeatedly apologizes after being called on what it labeled “errors,” “mistakes” or “any confusion.”

“As an AI language model, I strive to provide accurate and reliable information, but I can make mistakes. I appreciate you bringing this to my attention and giving me the opportunity to correct my errors,” it said after being told of a bungle.

The promise of artificial intelligence is expansive, but so are potential errors — as one unfortunate lawyer found out.

Steven A. Schwartz used the tool to “supplement” his legal research in a case in federal court in southern Florida. ChatGPT ended up fabricating six bogus cases that Mr. Schwartz then cited in his brief as precedent.

Mr. Schwartz said in a legal filing that he now realizes ChatGPT “has revealed itself to be unreliable.” He said he had never used it for legal research before “and therefore was unaware of the possibility that its content could be false.”

The judge is threatening sanctions on Mr. Schwartz and his law firm for submitting the bogus cases. A hearing has been set for June 8 on the matter.

The Times, in its own research, has found ChatGPT to be pretty iffy on legal matters.

At one point ChatGPT says it’s illegal to shout “fire” in a crowded theater. But that is actually not considered good law, ever since the 1969 landmark Supreme Court case Brandenburg v. Ohio.

Or take the “Lemon test,” a formula for gauging church-state entanglement that the Supreme Court laid out in a 1971 case, Lemon v. Kurtzman. ChatGPT says Lemon “is still widely used today” and even cites a 2019 case before the justices, American Legion v. American Humanist Association, where it says the justices “explicitly cite the Lemon test as a standard.”

In fact, the majority in that case specifically said the Lemon test didn’t apply.

Ask ChatGPT what the federal deficit was in 1980, and it spits back a firm declaration that it was $74.97 billion, saying it got its data from the Treasury Department. But that figure is off by more than $1 billion from the real answer: $73.8 billion.

It’s tough to figure out where ChatGPT got its clearly erroneous figure. It doesn’t seem to appear in any news reports, for example.

ChatGPT gets the American death toll in the Vietnam War correct, but bungles the question of what the projected American death toll would be if the U.S. had invaded Japan to try to end World War II.

It says the estimate of American deaths was 46,000 and Japanese casualties could reach between 1.7 million and 4 million. In fact, that 1.7 million to 4 million figure was the War Department’s estimate of American casualties, including up to 800,000 dead.

ChatGPT 4.0, the most current version for which users pay a monthly fee, is somewhat better at accuracy than 3.5. It nails questions about the most-watched 2010 YouTube video, the 1980 federal deficit, the “fire” in a crowded theater test and a query about the original 12 amendments proposed to the Constitution by Congress in 1789.

But it still bungles the Lemon test question, the Clay-Randolph duel location and a question about MTV’s top video of 1996.

That evolution “shows that we’re not near the limit of these systems,” Mr. Lovin said.

He said there’s still the potential for ChatGPT and other language AIs to eventually be superaccurate search engines, but that is still far away.

“Maybe GPT 6 or GPT 7,” he said.

Leave a Reply

Your email address will not be published. Required fields are marked *