General: Is AI Learning from our old poor quality code?

General: Is AI Learning from our old poor quality code?

A while back, I wrote about my experiences with trying to see if ChatGPT could answer questions as though it was a baseball umpire. It did quite poorly, and I came to the conclusion that the problem was what it learned from. Curiously, it got hard questions correct and easy questions wrong. It seemed to me that people who knew what they were talking about were the only ones to ever discuss the hard questions, while everyone had an opinion (often wrong) about the easy questions.

But the LLM has no idea who to learn from.

I was interviewing a friend the other day and he pointed out that the same thing applies to code that we write.

If I look back at code that I wrote a year ago, or two years ago, or 15 years ago, I’d probably cringe, as I’ve learned so much more about what to do right during that time. But if an LLM is trained on all that code, from all those years, that’s pretty scary.

Is it little wonder that it’s likely to produce sub-standard code.

Once again, the challenge is about how it would know the difference.

Mixing up dialects

A related, but interesting issue relates to dialects of code. With the baseball umpiring example, I noted that sometimes it used softball rules to answer baseball questions. It hadn’t worked out what was different.

And a similar thing applies to code. I recently saw this AI-produced code for SQL Server:

 SELECT CONCAT(c.FirstName, ' ', c.LastName) AS CustomerName, 
        o.SalesOrderID, o.OrderDate
 FROM SalesLT.Customer c
 INNER JOIN dbo.GetCustomerOrders(c.CustomerID) o ON 1 = 1
 WHERE c.CustomerID = 1;

The problem is, that it’s not valid SQL Server code. I think it would work in DB2, but not in most other dialects. In SQL Server, that INNER JOIN should be a CROSS APPLY.

So I asked ChatGPT what it thought. It told me it should run. I then pointed out that it would not run and asked it why it incorrectly thought it would. What it answered was:

Why I thought it might work: I was pattern-matching from SQL dialects and products where a function call in the right side of a join can behave laterally, and I overgeneralized that to SQL Server. In other words, I mentally collapsed these two ideas into one: “SQL Server supports joining to TVFs” “therefore this correlated INNER JOIN … ON 1=1 should work” That second step was wrong. So the real mistake was not checking the correlated versus uncorrelated distinction before answering. You were right to push back.

Fascinating.

2026-04-26