The Rapid Progress of AI and Forthcoming Plateau

· jamz's blog


It's worth noting this post focuses specifically on LLMs.

Disclaimer #

I'm not an AI expert by any means. I'm just a part-time software engineer finishing my Bachelor's in computer science who's watched this unfold like the rest of you, but perhaps been more exposed to the negatives. I'd just like to share what seems to be my minority take on the progression of AI.

This is also my first ever (apart from a hello world) blog post, which I'd appreciate feedback on, as I'm not a good writer, but wanted to give my two cents.

The Hype #

It began as whisperings from a few friends that managed to get early access to GPT-3 in ~2021, the claims they made about its capabilities and usefulness were difficult to believe.

Then GPT-3.5 was released in 2022. I was in my first semester of university, and while my friends, along with myself, were shocked; we never could have predicted the rapid rise to popularity that ensued.

Of course, GPT-4, GPT4.5, Bard, Gemini, Claude, and more all followed creating the landscape that exists today.

Present #

Here we are. Three years on from the initial disruption; I've been hearing since then how these models would eventually take our jobs, and for a long time I said wait and see. But as these models progressed:

GPT-3.5 (175 billion parameters) -> GPT-4 (~1.8 trillion parameters) -> GPT-4.5 (~12 trillion parameters) Bard/Gemini (1.56 trillion paramters) -> Figures for newer models unavailable Claude-2 (~130 billion paramters) -> Claude-3-Opus (~2 trillion paramters) -> Claude-3.7 (? parameters)

Some of the above figures are estimates where official figures are not available, and of course the number of parameters is not the whole story. Different models are more effective with fewer parameters (see Mistral models), models have different sized context windows, and other quirks.

The point is, by and large, the solution for some time now has been to up the parameter count by an order of magnitude and hope this improves the model. This is an enormous oversimplification, but happens as part of the quest to improve these models nonetheless.

The Plateau #

Unfathomable quantities of resources have been and continue to be spent in training and running these models. They do improve, but thinking back to GPT-3.5, and GPT-4, the improvement of current model outputs compared to the earlier ones is increasingly marginal. It seems logical to me that we've reached diminishing returns with the current GPT (generative pre-trainted transformer) style models in regards to the number of parameters used, which from my understanding all the models currently popular (not only OpenAI) are based upon.

And despite the best efforts of the (no doubt very skilled) engineers working on these models, they face the same problems. Namely hallucinations, lack of consistency, and lack of generalization.

These problems don't mean that the models are not useful, they are in fact incredibly useful, I use them every day, and I believe they make me more efficient while being better at answering questions than traditional search engines.

However, given the aforementioned problems, and how confidently the models respond, even when entirely wrong, they require careful oversight from someone who already knows the answer (or if the answer is feasibly correct); and personally I don't see these issues going away without a significant breakthrough.

The Trap #

Thankfully, I learned to code long before my university course and studied computer science at lower levels, and as such already had a good understanding of the theory (albeit not as in depth as now) before I began. Many do not. Many, if not most, others in my cohort did not know how to code or have a meaningful understanding of the subject, and a smaller (but significant) group of them have used LLMs as a crutch to gain their degree, and from what I gather have little understanding of how to actually code.

This, while entirely their own fault, creates a problem for employers who now will struggle to determine which graduates actually learned the skills required to perform a job and which have little practical understanding having completed their assignments by essentially vibe coding.

Further, many higher-ups now view junior developers as negative value, because perhaps an LLM could do the same or better at a significantly lower cost, I don't believe these LLMs will progress to senior developer level skill, and if this trend continues we will, of course, run out of senior developers as no juniors are trained.

We should also, in my opinion, be mindful even as individuals with existing skills not to lean too heavily on these tools, lest we become over-reliant and become less skilled due to insufficient practice, or even degrade our overall critical thinking by doing less of it.

Conclusion #

The points above largely illustrate where I disagree with many people who are skilled technically and understand how these models work. Many of these people, developers and executives alike, believe our jobs are on the way out, and some that we're on track to superintelligence.

As far as I can tell the rapid progression ends in AI ends here (for now). At some point there will be a breakthrough and we'll take another significant step forward, but I predict for now we'll mostly see progress in how the models are used: for example, Claude code is cool (with careful oversight) and I enjoy AI code review on otherwise solo projects which help me catch my mistakes earlier.

Briefly, on a tanget: what I struggle most with is explaining how LLMs work to non-technical people, friends mostly, who don't seem to believe me when I tell them that LLMs don't have any understanding, don't try to say what's correct (how could they, with no concept of correctness), only what token is most likely to follow the previous one.

I think this is important to remember, specifically that these models can only state what's most convincing based on their dataset, and therefore in many cases what sounds most correct to us, while it often is not.