“AI” vs. the Web

“AI” vs. the Web

A concise, citation-rich memo on “AI” claims from a moderate Web software perspective.

Last updated: 14 July 2025
Authored by Lincoln Russell, 9 May 2025, CC BY-SA

What do we mean by “AI” today?

  • Machine learning (ML) uses statistics-based algorithms to recognize patterns and generalize to unseen data, which allows it to perform tasks without specific instructions.
    • ML is a branch of computational learning theory, of which a key tenant is “Probably Approximately Correct” (PAC), which says the goal, given finite data and an uncertain future, is to reach a high probability result, not a correct one.
  • A large language model (LLM) is a type of machine learning optimized for natural language processing that uses self-supervised learning on huge datasets.
    • “LLM” entered the lexicon with OpenAI’s GPT-3 in 2020 (more broadly in 2022). 
    • A prompt is input to an LLM. Prompts are added at multiple levels: the service provider (e.g. OpenAI), the app (e.g. a bot provider), and finally the end user.
  • Retrieval-augmented generation (RAG) is way of feeding specific content into a pre-trained LLM to give it domain-specific information, or information not in its training. It also allows the LLM to cite its sources in the response. However, it adds risks.
    • Functionally, RAG searches content before prompting the LLM with the user prompt combined with the documents retrieved by the search.
  • Generation augmented retrieval (GAR) flips the script: Rather than outputting conversational text based on additional documents, you leverage the LLM to enhance & add context to your “normal” search results.

AI is technically a broader term than ML (a subset of AI), but today we’re talking about LLMs, which are a subset of ML. In other words, “AI” is being used as a marketing term for “LLM” (the technical term). This matters because there is no intelligence / reasoning behind LLMs, but rather a statistical model for achieving PAC (“Probably Approximately Correct”) results.

Buzzwords explained

  • Generative AI” — Marketing term for LLM output (vs other ML-based tech).
  • Hallucination” — Marketing term for when PAC (“Probably Approximately Correct”) isn’t correct at all (a feature of the technology, not something that can be eliminated).
  • Reasoning” — Marketing term for an LLM reprocessing its own output in multiple passes / steps in an attempt to reduce incidents of “hallucination”.
  • Prompt Engineering” — Marketing term to associate LLM prompt writing with software engineering. (By contrast, engineering has deterministic outcomes.)
  • Agentic” — Integrating APIs with an LLM to translate PAC output into deterministic actions in other systems. Much of this existed prior to LLMs using other forms of ML, such as sentiment analysis and auto-escalations of support tickets.

We’re generally in the midst of a game of oneupmanship between LLM vendors to invent new terms and “next generation” features. To assess the landscape today, refer to the LLM Leaderboard or Zapier’s list.

Benefits & strengths of “AI” today

The current generation of LLMs were a turning point for achieving PAC results for:

  • Data trend identification
  • Prose authoring & summarizing
  • Image generation & identification
  • Audio transcription
  • Search enhancement
  • Code authoring & summarizing (in limited capacity)

These tasks are automatable, and enabling non-engineers to achieve PAC results through prompt writing is a key benefit. Benefits lean into content discovery, pattern recognition, blending many inputs, and coping with unpredictable inputs (e.g. natural language).

Due to the rate of LLM releases and relative newness of mass-access, there is still much discovery to be done regarding how the above strengths can be remixed for individual business cases beyond surface-level integrations like chatbots, search, and summarization.

Barriers to realizing benefits

Tactically, the most common limitation in realizing value from LLMs are:

  1. Structured access to data
  2. Having a sufficient corpus of content to make an LLM domain-specific
  3. Structuring complex prompts to get PAC within a desired threshold

Strategically, the primary barriers are:

  1. Unrealistic expectations based in magical and conformity thinking.
  2. Many AI implementations are simply marketing spin or fraud designed to generate positive buzz.
  3. The future stability of LLMs is uncertain (see below).

These distract from & impede discovery of useful implementations and threaten their future viability.

Known Risks & Costs: Product / Marketing

Known Risks & Costs: Engineering / Tooling

Analysis of risks & costs

“Genuine revolutions create genuinely irrational environments” and the best strategy under those circumstances, for both skeptics and true believers, is pessimistic adoption: “only adopt specific implementations that are well-tested and have a demonstrated effectiveness.” 

While we do not know if LLMs are the future of software engineering, we lack data supporting a significant positive impact from it in the present.

Writing process vs software engineering process

For internal use cases involving natural language (e.g. emails, meetings, presentations) LLMs are a significant advancement with trivially constrained risks in the hands of capable users.

For use cases involving coding language, benefits are often more muted (or non-existent) while the risks & costs are dramatically higher. LLMs best assist with code authoring (the act of typing code), which is typically not a noteworthy bottleneck in the software development lifecycle.

Understanding how work impacts a broader system and having sufficient context to consistently make good decisions is a much larger concern and far more time-consuming than typing code. Using an LLM to author your code may impede or short-circuit this process entirely, creating an emphasis on short-term delivery capacity over long-term sustainability and cumulative capacity, creating an environment where capacity erodes over time as your team effectively suffers “brain drain” without any employee turnover. 

While an email is likely consumed once, code must be accurately re-consumed many times over its lifespan by engineers as well as provide flexibility for unforeseen needs which requires much deeper values judgements than LLMs are capable of performing. LLMs are not effective at code review, and LLM-generated code can add friction to the process by increasing the time required for review & communication about the code when neither party truly understands it.

Alternative ML-based strategies

Many use cases for applying LLMs to code were already possible with other ML technologies. While LLMs were a big leap for natural language enablement, they were a much smaller iteration for engineering enablement. It’s likely there are more appropriate existing tools for any given challenge, which makes it important to not presuppose AI is the solution to any of them. Data analysis and quality assurance are key domains where ML tools already proliferate.

Future stability of LLMs

There are a number of factors threatening the stability and success of LLMs (and the world on which they rely), which we can put in three buckets:

Training LLMs is prohibitively resource-intense

Diminishing returns on LLM investments

Lack of visibility into quality & methods