≡ Menu

How JP Morgan built 30,000 AI Agents

If you are an AI fast-follower, rather than an AI early-adopter, it may be time to learn from organizations that moved beyond the AI pilot.

It may be time to look past the conflicting stories on AI, AI Agents, and GenAI to focus on what is working. If your organization is in a regulated environment where compliance, auditability, and governance reigns, it may be time to learn how innovative organizations navigated past these AI adoption barriers. While it’s unlikely you have an almost $20B 2026 technology budget like JP Morgan, you can learn from their experimentation with AI beyond software coding.

The conflicting AI stories last month included the warning Something Big is Happening from Matt Shumer that received 84 million views on X.  He advising you to build up your savings and be cautious about adding debt based on your income. A substack post on Citrini Research shook the US stock market by predicting AI will cause massive white collar job layoffs, 10% unemployment and 36% drop in the S&P in 2028.

This narrative is contrast to 2026 researchers finding AI agents fail 70% of the time and 95% of enterprise GenAI projects get zero return. and Gartner predicting over 40% of agentic AI projects will be canceled by end of 2027.

JP Morgan ignored the conflicting AI stories and began their GenAI journey before the release of ChatGPT in November 2022. Derek Waldron, their chief analytics officer, explained how the financial giant achieved large-scale, voluntary employee adoption of AI, GenAI and AI agents. Their experience helped them realize three core strategy principles:

  • Employees should decide AI use cases. AI technology capabilities must be distributed to the employee user base. Employees are best to determine the effectiveneess of AI use cases.
  • Employees should build with AI. There are no two job functions exactly the same. Distribute powerful reusable building blocks and capabilities that enable individuals to build solutions themselves such as AI tools for research, analysis, and document prep.
  • Connections. The long-term bottlenecks for driving maximum value was not the AI models, it was the connections to the existing technology applications, knowledge, analytics, and processes of the enterprise.

They realized that if AI superintelligence or Artificial General Intelligence (AGI) was ever achieved, it would provide little value without the reusable building blocks and connections.

They also determined that we are in a Foundational AI arms race (i.e., OpenAI, Anthropic, Google) and new models are released almost daily. They determined the top foundation models and even smaller models work well enough, the solution almost entirely is about the internal platform, tools, and connections and how it works within the ecosystem. Thus, they designed their enterprise AI capabilities so that they could swap foundational AI models anytime without impact to the enterprise AI ecosystem. Essentially treating Foundational AI models as commodities.

To create value, JP Morgan had to address:

  • LLM augmentation
  • Connections
  • Reusable building blocks
  • Guardrails
  • Evals
  • User Awareness and Adoption
  • AI Agent mindset

LLM Augmentation – For GenAI to be effective, it must leverage current, trusted, and proprietary knowledge while avoiding hallucinations. To address this, it begins with developing a Retrieval-Augmented Generation (RAG) strategy. RAG improves Gen AI by retrieving external and trusted knowledge sources —like company documents or the internet—adds them to the Large Language Model (LLM) prompt (a.k.a., context window) before generating a response. The inference output is up-to-date trusted information, which reduces hallucinations, and doesn’t require retraining when new Foundational AI models are released. JP Morgan is on their fourth generation of RAG:

  • First – basic RAG – access internal knowledge in keyword and vector searches. Key word search is brittle, such as searching for “vacation” may not include knowledge on approximate nearest neighbors (ANNs) such as PTO, holiday, or personal leave. To enable vector search, organization create vector stores by transforming data into numerical embeddings stored in a database for semantic search. The numerical embedding captures the relationships between vacation, PTO, holiday, and personal leave.  This enables users to ask questions about vacation time or recently quarterly results without navigating HR and financial systems, finding documents, clicking on links, scrolling, and searching for answers. You may have noticed that Google Search started to improve around 2017 when they incorporated vector search of numerial embeddings innovations.
  • Second – democratized RAG – they federated it so the entire firm could create their own knowledge stores and set various access provisions around it.
  • Third – hierarchy RAG – they realized not all knowledge is equal, thus they created hierarchies of information
    • top – precise scripted answer to various questions when you don’t want degrees of freedom
    • second – they call it evergreen. It is surfacing authoritative real time sources.
    • third – knowledge and information contained in documents that stand the test of time
    • bottom – information that may be more temporal that becomes less relevant as time goes on
  • Fourth – multimodality RAG – ingest reports with graphs, images, and company pitch illustrations, like what’s used in marketing

Connections – for AI agents to work, they need ubiquitous connections to what people interact with such as structured data systems, documents, knowledge stores, applications, and systems like HR and CRM. JP Morgan calls this their connected ecosystem. They built connections to their trading, financing, and risk systems. They added connections each month. They use a five-team rule for deciding on connections. If five teams request a connection, they prioritize implementing those connections via application programming interfaces (API) or Model Context Protocols (MCPs) to enable AI models to seamlessly connect to external data sources, tools, and databases. Once the connections are made, they become available to anyone in the enterprise with the access privileges.

Reusable building blocks – The personal AI agents call the many reusable building blocks available across the enterprise or what’s shared by team members. The building blocks are often multiple-step tasks, such as research (multiple calls to various systems, knowledge, and data) and analysis by comparing companies within an investment strategy to determine the best value. The building blocks create presentations in PowerPoint, populate and analyze data in Excel, pitchbook creation, and prepare material into a JP Morgan’s standard formats for documents, brochures, or presentations. The reusable building blocks could be other AI agents and skills that can be added to LLM prompts. The skills could by step-by-step instructions of how to do a task that are added to the LLM prompt (context window) when planning a task.

Guardrails – AI agents is a major mindset shift from telling the system what to do (traditional software) to telling the system what not to do (AI Agents). “The only way to truly know if AI agents work is to release them into production environments,” according to LangChain CEO Harrison Chase. Guardrails must address the many alignment challenges AI Agents create. JP Morgan onboards every MCP server into platform, which includes security testing and making sure legal agreements are in place. Once the MCP server is onboarded, it is available for the whole firm provided the individual has access privileges.

Evals – In addition guardrails, AI agents require robust and comprehensive set of evals to determine their effectiveness. The evals must determine:

  • overall performance of the AI agents
  • alignment with goals, risk, compliance, and governance
  • feedback to power the flywheel of learning and adapting.

Evals are a fast-emerging new industry that helps train AI models and determine their performance, alignment, and learning. The AI Startup Mercor hired tens of thousands of human experts—including lawyers, doctors, engineers, and researchers—to train and evaluate advanced AI models. Their evals are critical to understanding the quality and accuracy of AI inferences in specific domains and subjects.  Mercor’s APEX-Agents evaluate agents on real day-to-day work of professionals such as investment banking analysts, management consultants, and corporate lawyers. OpenAI released GDPval, a framework designed to measure AI model performance with 1,320 real-world, economically valuable tasks across 44 occupations. Guardrails, alignment, and governance of AI Agents is not possible without robust enterprise evals that monitor, audit, and measure performance.  JP Morgan vaulted to the top of the AI Evident Index, which is a global eval of AI talent, innovation, leadership, and transparency in banking.

User Awareness and Adoption – JP Morgan began with the distribution of LLM technology onto everyone’s desktop within a hosted platform. This ensures the inputs to the LLMs stay inside the enterprise rather than captured by OpenAI, Anthropic, or Google. The internal LLM helps to prevent shadow AI, the unauthorized use of AI tools, applications, or large language models (LLMs) by employees to perform work tasks without the knowledge, approval, or security vetting of their organization.

JP Morgan displayed posters, launched “AI Made Easy” employee training, and hosted ideation workshops with thousands of people across businesses, operations, and technology departments. At the hosted events, they demoed the tech, brainstormed how it could be used, and identified a ton of ideas. They identified half a dozen of the most prominent patterns of asks including being able to query and analyze their digital data within LLMs. They enabled conversational questioning to structured databases, thus automating the process of traditional data base querying. They built document ingestion and comparison analysis AI tools.

Without much effort, they got to 30% enterprise adoption with their early adopters. That group got busy right away building personal AI agents that helped them get to 60% once the fast follower population saw what they were doing.

AI Agent mindset – For AI agents to evolve, it requires what Waldron describes as an AI Agent mindset. He often asks himself what is missing from his personal AI assistant ecosystem that he can’t ask an AI agent to do. First, they helped people use AI for questions and answers, then research and summary activities. Then people didn’t want AI agents to solve part of the process, they wanted it to solve the whole process. For example, an investment banker must go to news, check earnings releases, do web research about a particular client and then create a briefing note. In 2025, Waldron believes they landed on an innovation flywheel. When they try to build AI agents, they identify and fill the gaps (maybe connections). They have a team that surveils the gaps and has a process to solve them centrally. This addresses individuals’ problems, which expands capabilities, which creates more ideas, more uses, thus the flywheel effect.

JP Morgan uses a top-down and bottom-up approach. Bottom-up has resulted in building incredibly powerful platform capabilities that have connections, knowledge, and reusable building blocks that grow and scale over time, enabling more use cases and adoption. Yet the bottom-up approach won’t fully transform a company on its own. Businesses run on long processes that cross multiple teams. JP Morgan recognized that if they want to move the needle on end-to-end processes, top-down strategies are required, such as reducing end-to-end time to disperse credit and end-to-end time to onboard employees. Waldron acknowledged they must strategically rethink what the process will look like in a world of AI and AI agents.

While the 30,000 personal AI agents don’t address the more complex enterprise workflow processes that cross multiple teams, the AI platform layer created for personal AI agents is a foundation for the end-to-end enterprise processes.

For more on enterprise adoption of AI, VentureBeat, and their Beyond the Pilot podcast highlights what JP Morgan and other enterprises are doing with AI. We can learn from organizations that moved beyond AI demos and pilots. The Beyond the Pilot podcast has helped me learn.

{ 0 comments }

MIT Researchers Identified AI Challenges in Healthcare

MIT researchers identified AI challenges healthcare systems must overcome in a paper presented at the Neural Information Processing Systems (NeurIPS 2025) conference in December.

Best-performing AI models on chest X-rays and cancer histopathology images at the first hospital were the worst-performing on up to 75 percent of patients at the second hospital. The AI model failures occurred when models are applied to data other than what they were trained on.

“We demonstrate that even when you train models on large amounts of data, and choose the best average model, in a new setting this ‘best model’ could be the worst model for 6-75 percent of the new data,” says Associate Professor Marzyeh Ghassemi.

 

The challenge will be how do we leverage AI models working at the Mayo Clinic or MD Anderson at the over thousand healthcare systems including safety net hospitals.

{ 0 comments }

The AI Struggle With “Doing”

It’s easy to infer AI models will replace your job soon when you see these stories:

As we have learned, with what Gen Z calls “irl,” these proxies for human intelligence don’t translate to AI doing things in-real-life.

  • Roughly 95% of businesses that invested a combined $40 billion in AI failed to make money, according to an MIT study
  • A randomized controlled trial (RCT) found that when developers use AI tools, they take 19% longer than without it
  • Carnegie Mellon researchers found the best AI agents fail about 70% of the time on real-world corporate tasks.
  • A McKinsey survey found that only about ten percent of respondents report scaling AI agents beyond pilots.
  • Gartner predicts over 40% of Agentic AI projects will be canceled by end of 2027

Steven Pinker defines intelligence as “the pursuit of goals in the face of obstacles[1] which requires doing. Psychologists define intelligence as learning from experience, adapting to new situations, handling abstract concepts, and manipulating the environment[2]. AI struggles with these intelligent behaviors of “doing” in the real world.

While AI struggles with “doing,” it has had great success with “advising” and “assisting” with a human-in-the-loop (including explictly-defined-next-action scripts). OpenAI reports approximately 30% of people use AI Chatbots for advising and assisting at work and 70% for non-work. Physicians find ambient AI assists them with drafting medical notes, thus saving them thirty minutes per day.

The cognitive dissonance of AI is the advising and assisting performance while struggling with hallucinations and doing. The inferred leap that AI’s success will translate into AI doing (with massive job layoffs) may be clouded by these human intelligence proxies. Human proxies assume you can achieve goals in the face of obstacles (Pinker), learn from experience, handle abstraction, and manipulate the environment (psychologists). AI researchers have recognized the need for new proxies which includes OpenAI releasing GPTval, that evaluate “doing” within 44 occupations and 1,320 specialized tasks.

A recent AI paper from Stanford and Harvard explains why most ‘Agentic AI’ systems are impressive in demos and then completely fall apart in real use. Here are some of the “doing” areas that researchers are addressing:

On-the-job training – the ability to learn a unique environment, workflow, people, tools, goals, and improve over time.  The industry calls this recursive self-improvement. Yann LeCun cites a teenager learning to drive in 14 hours and AI-powered autonomous vehicles still struggling. Waymo provided 14 million rides without a driver in 2025, though it lost $1.23 billion on $450 million of revenue. Waymo still requires fleet response agents that view real-time feeds from the vehicle’s exterior cameras. Tesla’s robotaxi has been perpetually one year away since Elon Musk’s 2019 announcement.

Generalizations – AI agents are very good at recognizing and reproducing patterns they’ve seen before, though often fail when a situation looks new though conceptually similar. This makes it difficult for AI agents to make predictions in novel situations or when significant variations exist. Geoffrey Hinton has described the human brain as an analogy machine that help us decide what to do based on analogies of the past. A toddler needs one taste of a disgusting food to generalize it to new situations. AI’s lack of generalization makes it difficult to interpret causal relationships unless someone stated it on Reddit. AI’s understanding is on the surface-level through text or pixels tokens, not the conceptual-level like humans.

Tool Use – Agents can call tools (APIs, databases) and use browsers, but they struggle to decide when, why, and how to use tools reliably. AI models are trained via supervised examples rather than experiential trial-and-error like humans. Small errors on early steps using tools can compound and confuse downstream AI reasoning. AI agents call the same failing tool instead of diagnosing the issue, misinterpret outputs, or assuming the tool is always correct. When AI agents use tools, they are susceptible to adversarial attacks just like humans to social engineering and phishing.

Memory – AI lacks durable, reliable memory across interactions, sessions, and episodes. The memory embedded in pretraining and fine tuning is expensive to update. Large Language Models (LLMs) supplement this with user prompts, Retrieval-Augmented Generation (RAG) techniques, and context windows that can process one million tokens (10 to 15 books). The AI agent doesn’t know what it should remember and how to prioritize it without explicit instructions. This leads to catastrophic forgetting of user preferences, trained knowledge, or past decisions, relearning the same facts repeatedly and storing information but failing to retrieve it when relevant.

Context – AI agents have limited ability to track, prioritize, and reinterpret context over time. Context windows are finite, older information gets compressed and models struggle to distinguish between “important” and “incidental” details. Humans imagine mental models of the world specific to the goal and environment that includes the objects, people, places, abstractions, and analogies.  This enables human to mentally test strategies, beliefs, causal effects, and potential futures as well as update them as they learn. Many researchers[3] are focused on developing world models to address these LLM struggles, which are essentially next token (words, pixels) predictors.

Long horizon planning – AI agents have difficulty planning and executing goals that require many steps over time. AI training optimizes for next-token prediction, not multi-step success. AI agents lack a persistent notion of goals, subgoals, and progress. Errors early in a plan propagate, cascade, and derail later steps. AI developers address this with reinforcement learning during fine-tuning, however, unless objectives can be clearly measured (does the software code work, did you win the game, did you pass the test) and rewarded, it doesn’t learn. AI Agents are strong at planning on paper, weak at planning in action.

Ben Dickson provides an informative look how AI researchers plan to address these issues in 2026. AI “doing” progress is something we all need to watch to get advance notice when AI will replace our jobs.

[1] Steven Pinker, How the Mind Works (W. W. Norton, 1997), 372.

[2] Jahangir Moini, Anthony LoGolbo, and Raheleh Ahangari, “Understanding Physiological Psychology,” in Foundations of the Mind, Brain, and Behavioral Relationships (Elsevier, 2024), 211–28, https://doi.org/10.1016/B978-0-323-95975-9.00002-0.

[3] Yann LeCunFei-Fei Li, Mira Murati,

{ 0 comments }

Can You Live Without AI?

A dad, husband, author, and journalist living in New York decided to find out. For 48 hours, A.J. Jacobs would avoid all interactions with A.I. and machine learning.

He woke, picked up his phone, and entered his iPhone passcode like it was 2017. He quickly learned his iPhone would be useless. No AI curated news, social feeds, or attention maximization targeted ads. No email passed through the spam filter or podcasts cleaned up with AI. He put his iPhone in the drawer.

His wife Julie turned on the lights and A.J. quickly flicked them off. “Are you kidding me?” she asked.

Con Edison uses AI to monitor four million meters to manage the grid. He thought about using rainwater to brush his teeth after realizing New York water uses machine learning to monitor 1,600 sensors. He had to walk or bike to avoid traffic flow monitoring, Ubers, and subway. He couldn’t use weather apps, Zoom that leverages AI for noise suppression, credit card transactions, food services, retail, or television streaming. He ended up watching Brewster McCloud on a twenty-year-old DVD player.

AI and machine learning (ML) models are used to predict, detect, and recognize patterns. The outputs of AI and ML models feed explicitly programmed software scripts that essentially run our world. While people are most familiar with AI through Chatbots, like ChatGPT introduced by OpenAI in November 2022, the hidden AI in the background runs our world. It has been since the 1990s when it began detecting credit card fraud, managing retail inventory, and sorting zip codes for the U.S. Mail.

The fraud detection ML model has milliseconds after a credit card is swiped to either flag for review (and anger the customer) or clear (and risk bank losses). These ML models consistently improved over the decades to find the perfect balance of customer friction and costly losses. Kroger would make five million ML model sales predictions per day, one for each item in each store, to power their supply chain. There’s not enough room in the back for extra Cheerios boxes or tolerance for unhappy Cheerios lovers.

Jacobs wrote in his New York Times story, “What I didn’t expect was that my attempt to avoid all interactions with A.I. and machine learning would affect nearly every part of my life — what I ate, what I wore, how I got around.

Most can go forty-eight hours without chatbots, few can go without hidden AI and ML models that power human society. Jacobs mentioned that even a goat herder in the mountains check weather apps.

Photo Source: The New York Times

{ 0 comments }

What’s Up With AI?

As if life wasn’t complex enough, we now must make sense of AI. It hasn’t been easy, through three words may help.

A MIT study found 95% of investments in Gen AI have produced zero returns, while 90% of workers surveyed reported daily use of personal Gen AI tools like ChatGPT or Claude for job tasks.

A study from software company Atlassian found daily usage of AI among individual workers has doubled over the past year, while 96% of businesses “have not seen dramatic improvements in organizational efficiency, innovation, or work quality.”

A survey of 3,700 business executives found 87% said AI will “completely transform roles and responsibilities” within their organizations over the next twelve months, while 29% said their workforces are equipped with the skills and training necessary to leverage the technology.

Harvard economist found 92% of the U.S. GDP growth in first half of 2025 was from AI investments, yet a Center for Economic Studies (CES) paper found a 1.3% drop in productivity after implementing AI though they expect productivity gains later.

It seems clear AI “is” and “will be” transformational, though it is hard to distinguish what “is” versus “will be” or whether we are in an AI bubble. OpenAI CEO Sam Altman, Amazon founder Jeff Bezos, and 54% of fund managers recently indicated that AI stocks were in bubble territory.

Railroads, electricity, and the internet were transformational innovations that created bubbles, went bust, and then faded to normal in the background. When the internet moved past boom and bust, we faded into new business moats such as Google (Search, Android, Chrome, Cloud), Meta (social media), Tik Tok, Amazon (eCommerce, Cloud), Microsoft (Windows, Office, Cloud), and Apple (MacOS and iOS). The announced massive AI investments, with over one trillion by OpenAI, indicates investors expect AI to be more than companions, coders, and search tools, rather new moats when AI fades to the background.

To help make sense of AI, we may think in terms of advising, assisting and doing. We must also be clear what AI “is” versus “will be.” Today, AI “is” mostly “advising” with some exciting new “assisting.”  The hype is mostly about what AI “will be,” which is “doing.”

1. Advising – most AI use cases are advising. It takes inputs and creates inferences such as predicting email spam, loan worthiness, what to wear to a party, and content (Tik Tok video) that will maximize your engagement.  The AI inferences feed deterministically programmed actions, “if this, then do that.” Advising helps us figure out how to do things and answers our questions. The human-in-the-loop decides what to do next or the inference result powers explicit programming such as maximizing user engagement. AI is not replacing humans, though it should help us become more efficient and effective. It is hard to measure the productivity of advising, though if the strategies result in less actions (efficiency) and better outcomes (effectiveness), it must be more productive.

2. Assisting – this is essentially a tool that does stuff in the digital world like tools (i.e., shovel, washing machine) in the physical world. It is often called Gen AI. It creates videos, software code, summarizes content, drafts letters or does homework assignments based on user prompts. It makes us more efficient and effective creating digital content within individual tasks. It requires the human-in-the-loop to judge the content created to avoid adverse outcomes, like a chatbot that accepted a $1 for a new Chevy Tahoe with a MSRP of $58,195. While the terms “AI agents” or “Agentic AI” are used to describe AI that extracts data from documents, engages customers, curates and summarize content, the next actions are determined by the human-in-the-loops or predetermined and executed with explicit software logic (like Siri or Alexa). It’s logical to assume that creating digital content, like tools in the physical world, will help us become more efficient and effective.

3. Doing – this is goal achievement without a human-in-the-loop. “Doing” is a typically a highly efficient, tightly synchronized flywheel of few to millions of “inferences” and “actions” where the actions are not predetermined. Humans have a tight integration of the neurons, synapses, and well-tuned perceptual, motor, learning, memory, and executive neurocognitive functions. This enables flexibility in novel environments based on mentally imagined models of the world. “Doing” is an autonomous vehicle without a human driver. As we have learned, addressing the last 1% to 5% of autonomous driving edge case may take a decade or longer. “Doing” is the human immune system that makes inferences based on inputs and uses its agency to makes decisions and take actions to destroy pathogens. A thermostat that automatically turns on the heat is not “doing”, rather it is “advising” because a human explicitly programmed the next actions based on inferences.  Doing is difficult for AI as it lacks the capacity to understand the world, understand the physical world, the ability to remember and retrieve things, persistent memory, the ability to reason, and the ability to plan. This is according to Turing Award winner Yann LeCun.  While there is no doubt these AI challenges will be addressed, AI is not doing much today.

“Advising” and “assisting” is today’s AI reality. “Doing” gets the AI hype with arousal headlines of how it will replace our jobs and superintelligence will rapidly, irreversibly, and uncontrollable take over the world to render humans subservient.

When trying to make sense of AI, begin with who decides and does the next best actions. Is it the human-in-the-loop, predetermined by humans, or AI with agency.

{ 0 comments }

Understanding AI by understanding humans

Are humans underrated? Anthropic (maker of Claude) CEO Dario Amodei predicted in May:

AI could wipe out half of all entry-level white-collar jobs — and spike unemployment to 10-20% in the next one to five years.

Last year, AI startup investor and author of AI Superpowers Kai-Fu Lee predicted AI will displace 50% of jobs by 2027.

Research on humans has begun to put sand in the gears of these bold predictions. Researchers are following entrepreneurs, marketing departments, and shameless blog writers by evoking AI to get attention. Yet, improving our understanding of humans may be essential for our lifelong journeys living with AI.

Last week, I saw three studies that illustate this shift.

AttentionHow the Brain Filters Distractions to Stay Focused on a Goal

The Yale University study demonstrated how the human brain allocates limited perceptual resources to focus on goal-relevant information in dynamic environments. The study finds that the brain prioritizes perceptual effort based on goals, filtering out distractions. Attention shifts rapidly and flexibly in response to changing visual demands.  AI struggles with non-relevant information and requires precise language to be effective, as demonstrated in a clinical diagnosis study using chatbots.  The study found that if you remove physicians from filtering relevant information and precisely describing them (using long Latin derived terms) the effectiveness of chatbots drops from (94 percent accuracy to 34 percent).

Attention is essentially processing of bi-directional electrical pulses in neurons between perception and mental models relevant to the goal-directed strategy. Agentic AI will need to learn attention to focus on relevant inputs, shift attention rapidly, change based on perceptual inputs (learning) and infer futures without requiring precise prompts to engage LLM token prediction machines.

LearningWhy Children Learn Language Faster Than AI

Learning (a.k.a. self-correction) may be the most important type of inference for survival of any form of life.  The Max Planck Institute study found that even the smartest machines can’t match young minds at language learning. They estimated if a human learned language at the same rate as ChatGPT, it would take them 92,000 years. They introduced a new framework and cited three key areas:

  • Embodied Learning: Children use sight, sound, movement, and touch to build language in a rich, interactive world.
  • Active Exploration: Kids create learning moments by pointing, crawling, and engaging with their surroundings.
  • AI vs. Human Learning: Machines process static data; children dynamically adapt in real-time social and sensory contexts.

Next ActionAffordances in the brain: The human superpower AI hasn’t mastered

To achieve a goal, strategy inferences such as perception, imagining, deciding, and predicting must conclude with the next best action(s). The  study by University of Amsterdam scientists discovered:

Our brains automatically understand how we can move through different environments—whether it’s swimming in a lake or walking a path—without conscious thought. These “action possibilities,” or affordances, light up specific brain regions independently of what’s visually present. In contrast, AI models like ChatGPT struggle with these intuitive judgments, missing the physical context that humans naturally grasp.

There is no doubt that AI and robots will improve next best action inferences when they get widely deployed. For now, they must rely on token prediction machines based on statistical representations of words or groups of pixels (a.k.a., ChatGPT, Claude, or Gemini).

Photo Credit: Neuroscience News

{ 0 comments }

Are we ready for Doctor AI?

ChatGPT, Gemini, Claude and Large Language Models (LLMs) are impressive with medical diagnoses, with ChatGPT-4 performing better than physicians at diagnosing illness in a small study. A closer look finds AI in medical diagnosis is another example of the cognitive dissonance of AI.

  • Thought – A paper by researchers at the University of Oxford found LLMs could correctly identify relevant conditions 94.9% of the time when directly presented with test scenarios.
  • Thought – Human participants using LLMs to diagnose the same scenarios identified the correct conditions less than 34.5% of the time.

What went wrong?

Looking back at transcripts, researchers found that participants both provided incomplete information to the LLMs and the LLMs misinterpreted their prompts. For instance, one user who was supposed to exhibit symptoms of gallstones merely told the LLM: “I get severe stomach pains lasting up to an hour. It can make me vomit and seems to coincide with a takeaway,” omitting the location of the pain, the severity, and the frequency.

It appears physicians know how to identify the relevant conditions and how to clearly state them to the ChatBot. The Oxford study highlights one problem, not with humans or even LLMs, but with the way we sometimes measure LLM performance.

  • Thought – LLMs can pass medical licensing tests, real estate licensing exams, or state bar exams.
  • Thought – LLMs can often provide poor personal medical, real estate, and legal advice.
{ 0 comments }

The Cognitive Dissonance of AI

In psychology, cognitive dissonance is the discomfort from holding two or more contradictory thoughts.  The term describes AI today. To leverage AI and thrive in our AI journeys, we need to live with the discomfort that comes with understanding of the strengths and weaknesses of AI.

ChatGPT, Gemini, Claude:

Chatbots for advice:

Large Language Models:

AI Agents:

AI Reasoning Models:

Autonomous Vehicles:

Photo Credit: Author generated with ChatGPT. AI image generation is amazing, though it can be a struggle to get precisely what is wanted.

{ 0 comments }

How Do You Hire a Gen AI Model?

Hilke Schellmann describes how we use AI-powered algorithms to screen resumes, process background checks, facilitate candidate online assessments, and conduct one-way interviews in the book  The Algorithm: How AI Decides Who Gets Hired, Monitored, Promoted, and Fired and Why We Need to Fight Back Now. 

While the AI-powered algorithms for hiring humans may not work with Large Language Models (Gen AI), we do have insights from Melanie Mitchell. She is one of the best explainers of AI. Her bestselling book is Artificial Intelligence: A Guide for Thinking Humans. She explains very well what AI can do and what it cannot do.

She recently casted doubt on recent LLM research that stated: “GPT-3 appears to display an emergent ability to reason by analogy, matching or surpassing human performance across a wide range of text-based problem types.”

She replicated the experiments using counterfactual tasks to stress-test claims of reasoning in large language models.  While the advances of LLMs have been amazing, we need people like Melanie Mitchell to help make sense of the hype and sensational claims. Otherwise, how are we going to know to how hire our next assistant?

{ 0 comments }

A Little Earth Day Optimism

The complexity of reducing the CO2 pumped into the atmosphere can feel overwhelming and even hopeless. While we must continue engaging in the many initiatives to make this happen, it is nice to read an optimistic story that could help us improve our future.

That dose of optimism is the Jessica Rawnsley story “The Rise of the Carbon Farmer” in Wired.  She describes the revival of Regenerative Agriculture that keeps carbon in the soil rather than the atmosphere. It even improves soil health and improves yields.

By some counts, a third of the excess CO2 in the atmosphere started life in the soil, having been released not by burning fossil fuels but by changing how the planet’s land is used.

He (Patrick Holden) is one of a growing number of farmers shaking off conventional methods and harnessing practices to rebuild soil health and fertility—cover crops, minimal tilling, managed grazing, diverse crop rotations. It is a reverse revolution in some ways, taking farming back to what it once was.

 

 

{ 0 comments }