Medicine’s AI Divide

A split-screen moment for artificial intelligence in medicine

Artificial intelligence is delivering one of its clearest signs yet that it could become a powerful tool in medicine — even as a very different use of algorithmic decision-making is being accused of making healthcare less affordable for some of the people who need it most.

In one development, a Harvard-led study found that a frontier AI model matched or exceeded physician baselines on several clinical reasoning tasks, including emergency-department cases in which speed and uncertainty can mean the difference between life and death. In another, an investigation in Kenya found that an AI-assisted system used to determine how much people should contribute for public healthcare had overstated the means of poor households and understated those of richer ones, pushing costs upward for many low-income families.

Together, the episodes capture a defining tension in healthcare’s AI era: the technology is showing real promise in narrow clinical settings, while also exposing how harmful algorithms can become when they are used to ration access, set prices or make decisions with limited transparency.

A striking result in emergency medicine

The Harvard-led research, published on April 30, tested OpenAI’s o1-preview on clinical reasoning benchmarks, including 76 real emergency-department cases at Beth Israel Deaconess Medical Center. The model’s strongest performance came at the earliest stage of care — triage — when doctors have the least information and must make high-stakes judgments quickly.

That finding has drawn attention because triage is one of the most difficult tasks in medicine: clinicians must identify which patients may be in immediate danger before the full picture is known. Researchers described the results as evidence of a profound technological shift, but they also emphasized that the study was not a green light for hospitals to turn over emergency rooms to chatbots.

The work was based on controlled evaluations, not live autonomous use on patients. And the comparison was not with practicing emergency physicians working in a real emergency department, but with internal-medicine attending physicians given the same text-based information from the medical record at each stage. That distinction matters. Emergency medicine often hinges less on naming the final diagnosis than on rapidly ruling out catastrophic threats and deciding who needs immediate intervention.

For that reason, even supporters of AI in medicine have been careful to frame the result as an argument for decision support, not replacement. The Harvard researchers called for prospective trials in real-world clinical settings, where questions of workflow, accountability and patient safety become far more difficult.

The case for AI as a medical “second opinion”

Still, enthusiasm is building among some prominent technology investors and entrepreneurs who see medicine as one of AI’s most consequential applications.

Reid Hoffman, the LinkedIn co-founder and now an investor in AI drug discovery and clinical-support ventures, has argued that doctors should treat advanced AI systems as a kind of second opinion. He has suggested that failing to consult such tools, as they improve, could itself become hard to justify.

That view reflects a wider push in Silicon Valley and among some healthcare executives: if large language models can synthesize symptoms, scan medical literature and reason through differential diagnoses faster than humans, then doctors may eventually be expected to use them much as they use imaging software, lab tests or specialist consults.

The appeal is easy to understand. Health systems are strained by clinician shortages, rising costs and growing administrative burdens. In that environment, a tool that appears able to sharpen diagnosis or reduce error is likely to be welcomed — especially in settings where expertise is scarce.

But medicine has repeatedly shown that impressive performance in controlled testing does not automatically translate into better outcomes at the bedside. A system can answer exam-style questions or retrospective case reviews well and still fail in the messier conditions of real care, where information is incomplete, patients do not fit textbook patterns and clinicians must weigh risks, values and practical constraints in real time.

In Kenya, an algorithmic reform collides with inequality

The risks look even sharper when AI is applied not to diagnosis but to the economics of care.

Kenya’s Social Health Authority, introduced in October 2024 as part of President William Ruto’s universal-health-coverage agenda, was presented as a modern digital overhaul of the country’s older insurance system. But an investigation published on May 4 found that the AI-assisted means-testing process used to estimate what households could afford had systematically misclassified many people, often to the detriment of the poor.

According to the investigation, the system overestimated the incomes of low-income households and underestimated those of wealthier ones. The result was not a technical glitch in the abstract, but a practical burden: some of the poorest Kenyans were asked to pay more for access to healthcare than the system had led them to expect, while better-off households were assessed more lightly.

In a country where even modest increases in required contributions can deter enrollment or delay treatment, the consequences are immediate. An algorithm used in the name of efficiency and universal access can, if poorly designed or inadequately audited, end up hardening existing inequalities.

The Kenya case illustrates a broader truth about AI in healthcare: harm does not only occur when a model gets a diagnosis wrong. It can also occur when automated systems shape who can afford care, whose claims get approved, or which communities are judged too costly to serve.

The governance gap

What links the emergency-room study and the Kenyan investigation is not simply the use of AI, but the question of governance.

Who is responsible when an AI suggestion is wrong? Who checks whether a pricing or eligibility model is biased? How much transparency should governments, hospitals and vendors owe to the people whose care is being shaped by software?

These questions become more urgent as AI systems improve. Better benchmark performance can create pressure to deploy faster, especially when investors, startups and policymakers are eager for savings and scale. But the threshold for usefulness is not the same as the threshold for trust.

In clinical settings, doctors remain legally and ethically responsible for decisions, even when aided by software. Yet as AI advice becomes more persuasive, the line between assistance and de facto authority can blur. In public systems, the accountability problem can be even murkier, particularly when models are embedded in bureaucracy and their logic is difficult for patients to challenge.

The Kenyan controversy is likely to intensify demands for independent audits, greater transparency and mechanisms to correct misclassifications before people are priced out of care. And the Harvard findings are likely to intensify a separate debate over when, and under what safeguards, hospitals should begin integrating advanced models into routine practice.

Why this matters now

The significance of this moment lies in the fact that both sides of the AI healthcare story are becoming harder to dismiss.

The optimistic case no longer rests entirely on hype. In some bounded tasks, especially those involving pattern recognition and text-based clinical reasoning, frontier models are showing capabilities that even skeptics a few years ago would have doubted. That makes AI’s role in medicine feel less speculative and more imminent.

At the same time, the harms are no longer theoretical either. When algorithmic systems are used to structure payment, access and public benefits, their mistakes can widen inequity at scale. In those contexts, efficiency claims can obscure the fact that software is making value-laden judgments about who pays, who qualifies and who gets left behind.

The result is a split-screen reality. AI may help doctors think more clearly in the emergency room, while elsewhere algorithms may make healthcare systems harsher or less fair. The challenge for policymakers and health leaders is not to decide whether AI belongs in medicine. It already does. The challenge is to decide where it helps, where it harms and how to build rules strong enough to tell the difference.

Sources

Further reading and reporting used to add context:

AI News