🧠 Which AI Answer is Best?

Five leading AI models tackle the same complex biotech question - which one is right?

The Prompt:

"Based on the most recent interim topline results, estimate the probability that Summit Therapeutics' HARMONi trial of Ivonescimab + chemo demonstrates a statistically significant benefit in overall survival when the trial data is mature."

Note: These responses have been copied and formatted from the original chatbots. We encourage you to ask the same question to these models yourself to see their current responses.

Probability Estimate: 65-75%

Based on the most recent interim topline results from Summit Therapeutics' HARMONi trial, I can provide an estimate of the probability for achieving statistically significant overall survival benefit.

Current Status of HARMONi Trial

The HARMONi trial successfully met its progression-free survival (PFS) primary endpoint, showing a 48% reduction in risk of disease progression or death. However, overall survival showed a positive trend without achieving statistical significance, with a hazard ratio of 0.79 (95% CI: 0.62-1.01; p=0.057).

Key Factors Supporting Future Statistical Significance

Strong PFS Signal: The trial demonstrated ivonescimab in combination with chemotherapy reduces the risk of disease progression or death by 48% compared to chemotherapy alone. Research shows that hazard ratio of progression-free survival is an excellent predictor of overall survival in phase III randomized controlled trials, with correlation coefficients of 0.87-0.96 in Phase III subgroups.

Consistent Results Across Studies: The results were consistent with the single-region HARMONi-A study, which demonstrated an overall survival hazard ratio of 0.80 at 52% data maturity in a similar patient population. Additionally, HARMONi-2 showed an overall survival hazard ratio of 0.777 favoring ivonescimab at 39% data maturity.

Data Maturity Considerations: The current OS analysis appears to be at an interim stage, and additional follow-up time typically leads to increased statistical power as more events occur.

Challenges to Achieving Significance

FDA Requirements: Based on discussions with the FDA, a statistically significant overall survival benefit is necessary to support marketing authorization. The current p-value of 0.057 is just above the conventional 0.05 threshold.

Historical Precedent: Research on FDA approvals shows that fewer than a third of cancer drug indications approved with immature survival data showed a statistically significant overall survival benefit after approval, with only 32% eventually demonstrating significant OS benefit.

Confidence Interval: For statistical significance, confidence intervals for the hazard ratio cannot include unity (one). The current 95% CI of 0.62-1.01 crosses 1.0, indicating non-significance.

Probability Estimate

Based on the available evidence, I estimate the probability that the HARMONi trial will demonstrate statistically significant overall survival benefit when mature at approximately 65-75%.

This estimate considers:

  • Positive factors (+): Strong and consistent PFS benefit (HR ~0.5), consistent OS trends across multiple studies (HR 0.77-0.80), current p-value very close to significance (0.057), and strong biological rationale for OS benefit
  • Negative factors (-): Historical precedent showing only 32% success rate for immature OS data, FDA's explicit requirement for statistical significance, and current CI crossing 1.0

The relatively high estimate reflects the exceptional strength of the PFS signal, the consistency of OS trends across multiple studies, and the proximity to statistical significance in the current interim analysis. However, the substantial uncertainty inherent in survival analyses and the FDA's stringent requirements prevent a higher confidence estimate.

Probability Estimate: 45-55% (best point-estimate ≈50%)

Quick answer: Using standard conditional-power methods and a weighted scenario analysis, I estimate that the HARMONi trial now has ≈45-55% probability of converting its current overall-survival (OS) trend (HR = 0.79, p = 0.057) into a statistically significant OS benefit at the final read-out. Details, assumptions, and sensitivities follow.

1. What the interim topline told us

Endpoint Hazard ratio 95% CI p-value Comment
Progression-free survival (primary) 0.52 0.41 – 0.66 < 0.00001 Highly positive Source
Overall survival (co-primary) 0.79 0.62 – 1.01 0.057 Positive trend but missed significance Source

No event counts were released, but the CI width implies ≈250 OS events have occurred. HARMONi's design most likely targets ~440-460 OS events for 90% power to detect HR ≈ 0.75 at a two-sided α = 0.045. That places the interim information-fraction near 55-60%.

2. Conditional-power calculation

Interpolated standard error:

SElnHR = [ln(1.01) − ln(0.79)] / 1.96 = 0.125

Implied events so far:

Eobs ≈ 4 / SE² ≈ 254

Assume final target Efinal = 450.

Conditional power if true HR = 0.79 stays constant:

CP = Φ((z₁√(Efinal/Eobs) − z0.05/2) / √(1 − Eobs/Efinal)) = 79%

3. Scenario sensitivity

Future true HR Why it could occur Conditional power
0.79 Effect durable, no dilution 79%
0.82 Minor dilution (cross-over, subsequent PD-1 use) 59%
0.85 Effect wanes to levels seen with pembrolizumab + chemo in KEYNOTE-789 36% Source
0.90 Little true survival gain 10%

4. External evidence to weight those scenarios

Historical precedent is tough. KEYNOTE-789 (pembro + chemo) and IMpower150 subgroup analyses all showed HR ≈ 0.84–0.90 and ultimately failed OS significance in EGFR-mutant post-TKI NSCLC. Source Source

Ivonescimab's biology could help. A bispecific PD-1×VEGF antibody may add anti-angiogenic benefit that pembro lacked; the Chinese HARMONi-2 monotherapy trial already shows an OS HR = 0.78 at only 39% maturity. Source

Design considerations

Weighting the four HR scenarios (30%/33%/25%/12%) yields an aggregate ≈53% chance that HARMONi's final OS p-value will cross < 0.045. Rounding for the inevitable uncertainties, a 45-55% probability is a fair point estimate.

5. Key risks & upside levers

Risk / Dilution factor Impact on OS gap
Post-progression PD-1 therapy in control arm Down
Higher Western enrollment (38 %) where EGFR immunotherapy benefit historically smaller Down
Longer follow-up may erode initial hazard-ratio advantage if early deaths already captured Down
More VEGF-driven resistance biology than checkpoint biology Up
Potentially cleaner cross-over rules vs KEYNOTE-789 Up

6. Bottom line

That puts Summit’s ivonescimab at the knife-edge for an FDA filing strategy predicated on OS—or a push toward other registrational trials (HARMONi-3, -7) where larger patient numbers and earlier-line settings might deliver cleaner survival wins.

(All calculations are approximate and use public data; numbers will shift if Summit discloses the exact OS event target or new interim updates.)

Qualitative Estimate: "Moderate to reasonably good probability"

Based on the most recent interim topline results from Summit Therapeutics' HARMONi Phase III trial of Ivonescimab plus chemotherapy, announced in May 2025, an estimation of the probability of demonstrating a statistically significant benefit in overall survival (OS) at data maturity can be made, considering the following key points:

Current Interim OS Results

Hazard Ratio (HR): The interim analysis showed a positive trend in OS with a hazard ratio of 0.79 (95% Confidence Interval: 0.62 – 1.01). This suggests a 21% reduction in the risk of death for patients treated with Ivonescimab plus chemotherapy compared to chemotherapy alone.

P-value: The p-value for OS at the interim analysis was 0.057. This did not meet the threshold for statistical significance at this time point (typically p < 0.05 for a primary endpoint).

Data Maturity: Crucially, the company noted that the OS data is not yet mature. Specifically, "the median follow-up time for western patients was less than the median overall survival at the time of the analysis, and these patients may continue to be followed for long-term outcomes." Approximately 38% of the HARMONi trial participants were enrolled from Western countries.

Progression-Free Survival (PFS): The trial's other primary endpoint, PFS, was met with high statistical significance (HR = 0.52; 95% CI: 0.41 – 0.66; p < 0.00001). While positive PFS does not guarantee statistically significant OS, it is often considered a favorable indicator.

Regulatory Context: Summit Therapeutics has stated that "The FDA noted that a statistically significant overall survival benefit is necessary to support marketing authorization."

Consistency: The observed OS trend is consistent with the earlier single-region HARMONi-A study, which reported an OS HR of 0.80 at 52% data maturity in a similar patient population.

Estimating the Probability of Future OS Significance:

Estimating a precise numerical probability of future success is challenging without access to the full statistical analysis plan, the planned number of OS events for the final analysis, and the exact current number of OS events. However, based on the available public information, a qualitative estimate can be provided:

The current p-value of 0.057 is very close to the typical significance threshold. Given that the OS data is still maturing, particularly for the Western patient cohort, there is a reasonable expectation that as more OS events are recorded, the statistical power to detect a difference will increase.

Factors Favoring Statistical Significance for OS at Maturity

Factors Introducing Uncertainty

Qualitative Estimate

Considering these factors, particularly the borderline p-value and the immaturity of the OS data with a clear statement about ongoing follow-up for a significant patient subgroup, there is a moderate to reasonably good probability that the HARMONi trial will demonstrate a statistically significant benefit in overall survival when the data is mature.

This is not a certainty, and the outcome remains at risk. However, the current data provides a stronger basis for optimism than if the p-value were substantially higher or the HR closer to 1.0. If the observed effect size (HR ≈ 0.79) is maintained and the variability decreases with more patient follow-up (leading to a narrower confidence interval), the p-value is likely to cross the significance threshold.

Important note: This is an estimate based on publicly available interim data and not a formal statistical projection. The actual outcome will only be known when the final OS analysis is conducted and reported by Summit Therapeutics.

Probability Estimate: 35-45%

Based on the most recent HARMONi trial results, I estimate the probability of demonstrating statistically significant overall survival (OS) benefit at approximately 35-45% when the data is mature.

Here's my analysis of the key factors:

Current Trial Results

The HARMONi trial met its progression-free survival (PFS) co-primary endpoint with a hazard ratio of 0.52 (48% risk reduction), but missed statistical significance for overall survival with a hazard ratio of 0.79 (95% CI: 0.62-1.01; p=0.057).

Reasons for Moderate Probability Estimate

Positive indicators:

Challenges:

Contextual Considerations

The HARMONi-2 trial in China showed an interim OS hazard ratio of 0.777 at only 39% data maturity, suggesting that early OS data can be promising but may not translate to statistical significance at final analysis.

Given the borderline p-value and the consistency of results across regions, there's a reasonable chance that with additional follow-up and events, the confidence interval could narrow enough to achieve statistical significance. However, the probability is less than 50% because the current result already represents the primary analysis, and dramatic improvements with maturity are uncommon.

Probability Estimate: ~70% (roughly 2 in 3 chance)

Probability of Ivonescimab’s HARMONi Trial Achieving Significant OS Benefit

Interim HARMONi Results and Significance Criteria

Summit Therapeutics' Phase 3 HARMONi trial (ivonescimab + chemotherapy vs. chemo in EGFR-mutant NSCLC post-EGFR TKI) delivered positive topline results for progression-free survival (PFS) but only a trend for overall survival (OS). At the primary analysis, the ivonescimab arm showed a 21% reduction in risk of death (hazard ratio 0.79) compared to placebo, but this did not reach statistical significance (p=0.057). In other words, the OS difference just missed the conventional significance threshold (typically p < 0.05 for a “statistically significant benefit”). The FDA has explicitly told Summit that a "statistically significant overall survival benefit" (i.e. a proven OS improvement at the p<0.05 level) is required to support U.S. approval. Thus, despite a favorable OS trend, the current data are considered indeterminate for OS until more events accrue.

Importantly, these results pertain to the primary analysis population (intent-to-treat) of the global HARMONi trial. About 420 patients were enrolled (including a large subset from a prior China-only study, HARMONi-A). The interim OS analysis is based on immature data, as many patients (especially those outside Asia) had not been followed long enough for OS events. In fact, Summit noted that the median follow-up for Western patients was shorter than the median OS at this analysis, indicating that a substantial fraction of patients were still alive and being followed. Both Asian and North American subgroups showed a favorable OS trend (hazard ratios below 1), reinforcing that the benefit was observed across regions. However, with only a portion of the required OS events observed so far, the trial could not confirm a definitive survival advantage at this interim look.

Outlook for Achieving OS Significance at Maturity

Given the near-miss OS result, an important question is how likely full OS maturity will yield a statistically significant benefit. Several considerations inform this probability:

Positive Factors Supporting Future Significance

Close to Significance Already: The OS endpoint fell just short of significance (p=0.057 vs. the <0.05 target). This narrow miss (described by one analyst as a "narrow" OS miss) suggests that only slightly more events or longer follow-up could tip the result over the significance threshold. In fact, the difference was so close that a small increase (~5–10%) in the number of OS events would likely have made the p-value <0.05. This implies that if the observed hazard ratio (~0.79) persists, the statistical power to detect an OS benefit will improve substantially as additional deaths accrue. In other words, if ivonescimab’s true effect on OS is around a 20% risk reduction (HR ~0.8), the trial is highly likely to show a significant result with complete data. The interim data gave strong evidence of benefit (the entire 95% CI for OS was 0.62–1.01, almost entirely favoring the drug), so achieving p < 0.05 at final analysis is quite plausible assuming the effect does not wane.

Incomplete Data Maturity: As noted, the OS data are still maturing. Only a fraction of the planned OS events had occurred at the time of the topline readout (the company did not disclose the exact percentage in this PR, but a related Chinese trial interim was at ~39% of OS events). Summit's management emphasized that Western patients in HARMONi had not been observed for long enough; many were still alive, and their outcomes will be recorded with additional follow-up. As these remaining patients continue to be followed, more OS events will accumulate. If the trend observed so far continues, those additional events should strengthen the statistical signal. In fact, Jefferies analyst Kelly Shi ran a post-hoc analysis and concluded that the OS endpoint “still has [a] chance to reach statistical significance in the future as data on patients from outside of China mature”. This assumes that ivonescimab’s OS benefit continues in the later-maturing Western cohort – a reasonable expectation given that both geographies showed similar trends so far. Essentially, the trial may simply need to “catch up” on events to cross the significance line.

Consistent Benefit Across Populations: The primary endpoint population includes patients from Asia and new patients from Western countries (~38% of the total). Encouragingly, Summit reported that ivonescimab's effect was consistent between Asian and Western patients for both PFS and OS. This consistency suggests the drug is benefitting all subgroups, which bodes well for the effect holding up as more Western patient data mature. If anything, longer follow-up in Western patients could even improve the overall hazard ratio (for example, if their survival curves separate later). The fact that PFS was strongly positive and clinically meaningful (HR 0.52) in this hard-to-treat population provides biological plausibility that an OS benefit exists – it may simply need time to manifest fully. No new safety issues emerged, so nothing is likely to confound OS in terms of drug-related mortality.

Risk Factors and Uncertainties

Historical OS Trends in Ivonescimab Trials: The HARMONi-A trial (the prior single-region Phase III in China) hints that ivonescimab's OS benefit might diminish over time. Notably, in HARMONi-A the initial interim OS analysis showed a 28% reduction in risk of death, but on a subsequent update this shrank to about a 20% reduction (hazard ratio worsened from ~0.72 to ~0.80). In fact, the HARMONi-A final OS hazard ratio was ~0.80 (and not statistically significant), quite similar to what we see now in HARMONi. This historical pattern raises some concern: as more time passed and more events occurred, the measured OS advantage became smaller in the earlier trial. It’s possible that the same could happen in the global HARMONi study – for example, if the control arm patients live longer than expected or if the initial separation in survival curves narrows with time. If the true OS benefit is only around 15–20%, the trial might just struggle to reach significance depending on total events. In short, there is no guarantee the hazard ratio will stay at 0.79 or improve – it could drift closer to 0.85–0.90 with full follow-up, which would make significance harder to attain.

Patient Overlap and Data Independence: A large portion (roughly two-thirds) of the HARMONi trial's patients were actually enrolled in HARMONi-A (the Chinese study) and carried over into this global analysis. This overlap means a lot of the OS data in HARMONi is not new – it's the extended follow-up of those original Chinese patients. Since their outcomes may already be near fully matured (and as noted, that subgroup’s OS benefit plateaued around HR ~0.8), the incremental benefit will have to come from the ~38% of patients who were new (mostly Western). Essentially, the “new” events that could push OS to significance will come largely from the Western cohort. If those Western patients end up showing a strong survival benefit (similar or better than the overall 21% risk reduction seen so far), it could improve the composite OS result. However, if their OS benefit is modest or confounded by post-trial treatments, the overall hazard ratio might not improve much from the current ~0.79. The reliance on the Western subset to “carry” the OS significance adds uncertainty, though current signals in that group are positive.

Statistical Considerations: We should note that the trial's statistical design (with co-primary endpoints of PFS and OS) might involve alpha-sharing or interim looks that could slightly adjust the final p-value threshold. The company hasn’t detailed any penalty for looking at OS now, but since no alpha was spent (OS wasn’t significant at interim), presumably the full alpha (5%) remains for final. This means final analysis will likely use p<0.05 as the bar. If an interim analysis had a very stringent alpha, final could be at a slightly different threshold, but the press release implies the usual <0.05 is needed. Therefore, the hurdle for significance is clear. Because PFS was so overwhelmingly positive (p<0.00001), we don't expect any alpha inflation issue there – OS essentially still has ~0.05 to work with. This favors the chance that with more data, the trial can cross the significance line for OS.

Probability Assessment

Best Estimate: ~70% probability of achieving statistical significance

Taking all these factors into account, we can estimate the probability of ultimately seeing a statistically significant OS benefit once the HARMONi data are mature. In our judgment, the odds are meaningfully in favor of success (greater than 50%), but it’s not a sure thing. Given the very close call at interim and the expectation of substantial additional events, one could argue the probability is quite high – perhaps on the order of ~70% (roughly a 2 in 3 chance) that OS will become significant with full follow-up. This assumes the hazard ratio remains in the same ballpark (~0.75–0.80). In a scenario where ivonescimab’s benefit actually strengthens in the maturing Western data (e.g. hazard ratio improves into the low 0.70s), the probability of hitting significance would approach ~90% (essentially almost certain) because the effect size would then easily clear the bar with more events. Conversely, if the effect fades slightly (say HR drifts toward 0.85 by final analysis), the chance of p<0.05 might drop to ~50% or below – essentially a coin flip.

In sum, our best estimate is that the HARMONi trial has a strong likelihood – on the order of roughly 70% – of demonstrating a statistically significant OS benefit when the data fully mature, given the current trend and the closeness of the interim result. This is supported by the “positive trend” observed now and expert analyses suggesting OS significance is attainable with more follow-up. However, we must acknowledge a real possibility (perhaps ~30% or so) that the OS advantage could remain marginal and fall short, especially in light of prior patterns in the Chinese trial. Investors and regulators are effectively waiting for this confirmatory OS data to either “catch up” and confirm the benefit or prove the skeptics right. The FDA’s requirement of a clear OS improvement underscores the importance of this outcome.

Bottom Line

Ivonescimab has already shown a compelling PFS benefit and a nearly significant OS benefit in HARMONi. With additional time and events, it is more likely than not that the OS difference will cross the statistical significance threshold in the primary analysis population. We estimate roughly a 70% probability of eventually seeing a statistically significant OS benefit, assuming the current hazard ratio (~0.79) holds or improves. There remains a non-trivial risk (perhaps 20–30%) that the OS effect could dilute and never quite reach significance, but overall the data and trends to date favor ultimate success in achieving a significant OS improvement.

Sources

Summit Therapeutics – HARMONi Topline Press Release (May 30, 2025): Ivonescimab + chemo improved PFS (HR 0.52, p<0.00001) and trended toward better OS (HR 0.79, p=0.057). The FDA requires a statistically significant OS benefit for approval.

FiercePharma (May 30, 2025) – Trial Analysis: Emphasized that the 21% OS risk reduction did not reach significance yet, but noted Jefferies’ analysis that OS “still has [a] chance to reach statistical significance in the future as data…mature”. Summit noted Western patient follow-up is shorter than median OS, indicating immature data.

OncologyPipeline (May 30, 2025) – Expert Commentary: Pointed out that in the earlier HARMONi-A (China) trial, an initial 28% OS risk reduction fell to ~20% at a later read (HR ~0.80) as data matured, “suggesting the OS result there is deteriorating as the data mature.” This cautions that HARMONi’s OS benefit might likewise diminish with time. Still, Evercore analysts called the HARMONi OS miss “narrow” and better than the market feared, implying optimism that significance can be reached with more data.https://www.oncologypipeline.com/apexonco/harmoni-spoils-summits-pre-asco-splash

Motley Fool/Nasdaq (May 30, 2025) – Investor Reaction: Noted that while investors were disappointed by the lack of definitive OS data, analysts highlighted that the trial likely just “hasn’t lasted long enough” to hit OS significance, “which doesn’t mean that it won’t get there”. This aligns with the expectation that time will tell, and many experts lean towards OS eventually turning significant once follow-up is complete.

🔍 Five Different Approaches, Five Different Answers

⚠️ Important Note: While ChatGPT o3 provided a confident-sounding statistical analysis, it contained mathematical errors that are difficult for non-statisticians to detect - such as using the wrong denominator in its standard error calculation and incorrectly using the point estimate (0.79) instead of the lower bound in conditional power calculations.

If prompted to check its analysis for errors, ChatGPT can identify some of these errors. However, this highlights why expert human oversight remains critical even with sophisticated AI analysis.

Key Insight: Five leading AI models reached dramatically different conclusions from the exact same prompt. Probability estimates ranged from 35% to 75% - a 40 percentage point spread. The spread is so wide as to be almost totally uninformative in terms of making investment decisions. ChatGPT's confident mathematical analysis contained subtle but significant errors, while models used completely different analytical frameworks.

Despite these limitations, these models can be helpful. They provided useful references, accurately extracted data from the press release, and generally exhibited balanced reasoning. And these answers are delivered almost instantly for $0 marginal cost (to the user; there is a marginal environmental cost) gives them potential to be useful.

LLMs are useful, but they are not as smart as the best experts. Even if they do one day reach that level, human analysts remain essential — not just for gathering data, but for choosing the right analytical framework, detecting errors in AI reasoning, and weighing different types of evidence appropriately for investment decisions.