Joining Up Cancer Pathways

This is an example of an analytics project that can be conducted in collaboration between the AI Centre and ICBs on London SDE infrastructure

AIM: To link cancer pathways (including screening, diagnosis, staging, and outcomes) across primary care and secondary care. To identify areas of inequality in screening and late diagnosis, and to generate predictive insights for risk and screening recall.

SUMMARY: Cancer has long been a challenging area for data-driven initiatives. Primary care coding of cancer is often incomplete, as the majority of care following referral takes place in hospitals. The most readily available secondary care data is from Commissioning Data Sets, which are only complete for inpatient care, and may not include the majority of cancer events. Within hospitals, the largest quantity of cancer data sits within audit datasets, or within unstructured text and clinic letters, which are not easily accessible.

This results in significant limitations. First, it is difficult to reconcile screening data with diagnostic outcomes. Second, only an incomplete picture of cancer diagnoses can be obtained at the ICB level, including of disease severity following delayed referral or prolonged waiting time. Finally, it is difficult to gain understanding of how such pathways impact on overall treatment outcomes and cancer recurrence.

The SDE programme provides opportunities to surface and link currently missing data, to provide an end-to-end overview of key cancer pathways. The figure below shows cancer data flows within the SDE ecosystem (Figure 1). This work is complementary to additional work with FLIP to make cancer imaging and digital pathology data available, to support precision medicine research in the same patient cohorts.

flowchart TD
    d1[GP Surgeries] ==>|"Primary care
                        coded cancer data"| LDS1[(London Data Service)]
    d2[NHS Digital] ==>|"National Disease
                      Registration Service"| LDS1
    AIC1["CogStack in Trusts"] ==>|"Cancer staging, diagnosis,
                                      from unstructured text"| AIC2
    LDS1 ==> AIC1a["CogStack in LDS"]

    AIC1a ==> |"Cancer presentation, symptoms,
              from unstructured GP text"| LDS1

    d3[Hospital Trusts] ==>|"Cancer registry and 
                            audit datasets"| AIC2

    AIC2["AIC FLIP"] ==> |"Hospital cancer data
                          via federation"| LDS1
    
    LDS1 ==>|"Linked cancer data
            and federated analytics"| ICB[("ICB / AIC Analytics Teams")]

    classDef green fill:#ab9, stroke:#333, stroke-width:1px
    classDef blue fill:#89c, stroke:#333, stroke-width:1px
    classDef bluegreen fill:#9ca, stroke:#333, stroke-width:1px
    classDef lightblue fill:#ace, stroke:#333, stroke-width:1px
    classDef red fill:#c89, stroke:#333, stroke-width:1px
    classDef lightorange fill:#ca8, stroke:#333, stroke-width:1px
    classDef orange fill:#d96, stroke:#333, stroke-width:1px
    classDef purple fill:#a9c, stroke:#333, stroke-width:1px
    classDef bluegray fill:#9ab, stroke:#333, stroke-width:1px
    classDef gray fill:#fff, stroke:#333, stroke-width:1px

    class LDS1 blue
    class ICB lightblue
    class AIC1,AIC1a,AIC2,AIC3 orange
    class d1,d2,d3 bluegreen
Figure 1: Summary of potential cancer data flows within the London SDE ecosystem.

It is expected that year one of the programme will consist of infrastructural work to enable these data flows, with a focus on two cancer areas, breast cancer and lung cancer. This work will include the installation of technology such as CogStack and FLIP, and creating pipelines to standardise data into OMOP to enable linked querying and federated analytics. Year two will enable a number of analyses for each cancer area that can support understanding of population health, health inequalities, and pathway bottlenecks, and ultimately support the use of predictive analytics to address late referrals, missed screening, and late diagnosis.

(1) How common is later stage cancer diagnosis following primary care referral, and what is the incidence/prevalence of late diagnosis across different patient groups and geographies?
(2) Which patient groups are most subject to screening delay or refusal, and what is the impact on late cancer diagnoses?
(3) In a typical longitudinal cancer “journey” (including screening/GP presentation with symptoms -> referral/investigation -> diagnosis -> treatment initation), where are the major delays? Is there inequality in how patient groups are affected by delays?
(4) Can the unstructured and structured GP record be used to inform cancer risk and referrals through predictive analytics?
(5) Can population groups with cancer outcomes inequality be targeted to increase early diagnosis rates?

Ultimately, a major aim of this SDE programme phase is to bring cancer data availability, linkage, and utilisation in line with other long-term conditions, and to enable systematic evaluation across the entire cancer pathway.