← Back to News
blog

October 23, 2025

Before we bet big on AI, we need evidence it drives impact

Luke Arundel

Since the launch of ChatGPT in 2022, the world has been upended, and in an explosion of technological progress, Artificial Intelligence (AI) is absolutely everywhere, seemingly overnight. It is both enabling scientific breakthroughs that garner Nobel Prizes, while also introducing floods of AI slop (not to mention work slop). This pace of progress and the mass rollout of AI tools - particularly large language models (LLMs) such as ChatGPT, Claude and Gemini - has been so rapid, it’s hard to see the signal through the noise. Sometimes it feels like you can’t move for new AI products and services being rolled out. Your favourite newspaper offers AI summaries of their articles, your bank encourages you to speak to its chatbot, and your employer is encouraging you to integrate AI into your workflows. 

How much of an impact is this actually having on the outcomes we care about most? Amidst the noise, where can we look to answer the questions that are worth asking? Economists are trying their best to measure the macro-level impact AI is having on productivity, and to understand how the job market is adjusting (the outlook for junior-level jobs is looking grim). Beyond these big-picture questions lie many equally important but more targeted questions: do specific AI interventions cause an improvement in outcomes? We have a familiar toolkit to answer these questions: randomised trials and other rigorous evaluation methodologies can help us understand when AI works, why, and for whom. 

We already know that it’s important to use these evaluation methods to test whether interventions lead to improvements in outcomes we’re interested in. Evidence-based organisations have cited ‘Scared Straight’ (programmes that take young people at risk of offending on visits to prisons to discourage future criminal behaviour) for years as a cautionary tale. Scared Straight caught the popular imagination and seemed to work. In reality, the evidence suggests that this programme was having a significant negative impact

Testing the impact of AI tools is in its infancy, but we already have a comparable example that tells a similar story. In one randomised trial, experienced developers were randomly assigned to complete coding tasks either with or without AI tools. The developers using the tools thought that they were 20% more productive. The study found that they actually took 19% longer to complete tasks. AI seemed to the developers to have helped, but it actually made things worse. Elsewhere, on the other end of the spectrum, AI tools have proved to be incredibly valuable - a World Bank randomised trial that evaluated an after-school programme in Nigeria, where students used an AI tutor, found significant positive effects on students’ scores. The authors concluded that the programme “ranks among the most cost-effective solutions for addressing learning crises”. 

As with any other intervention, AI tools must be rigorously evaluated to understand which ones genuinely deliver impact.. The homelessness sector is not immune from the enthusiasm around AI tools, and a huge number of promising ideas are already being considered - indeed, the Centre for Homelessness Impact (CHI) is already well into a trial to test an intervention that uses machine learning to predict who is at risk of experiencing homelessness. But without testing these ideas, and exposing them to the most rigorous standards of evidence, we won’t know whether they’re really making the difference that we want to see. 

With funding from the Evaluation Accelerator Fund from the Cabinet Office Evaluation Task Force, we’re testing one of these ideas. The problem we’re seeking to address is important: people often don’t seek advice around housing issues until they’re already at a crisis point. When they do seek advice, it can be difficult to know where to start, and hard to access services with limited capacity. 

We’re testing the impact of an AI chatbot designed to provide personalised housing advice to people before they reach a crisis point, acting earlier to prevent people experiencing homelessness and the harms that come with this. The chatbot can assess someone’s situation, offer tailored advice, and draft letters to landlords or the council. Unlike general-use AI tools like ChatGPT or Claude, the tool uses only trusted, authoritative sources, including Shelter, Citizens Advice and government guidance. We think it’s a really promising idea that can help us to address a big problem. To properly understand its impact, we’re running a randomised trial and evaluating whether using the tool causes a reduction in homelessness duties owed. 

This testing is crucial. Working closely with our project partners, our approach to testing AI is just like testing other homelessness services. It’s not just about being cautious - it’s about developing our understanding of what works, so we can direct resources where they can have the biggest impact and improve people’s lives. AI may well be a valuable tool in helping to prevent homelessness, and given its pace of growth and potential for scaling, we’re optimistic that it will be. However, before these tools become embedded in homelessness and housing services, we need to know if they actually work.

You can find out more about our trial using an AI chatbot to provide housing advice here

← Back to News