Can AI be your budget analyst?
The CEO of Anthropic, which makes the Claude large language model, recently said that AI could eliminate 50% of all entry-level white collar jobs.
In public policy, this would mean that the analysts and junior staffers who dive into complex areas of legislation might be out of work, knocking out the first rung on an already difficult career ladder.
But, this depends on whether or not an AI model could be a replacement for a good analyst. Can it do the job with the latest models, or will it hallucinate nonsense, causing immense problems down the road when you testify to a committee about a non-existent line item.
Running a test
To test this, I asked ChatGPT’s o3 reasoning model (it’s best model for research) to tell me about the differences in the Massachusetts House and Senate budgets in three areas: climate and environmental policy; transportation; and health care. The reports are in the links. I then also asked ChatGPT to assess the transportation memo and identify how accurate its assessment.
Was it accurate?
I’ll leave it to those more expert in the subject matter than me evaluate the accuracy of all parts of the report, but from my first read through, it appears like it is a pretty well-written report. The sections are written with citation to the actual documents, it quickly compared the different elements of each budget, and it supplemented the research with sources from the open web, including press releases and analyses from advocacy groups.
Here’s the model’s assessment of its own work:
In summary, the policy brief appears to correctly outline the major differences between the House and Senate FY2026 budgets on transportation funding: The House pumped significantly more money into the MBTA (prioritizing urban transit improvements and agency workforce development), while the Senate steered more funding to RTAs (aiming for fare-free regional buses and greater geographic equity). Both budgets used the new Fair Share surtax revenue to boost transportation overall, but the House deployed funds more aggressively, whereas the Senate was more cautious – even setting funds aside for the future. Core areas like MassDOT/highway funding were similar, and neither chamber deviated much on smaller areas like aeronautics or maritime programs. Any discrepancies noted (such as the exact dollar figures for the MBTA or RTAs) seem to stem from different reporting conventions, but the broad strokes in the brief are likely accurate. The brief’s comprehensiveness is strong if it covered all the above categories; any omissions (for example, failing to mention the Fair Share context or the rationale of regional equity) should be remedied to fully inform readers. Overall, the House and Senate budgets shared the goal of improving transportation across Massachusetts, but they balanced priorities differently – and the brief’s job was to capture those nuances. From this analysis, it aligns well with the actual budget documents and provides a useful comparative overview, with only minor gaps that could be filled with the additional context noted above.
Overall, AI was able to give me a good sense of what was happening in a particular issue area, drawing insights out of extremely long and tedious documents, in 15 minutes. It seems like it can do the job of a junior analyst in a fraction of the time and for a tiny fraction of the salary.
But don’t fire people just yet
The AI discourse in media tends towards a binary: AI is a fraud or AI will replace us all. The reality is that there are certain tasks that current LLMs can do very well, like software engineering, and others that it can’t at all, like be creative.
While AI did this task well, it was also the most AI-able part of a policy analyst’s job.
I gave it two long PDFs and asked it to summarize the differences in the two. That is a very literal task, and does not require much more than being able to break down the text, contextualize it, and return a synopsis. It is similar to every meeting assistant program that emails you a summary of a Zoom call, just with a different input.
Notably as well, this is a task that could also draw on the previous work of other policy analysts who published their analysis that could become an input for the model. If we were working on a single bill, or something that did not have as much material as the annual budget, the results would likely have been worse.
So, should we use AI or not?
My sense is that AI is going to become a copilot for the policy industry. When a budget is posted, I expect that one day analysts will load it into a model like o3 and find out what’s in it through a chat interface. It’s not that AI will replace the analyst, but that AI will replace keyword searching. I also expect that analysts may start to work by asking a model to generate a report, and will then edit and fact-check the report rather than writing everything from scratch.
Most of all, I expect this to be used at smaller firms and non-profits who don’t have a dedicated policy analyst. If you already have someone on staff, then whether they produce a brief in a few hours or a day is not much of a difference. If you don’t have anyone at all, having the budget explained quickly versus an executive director trying to figure everything out on there own is tremendous.
Bottom line: if you work in policy, you should be familiar with what AI can do for your specific use cases, and be willing to experiment with it in closed settings, like explaining a budget that you are going to read anyway. But there still is no replacement, yet, for someone who can read a legislative proposal, connect it to other issues and the political environment, and give guidance on what it truly means and whether it will pass.
At Legislata, we’re building a platform to help human analysts and journalists connect to a public policy audience, so please click the link to sign up for a demo and learn more how to use it.