AstroCoder
Try out AstroCoder here.
At UniverseTBD, we are investigating how astrophysics research will benefit from this AI revolution. We developed AstroCoder to investigate how effectively current AI models can use existing astronomy tools. This report details how we designed this application to help users discover and work with 2,270 repos from the Astronomy Source Code Library (ASCL).
A large part of astronomy relies on custom software built for niche problems. Yet the main platforms that host information about this software, like ADS and ASCL, are not designed for specialized searches based on tool functionality, task focus, or cross-domain applications. AstroCoder helps solve this problem by providing AI-generated summaries, installation guides, and examples, along with a specialized chatbot that makes it easier to discover and work with astronomy software. This approach can not only help us better understand packages, but also drastically improves the way we can search and discover packages that can solve niche astronomy problems.
The coding ability of AI is improving dramatically and we are interested in how it will affect scientific research. METR measures AI progress by the length of coding tasks (based on human expert completion time) that AI can successfully complete. While current AI reliably handles tasks taking humans ~1 hour, AI is projected to soon complete coding tasks that require ~days or ~weeks of human effort. This progress strongly points towards AI systems becoming powerful automated research assistants aiding human scientists. To prepare for this future, we are evaluating AI’s ability to work with domain-specific astronomy tools, while simultaneously building a public resource to make astronomy software easier to find and use.
This project aligns with the new wave of AI tools helping astronomers work with domain-specific code, alongside cmbagent, ChatGaia, and CAMELS Agent. While these specialized agents focus on specific domains or codebases, AstroCoder is built to help researchers find and use code across all areas of astronomy. Its broad scope could make it easy to discover techniques from different subfields and find lesser-known software.
Here's a sped-up example that showing AstroCoder recommending astronomy tools for simulating orbits in the Milky Way:
Since we have not yet comprehensively reviewed the generated content or conducted benchmarking, this beta release will allow us to identify issues and prioritize improvements based on community feedback. We greatly appreciate any feedback submitted via nolan.koblischke@astro.utoronto.ca or through this Google Form.
Key Features of AstroCoder
Auto-generating documentation
Starting with Github entries from ASCL, we found substantial variability in existing documentation across repositories. We go about standardizing this library of software. To do so, we generated a summary, installation guide, and examples for every codebase.
We collected the code for every repo with repomix and found that 58% of repos exceed OpenAI's o3-mini’s 200k token context window. We address this by performing vector-based retrieval using keywords such as ‘summary overview readme documentation’, ‘installation config setup configuration’, and ‘example demo tutorial’ to extract the relevant code segments. We then provide these contexts to the language model, along with specific instructions to generate each type of documentation. The prompts guide the model to produce a concise technical overview for the summary, comprehensive setup steps and dependency information for the installation guide, and diverse, runnable code snippets demonstrating core functionality for the examples.
The Chatbot
Our chatbot is built on-top of language models from OpenAI and a user interface from Chainlit, an easy to use Python chat interface. We host the Chainlit app on Render and embed it on a website hosted on Github pages.
When a user sends a query to our chatbot, GPT-4o decides whether to direct that user query to our Search assistant or Repo Agent. We separate these agents due to the constraints of context windows, since codebases can be very large.
Search Assistant
We perform vector search over these AI-generated summaries of repositories using embeddings from text-embedding-3-small
from OpenAI. The most relevant repository summaries are retrieved and injected into a system prompt, which is then provided to GPT-4o to suggest repositories related to the user’s query.
Repo Agent
We experimented with Retrieval Augmented Generation, where the code with the highest relevance score for a user query (as determined by a text embedder) is retrieved and provided to the AI. However, we found these relevance scores to be unreliable for codebases. Instead, we decided to pursue a more human-like agentic approach, where the AI itself chooses what files to read. This method is used by state-of-the-art for coding agents (see LiveSWEBench leaderboard) like SWE-Agent, Github Copilot Agent, Cursor, and Claude Code.
We recommend trying these coding agents after using AstroCoder to find a repo of interest, as they are precursors to the automated research assistants to come. In fact, the entirety of the AstroCoder app was written with the help of Cursor.
Limitations and future work
Throughout our development of AstroCoder, we encountered several limitations related to AI reliability and context management. First, retrieval-based relevance scores were unreliable for codebases. This drove us to adopt a more agentic approach for our Repo Agent. However, we still employ Retrieval Augmented Generation (RAG) for initial summaries, installation guides, and examples. Moving forward, we intend to explore whether an agent-driven generation method may yield superior quality, especially for generating examples.
Even with plenty of code context, we find that o3-mini still hallucinates function names, argument names, and other details. This leads to frequent errors in the generated code examples, and part of this beta release is to have the community help uncover and report these issues.
Additionally, we found that o3-mini frequently forgets to call tools and instead rushes to answer the user, prompting us to rely on GPT-4o as the intermediary handoff agent. To address this, we implemented a structured scaffolding approach for our Repo Agent, forcing o3-mini to first read five files from the repo with tool_choice='required'
, document any lingering uncertainties, and then read another set of five files to resolve these uncertainties before providing a final answer. This structured analysis incorporates an notepad
tool, allowing the agent to explicitly record its findings, open questions, and plans before reading subsequent files.
Overall, we find that explicitly structuring the AI calls to require a desired behaviour proved more reliable than relying solely on prompting. Looking ahead, we expect future AI developments to lead to stronger autonomy, enabling them to engage more persistently with tasks and only write a final answer when they are sufficiently confident.
Going forward, we can turn this into a reproducibility benchmark, testing whether AI can recreate the results in the paper that used a given astronomy tool. To summarize, at UniverseTBD we are preparing for a future where AI will have strong research capabilities. With this in mind, we would be thrilled to build human collaborations today to help create the AI collaborators of tomorrow.
Acknowledgements
This work was supported by the Microsoft Accelerate Foundation Models Research (AFMR) grant program. Furthermore, this project began at a hackathon at the NLP For Space Science Workshop at ESA/ESAC Madrid in September 2024. We also thank the ASCL team for their helpful comments and feedback, which improved the application.
This blog was written by Nolan Koblischke on behalf of UniverseTBD.
@misc{astrocoder-blog,
author = {Nolan Koblischke and Mugdha Polimera and Maja Jablonska and David Hendriks and Sergi Blanco-Cuaresma and Hilke Reckman and Ioana Ciuca and UniverseTBD},
title = {Astrocoder Blog},
year = {2025},
month = apr,
howpublished = {\url{https://blog.nolank.ca/astrocoder}}
}