No. 03 · Technical
Git Insights
We pointed AI at the whole of Alberta's code estate to get the ground truth. Here is how it works and what it found.
Abstract. Alberta holds its code assets in GitHub Enterprise, but the estate had grown into a mire where prototypes mixed with production and no dataset linked to the next. Leadership could not get a definitive answer about what we run or how healthy it is. So, we built Git Insights, an agentic tool that recursively scans the entire estate and reports the ground truth of every repository. Fifty agents read 466 million lines of code in roughly 20 hours, mapping the digital landscape and capabilities of government. This paper explains the problem, shows in detail how the tool works, and reveals what the scans found. The analysis returned the languages and frameworks we had never counted, the repositories with no tests or documentation, the contributor patterns that quietly degrade security, and the first real benchmark of how AI compares to humans on our own code.
Like many large organizations, Alberta uses GitHub Enterprise as its primary code repository. GitHub provides powerful analysis tools for code scanning through features like CodeQL, and integrations with NPM and Maven for dependency vulnerability scanning. Secrets are automatically flagged and blocked so they can be removed, and our cyber team has well-integrated evaluations. The trouble is that these checks are mostly deterministic, logic-based rules, so they carry a high rate of false positives, and they say nothing at all about the things that matter just as much: the absence of documentation, automated build and deployment pipelines, and automated unit tests. Further, it was impossible to get an overall health check on our government systems. To properly assess the current state of our technology, we needed to look deeper. ## §01 The data we could not see Putting the whole picture together for a large organization can be challenging. In their default state, our primary systems of record could not be linked together. Dependency information from GitHub did not flow into our Configuration Management Database. That database lived in ServiceNow and was maintained by hand, with more than 100,000 entries covering these applications. The records for our applications were scattered across GitHub, Jira and Confluence tickets, incident trackers, SharePoint documentation, and a scatter of spreadsheet inventories. Integrating this unstructured data was historically impossible, because there was no linking key to tie one dataset to the next. CMDB entries maintained by hand 100,000+. Kept in ServiceNow, with no automated flow from GitHub. The logical structure runs Ministry to Project or Program to Application to repository, and fans out from there into sprints, the CMDB, tickets, and documentation, none of it linked. As a result, management at the Deputy Minister and Assistant Deputy Minister levels could not get meaningful answers from standard tools, and staff were struggling to provide insights. GitHub itself was also at risk of turning into a mire where quick prototypes sat beside production systems with nothing to tell them apart. We could not say with confidence what we ran, how healthy it was, or where the real exposure sat. So, we did something about it. ## §02 Building a scout Using Claude Code as the coding agent, the Opus and Sonnet models for the analysis, and Google Vertex (now the Google Agent Platform) as the agentic layer, Alberta built a tool called **Git Insights**. It is an agentic-first tool that recursively scans the entire GitHub estate. It functions as a reconnaissance scout, investigating every repository to report back what is actually there. Code forms the ground truth. Agents read in about 20 hours 466 million lines. A scan of the whole estate that no consultant engagement could match in time or cost. A third of the repositories had no documentation at all, so the agents wrote it, and every dependency was itemized. Timing was important for this scan. Alberta has seen significant increases in the number of known vulnerabilities across the estate. On our bug trackers, vulnerabilities increase with a significant inflection point which aligns with the release of Anthropic's Mythos, the latest Opus models, and OpenAI's GPT 5.4 and 5.5. The same capability that lets a tool like Git Insights read the estate at depth is what lets attackers find the flaws in it, a point taken up in detail in the cybersecurity paper. Within 4 weeks of the Mythos release, Alberta engineered our own powerful tools and methods to perform this analysis. ## §03 How Git Insights works Here is what Git Insights actually does, in plain terms. It works in two layers. At the top, a fleet of agents works through the whole estate at once. Inside each code repository, every agent runs the same fixed routine. A rules engine conducts a review of the code base and flags known patterns for deeper investigation, while the AI agent steps in to review and provide judgment. All insights are tracked and human auditable down to the line of code. The agents run on Google Agent Platform. Google worked closely with Alberta to ensure we had adequate throughput, increasing our capacity to 25 million tokens per minute so we could maximize analysis speed. We also built it to be resilient. If one scan fails or hits a rate limit, that single job waits and retries while the rest carry on, and if the whole scan is stopped it resumes exactly where it left off. That resilience is what lets it read 466 million lines across thousands of repositories in about 20 hours, pushing the limits of our processing capacity. A fair question is how far a government can trust what an AI reports. Three things keep it honest. First, the boring, countable work, listing files and finding the known-bad patterns, is done by ordinary code, which cannot invent a result. Second, the AI is never taken at its word: when it reports a problem it must name the exact file and line, say how serious it is and why, and say how to fix it, so a developer can open the file and confirm the claim. Third, it grades every repository against the same fixed checklist, covering security, code quality, architecture, documentation, tests, maintainability, and the health of the libraries it depends on, so a score means the same thing on the thousandth repository as it did on the first. Every finding the AI reports can be opened and checked against the real code. Insights are stored in a database, allowing meta-analysis by yet more agents run by the cyber and delivery teams to detect trends and horizontal insights. Each scan detects the old technologies hiding inside, such as COBOL, classic ASP, and long-unsupported versions of .NET and Java, and infers which ministry owns it. It itemizes every dependency a system relies on, which is the exact inventory our Configuration Management Database never had and can now be fed from. It flags the repositories that hold sensitive personal information, so a privacy risk becomes visible rather than buried. And for the 1,280 repositories that had no documentation at all, the agent wrote it from what it had just read, while validating or improving the README on every other repository. That closed the gap to zero: for the first time, every single code base in the estate was documented. The scans do double duty, revealing how each system works and what it does for citizens. The business function of each application was mapped into a business capability hierarchy, allowing us to categorize each application by function. The combined record of each repository goes far deeper than any inventory we had before. This also took stock of redundancies in function which we can target for rationalization. As noted in the next paper, we believe we can achieve in some areas a 10 to 1 reduction of our redundant code through AI-driven standardization of common functions. The results of this Git Insights analysis land in one database and are exposed through an executive dashboard. There is an executive view of health and risk, a nine-box grid that plots each system by how active it is against how healthy it is, code-health cards per repository, contributor profiles, and the disposition breakdown, all exportable to a document, a spreadsheet, or raw data. The deep cross-repository rollup, where hundreds of overlapping systems in a single ministry are collapsed into a smaller set of rebuilt capabilities, is its own engine and its own paper; see Git Insights Ministry. "For the first time, a Deputy Minister could ask a question about the estate and get an answer grounded in the code itself." · Paper 3 · Git Insights ## §04 What the estate actually looked like With the scan complete, we could finally measure what we had only ever estimated. The domain Git Insights mapped was wider than anyone had counted. The structural gaps were more revealing. Across roughly 3,400 repositories, the practices that make code safe to change were missing far more often than they were present. Every repository was scored out of ten on its overall health across code quality, security, tests, and documentation. Across the whole estate the average came out low, held down by the missing tests, pipelines, and documentation noted above. ## §05 Benchmarking ourselves We also measured the habits of our staff and contractors. We did not use this as a punitive or a performance measure. We measured it to find where ordinary human variance degrades our security controls and best practices, so we know where training and better tooling will help most. Across more than eight thousand one hundred contributors and eight years of history, the pattern was clear. These gaps are not the work of bad actors. They come from ordinary, hard-working people at varying skill levels. But over decades and thousands of people, clear patterns emerged which demonstrated clear structural gaps. Despite policy and clear process documentation, steps are frequently missed. Standards also change over time, and new standards like containerization and automated build pipelines have not made their way into legacy code bases. Individual contributors also demonstrated repeated steps missed in code hygiene which can be addressed through education. Underneath the risk tiers, Git Insights builds a full profile of every contributor, drawn from the record rather than from opinion. ## §06 Human + AI: comparison and partnership There is a great deal of criticism aimed at AI for the mistakes it makes. A lot of ink is spilled describing its potential for hallucination and bias. However, these stories reveal a strong recency bias in how we humans report on those failures. One wrong answer by AI is remembered, the ten thousand right ones are not. Such critiques of AI often make the more grievous mistake of failing to also consider human performance in an equivalent domain. Our findings were clear: with the proper controls described in the harness paper, the latest AI coding tools make fewer mistakes than humans on equivalent coding and analysis tasks. Git Insights gave us the benchmark to make this statement with evidence. The metadata harvested out of GitHub produced the first-ever large scale and longitudinal study in Alberta on human technical performance. We measured the real work of more than eight thousand people over eight years: every commit, every pull request, and the state of the code they produced, across the entire estate. This process gave us a benchmark to measure our future investments in AI speed, quality, security, and process. Humans and AI both make mistakes. And we are both creative problem solvers. The takeaway from this analysis is that technologies are changing, that skill levels between humans are more variable than between the latest models, and that working creatively with AI provides the best of both worlds outcome. With AI providing greater assistance in coding, human efforts shift to architecture, strategy, creativity, design, and crucially supporting human interaction and change management. Human creativity remains unmatched and integral to the process. An AI did not dream up Git Insights. That was 100 percent human driven. But it did make it possible to properly analyze a vast digital estate, for nearly no cost (less than two thousand dollars), and in only hours. In a properly formed partnership between the human and the AI agent, mistakes can be identified immediately, resolved rapidly, and forward velocity increased. The next paper takes Git Insights up a level, to the ministry scale, where hundreds of overlapping systems are read together and collapsed into a smaller set of modern, rebuilt capabilities with a costed plan attached. See Git Insights Ministry.
Tags: git-insights, code-analysis, agents, technical-debt, cybersecurity