OSCON 2025 Regular Session Presentation Abstracts

Find below the abstracts for OSCON 2025's Regular Session abstracts.

March 24 Morning Session

Three tracks in the amphitheater, room 308, and room 311.

Building Stronger Communities for Population-scale Open Source Technology

11:00am - Amphitheater

Presented by Shailiza Mayal

Population-scale technology impacting billions globally has Digital Public Infrastructure (DPI) at its core, which, when well-designed and implemented, helps countries achieve their priorities and accelerate Sustainable Development Goals. Powering DPI are Digital Public Goods (DPGs)—open-source building blocks that are interoperable, modular and scalable. However, the long term sustainability of DPGs relies on vibrant, engaged developer communities that enhance product quality, drive innovation, ensure transparency, and support widespread adoption. Currently, DPG builder organisations are at varying stages in their journey towards community development, demonstrating different levels of maturity. Code for GovTech (C4GT) is an initiative that enables development and long term maintenance of open-source products that are creating population scale social impact–DPGs and beyond–by facilitating pathways for young talent to contribute to these products through community building. In addition to running multiple programs for contributors and DPG builder organisations, C4GT has launched the DPG State of the Community Report (DPG SCoRe) which provides valuable insights into successful community building. This talk will focus on sharing the 1) foundational principles that define DPGs and DPI and how they hold the potential to impact billions, 2) the different pathways through which interested contributors can support the development and long-term maintenance of these open source tech building blocks and 3) the community building insights and best practices from the DPG SCoRe 2024.

The Early Years of Scientific Python

11:00am - Room 308

Presented by Paul Barrett

Scientific Python began nearly 30 years ago as an effort to improve the quality and productivity of astronomical software. Most astronomical data analysis was performed using command line applications that were cumbersome to use. There was an effort in the astronomical community at this time to move away from applications that were developed using a statically compiled language toward a more agile and extendable interpreted programming language that would be easy to learn, yet powerful enough for scientific programming. A key element of the language was the inclusion of multi-dimensional arrays. This talk will present the history of how Python became that language. It will cover the formative years before the development of AstroPy in 2008. How a small group of scientists and engineers, primarily at the Space Telescope Science Institute in Baltimore, MD, began developing the fundamental set of software to do scientific data analysis. Those early developers never imaged that Scientific Python would become so successful.

surveydown: An Open-Source, Markdown-Based Platform for Interactive and Reproducible Surveys Using R, Quarto, and Shiny

11:00am - Room 311

Presented by JP Helveston and Pingfan Hu

This talk introduces the surveydown R package and survey platform. With surveydown, researchers can create reproducible, interactive surveys using markdown and code chunks leverage the Quarto publication platform. Unlike proprietary survey platforms that rely on graphical interfaces or spreadsheets to define survey content, surveydown leverages plain text formatting for survey design, making version control and collaboration more straightforward. Built on Shiny, the package renders surveys as interactive web applications, allowing for complex features like conditional skip logic, dynamic question display, and complex randomization. The package supports diverse question types including (but not limited to) multiple choice (single/multiple selection), dropdown menus, text inputs, numeric entries, and date selectors. As an open-source tool, surveydown provides researchers full control over their survey implementation while maintaining reproducibility - a crucial feature missing from common alternatives. The package integrates seamlessly with existing R workflows for data collection and analysis, streamlining the research process from survey design to data analysis.

The Future of Work: What Open Source Teaches Us About Distributed Collaboration

11:20am - Amphitheater

Presented by Ben Balter

For decades, open source communities have embraced a radically different way of working— one where collaboration doesn’t depend on where you live, what time you log on, or even what your job title says. These communities have thrived in fully distributed, asynchronous environments, solving big problems together long before remote work became a necessity for the rest of us. What lessons can students, organizations, and academia learn from these communities as we navigate the future of collaboration in a post-pandemic, hybrid-work world? In this talk, Ben Balter dives into the core principles and practices that have made open source collaboration successful: transparent and asynchronous workflows, and the use of shared digital tools. He will explore how these methods not only make collaboration (from group projects to global corporations) possible but also make it better. From open documentation to lightweight communication processes to fostering a collaborative culture, you’ll learn how to apply these lessons to teams of any size. Whether you're leading a team, building an organization, early in career, or preparing to enter the workforce, this session will provide actionable insights and real-world examples into how embracing open source principles can reshape the way we work—making it more inclusive, productive, and enjoyable for the digital age. Open source isn’t just a way to write code—it’s a blueprint for how we can all work better together in the digital age.

Sourcing Open Source: Building an Inclusive Computational Pedagogy in GW LAI's Python Camp

11:20am - Room 308

Presented by Dolsy Smith, Daphna Atias, Emily Blumenthal, Marcus Peerman, and Max Turer

The fundamental intuition behind the open-source movement – that software is better when it is the outcome of an open, collaborative, accessible, and inclusive process – implies that the professional cultures of software development ought to embody the same principles. But the barriers to access and inclusion in this profession are well-documented, as are the limitations of the pedagogies that prepare people to enter the profession. Traditional approaches to teaching programming too often embody values of competitiveness and scarcity, prizing the success of individuals with the resources to acquire high levels of skill and confidence. At GW LAI, we encounter folks from across the disciplines who need to write code: in order to complete coursework, conduct research, automate a tedious task, or get a job. How do we support these learners in ways that emphasize the principles of openness we promote in the software that we write? Through GW LAI's Python Camp, we strive to create an inclusive and accessible experience for those who want to learn to code, especially those who might struggle with or feel excluded from other opportunities. We do so by centering collaboration and active learning, providing learners with a framework in which they can work together to solve realistic problems using code. Our curriculum is open-source, and it is designed to be replicable and scalable. In this talk, members of the Python Camp team will talk about strategies we use to engage learners and to keep our work as instructors sustainable.

Fostering Student Research Engagement Using Open-Source

11:20am - Room 311

Presented by Mike Sanders

AI revolution has led to a rise in student focus on software as a tool for learning in even the most technology agnostic fields. However, many lack the skills to engage with code heavy applications making many software options an off the shelf black box tool with little in the way of customization options. New AI integrations with low cost coding platforms, like google colab, have greatly reduced the learning curve to code even advanced applications opening software solutions to even novice coders. This has also led to the rapid rise in student led, collaborative software packages being published as open-source development projects. As part of GWIPP's LAiSER project, we engage dozens of student researchers from across disciplines to conduct real research that leverages generative AI to support advanced coding for public good. Our open-source skills extraction and mapping software is built on student ideas implemented using generative AI code. From first semester policy students with minimal coding experience to experienced data science coders, our project demonstrates how fostering interdisciplinary research discussions can lead to advanced research. In this talk I will discuss how we have approached integrating student research with LAiSER and what we have learned in the process. I will also suggest some "best practices" we have implemented to ensure that students receive recognition for their work even in a research environment driven by PI funding models. Finally, I will explore our plans for the future of open-source student research.

What is the Lyrasis Organizational Home?

11:40am - Amphitheater

Presented by Bridget Almas

Community-supported open-source technologies are critical components of open infrastructures for digital cultural heritage. The primary goal of the Lyrasis Organizational Home is to enable and support the communities that are working to build and sustain these technologies. In this talk I'll talk about how the Lyrasis Organizational Home is structured, its core principles and values, and the important role of program governance and community.

Open Source Tools, Open Weight Models, and Affordable Hardware: Generative AI Without Breaking the Bank

11:40am - Room 308

Presented by Glen MacLachlan

The availability of generative AI, beyond the scope provided by subscription models, is having a large impact on the way research computing and data (RCD) facilitators equip researchers with access to resources and services and provision for the future. Proper planning requires matching current and anticipated research workloads with trends in a rapidly developing marketplace to ensure adequate provisioning of hybrid (cloud/on-prem) infrastructure. Open source and open weight software and models expand the set of solutions RCD practitioners can implement but also present their own set of challenges. Significant developments in multi-modal models that shift more emphasis onto test-time compute (TTC) and raise the operational cost of inferencing, have shortened our horizon, obscuring the prediction of what workload requirements may look like over the next 3-5 years. This talk will focus on the new frontiers in generative AI and the solutions and challenges from an operational perspective. We’ll highlight the Fall 2024 student Noise to Nouveau Exhibition at the Corcoran School of the Arts & Design that featured displays created with the assistance of open source tools and open weight generative AI, hosted at The George Washington University.

Civic Tech DC: Open Source for Public Good

11:40am - Room 311

Presented by Michael Deeb

Michael Deeb’s talk chronicles the transformation of Civic Tech DC from a dormant meetup to a thriving, volunteer-driven nonprofit that unites technologists, designers, and policy experts to build open-source solutions for civic challenges in the DMV area and beyond. The presentation details how the organization has adopted robust open-source practices—not only in repository management but also in fostering inclusive onboarding, managing project lifecycles, establishing community governance, and implementing progressive open licensing and data sharing models. Michael highlights several case studies to illustrate these strategies in action. Attendees will learn about Mango Tree, a tool for disinformation analysis; ElectrifyDC’s Vendor Portal, which simplifies access to clean energy solutions; an election data dashboard that enhances transparency in the voting process; and an AI Vision & OCR System designed to reduce the costs of running elections. Beyond these projects, the talk examines Civic Tech DC’s role in the broader national and global civic tech movement. Michael discusses how the organization both contributes to and benefits from collaborative working groups and the exchange of best practices. Participants will leave with actionable strategies for building sustainable, community-driven open-source projects, effectively managing volunteer contributions, and fostering cross-sector collaboration.

March 24 Afternoon Sessions

Three tracks in the amphitheater, room 308, and 311.

Survival Guide for Software Engineers

2:30pm - Amphitheater

Presented by Mio Diaz Santiago

Let's be real, navigating the world of software engineering can be interesting. While the tech industry keeps rapidly evolving certain challenges are still present for those in underrepresented groups. Drawing from Mio's real world experiences, this talk aims to equip any engineer who has been "the only {insert_identity_here}" on the team with strategies needed to not just survive, but thrive.

Practical lessons from nine years of project maintaining

2:30pm - Room 308

Presented by Colin Fleming

The DARIA team has been running an all-volunteer open source rails web application in production for nine years. In this talk, Colin (one of DARIA's core maintainers) will talk about practical lessons from the past nine years, including how we've managed contributors, made technical and scoping decisions, addressed legal concerns, and structured the project to run in the long term.

Executable science: reproducing neuroscience analyses through collaborative open source tools and practices.

2:30pm - Room 311

Presented by Kristijan Armeni

Modern science relies on computational tools. In this interdisciplinary convergence, researchers are increasingly expected to share reproducible analysis code. In this talk, we argue that open source software practices are invaluable for reproducible science. We share experience and lessons learned from a small-scale reproducibility project in an academic neuroscience lab. The goal of the project was two-fold: i) to reproduce a set of published results in computational neuroscience ii) to do so while adopting a set of software development practices and using a suite of open-source analysis and publishing tools. We f irst outline the components of our workflow (the Python scientific stack, collaborative version control, code formatting, automatic code documentation parsing, code packaging, publishing interactive reports via Myst Markdown). We then discuss the main takeaways from our reproducibility attempt. First, while we succeeded in reproducing aspects of original results, discrepancies remained. The code and data, despite being openly available, were lacking documentation in critical parts, forming a barrier to understanding and thus reproducibility. Second, software development practices and tools, when adopted, are tightly coupled with scientific work; we frequently found ourselves reconsidering core analytical choices due to specific stages of our development workflow (e.g. deploying code documentation). Third, we find that choosing the right workflow tools, balancing the trade off between sophistication and achievability, is an important skill. In conclusion, certain software development practices should be an integral part of scientific analysis and publication workflows to equip researchers with the ability to deliver on open science promises.

Drupal: More Than a CMS—A Community, a Career, a Cause

2:50pm - Amphitheater

Presented by Nneka Hector

Drupal is more than just a content management system—it’s an open-source powerhouse driving some of the most impactful digital experiences in higher education and government. But why should you get involved? In this session, we will explore what Drupal is, how it fosters a thriving global community, and how you can get involved. I'll share insights on how Drupal is being used and how you can b a part of shaping its future. New to Drupal or looking to deepen your involvement, this session will provide valuable takeaways on the opportunities, resources and connections available.

Civic Tech: Making Ballot Access More Equitable

2:50pm - Room 308

Presented by Andrew Shao

The Ballot Initiative Project is an open-source civic tech initiative sponsored by Civic Tech DC that leverages OCR (Optical Character Recognition) and fuzzy matching to automate the process of verifying ballot petitions for local jurisdictions. Ballot initiatives empower citizens to directly shape government policy by voting on new laws and amendments. In Washington, DC, successful initiatives have included minimum wage increases and, most recently, rankedchoice voting. However, in DC and many other jurisdictions, getting an initiative accepted is a costly and labor-intensive process. Names, signatures, and addresses must be manually collected and verified against voter registration data to confirm eligibility. Opponents can challenge signatures, potentially causing petitions to fail. The Ballot Initiative Project streamlines this process using OCR technology to enhance verification, reducing costs and effort. By lowering the financial barrier for successful ballot petitions, the project aims to make civic participation more accessible and diverse. More time and effort can be spent getting signatures, and less on verification, which will lead to a larger number of signatures that cannot be challenged. While the project is currently designed around DC's ballot system, its architecture is adaptable, allowing other jurisdictions to implement the technology with minimal code modifications.

Reproducible Data Science with Open Source Tools

2:50pm - Room 311

Presented by Jay Qi

Reproducibility is essential in data science. Analyses and models must be reliable and verifiable to justify the investment of time and resources. Without reproducibility, even the most sophisticated work risks being non-actionable. In this talk, Jay will discuss key principles and best practices for making data science work reproducible, from applying software development concepts to structuring data flow through an analysis. He will highlight open source tools and frameworks, such as cookiecutter-data-science, that help facilitate and codify best practices. Attendees will gain practical strategies for improving the reproducibility of their own projects, making their work more robust, shareable, and impactful.

Open-Source and Participatory AI as Civic Science

3:10pm - Amphitheater

Presented by Alexa Alice Joubin

Generative AI is not a reliable stand-alone tool for the creation of new knowledge; its outputs are not reproducible and not interoperable. This paper explores open-access and opensource participatory AI as a new strategy to enhance the trustworthiness of AI in humanistic higher education. Through the author’s pilot AI project, she argues that open culture cultivates trust by allow all stakeholders equal access to processes. Trust is often defined as merely the verifiability of explicit outputs from AI systems. In fact, trust, resides in deeper social connections. To test this hypothesis of trust and to promote open culture, the pilot AI platform offers faculty and students access to the proverbial engine room, with an opportunity to adjust the training parameters for trustworthy outputs. The scholarly significance of this project lies in its modeling of civic science and open culture, which fosters accountability and trust in ethical and sustainable education. Civic science is a partnership between stakeholders and scientists-as-citizens to co-create technologies for public good. Some universities have subscribed to enterprise versions of AI. Subscriptions to these systems do not cultivate open culture and ethical human-AI collaboration. They promote profit-driven corporate culture rather than civic science and participatory justice geared toward public interest technology. Our participatory AI does not merely deploy the frontend but also invite educators and students into the proverbial engine room to adjust the AI’s training parameters, which develops meta-cognition and AI skills.

Restoring Trust in the Digital World: A Tiered Approach to Identity

3:10pm - Room 308

Presented by Michael Deeb

The digital landscape is evolving at breakneck speed. With AI-generated content, deepfakes, and bot-driven disinformation, our ability to discern genuine identity from synthetic fabrication is under threat. Traditional identity systems force an untenable trade-off between robust verification and user privacy. Enter the Tiered Privacy and Identity Verification Framework (TPIF) — an ambitious, open-source proposal designed to rival projects like Worldcoin by harnessing established standards alongside breakthrough computational advancements. TPIF leverages cutting-edge technologies such as fully homomorphic encryption (FHE) via CUDA processing—a capability that has transitioned from feasible to performant in the past six months. By integrating advanced cryptographic tools like Zero-Knowledge Proofs and Secure Multi-Party Computation, and concepts such as Decentralized & Self Sovereign Identities, TPIF proposes a distributed, transparent, and privacy-oriented framework that avoids locking users into any centralized authority while allowing for the anonymity that is the underpinning of the internet. Crucially, TPIF’s tiered design is informed by behavioral psychology. This approach not only ensures secure data handling but also creates distinct tiers where true anonymity is respected —allowing market forces to naturally prioritize and reward genuine interactions. Whether it’s facilitating safe pseudonymous engagement on social networks or meeting the rigorous demands of compliance-critical financial environments, TPIF’s multi-layered strategy aligns technical innovation with human behavior, cultivating an ecosystem where trust is built organically. Join Michael Deeb for a fast-paced exploration of how TPIF can restore digital trust— empowering individuals to control their online identities without sacrificing privacy or authenticity, and fostering an environment where genuine, community-driven interactions thrive.

Positioning the R in Open Source: RStudio + GItHub + renv workflows

3:10pm - Room 311

Presented by Daniel Kerchner

In this session, you'll learn how to take your R coding work and transform it into an open, reproducible project that other researchers can smoothly clone from GitHub, run, and get the same results, and your published paper will have a "Source Code" link you can be proud of. We'll use available features in RStudio for working with Git, as well as the "renv" package. Anyone working in R who hasn't yet integrated Git and/or renv into their workflow can benefit from this session.

March 25 Sessions

Two tracks in the amphitheater and room 309.

LaborLedger: A Decentralized Solution for Fair Work and Supply Chain Transparency

1:00pm - Amphitheater

Presented by Emmanuel Teitelbaum and Tejaaswini Narendran

Millions of workers face exploitative labor conditions due to a lack of transparency and weak regulatory enforcement. Issues such as unpaid wages, forced labor, child labor, and unsafe working conditions persist because work arrangements are often undocumented and unenforceable. In response, multinational companies concerned about reputational and regulatory risks, along with governments and international organizations focused on labor rights and sustainable development, have sought to improve transparency in global supply chains. LaborLedger is an open-source blockchain-based decentralized application (dApp) designed to support these efforts by providing secure, automated payments linked to verified work completion. The system reduces the risk of wage theft by leveraging smart contracts to guarantee that wages are disbursed instantly and transparently. Oracles for GPS, image, weight, and time-based verification ensure accountability by providing objective proof of work completion. A dual DAO system (WorkerDAO and EmployerDAO) governs arbitration, compliance enforcement, and industry standards, while a reputation-based trust system promotes ethical labor practices. In addition to payments, LaborLedger includes a grievance system, an audit-driven compliance framework, and a privacy-preserving survey mechanism for assessing workplace conditions. LaborLedger is an evolving project focused on building a decentralized framework for fair labor practices. Our talk will showcase key features of our MVP, discuss future development plans, and explore potential applications. We will also examine how the platform can be adopted in contexts where labor regulations and enforcement mechanisms are weak or absent, highlighting the role of international labor NGOs, multinational corporations, and initiatives such as Global Framework Agreements in ensuring accountability.

Supporting Access to Public Benefits via Open-Source AI Tools

1:00pm - Room 309

Presented by Martelle Esposito

Nava Public Benefit Corporation will share its approach and results from prototyping and testing open-source, generative-AI-enabled tools for various use cases to help families navigate and enroll in public benefits. Tools include: an assistive chatbot that provides plain-language answers to questions about complex policy rules, a document analyzer that identifies and extracts key information from uploaded PDFs, and a tool that summarizes call notes and generates a list of next steps. All of the use cases are focused on assisting and empowering staff such as caseworkers, callcenter specialists, and community health workers—reducing administrative burdens on them and allowing them to focus on the human aspects of their jobs. Nava will highlight its principles for responsible AI product development, and how we are collaborating with partners to identify sustainable adoption models that maximize access across settings and technical capabilities.

Mustering Open Source Volunteers to Tackle Online Disinformation Campaigns

1:20pm - Amphitheater

Presented by Ben Sando and Helen Glover

The CIB Mango Tree project will showcase open source tools to uncover online information manipulation. The project was founded in summer 2024 out of the Civic Tech DC open source community, and aims to empower non-technical disinformation researchers with the tools to uncover inauthentic behavior in social media datasets. Using CIB Mango Tree tools, researchers can detect the "low-hanging fruit" of inauthentic behavior, such as copy-andpasted text. Successful open source projects must be structured so that short-term volunteers, from engineers to designers and networkers, can make individualized impacts. CIB Mango Tree organizers will explain how they build a modular workflow so that volunteers can cycle in and out of the project. We will also share how we draw from the resources provided by the Civic Tech DC open source community, including the Civic Tech DC project night events. We encourage students of any background to get involved in the CIB Mango Tree and the Civic Tech DC community.

Snapshot of research software development practices in the NIST Materials Science and Engineering Division

1:20pm - Room 309

Presented by Jonathan Guyer

The Material Science and Engineering Division (MSED) of the National Institute of Standards and Technology (NIST), like many similar research organizations, develops software that ranges from internal analysis and plotting scripts to formal software products for release to the public. We conducted interviews with the 30 projects that compose MSED to better understand the role that software development plays in different parts of the Division. These interviews are a first step to understanding how we can raise the quality of software products, encourage the release of research scripts and codes, and foster software development collaboration within the Division and with outside partners. When approached to discuss their research software practices, many demurred. Many in MSED do not see themselves as software developers; they build and operate experimental measurement apparatus or they run simulations on software developed elsewhere. Even so, the majority have a collection of scripts for data acquisition, analysis, and plotting. To engage as many as possible, the emphasis of the interviews was shifted to focus on workflow. Whether implemented in software or as a written list of instructions, every research process involves workflow. This talk will discuss commonalities and pain points in existing workflows. The hope is to identify opportunities to automate and share workflows, via software, both within MSED and with external stakeholders.

Freezing Saddles: Open Source DC Winter cycling game

1:40pm - Amphitheater

Presented by Richard Bullington-McGuire

You might consider folks who bicycle outdoors in the winter of being obsessive - but the DC area has a unique group of cold-weather cycling enthusiasts who rely on Open Source software to facilitate their Reindeer Games: the Freezing Saddles Winter Cycling competition. Richard Bullington-McGuire did not write the software originally, but 8 years ago he volunteered to take it over from the original author, and has since maintained and extended it with the help of a merry band of volunteer programmers. In this talk Richard will explain the history of the site, the technologies involved including Python and Docker, and how adopting DevOps practices has radically accelerated the evolution of the site. Interested people will get a look at how the site is built and deployed and how they can contribute.

Econ-ARK: Open-Source Tools for Cutting-Edge Economic Research and Industry Applications

1:40pm - Room 309

Presented by Alan Lujan

Dr. Alan Lujan will present Econ-ARK, an open-source software project designed to accelerate and democratize computational economics. This presentation will introduce HARK (Heterogeneous Agents Resources and toolKit), the core Python toolkit within Econ-ARK, and demonstrate its utility for both academic researchers and industry practitioners. The session will explore how HARK simplifies the process of building, solving, and analyzing complex economic models, particularly those featuring heterogeneous agents making dynamic choices. Key functionalities to be highlighted include HARK's pre-built modules for handling numerical methods, its capacity for modeling diverse economic agents, and its focus on creating reproducible research through the "REMARKs" framework. Dr. Lujan will illustrate the practical applications of Econ-ARK and HARK across various domains. For academic research, he will showcase examples of using HARK to investigate wealth inequality, household finance, and macroeconomic policy impacts. For industry, the presentation will demonstrate how Econ-ARK can be applied to solve real-world problems such as portfolio optimization, risk management, and economic forecasting. The talk will also emphasize the advantages of Econ-ARK's open-source nature, promoting transparency, collaboration, and community-driven development in economic modeling. Attendees will leave with a clear understanding of how Econ-ARK and HARK can empower them to conduct cutting-edge economic analysis and address pressing economic questions in both research and industry settings. The broader implications of open science and reproducible research in economics will also be discussed.