Gov R Conference

Thanks for attending the 7th annual Government & Public Sector R Conference!

Stay tuned for the talk recordings, presentation slides, and photos from the event.

Agenda

Monday, Oct 28

08:15 AM - 08:50 AM

Registration & Breakfast
08:50 AM - 05:00 PM

Workshop: Better Development Practices with Large Language Models (LLMs)

Abigail Haddad & Benjy Braun

More details
In recent years, data scientists have increasingly adopted best practices from software engineering to improve code quality and project management. These practices are ideal candidates for leveraging Large Language Models (LLMs), as they are well-documented online and often involve tasks performed infrequently enough that memorization is impractical. This workshop will guide you through key software development practices tailored for data science. Participants will learn how to use LLMs to enhance their documentation, version control, and other essential tasks. The goal is to produce code that's easier to run, build upon, and understand, ultimately leading to more efficient and reproducible data science projects.

Workshop Highlights:
1. Writing Cleaner, More Readable Code: Learn techniques to improve code readability and maintainability, with LLMs assisting in generating clearer syntax and structure.
2. Improving Documentation: Discover how LLMs can help create comprehensive and understandable documentation, making your projects easier to use and collaborate on.
3. Using Git for Version Control: Gain proficiency in using git for version control, with LLMs offering support in managing branches, resolving conflicts, and maintaining a clean commit history.
4. Docker/Virtual Environments: Understand the benefits of containerization and virtual environments in development, and how LLMs can assist in setting up and managing these environments
5. Debugging and Error Handling: Learn effective debugging techniques and use LLMs to interpret error messages and suggest fixes Participants will engage in practical examples using either Python or R, exploring how LLMs can be integrated into their development processes. While LLMs do not produce perfect code instantly, they are invaluable for iterative development, particularly in data pipelines and analyses. We will practice effective prompting strategies to guide LLMs towards better solutions and explore their ability to interpret error messages and suggest fixes in data science contexts.
Requirements:
- Comfortable writing a function in either Python or R
- Laptop with Python or R and git installed
By the end of this workshop, participants will have a toolkit of best practices and the skills to utilize LLMs for enhancing their development workflows, leading to more efficient and error-resistant coding practices. This workshop is ideal for developers, data scientists, and analysts looking to integrate advanced AI tools into their everyday coding routines.

(In-Person & Virtual Ticket Options Available)
08:50 AM - 05:00 PM

Workshop: Dashboards and CRUD Apps: Managing Data For Your Organization

Maxine Drake

More details

This class focuses on working with your organization’s data from data collection to data management to data visualization. We will learn how to build a dashboard with Shiny, including dynamic calendars perfect for large-scale event tracking. We will also build a CRUD (create, read, update, delete) application that allows users to manage data themselves. In addition to these technical skills, we will cover concepts, such as multi-tiered architectures, modularizing code, clear data visualizations, and managing user permissions in your Shiny apps.

(In-Person & Virtual Ticket Options Available)

Workshop tickets sold separately

Tuesday, Oct 29

08:00 AM - 08:50 AM

Registration & Breakfast
08:50 AM - 09:00 AM

Opening Remarks
09:00 AM - 09:20 AM

Text Explorer

William E. J. Doane

Assistant Director @ IDA's Science & Technology Policy Institute

More details

Textual data constitutes a large, often unstructured data source that can be mined for insights about common themes or emerging topics of interest. Making tools available to non- or novice-coders that support data analysis best practices is a core goal of mine. This is challenging when your data set might contain confidential or sensitive information, or may involve dynamically changing content that's challenging to move around your network. Text Explorer was designed as a locally-running Shiny app to reduce the technical burden for users and allow them to quickly explore their data.
09:25 AM - 09:45 AM

Liberate the Summer Interns onto More Interesting Terrain: Harnessing NLP in R to Create the Concept of Operations for a Large Organization

Selen Stromgren & Evgeny Kiselev

U.S. Food and Drug Administration

More details

Large organizations can have large, byzantine document spaces which include standard operating procedures, work instructions, strategic plans, RACI charts, infographics, fact sheets, and workflows. Even relatively well-constrained document subspaces, such as quality system documents, can have multiple templates and be stored in hard-to-access quality information systems. When it comes to updating the concept of operations document for the said large organization, it is crucial that all functional aspects of organizational operation are harvested from dispersed document space and included in concept of operations. We found it useful to harness natural language processing (NLP) methods in R. In this talk, we will present our experience using NLP in R to ingest, analyze, and distill essential functional areas from almost 100 relevant organizational documents. Our approach required flexibility to read several document templates, disregarding redundant and ‘uninteresting’ language, plus employing OCR to capture text information from workflow diagrams and illustrative figures. Once the documents were ingested, word frequency and uniqueness were computed and PCA/other clustering methods were used to understand the similarity of language used in each document. Through NLP analysis we extracted main topics and keywords across organization documents, prioritized them, and captured them in the concept of operations document as main functional areas. “Reading” the document space with R was much more efficient than having human readers (summer interns?) digest the documents and subjectively extract primary functional areas. NLP using R provided a much more objective way to distill core information from large a document space and can be used by organizations to filter critical highlights and institutional practices from a repertoire of legacy documents.
09:50 AM - 10:10 AM

Finding Your Next Federal Data Job

Abigail Haddad

Machine learning engineer -- AI Corps @ Department of Homeland Security

More details

Hiring data scientists and other data/AI workers is a priority for the federal government, but that doesn't mean we make the application process easy. I'm going to talk about how to search for data jobs and write a resume that gives the hiring folks the information they need to accurately assess you. I'll also talk about the new DHS AI Corps, why I'm there, what's worked about our hiring process – and what I think you should be doing if you're on the hiring side in order to attract and hire great candidates.
10:10 AM - 10:40 AM

Break & Networking
10:40 AM - 11:00 AM

Slowly Scaling Per-Record Differential Privacy

Mikaela Meyer

Senior Data Scientist @ The MITRE Corporation

More details

The goal of differential privacy is to hide the presence, absence, or value of any particular record in a dataset. To do this, random noise is added to published statistics so that an attacker looking at these statistics will not know if a given record is in the dataset. If a statistic's distributions changes little with the addition, deletion, or alteration of a single record in the underlying dataset, the record's privacy is preserved. More influential records - those whose absence, presence, or alteration would change the statistics’ distribution more - typically suffer greater privacy loss. The per-record differential privacy framework quantifies record-specific privacy guarantees, but existing per-record mechanisms let these guarantees degrade rapidly (linearly or quadratically) with influence. While this may be acceptable in cases with some moderately influential records, it results in unacceptably high privacy losses when records’ influences vary widely,as is common in economic data. We develop formal per-record differential privacy mechanisms for releasing statistics from data with many outlying values, such as income data. These mechanisms ensure that a per-record differential privacy guarantee degrades slowly in the protected records’ influence on the statistics being released. We develop mechanisms with privacy guarantees that instead degrade as slowly as logarithmically with influence. These mechanisms allow for the accurate, unbiased release of statistics, while providing meaningful protection for highly influential records. As an example, we consider the private release of sums of unbounded establishment data such as payroll, where our mechanisms extend meaningful privacy protection even to very large establishments. We evaluate these mechanisms empirically and demonstrate their utility.
11:05 AM - 11:25 AM

Mapping Ever Larger Data with PostGIS, DuckDB, GeoArrow and deck.gl

Jared P. Lander

Chief Data Scientist @ Lander Analytics

More details

The volume of spatial data available to analyze is getting larger and larger every year. Fortunately, the tools used to analyze these data are improving at a faster pace. During this talk we will look at four key aspects of the geospatial pipeline. We start with storing the data efficiently using Postgres with the PostGIS and TimeScaleDB extensions installed for smart partitioning. Then we perform various spatial queries using the DuckDB query engine while the data are still in Postgres. After that we use DuckDB to quickly extract the data from Postgres into GeoArrow to enable columnar operations. Finally, we visualize large scale data with the high performance deck.gl library, including filtering and aggregating data on the fly with Arquero. All those steps together make for a high performance geo workflow on large data.
11:30 AM - 11:50 AM

Wrangling Data with DuckDB

Will Angel

Data Analytics Capability Lead @ Excella

More details

Learn how to accelerate data processing in your R code with DuckDB, a fast, open source, in-process analytical database! This talk will provide an overview of DuckDB and Duckplyr, explore when and how you can speed up your data processing with DuckDB, and benchmark the performance improvements you should expect compared to other popular data processing methods in R! This talk will also briefly explore the 'shrinking size' of big data and make the argument that you may not need to adopt distributed processing technologies to scale your data.
11:50 AM - 01:00 PM

Lunch & Networking
01:00 PM - 01:20 PM

Falling for the Exception: Understanding and Avoiding the Base Rate Fallacy

Benjy Braun

VP Data Solutions and Innovation @ Capital Technology Group

More details

In decision-making, we often get swept up by exceptional cases, ignoring the broader context in which they exist. This tendency, known as the base rate fallacy, can lead us to make poor judgments, whether evaluating NBA draft picks, hiring employees, or investing in startups. In this talk, we'll explore how neglecting base rates distorts our thinking, drawing from examples in sports, business, and everyday life. By understanding and accounting for base rates, we can make better, more informed decisions. Together we'll learn how avoiding this common fallacy can improve both your professional and personal choices.
01:25 PM - 01:45 PM

R and Python - A Love Story

Marck Vaisman

Sr. Cloud Solutions Architect @ Microsoft

More details

In my life as a data professional, I started with and mastered R (I even ran R on Hadoop back in 2010-2012 timeframe). Over time, especially at work, I have been using Python more and more and really trying to understand its mental model and best programming practices. There are many things that drive me bananas. This talk is NOT about making an argument for which one is better, or trash talking about either (well, maybe a little for fun's sake). It's about the gripes I've had when trying to do tasks in Python that are, in my opinion, much easier to do in R, and how to effectively love both and use them together.
01:45 PM - 02:15 PM

Break & Networking
02:15 PM - 02:35 PM

What is the Best Data Format for Your Shiny Project?

Richard Schwinn

Financial Analyst @ U.S. Securities and Exchange Commission

More details

What is the best data file format for your Shiny project? I compare the performances of several popular file formats and explore their pros and cons. Choosing the optimal file format is more complex than simply prioritizing speed or file size. You will learn how additional factors, such as system architecture, CPU availability, and more, can impact performance.
02:40 PM - 03:00 PM

Incorporating Local LLMs into Your Workflow

David Meza

Head of Analytics – Human Capital, Branch Chief People Analytics @ NASA

More details

In this talk we will walk through how to add local LLMs into your workflow. We will demonstrate how to install local LLMs on your laptop and using Positron, Posit’s new IDE, add an extension to help you in your developments. Then we will create chatbots in Shiny(R) and Streamlit(python) to ask questions of your data.
03:05 PM - 03:25 PM

Scaling Environmental Insights with Earth Observation Across Time, Foundation Models, UIs, and the Future of Environmental Monitoring

Refael Lav

AI Specialist Leader @ Deloitte Consulting

More details

The way the world, and the US Government, use Geospatial technology is about to change. Earth Observation Foundation Models (EOFMs) are advanced AI models trained on large and diverse datasets capturing various aspects of the Earth's surface, allowing them to generalize well across different tasks without needing re-training for each new application. They eliminate the need for multiple specialized AI models, reducing computational resources and costs while enhancing performance. An embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors. Embeddings can analyze environmental changes and man-made events like mining, coastal damage, and land cover. The model can be fine-tuned for specific tasks or use cases. We will cover EO across time and the new model, Clay. Clay is a computer vision foundation model that turns satellite imagery from a variety of sources into searchable, actionable insights through the generation of embeddings (generates embeddings from Earth observation satellite images and metadata) – much like ChatGPT and other generative AI models. Clay and others, without the need for coding skills, have the potential to fundamentally change the way we can monitor, report, and verify dimensions of sustainability, climate, and equity.
03:25 PM - 03:55 PM

Break & Networking
03:55 PM - 04:15 PM

Serving Your Own Local LLM for Internal, Secure GenAI

Travis Knoche

Senior Data Scientist @ Lander Analytics

More details

You’re likely hearing a lot about generative AI and large language models (LLMs) and how they can assist in day-to-day data science and analysis, but have you considered running one that is free, self-hosted, and customizable? In this talk, you’ll see how you can serve your own local LLM for internal GenAI use, leveraging Ollama alongside R packages like chattr and httr2. We’ll walk through the process of deploying an LLM on your own infrastructure, giving you control over data privacy and security. By the end, you’ll have a clear understanding of how to set up and interact with an LLM using these tools, and how your organization can benefit from internal AI solutions without relying on external cloud services.
04:20 PM - 04:40 PM

Everything You Never Wanted to Know About Auth

Alex Gold

Director of Solutions Engineering @ Posit

More details

You use auth constantly -- to log into your computer, access Instagram, and pull data into R. But unless you've been forced to, you've probably never thought about how that auth works or the concerns your organization might have about auth. In this talk, you'll learn about how different auth technologies work and the ways your organization might manage auth.
04:40 PM - 04:50 PM

Closing Remarks
05:00 PM - 07:00 PM

Happy Hour at Clubhouse

Hosted by Data Science DC

More details

Come socialize and network with fellow data scientists, analysts, software engineers, and other data enthusiasts. Chat about your latest project, your job search, or what you're learning about, Clubhouse: Beer, Billiards & Cocktails is located at 1070 Wisconsin Ave NW, Washington, DC, 20007

Wednesday, Oct 30

09:00 AM - 09:50 AM

Registration & Breakfast
09:50 AM - 10:00 AM

Opening Remarks
10:00 AM - 10:20 AM

Using Visual Perception to Find Patterns in Data and Drive Insight

Alex Gurvich

Senior Graphics Designer & Data Visualization Specialist @ NASA's Science Visualization Studio

More details

At the Data Visualization Society’s 2024 Outlier conference, communications and cognitive science researcher Dr. Steven Franconeri gave a keynote presentation on how the human brain is a super-charged pattern recognition machine. He demonstrated through interactive exercises how using principles of visual perception can reveal patterns in data almost instantly, while ignoring them can make them almost impossible to find. But my story begins earlier— almost 10 years ago, when I first saw a version of that presentation, which changed the way I think about data and the course of my life, inspiring me to become a data visualizer. In this talk, I'll summarize Dr. Franconeri's work and guide you through what I’ve learned about the core principles of using visual perception in data visualization.
10:25 AM - 10:45 AM

Quarto, AI, and the Art of Getting Your Life Back

Tyler Morgan-Wall

Research Staff @ Institute for Defense Analyses

More details

Tired of endless server issues and maintenance headaches? Want to reclaim your time for coding, writing, and creating? Join me as I share my journey of switching from the server-based headaches of Wordpress to Quarto, with a little help from AI. In this talk, I’ll describe the simple trick I used to convert an existing Wordpress blog—complete with custom scripts, styles, and beautiful 3D dataviz content—into a slick Quarto site. I'll then demonstrate some lesser-known features of Quarto to automate deploying a website entirely from a Quarto project file. Finally, I’ll show you how I used AI to customize and style my new Quarto site, and provide several useful strategies to employ if you decide to get some help from AI on your own Quarto journey.
10:45 AM - 11:15 AM

Break & Networking
11:15 AM - 11:35 AM

The Good, the Bad, and the Shiny: A Data Scientist's Guide to Choosing Python Web App Frameworks

Alan Feder

Staff LLM Data Scientist @ Magnifi

More details

Shiny has long been the go-to for data scientists creating web applications in R, but Python offers a plethora of alternatives that can be overwhelming to choose from. This talk will demonstrate the creation of a single application using three popular Python tools: PyShiny, Streamlit, and Gradio. We'll explore the strengths and weaknesses of each framework, through a live demonstration of each. This comparison will provide you with a practical roadmap for selecting the ideal Python web app framework to suit your specific data science projects.
11:40 AM - 12:00 PM

The Role of R in Census Bureau Data Reporting

Jessica Klein

Data Scientist @ United States Census Bureau

More details

The Census Bureau collects vast amounts of data on America's population, places, and economy. In this presentation, we will highlight how R has been adopted by staff to enhance data analysis processes and manage these large datasets. We'll discuss how R supports data work across the Bureau, the growing R user community, and ongoing training efforts to develop staff skills. We'll also showcase several innovative use cases of R from different departments. Finally, we will look ahead to the future of R at the Census Bureau, exploring upcoming initiatives to further integrate R into our data science operations.
12:05 PM - 12:25 PM

An Introduction to Estimation and Comparison of Discrete Variate Time Processes

Rachel Gidaro

Assistant Professor @ United States Military Academy

More details

This talk delves into the parameter estimation of discrete time series, focusing on the comparison between integer-autoregressive processes and traditional Gaussian autoregressive models. We will explore the theoretical underpinnings of these discrete models, emphasizing their relevance in various applications. This talk will largely be an overview of the relevant background in time series analysis, setting the stage for the integer-autoregressive processes.
12:25 PM - 01:35 PM

Lunch & Networking
01:35 PM - 01:55 PM

Who Are Your Consumers? Understanding Selection Bias Into Government Programs

Travis Riddle

Senior Research Fellow @ Consumer Financial Protection Bureau

More details

Many government programs and initiatives serve an unknown subset of the population. There are many reasons for this, including because individuals need to take action to enroll, or because an unknown portion of the population is eligible. The lack of insight into how people select into or out of government programs leads to difficulties in targeting those individuals who could benefit from them and complications in using data from government programs to generalize about broader trends. In this talk, I will describe a project undertaken at the CFPB whose goal is to understand selection into our complaint program. We use a case-control survey methodology to understand this selection bias along several dimensions. I will describe our findings and provide some guidance for those looking for cost-effective ways to understand how people select into their programs.
02:00 PM - 02:20 PM

Detecting Automotive Quality & Safety Issues from Consumer Complaints

Tommy Jones

CEO @ Foundation

More details

70% of the cars on the road are between 6 and 14 years old. Most aren't equipped with sensors allowing manufacturers, regulators, and the public to see when a vehicle platform develops a quality issue, but owners do complain about their cars online. Tommy will tell you about the process of detecting automotive quality issues from consumer complaints, demonstrate the application Foundation is building to unlock these latent issues for automotive companies, regulators, and the public, and showcase the tech stack they used to pull it all together.
02:25 PM - 02:45 PM

SEC Board Diversity Requirements: Are NASDAQ Companies Disclosing Their Data?

Princess Onyiri & Brittany Long

Bloomberg Law

More details

In 2021, the US Securities and Exchange Commission (SEC) approved a rule which requires companies listed on Nasdaq to have or publicly disclose why they don’t have at least two diverse board members. However, there is a lack of uniformity in the way companies file the Board Diversity Matrix requirement in their proxy statements or 10k filings. This presents a series of challenges when trying to pull and analyze the data. However, with a little finessing, Bloomberg Law will discuss methodologies that uncovered certain data trends in corporate board diversity (or lack thereof) despite the SEC’s requirement.
02:45 PM - 03:15 PM

Break & Networking
03:15 PM - 03:35 PM

Defending Your Data: When Best Practices Don’t Apply

Frederick Thayer

Data Scientist @ NAVAIR Proposal Analysis Team

More details

Data analytics best practices are important guidelines to follow whenever possible, but what do you when you cannot apply them? This talk will cover how to determine when they don’t apply, what to do in that case, and how to explain and defend your analysis to stakeholders.
03:40 PM - 04:00 PM

Making Things Difficult: The Role of Disfluency in Science Communication

Laura Gast

Data Science & Analytics Manager @ USO

More details

This talk explores how disfluency, both in font choice and in speech, impacts memory retention, comprehension, and decision-making. We'll examine research showing how introducing slight difficulty in reading or speech can improve recall and encourage deeper cognitive processing. By understanding these effects, you can make more informed choices when crafting both written and spoken messages to maximize audience engagement and understanding.
04:00 PM - 04:10 PM

Closing Remarks

Workshops

Better Development Practices with Large Language Models (LLMs)

Hosted by Abigail Haddad & Benjy Braun

Monday, Oct 28 | 9:00am - 5:00pm

More details

In recent years, data scientists have increasingly adopted best practices from software engineering to improve code quality and project management. These practices are ideal candidates for leveraging Large Language Models (LLMs), as they are well-documented online and often involve tasks performed infrequently enough that memorization is impractical. This workshop will guide you through key software development practices tailored for data science. Participants will learn how to use LLMs to enhance their documentation, version control, and other essential tasks. The goal is to produce code that's easier to run, build upon, and understand, ultimately leading to more efficient and reproducible data science projects.

Workshop Highlights:

Writing Cleaner, More Readable Code: Learn techniques to improve code readability and maintainability, with LLMs assisting in generating clearer syntax and structure.
Improving Documentation: Discover how LLMs can help create comprehensive and understandable documentation, making your projects easier to use and collaborate on.
Using Git for Version Control: Gain proficiency in using git for version control, with LLMs offering support in managing branches, resolving conflicts, and maintaining a clean commit history.
Docker/Virtual Environments: Understand the benefits of containerization and virtual environments in development, and how LLMs can assist in setting up and managing these environments
Debugging and Error Handling: Learn effective debugging techniques and use LLMs to interpret error messages and suggest fixes Participants will engage in practical examples using either Python or R, exploring how LLMs can be integrated into their development processes. While LLMs do not produce perfect code instantly, they are invaluable for iterative development, particularly in data pipelines and analyses. We will practice effective prompting strategies to guide LLMs towards better solutions and explore their ability to interpret error messages and suggest fixes in data science contexts.

Requirements:

Comfortable writing a function in either Python or R
Laptop with Python or R and git installed

By the end of this workshop, participants will have a toolkit of best practices and the skills to utilize LLMs for enhancing their development workflows, leading to more efficient and error-resistant coding practices. This workshop is ideal for developers, data scientists, and analysts looking to integrate advanced AI tools into their everyday coding routines.

(In-Person & Virtual Ticket Options Available)

Dashboards and CRUD Apps: Managing Data For Your Organization

Hosted by Maxine Drake

Monday, Oct 28 | 9:00am - 5:00pm

More details

This class focuses on working with your organization’s data from data collection to data management to data visualization. We will learn how to build a dashboard with Shiny, including dynamic calendars perfect for large-scale event tracking. We will also build a CRUD (create, read, update, delete) application that allows users to manage data themselves. In addition to these technical skills, we will cover concepts, such as multi-tiered architectures, modularizing code, clear data visualizations, and managing user permissions in your Shiny apps.

(In-Person & Virtual Ticket Options Available)

Speakers

Tyler Morgan-Wall

Research Staff

Institute for Defense Analyses

@tylermorganwall

Talk: Quarto, AI, and the Art of Getting Your Life Back

Selen Stromgren

Associate Director

U.S. Food and Drug Administration

@US_FDA

Talk: Liberate the Summer Interns onto More Interesting Terrain: Harnessing NLP in R to Create the Concept of Operations for a Large Organization (Joint Talk with Evgeny Kiselev)

Alex Gold

Director of Solutions Engineering

Posit

@alexkgold

Talk: Everything You Never Wanted to Know About Auth

Abigail Haddad

Machine learning engineer -- AI Corps

Department of Homeland Security

@abbystat

Talk: Finding Your Next Federal Data Job

Princess Onyiri

Senior Data Scientist

Bloomberg Law

@BLaw

Talk: SEC Board Diversity Requirements: Are NASDAQ Companies Disclosing Their Data? (Joint Talk with Brittany Long)

David Meza

Head of Analytics – Human Capital, Branch Chief People Analytics

NASA

@davidmeza1

Talk: Incorporating Local LLMs into Your Workflow

Rachel Gidaro

Assistant Professor

United States Military Academy

@WestPoint_USMA

Talk: An Introduction to Estimation and Comparison of Discrete Variate Time Processes

Will Angel

Data Analytics Capability Lead

Excella

@datadrivenangel

Talk: Wrangling Data with DuckDB

Jared P. Lander

Chief Data Scientist

Lander Analytics

@jaredlander

Talk: Mapping Ever Larger Data with PostGIS, DuckDB, GeoArrow and deck.gl

Jessica Klein

Data Scientist

United States Census Bureau

@uscensusbureau

Talk: The Role of R in Census Bureau Data Reporting

Frederick Thayer

Data Scientist

NAVAIR Proposal Analysis Team

@NAVAIRNews

Talk: Defending Your Data: When Best Practices Don’t Apply

Laura Gast

Data Science & Analytics Manager

USO

@The_USO

Talk: Making Things Difficult: The Role of Disfluency in Science Communication

Brittany Long

Assistant Team Lead, Data & Surveys

Bloomberg Law

@BLaw

Talk: SEC Board Diversity Requirements: Are NASDAQ Companies Disclosing Their Data? (Joint Talk with Princess Onyiri)

Marck Vaisman

Sr. Cloud Solutions Architect

Microsoft

@wahalulu

Talk: R and Python - A Love Story

Mikaela Meyer

Senior Data Scientist

The MITRE Corporation

@mmeyer717

Talk: Slowly Scaling Per-Record Differential Privacy

Alex Gurvich

Senior Graphics Designer & Data Visualization Specialist

NASA's Science Visualization Studio

@alexbgurvich

Talk: Using Visual Perception to Find Patterns in Data and Drive Insight

Travis Riddle

Senior Research Fellow

Consumer Financial Protection Bureau

@CFPB

Talk: Who Are Your Consumers? Understanding Selection Bias Into Government Programs

Refael Lav

AI Specialist Leader

Deloitte Consulting

@RefaelLav

Talk: Scaling Environmental Insights with Earth Observation Across Time, Foundation Models, UIs, and the Future of Environmental Monitoring

William E. J. Doane

Assistant Director

IDA's Science & Technology Policy Institute

@WilDoane

Talk: Text Explorer

Richard Schwinn

Financial Analyst

U.S. Securities and Exchange Commission

@SECGov

Talk: What is the Best Data Format for Your Shiny Project?

Benjy Braun

VP Data Solutions and Innovation

Capital Technology Group

@Ben_G_Braun

Talk: Falling for the Exception: Understanding and Avoiding the Base Rate Fallacy

Tommy Jones

CEO

Foundation

@thos_jones

Talk: Detecting Automotive Quality & Safety Issues from Consumer Complaints

Alan Feder

Staff LLM Data Scientist

Magnifi

@AlanFeder

Talk: The Good, the Bad, and the Shiny: A Data Scientist's Guide to Choosing Python Web App Frameworks

Evgeny Kiselev

Chemist/Scientific Coordinator

U.S. Food and Drug Administration

@US_FDA

Talk: Liberate the Summer Interns onto More Interesting Terrain: Harnessing NLP in R to Create the Concept of Operations for a Large Organization (Joint Talk with Selen Stromgren)

Travis Knoche

Senior Data Scientist

Lander Analytics

@LanderAnalytics

Talk: Serving Your Own Local LLM for Internal, Secure GenAI

Agenda

Monday, Oct 28

08:15 AM - 08:50 AM

08:50 AM - 05:00 PM

08:50 AM - 05:00 PM

Tuesday, Oct 29

08:00 AM - 08:50 AM

08:50 AM - 09:00 AM

09:00 AM - 09:20 AM

09:25 AM - 09:45 AM

09:50 AM - 10:10 AM

10:10 AM - 10:40 AM

10:40 AM - 11:00 AM

11:05 AM - 11:25 AM

11:30 AM - 11:50 AM

11:50 AM - 01:00 PM

01:00 PM - 01:20 PM

01:25 PM - 01:45 PM

01:45 PM - 02:15 PM

02:15 PM - 02:35 PM

02:40 PM - 03:00 PM

03:05 PM - 03:25 PM

03:25 PM - 03:55 PM

03:55 PM - 04:15 PM

04:20 PM - 04:40 PM

04:40 PM - 04:50 PM

05:00 PM - 07:00 PM

Wednesday, Oct 30

09:00 AM - 09:50 AM

09:50 AM - 10:00 AM

10:00 AM - 10:20 AM

10:25 AM - 10:45 AM

10:45 AM - 11:15 AM

11:15 AM - 11:35 AM

11:40 AM - 12:00 PM

12:05 PM - 12:25 PM

12:25 PM - 01:35 PM

01:35 PM - 01:55 PM

02:00 PM - 02:20 PM

02:25 PM - 02:45 PM

02:45 PM - 03:15 PM

03:15 PM - 03:35 PM

03:40 PM - 04:00 PM

04:00 PM - 04:10 PM

Workshops

Better Development Practices with Large Language Models (LLMs)

Hosted by Abigail Haddad & Benjy Braun

Monday, Oct 28 | 9:00am - 5:00pm

Dashboards and CRUD Apps: Managing Data For Your Organization

Hosted by Maxine Drake

Monday, Oct 28 | 9:00am - 5:00pm

Speakers

Tyler Morgan-Wall

Selen Stromgren

Alex Gold

Abigail Haddad

Princess Onyiri

David Meza

Rachel Gidaro

Will Angel

Jared P. Lander

Jessica Klein

Frederick Thayer

Laura Gast

Brittany Long

Marck Vaisman

Mikaela Meyer

Alex Gurvich

Travis Riddle

Refael Lav

William E. J. Doane

Richard Schwinn

Benjy Braun

Tommy Jones

Alan Feder

Evgeny Kiselev

Travis Knoche

Sponsors

Gold

Silver