• Home
  • About
  • Tickets
  • New York
    • About
    • Speakers
    • Workshops
    • Agenda
    • Sponsors
  • Gov & Public Sector
    • About
    • Speakers
    • Workshops
    • Agenda
    • Sponsors
  • More
    • Videos
    • Job Board
    • Code of Conduct

Gov R Conference


Workshops: October 18   |   Location: Georgetown University

Conference: October 19-20   |   Location: Georgetown University

Workshops: October 18

Location: Georgetown University


Conference: October 19-20

Location: Georgetown University

✕

Click here to buy tickets
Sell tickets online with Ticket Tailor



Speakers

Headshot of David Shor
David Shor

Head of Data Science

Blue Rose Research

@davidshor

Talk: Data, Surveys, and US Politics

Headshot of Abigail Haddad
Abigail Haddad

Lead Data Scientist

Capital Technology Group

@abbystat

Talk: What Job Is This, Anyway?: Using LLMs to Classify USAJobs Data Scientist Listings

Headshot of Alex Gold
Alex Gold

Solutions Engineer

Posit

@alexkgold

Talk: Learn to Love Logging

Headshot of Melissa Albino Hegeman
Melissa Albino Hegeman

Marine Fisheries Data Manager

NYSDEC

@mo1590

Talk: It Works on My Machine (Reproducibility in R for Small Teams)

Headshot of Selen Stromgren
Selen Stromgren

Associate Director

U.S. Food and Drug Administration

@US_FDA

Talk: Deterministic Extraction vs. Probabilistic Extrapolation: A Pilot for R-Enabled Augmentation of Information Retrieval by Humans (Joint Talk with Danielle Larese)

Headshot of David Meza
David Meza

Head of Analytics – Human Capital, Branch Chief People Analytics

NASA

@davidmeza1

Headshot of Irena Papst
Irena Papst

Senior Scientist

Public Health Agency of Canada

@irenapapst

Talk: From Scripts to Pipelines with Targets

Headshot of Gary Harki
Gary Harki

Investigations Editor

Bloomberg Industry Group

@GaryHarki

Talk: Using Open Records Laws to get Data from the Government (and When to Sue)

Headshot of Jared P. Lander
Jared P. Lander

Chief Data Scientist

Lander Analytics

@jaredlander

Talk: I Wrote this Talk with an LLM

Headshot of Vivian Peng
Vivian Peng

Lead Data Scientist, Innovation

The Rockefeller Foundation

@create_self

Talk: Using Large Language Models in Production: Hype vs Reality (Joint Talk with David Cyprian)

Headshot of George Perrett
George Perrett

Director of Research and Data Analysis

New York University

@NYU_PRIISM

Talk: stan4bart: Harnessing the Power of Stan and the Flexibility of Machine Learning

Headshot of Danielle Larese
Danielle Larese

Scientific Coordinator (Chemist)

U.S. Food and Drug Administration

@US_FDA

Talk: Deterministic Extraction vs. Probabilistic Extrapolation: A Pilot for R-Enabled Augmentation of Information Retrieval by Humans (Joint Talk with Selen Stromgren)

Headshot of Alex Gurvich
Alex Gurvich

Senior Graphics Designer & Data Visualization Specialist

NASA's Science Visualization Studio

@alexbgurvich

Talk: Storytelling with Data at NASA's Earth Information Center

Headshot of Soubhik Barari
Soubhik Barari

Quantitative Social Scientist

NORC

@SoubhikBarari

Talk: LocalView: Scaling up the Analytics of Local Politics with R

Headshot of Dusty Turner
Dusty Turner

Major

United States Army & Baylor

@dtdusty

Talk: World Leaders, Military Service, and Their Propensity for War

Headshot of Rhys O'Neill
Rhys O'Neill

Innovations and Technology Lead - AIRA

World Health Organization

@@WHO

Talk: Democratizing Misinformation Management

Headshot of Jake Dyal
Jake Dyal

President

Certus Group

@GroupCertus

Talk: Organizational Effects Driven by Ontologies

Headshot of Tommy Jones
Tommy Jones

CEO

Foundation

@thos_jones

Talk: R-Squared for Multidimensional Outcomes

Headshot of Benjy Braun
Benjy Braun

Vice President

Data Solutions and Innovation

@Ben_G_Braun

Talk: You Don't See with Your Eyes, You Perceive with Your Mind: Sight, Psychology, and Data Visualization

Headshot of Marck Vaisman
Marck Vaisman

Sr. Cloud Solutions Architect

Microsoft

@wahalulu

Talk: Rockin' R with VSCode

Headshot of David Cyprian
David Cyprian

Partner

Rootwise

Talk: Using Large Language Models in Production: Hype vs Reality (Joint Talk with Vivian Peng)

More speakers coming soon



Workshops

Workshop leader headshot

Introduction to Natural Language Processing

Hosted by William E J Doane
Wednesday, Oct 18 | 9:00am - 5:00pm
(In-person & Virtual Ticket Options) Unstructured and loosely-structured textual data is commonly used is public policy analyses to wrangle the vast amount of information available from open (and not so open) sources of information. This workshop will use R to acquire data from various sources, clean and standardize the data, and explore it for insights that can inform public policy discussions. B... ...Basic visualizations will be considered to help communicate stories about the collected data. Dr. Doane is a science and technology policy researcher in Washington DC with a background teaching computer science and information science at University. https://DrDoane.com/about/cv
Workshop leader headshot

Generative AI for Better Code

Hosted by Abigail Haddad & Benjy Braun
Wednesday, Oct 18 | 9:00am - 5:00pm
(In-person & Virtual Ticket Options) This one-day workshop focuses on how GPT-4 can help reduce technical debt for your R projects, regardless of whether you’re doing analysis, automation, or data science. Technical debt refers to 'debt' you accumulate when you write code and build tools quickly, but which later slows you down when trying to add additional functionality. We'll go from writing "cod... ...de that runs" to "code you can build on", or code that’s modular, documented, and on GitHub. The workshop is structured into four parts: -Code Refactoring: We take code and show you how to make it modular by putting it in functions and structuring it to be easier to run, debug, and build on. -Documentation: The next step is about making your code easy to understand and use. We will show you how to create clear and thorough documentation, at both the project and function level. -Unit testing: We’ll guide you through creating unit tests. These formal checks ensure your functions operate as expected so when you modify your code, you’re less reliant on ad hoc testing. -Version Control: The final step involves using git for local version control and GitHub for collaboration. Even if you’re already a git user, ChatGPT can help you write commands for less-used tasks and to debug your error messages. If GitHub's CoPilot is available in RStudio, we will also discuss how you can use this tool to generate code. At the end of the workshop, you will have transformed your code that runs into modular, well-documented code that's stored in a Git repository. You will understand how large language models like GPT-4 can help you create code that not only does what it's supposed to do but is also easy to work with and build on. This knowledge will be useful for your ongoing analytical work and future development projects.
Workshop leader headshot

Causal Inference + BART

Hosted by George Perrett
Wednesday, Oct 18 | 9:00am - 5:00pm
(In-person Ticket Option Only) This workshop will introduce using Bayesian Additive Regression Trees (BART) as a tool for causal inference. BART is a machine learning algorithm with applications in both randomized and observational studies. No prior experience with causal inference or machine learning is expected. By the end of this workshop you will have hands-on experience fitting BART models f... ...for causal inference. You will be able to articulate the main ideas of BART and communicate the advantages of utilizing BART models for causal inference and the underlying assumptions of these models. This workshop will begin with an introduction to causal inference and BART. You'll learn the basics of what causal inference is and why it matters. We'll then cover the intuition of BART and why it is a desirable tool for causal inference. After this introduction, I'll cover the applications of BART in randomized studies. Randomized studies are the "gold standard" of causal inference, but the choice of which model to use remains consequential. I will compare BART to other causal inference strategies in randomized studies and you will learn about utilizing BART to uncover treatment effect moderators and heterogeneous treatment effects. Randomized studies are not always practical or even possible. We'll extend the use of BART for causal inference to observational studies. Participants will learn about the advantages of BART for observational studies and gain hands-on experience working with observational data where individuals have self-selected into or out of the treatment in question. In public policy and governmental settings, data can be clustered and non-independent. Individuals in a data-set may share a common Congressional district, state membership, they may attend a common hospital or school. The workshop will end with extensions of the BART method for working with non-independent data to address these settings. This course is for you if you are interested in learning more about causal inference and implementing a cutting-edge machine learning method used by experts in causal inference. This course assumes familiarity and a basic understanding of R.



Agenda

Wednesday, Oct 18

  • 08:00 AM - 09:00 AM

    Registration & Breakfast

  • 09:00 AM - 05:00 PM

    Workshop: William E J Doane Research Staff Member @ IDA Science & Technology Policy Institute

    Introduction to Natural Language Processing ...

    (In-person & Virtual Ticket Options) Unstructured and loosely-structured textual data is commonly used is public policy analyses to wrangle the vast amount of information available from open (and not so open) sources of information. This workshop will use R to acquire data from various sources, clean and standardize the data, and explore it for insights that can inform public policy discussions. Basic visualizations will be considered to help communicate stories about the collected data. Dr. Doane is a science and technology policy researcher in Washington DC with a background teaching computer science and information science at University. https://DrDoane.com/about/cv
  • 09:00 AM - 05:00 PM

    Workshop: George Perrett Director of Research and Data Analysis @ New York University

    Causal Inference + BART ...

    (In-person Ticket Option Only) This workshop will introduce using Bayesian Additive Regression Trees (BART) as a tool for causal inference. BART is a machine learning algorithm with applications in both randomized and observational studies. No prior experience with causal inference or machine learning is expected. By the end of this workshop you will have hands-on experience fitting BART models for causal inference. You will be able to articulate the main ideas of BART and communicate the advantages of utilizing BART models for causal inference and the underlying assumptions of these models. This workshop will begin with an introduction to causal inference and BART. You'll learn the basics of what causal inference is and why it matters. We'll then cover the intuition of BART and why it is a desirable tool for causal inference. After this introduction, I'll cover the applications of BART in randomized studies. Randomized studies are the "gold standard" of causal inference, but the choice of which model to use remains consequential. I will compare BART to other causal inference strategies in randomized studies and you will learn about utilizing BART to uncover treatment effect moderators and heterogeneous treatment effects. Randomized studies are not always practical or even possible. We'll extend the use of BART for causal inference to observational studies. Participants will learn about the advantages of BART for observational studies and gain hands-on experience working with observational data where individuals have self-selected into or out of the treatment in question. In public policy and governmental settings, data can be clustered and non-independent. Individuals in a data-set may share a common Congressional district, state membership, they may attend a common hospital or school. The workshop will end with extensions of the BART method for working with non-independent data to address these settings. This course is for you if you are interested in learning more about causal inference and implementing a cutting-edge machine learning method used by experts in causal inference. This course assumes familiarity and a basic understanding of R.
  • 09:00 AM - 05:00 PM

    Workshop: Abigail Haddad & Benjy Braun

    Generative AI for Better Code ...

    (In-person & Virtual Ticket Options) This one-day workshop focuses on how GPT-4 can help reduce technical debt for your R projects, regardless of whether you’re doing analysis, automation, or data science. Technical debt refers to 'debt' you accumulate when you write code and build tools quickly, but which later slows you down when trying to add additional functionality. We'll go from writing "code that runs" to "code you can build on", or code that’s modular, documented, and on GitHub. The workshop is structured into four parts: -Code Refactoring: We take code and show you how to make it modular by putting it in functions and structuring it to be easier to run, debug, and build on. -Documentation: The next step is about making your code easy to understand and use. We will show you how to create clear and thorough documentation, at both the project and function level. -Unit testing: We’ll guide you through creating unit tests. These formal checks ensure your functions operate as expected so when you modify your code, you’re less reliant on ad hoc testing. -Version Control: The final step involves using git for local version control and GitHub for collaboration. Even if you’re already a git user, ChatGPT can help you write commands for less-used tasks and to debug your error messages. If GitHub's CoPilot is available in RStudio, we will also discuss how you can use this tool to generate code. At the end of the workshop, you will have transformed your code that runs into modular, well-documented code that's stored in a Git repository. You will understand how large language models like GPT-4 can help you create code that not only does what it's supposed to do but is also easy to work with and build on. This knowledge will be useful for your ongoing analytical work and future development projects.
Workshop tickets sold separately

Thursday, Oct 19

  • 08:00 AM - 08:50 AM

    Registration & Breakfast

  • 08:50 AM - 09:00 AM

    Opening Remarks

  • 09:00 AM - 09:20 AM

    Irena Papst Senior Scientist @ Public Health Agency of Canada

    From Scripts to Pipelines with Targets ...

    Do you ever find yourself starting with a simple analysis script only to end up wrangling a thousand line behemoth? Are you sick of wasting time re-running long scripts from start to finish, just to make sure everything is up-to-date? Are you haphazardly saving objects to file because they take a long time to generate? There’s got to be a better way! Enter targets, an R package used to build reproducible, efficient, and scalable pipelines. In this talk, I’ll introduce the targets package and share how I’ve used it to streamline my work modelling infectious disease spread at the Public Health Agency of Canada.
  • 09:25 AM - 09:45 AM

    Gary Harki Investigations Editor @ Bloomberg Industry Group

    Using Open Records Laws to get Data from the Government (and When to Sue) ...

    Gary will break down how to effectively use open records laws to get data from local, state and federal agencies. He'll talk about the hurdles you encounter and how to overcome them.
  • 09:50 AM - 10:10 AM

    Jon Schwabish Founder and CEO @ PolicyViz

  • 10:10 AM - 10:40 AM

    Break & Networking

  • 10:40 AM - 11:00 AM

    Abigail Haddad Lead Data Scientist @ Capital Technology Group

    What Job Is This, Anyway?: Using LLMs to Classify USAJobs Data Scientist Listings ...

    Navigating the federal job market begins with finding appropriate job listings. But for data professionals, discrepancies often arise between the content of the listing - that is, the duties of the job - and either the job title or the occupational code, making this step more difficult. In this presentation, I discuss using a Large Language Model (LLM) to generate new job titles for listings in occupational code 1560, Data Science. I'll show examples of listings with mismatches between the official job title and the one generated by GPT-3.5 and discuss the potential uses of this for applicants and agencies. I'll also highlight the advantages of using Marvin, a library that lets you use LLMs to solve Natural Language Processing problems by just writing documentation rather than code.
  • 11:05 AM - 11:25 AM

    Jared P. Lander Chief Data Scientist @ Lander Analytics

    I Wrote this Talk with an LLM ...

    We have all seen LLMs do data analysis, I even gave a talk about using an LLM to write an R package. But now I used an LLM to write these slides. Everything from creating the outline, to flushing out ideas, to writing the actual markdown. Let's see how it goes.
  • 11:30 AM - 11:50 AM

    Soubhik Barari Quantitative Social Scientist @ NORC

    LocalView: Scaling up the Analytics of Local Politics with R ...

    Never before have there been more tools and resources for political data science, yet in 2023, there are shockingly few resources for analysts of local politics - one of the central pillars of American democracy. In this talk, I introduce LocalView, a database of over 100,000 local government public meetings with a dashboard that enables real-time text analytics on issues such as climate change and LGBTQ rights. I show how this database (and accompanying dashboard) was built drawing on tools such as the tidyverse, R Shiny, quanteda, and duckdb. Finally, I show how LocalView can be useful for social science and journalistic applications such as measuring political polarization in local politics and tracking shifts in public health attention across geography.
  • 11:50 AM - 01:00 PM

    Lunch & Networking

  • 01:00 PM - 01:20 PM

    David Meza Head of Analytics – Human Capital, Branch Chief People Analytics @ NASA

  • 01:25 PM - 01:45 PM

    David Shor Head of Data Science @ Blue Rose Research

    Data, Surveys, and US Politics ...

    A walk through of the state of the art of Data Science and US politics
  • 01:45 PM - 02:15 PM

    Break & Networking

  • 02:15 PM - 02:35 PM

    Marck Vaisman Sr. Cloud Solutions Architect @ Microsoft

    Rockin' R with VSCode ...

    Learn how to set up Visual Studio Code to use it with R on both your local workstation and on Azure Machine Learning. We’ll show what R packages you need to install in your R environment, what VSCode extensions you need to install, additional configuration optiona, and we’ll show an end-to-end example using R, VSCode and Azure Machine Learning
  • 02:40 PM - 03:00 PM

    Melissa Albino Hegeman Marine Fisheries Data Manager @ NYSDEC

    It Works on My Machine (Reproducibility in R for Small Teams) ...

    Working collaboratively in R can be a lot of fun, but it can also be tricky to get started. A combination of GitHub, renv, and custom packages can help improve reproducibility, reduce stress, and lighten everyone's workload. I've made mistakes and hit roadblocks when implementing these tools within a team. But I've also learned a lot along the way. I'll share my experiences and tips so you can avoid the same mistakes and start on the right foot.
  • 03:05 PM - 03:25 PM

    Dusty Turner Major @ United States Army & Baylor

    World Leaders, Military Service, and Their Propensity for War

  • 03:25 PM - 03:55 PM

    Break & Networking

  • 03:55 PM - 04:15 PM

    TBD

  • 04:20 PM - 04:40 PM

    TBD

  • 04:40 PM - 04:50 PM

    Closing Remarks

  • 05:00 PM - 07:00 PM

    Happy Hour at Clubhouse

    Data Happy Hour at Clubhouse - Hosted by Data Science DC ...

    Take a break from your keyboard and join us at Clubhouse in Georgetown for this Data Science DC Happy Hour. Come socialize and network with fellow data scientists, analysts, software engineers, and other data enthusiasts. A range of non-alcoholic drinks will be supplied, with alcoholic beverages available for purchase. RSVP HERE!

Friday, Oct 20

  • 09:00 AM - 09:50 AM

    Registration & Breakfast

  • 09:50 AM - 10:00 AM

    Opening Remarks

  • 10:00 AM - 10:20 AM

    TBD

  • 10:25 AM - 10:45 AM

    George Perrett Director of Research and Data Analysis @ New York University

    stan4bart: Harnessing the Power of Stan and the Flexibility of Machine Learning ...

    Data is often organized within social systems, people cities that are in counties that are in states. Nested data often violates the independence assumptions inherent to most statistical and machine learning methods. Multilevel models are a popular solution for accounting for these dependencies, but they make rigid parametric assumptions about the linearity of data. stan4bart is a new type of multi-level model that combines the flexibility of machine learning and the robust inference of traditional mulit-levle models. stan4bart has applications for both prediction and inference problems and my talk will introduce the method and its utility in educational and public policy domains.
  • 10:45 AM - 11:15 AM

    Break & Networking

  • 11:15 AM - 11:35 AM

    Benjy Braun Vice President @ Data Solutions and Innovation

    You Don't See with Your Eyes, You Perceive with Your Mind: Sight, Psychology, and Data Visualization ...

    Inspired by Stephen Few's "Show Me the Numbers," this talk delves into the psychology of data visualization. We'll start by briefly exploring how the eye-brain interaction affects what we 'see' in a graph. The focus then shifts to the key Gestalt principles of design—proximity, similarity, enclosure, closure, continuity, and connection—that serve as the backbone of effective data visualization. We'll wrap up by critiquing poorly executed visualizations and discuss how to improve them using these principles. Attendees will leave with practical insights into making their data not just viewable, but truly 'seen'.
  • 11:40 AM - 12:00 PM

    Selen Stromgren & Danielle Larese U.S. Food and Drug Administration

    Deterministic Extraction vs. Probabilistic Extrapolation: A Pilot for R-Enabled Augmentation of Information Retrieval by Humans ...

    Large language model AI systems are taking off at a dizzying speed and end users are trying to ascertain which output can be trusted to what degree. More recently, machine learning experts have pivoted to “refining” large language models with focused sets of data where they train the AI tool with a topic-specific corpus to increase the accuracy and reliability of the output. Examples of such “subject matter expert” AI systems are pharmaGPT, bioGPT, etc. However, the AI system itself still remains a black box to the end user. In this talk, we will present a pilot idea where we explore a very deterministic approach to extracting information from a well-defined corpus using R. Our approach is completely transparent to the end-user, does not include any extrapolation nor probability-based guessing, and produces an output only if the specific answer to the question posed is present in the reference corpus. If successful, such an approach can allow users to create their own R-code using different corpus inputs with the ultimate goal of automating and expediting information retrieval on-the-go with full accuracy.
  • 12:05 PM - 12:25 PM

    TBD

  • 12:25 PM - 01:35 PM

    Lunch & Networking

  • 01:35 PM - 01:55 PM

    Alex Gold Solutions Engineer @ Posit

    Learn to Love Logging ...

    Good logging practice makes the software development parts of Data Science easier and more fun. Learn about how to add logging to your apps, projects, and reports.
  • 02:00 PM - 02:20 PM

    Tommy Jones CEO @ Foundation

    R-Squared for Multidimensional Outcomes ...

    The coefficient of determination---R-squared---is the most popular goodness of fit metric for linear models. Its appeal is so strong that nearly all statistical software reports it by default when fitting linear models. While several other pseudo R-squared measures have been developed for other use cases, to our knowledge our research is the first time anyone has proposed a variation of R-squared for models predicting an outcome in multiple dimensions. Multidimensional outcomes occur in settings such as modeling simultaneous equations, modeling multivariate distributions, or topic modeling of text. Our R-squared relies on a geometric interpretation of the standard definition of R-squared, and is, thus, an extension of the goodness of fit metric we all know and love.
  • 02:25 PM - 02:45 PM

    Jake Dyal President @ Certus Group

    Organizational Effects Driven by Ontologies ...

    How organizations can bake their goals into data structures to enable decision-making for more aligned organizational impact.
  • 02:45 PM - 03:15 PM

    Break & Networking

  • 03:15 PM - 03:35 PM

    Alex Gurvich Senior Graphics Designer & Data Visualization Specialist @ NASA's Science Visualization Studio

    Storytelling with Data at NASA's Earth Information Center ...

    Exploratory data visualization is a crucial element for building intuition for complex datasets. Numerous tools and approaches exist for efficiently summarizing data in order to extract key insights. However, these visualizations are not always optimized for communicating the final results. In this talk, I will share my experience as a data visualization specialist at NASA's Science Visualization Studio developing content for the new Earth Information Center and discuss the key differences between exploratory and explanatory data visualization. I will also provide helpful tips to make effective explanatory visualizations and share resources to continue learning about the best practices in data storytelling; a new approach to data visualization and communication.
  • 03:40 PM - 04:00 PM

    TBD

  • 04:00 PM - 04:10 PM

    Closing Remarks



Sponsors

Gold

Georgetown University logo

Silver

Posit logo
R Consortium logo

Bronze

PolicyViz logo

Supporting

Pearson logo
Springer logo
Chapman & Hall/CRC, Taylor & Francis Group logo