Click here to purchase tickets!
In-person & virtually tickets are available.
Workshops: Wednesday, November 30, 2022
Conference: Thursday, December 1 & Friday, December 2, 2022
Venue: Georgetown University in the ICC Auditorium, located inside the Edward B. Bunn, S.J. Intercultural Center (ICC) at 1501 Tondorf Rd, Washington, DC 20007
& Virtually Online
Conference will be held on December 1-2, 2022 at Georgetown University and virtually online.
Workshops will be held on November 30, 2022 at Georgetown University and virtually online.
Unstructured and loosely-structured textual data is commonly used is public policy analyses to wrangle the vast amount of information available from open (and not so open) sources of information. This workshop will use R to acquire data from various sources, clean and standardize the data, and explore it for insights that can inform public policy discussions. Basic visualizations will be considered to help communicate stories about the collected data.
Workshops and Conference will be held at Georgetown University and virtually online.
Are you responsible for producing a report that uses the same analyses from one iteration to the next? Are you spending your time manually updating tables, figures, and report text to reflect changes in data? This session will focus on how you can use Quarto, the next generation of RMarkdown, to write and update reports. Learn how Quarto can help facilitate your workflow.
This session will describe how NASA People Analytics incorporates the RStudio/Posit ecosystem into our analytic workflow. We will explore data ingestion, API creation with plumber, MLOPS with Vetiver, apps on Connect and Tableau extensions created with R.
USAID is a unique organization that operates across countries and supports areas from education to humanitarian aid. This session would focus on how USAID is now using data, a set of analytical tooling, using R, in their process of modeling the demand for workforce across the world and Washington. We would cover the business need, the technical architecture, the R involved, and the impact on the workforce.
The deaths from the U.S. opioid epidemic have reached a new record, totaling 108,000 in 2021 according to the CDC. The number of drug overdose has quadrupled since 1999. Curbing this unrelenting crisis is at the heart of many interventions by the government, public health experts, providers and community activists. PursueCare is a company offering comprehensive care for substance use disorders and other mental health conditions through telehealth technology and in-person treatments. Asmae Toumi, the director of analytics and research at PursueCare, will talk about how data and R/RStudio’s public and professional tools are being used to deliver evidence-based care and monitor outcomes.
Installing the data science stack, including the RStudio suite of products, poses it’s own challenges even in ordinary environments, but this gets significantly more difficult in locked down and airgapped environments. We have successfully installed tools in a variety of environments in different states of isolation across industry and government. This talk will go over lessons learned in setting up even the most locked down servers.
Transforming data into information requires use of versatile and accessible tools such as R-Studio and is a key step to support decision-making via accurate trend identification and fact finding. This talk will demonstrate applications of R to perform data analysis and visualization on various data sets at FDA’s Office of Regulatory Affairs and discuss the current focus at the agency on building a workforce proficient in data science tools such as R.
Many (if not most) R users learn R and use it in an interactive session. However, these scripts may require optimization and modularization in order to scale or run in an automated way. This talk explores best practices in R programming allowing you to write cleaner, modular and repeatable code. We discuss functional programming, meta programming and MLOps with R.
It’s all too easy to write an app or report or add an update and suddenly, cold bead of sweat running down your back, realize everything is broken. In this talk you’ll learn how to think about avoiding this moment with good R promotion practices. You’ll learn about a general framework for code promotion, as well as specific tools you can use to make deployments risk-free and easy.
The best known security posture is to “assume breach” and build your policy from there. The most likely possibility is this assumption is correct. You might be lucky and you could be wrong. Let’s take a look at the numbers.
Fantasy football is a widely enjoyed activity where you compete against friends, colleagues, or strangers, to see who’s real life football players outperform each other from week to week. While player performance is largely random, you can improve your chances of winning with deft analysis. In particular, the initial draft at the beginning of the season can get your fantasy football team started on the right track. In this presentation, Dusty will show how you can use the Tidyverse, web scraping, integer programming, and Shiny to start your fantasy football season off right.
Because language is one of the most abundant and information rich data sources that exists, Tommy dreams of a world where we can measure ideas and narratives with the same level of scientific rigor that we use to measure the economy, effects of medical interventions, and other quantitative phenomena. In this talk, Tommy will talk about his work with Latent Dirichlet Allocation to make measurement of language more scientific and how you can access the results of his research through a tidyverse adjacent package, tidylda.
Ever feel like you’re the only person on your team who “gets” data? Have data everywhere, but no insights? Merav Yuravlivker, CEO of Data Society, will share her experiences and best practices for developing a data-driven culture and a common data vocabulary. You’ll walk away with an understanding of what it really means to be data-driven and concrete steps that you can take to make a big impact in your organization.
How the Certus team helped national security analysts drowning in data by augmenting their data systems with human expertise and automated data engineering solutions that helped them tackle four disparate data sources, over 2 Bn records, and 40+ Terabytes of data while discovering along the way that enhanced knowledge beats rudimentary data and poorly-trained AI.
A gentle intro to dbt from some folks who have worked extensively in R over the years. It would be a light intro to analytics engineering, testing, and other prerequisites for good production ML in R.
Exploring how Bloomberg Law uses Shiny dashboards to make our proprietary data accessible to everyone in the business unit, not just the data scientists.
Getting data remains one of the major challenges facing data scientists and data analysts. Studies consistently show that data scientists and analysts spend most of their time simply getting and cleaning data. Apache Drill is an open source query engine that can radically simplify this process. In this talk, Charles will demonstrate how you can connect Apache Drill to a variety of data sources, to include traditional tabular data, APIs, and even binary data sources, and query them using standard SQL. You’ll also learn how to seamlessly get this data into a dataframe for further analysis.
With its adoption of the Minneapolis 2040 comprehensive plan, Minneapolis became the first major city in the U.S. to eliminate zoning regulations that ban the construction of duplexes and triplexes. The changes are part of a strategy to expand housing supply, increase housing affordability, and encourage equitable development. As the City implements the Minneapolis 2040 plan and related housing policies, many want to know whether the changes will have the desired effect on housing access and affordability relative to what would have occurred without the 2040 plan. We evaluate the impact of the plan using Synthetic Control Method on various housing-related indicators. We continue to track and monitor these indicators as more data become available. Learn more at: https://minneapolisfed.shinyapps.io/Minneapolis-Indicators/.
When working with textual data, having the capability for multiple researchers independently to mark-up the source text is vital. It enables persistent identification of critical information and discussion about agreement and disagreement about what counts as evidence resulting in traceability and transparency.
While there are a number of commercial and open source tools available, finding a feature set and cost point for use in isolated networks with rapidly changing resources has been difficult.
Qual is an early-stage Shiny app to meet that need.
This talk describes a few examples of how Army organizations are managing their data, and we use R to support. First, the talk will cover, in generalities, progress in building databases, cloud environments, and limitations that organizations face. Then, we will discuss examples of Shiny apps indicative of what these organizations use as well as Shiny code that streamlines large applications.
We live in a 3D world filled with 3D data: why limit yourself to 2D data visualizations when your data are three dimensional? In this talk, I will show how you can visualize an entire city in R, building a 3D model of every single building in Washington DC directly from open geospatial data sources. Using the rayverse universe of packages and the sf package, we will recreate a Washington Post visualization showing the 3D path of a helicopter descending on the 2020 Black Lives Matter protests in only a few lines of code. Additionally, we will use the rayrender package to visualize the point-of-view of the helicopter itself, and show how you can interactively fly through and visualize massive 3D scenes entirely in your R session.
In an attempt to deliver solutions as quickly as possible, non-technical co-workers and customers often butt heads with data scientists. A “let’s get it built as quickly as possible to meet this customer need” mindset may look like an asset, but the “tech debt”– the backlog created when writing unsustainable, non-reusable code – created in the process to deliver so quickly is a hidden mission risk. Like racking up debt on your credit card, organizations don’t just have to pay their tech debt but often pay interest in the process – with sneaky creditors calling debt due when you least expect it. That’s why in this talk, Benjy Braun who is the Chief Architect of the DC-based, tech start-up 202 Group, will explain everything data scientists need to know about tech debt so that their non-technical colleagues and customers will listen. Participants can expect to walk away with a better understanding of what tech debt is and why it’s a business and mission issue – not just a tech team issue, tips for balancing the need to deliver quickly while scaling sustainability (spoiler: not all tech debt is bad), frameworks for how data scientists can better communicate in support of shared goals, and strategies for paying back the debt you have without sacrificing mission success in the process. The bad news is that tech debt is one of the biggest hidden mission risks issues affecting organizations of all sizes. The good news is that once you make a mindset shift you can accelerate strategies for meeting mission needs sustainably.
The Occupational Requirements Survey (ORS) is used to collect information on requirements related to the critical tasks of a well-defined and specified job. Field economists use the ORS to collect information regarding the critical tasks of a job as open text. While these critical tasks are essential to the primary purpose of the job and are collected in the ORS survey process, this text is not fit to publish. Our goal is to classify tasks to further the extensive research into task data and publish task data for public consumption. We can use another data source to support the task of classification. That source, Occupational Information Network or “ONET”, contains only occupation and not job level data but could be leveraged as a taxonomy for classifying tasks. ONET could be a way to classify ORS task and act as a taxonomy. Topic modeling is a type of statistical model for discovering the abstract ‘topics’ that occur in a collection of documents. In our case, the documents would be ONET generalized work activities. We use topic modeling to classify the data and then we evaluate the results. We create LDA models to describe and fit the data. We hope to present to you all our findings of measuring models of ONET data to compare to ORS and hopefully lead to publishing task data for public consumption.
If you are interested in being a sponsor for the 2022 Government & Public Sector R Conference, please contact us at email@example.com