• Home
  • About
  • Tickets
  • New York
    • About
    • Speakers
    • Workshops
    • Agenda
  • Gov & Public Sector
    • About
  • Videos
  • More
    • Auction
    • Job Board
    • Code of Conduct

NY R Conference


Workshops: July 11-12   |   Location: Columbia University

Conference: July 13-14   |   Location: FIAF Manhattan

Workshops: July 11-12

Location: TBA


Conference: July 13-14

Location: FIAF Manhattan

✕

Click here to buy tickets
Sell tickets online with Ticket Tailor



Speakers

Headshot of Rob Hyndman
Rob Hyndman

Professor

Monash University, Australia

@robjhyndman

Talk: Being Open to Being Open

Headshot of Asmae Toumi
Asmae Toumi

Director of Analytics and Research

PursueCare

@asmae_toumi

Headshot of Wes McKinney
Wes McKinney

CTO & Co-founder

Voltron Data

@wesmckinn

Talk: Leveling Up the Data Stack: Thoughts on the Last 15 Years

Headshot of Caitlin Hudon
Caitlin Hudon

Data Scientist

Figma

@beeonaposy

Headshot of Ayanthi Gunawardana
Ayanthi Gunawardana

Senior Data Analyst

1-800-FLOWERS.COM, Inc.

@HoneyBadger88

Talk: CaRtography: Creating Accurate and Beautiful Maps in R

Headshot of Max Kuhn
Max Kuhn

Scientist

Posit

@topepos

Talk: The Post-Modeling Model to Fix the Model

Headshot of Molly Huie
Molly Huie

Team Lead, Data Analysis & Surveys

Bloomberg Industry Group

@mollyhuie

Talk: How to Interrogate Data Like a Journalist (Joint talk with Andrew Wallender)

Headshot of Jared P. Lander
Jared P. Lander

Chief Data Scientist

Lander Analytics

@jaredlander

Talk: Building an R Package with LLMs

Headshot of Bob Rudis
Bob Rudis

V.P. Research & Data Science

GreyNoise Intelligence

@hrbrmstr

Talk: Into the WebR-Verse

Headshot of Emily Riederer
Emily Riederer

Senior Manager of Data Science & Analytics

Capital One

@EmilyRiederer

Headshot of Hamdan Azhar
Hamdan Azhar

Founder

PRISMOJI

@hamdanazhar

Headshot of Jessica Duncan
Jessica Duncan

Greenlight

Marketing Data Scientist

Talk: Give Credit Where Credit Is Due: Data-Driven Approach to Marketing Channel Attribution

Headshot of Matt Dupree
Matt Dupree

Founder

EXORVA

@philosohacker

Talk: OpenAI's Embeddings are Cooler than ChatGPT: An Intro to using OpenAI's Embeddings API

Headshot of Caterina Constantinescu
Caterina Constantinescu

Principal Consultant

GlobalLogic

@c__constantine

Talk: Deconstructing LLM Application: Key Considerations to Deliver Custom Solutions

Headshot of Mike Band
Mike Band

Sr. Manager, Research & Analytics

NFL Next Gen Stats

@MBandNFL

Talk: The Many Models in Production at NFL Next Gen Stats

Headshot of Ryan Klein
Ryan Klein

Principal IT Data Scientist

Continental Resources

@KleinR_1980

Talk: Using Plumber to Expose Models In Excel

Headshot of Daniel Chen
Daniel Chen

Post-Doc Research and Teaching Fellow & Data Science Educator

University of British Columbia & Lander Analytics

@chendaniely

Headshot of George Perrett
George Perrett

Director of Research and Data Analysis

New York University

@NYU_PRIISM

Talk: Bayesian Boosting

Headshot of Andrew Wallender
Andrew Wallender

Investigative Data Reporter

Bloomberg Industry Group

@andrewwallender

Talk: How to Interrogate Data Like a Journalist (Joint talk with Molly Huie)


More speakers coming soon!




Workshops

Workshop leader headshot

Tidy Time Series and Forecasting in R

Hosted by Rob Hyndman
Tue, Jul 11 - Wed, Jul 12 | 9:00am - 5:00pm
It is common for organizations to collect huge amounts of data over time, and existing time series analysis tools are not always suitable to handle the scale, frequency and structure of the data collected. In this workshop, we will look at some packages and methods that have been developed to handle the analysis of large collections of time series. On day 1, we will look at the tsibble data struc... ...cture for flexibly managing collections of related time series. We will look at how to do data wrangling, data visualizations and exploratory data analysis. We will explore feature-based methods to explore time series data in high dimensions. A similar feature-based approach can be used to identify anomalous time series within a collection of time series, or to cluster or classify time series. Primary packages for day 1 will be tsibble, lubridate and feasts (along with the tidyverse of course). Day 2 will be about forecasting. We will look at some classical time series models and how they are automated in the fable package, and we will explore the creation of ensemble forecasts and hybrid forecasts. Best practices for evaluating forecast accuracy will also be covered. Finally, we will look at forecast reconciliation, allowing millions of time series to be forecast in a relatively short time while accounting for constraints on how the series are related. (In-Person Only)
Workshop leader headshot

Machine Learning in R

Hosted by Max Kuhn
Tue, Jul 11 - Wed, Jul 12 | 9:00am - 5:00pm
Join Max Kuhn on a tour through Machine Learning in R, with emphasis on using the software as opposed to general explanations of model building. This workshop is an abbreviated introduction to the tidymodels framework for modeling. You'll learn about data preparation, model fitting, model assessment and predictions. The focus will be on data splitting and resampling, data pre-processing and featur... ...re engineering, model creation, evaluation, and tuning. This is not a deep learning course and will focus on tabular data. Pre-requisites: some experience with modeling in R and the tidyverse (don't need to be experts); prior experience with lm is enough to get started and learn advanced modeling techniques. In case participants can’t install the packages on their machines, RStudio Server Pro instances will be available that are pre-loaded with the appropriate packages and GitHub repository. (In-Person & Virtual)
Workshop leader headshot

Bayesian Data Analysis and Stan

Hosted by Jonah Gabry
Tue, Jul 11 - Wed, Jul 12 | 9:00am - 5:00pm
This workshop will introduce the basics of applied Bayesian data analysis, the Stan modeling language, and how to interface with Stan from R. Participants will learn to write their own models in the Stan language, run them in R, and use a variety of R packages to work with the results. (In-Person & Virtual)
Workshop leader headshot

Causal Inference in R

Hosted by Malcolm Barrett & Lucy D'Agostino McGowan
Tue, Jul 11 - Wed, Jul 12 | 9:00am - 5:00pm
In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. In both data science and academic research, prediction modeling is often not enough; to answer many questions, we need to approach them causally. In this workshop, we’ll teach the essential elem... ...ments of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. We’ll also show that by distinguishing predictive models from causal models, we can better take advantage of both tools. You’ll be able to use the tools you already know--the tidyverse, regression models, and more--to answer the questions that are important to your work. This course is for you if you: -Know how to fit a linear regression model in R -Have a basic understanding of data manipulation and visualization using tidyverse tools -Are interested in understanding the fundamentals behind how to move from estimating correlations to causal relationships (In-Person & Virtual)



Agenda

Tuesday, Jul 11

  • 08:00 AM - 09:00 AM

    Registration & Breakfast

  • 09:00 AM - 05:00 PM

    Workshop: Rob Hyndman

    Tidy Time Series and Forecasting in R ...

    It is common for organizations to collect huge amounts of data over time, and existing time series analysis tools are not always suitable to handle the scale, frequency and structure of the data collected. In this workshop, we will look at some packages and methods that have been developed to handle the analysis of large collections of time series. On day 1, we will look at the tsibble data structure for flexibly managing collections of related time series. We will look at how to do data wrangling, data visualizations and exploratory data analysis. We will explore feature-based methods to explore time series data in high dimensions. A similar feature-based approach can be used to identify anomalous time series within a collection of time series, or to cluster or classify time series. Primary packages for day 1 will be tsibble, lubridate and feasts (along with the tidyverse of course). Day 2 will be about forecasting. We will look at some classical time series models and how they are automated in the fable package, and we will explore the creation of ensemble forecasts and hybrid forecasts. Best practices for evaluating forecast accuracy will also be covered. Finally, we will look at forecast reconciliation, allowing millions of time series to be forecast in a relatively short time while accounting for constraints on how the series are related. (In-Person Only)
  • 09:00 AM - 05:00 PM

    Workshop: Max Kuhn

    Machine Learning in R ...

    Join Max Kuhn on a tour through Machine Learning in R, with emphasis on using the software as opposed to general explanations of model building. This workshop is an abbreviated introduction to the tidymodels framework for modeling. You'll learn about data preparation, model fitting, model assessment and predictions. The focus will be on data splitting and resampling, data pre-processing and feature engineering, model creation, evaluation, and tuning. This is not a deep learning course and will focus on tabular data. Pre-requisites: some experience with modeling in R and the tidyverse (don't need to be experts); prior experience with lm is enough to get started and learn advanced modeling techniques. In case participants can’t install the packages on their machines, RStudio Server Pro instances will be available that are pre-loaded with the appropriate packages and GitHub repository. (In-Person & Virtual)
  • 09:00 AM - 05:00 PM

    Workshop: Jonah Gabry

    Bayesian Data Analysis and Stan ...

    This workshop will introduce the basics of applied Bayesian data analysis, the Stan modeling language, and how to interface with Stan from R. Participants will learn to write their own models in the Stan language, run them in R, and use a variety of R packages to work with the results. (In-Person & Virtual)
  • 09:00 AM - 05:00 PM

    Workshop: Malcolm Barrett & Lucy D'Agostino McGowan

    Causal Inference in R ...

    In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. In both data science and academic research, prediction modeling is often not enough; to answer many questions, we need to approach them causally. In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. We’ll also show that by distinguishing predictive models from causal models, we can better take advantage of both tools. You’ll be able to use the tools you already know--the tidyverse, regression models, and more--to answer the questions that are important to your work. This course is for you if you: -Know how to fit a linear regression model in R -Have a basic understanding of data manipulation and visualization using tidyverse tools -Are interested in understanding the fundamentals behind how to move from estimating correlations to causal relationships (In-Person & Virtual)
Workshop tickets sold separately

Wednesday, Jul 12

  • 08:00 AM - 09:00 AM

    Registration & Breakfast

  • 09:00 AM - 05:00 PM

    Workshop: Rob Hyndman

    Tidy Time Series and Forecasting in R ...

    It is common for organizations to collect huge amounts of data over time, and existing time series analysis tools are not always suitable to handle the scale, frequency and structure of the data collected. In this workshop, we will look at some packages and methods that have been developed to handle the analysis of large collections of time series. On day 1, we will look at the tsibble data structure for flexibly managing collections of related time series. We will look at how to do data wrangling, data visualizations and exploratory data analysis. We will explore feature-based methods to explore time series data in high dimensions. A similar feature-based approach can be used to identify anomalous time series within a collection of time series, or to cluster or classify time series. Primary packages for day 1 will be tsibble, lubridate and feasts (along with the tidyverse of course). Day 2 will be about forecasting. We will look at some classical time series models and how they are automated in the fable package, and we will explore the creation of ensemble forecasts and hybrid forecasts. Best practices for evaluating forecast accuracy will also be covered. Finally, we will look at forecast reconciliation, allowing millions of time series to be forecast in a relatively short time while accounting for constraints on how the series are related. (In-Person Only)
  • 09:00 AM - 05:00 PM

    Workshop: Max Kuhn

    Machine Learning in R ...

    Join Max Kuhn on a tour through Machine Learning in R, with emphasis on using the software as opposed to general explanations of model building. This workshop is an abbreviated introduction to the tidymodels framework for modeling. You'll learn about data preparation, model fitting, model assessment and predictions. The focus will be on data splitting and resampling, data pre-processing and feature engineering, model creation, evaluation, and tuning. This is not a deep learning course and will focus on tabular data. Pre-requisites: some experience with modeling in R and the tidyverse (don't need to be experts); prior experience with lm is enough to get started and learn advanced modeling techniques. In case participants can’t install the packages on their machines, RStudio Server Pro instances will be available that are pre-loaded with the appropriate packages and GitHub repository. (In-Person & Virtual)
  • 09:00 AM - 05:00 PM

    Workshop: Jonah Gabry

    Bayesian Data Analysis and Stan ...

    This workshop will introduce the basics of applied Bayesian data analysis, the Stan modeling language, and how to interface with Stan from R. Participants will learn to write their own models in the Stan language, run them in R, and use a variety of R packages to work with the results. (In-Person & Virtual)
  • 09:00 AM - 05:00 PM

    Workshop: Malcolm Barrett & Lucy D'Agostino McGowan

    Causal Inference in R ...

    In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. In both data science and academic research, prediction modeling is often not enough; to answer many questions, we need to approach them causally. In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. We’ll also show that by distinguishing predictive models from causal models, we can better take advantage of both tools. You’ll be able to use the tools you already know--the tidyverse, regression models, and more--to answer the questions that are important to your work. This course is for you if you: -Know how to fit a linear regression model in R -Have a basic understanding of data manipulation and visualization using tidyverse tools -Are interested in understanding the fundamentals behind how to move from estimating correlations to causal relationships (In-Person & Virtual)
Workshop tickets sold separately

Thursday, Jul 13

  • 08:00 AM - 08:50 AM

    Registration & Breakfast

  • 08:50 AM - 09:00 AM

    Opening Remarks

  • 09:00 AM - 09:20 AM

    Emily Riederer

  • 09:25 AM - 09:45 AM

    Matt Dupree

    OpenAI's Embeddings are Cooler than ChatGPT: An Intro to using OpenAI's Embeddings API ...

    There's been a lot of talk about ChatGPT, but not enough talk about OpenAI's embedding models. Embeddings are a language model's representation of the meaning of text, and in this talk, we cover how we can use OpenAI embeddings API to solve classification and recommendation problems. We'll also cover how it can be used to intelligently search through documents and any other kind of text. I'll end with a quick demo showing how I'm using embeddings to search across application copy to help users find out how to do things within the software they use.
  • 09:50 AM - 10:10 AM

    Jessica Duncan

    Give Credit Where Credit Is Due: Data-Driven Approach to Marketing Channel Attribution ...

    Knowing which step in the customer journey is most influential in driving a conversion is crucial to optimizing marketing efficiency. Traditional approaches assign all credit to the first touch or last touch; newer rule-based approaches, such as linear, U-shaped, and time-decay, apply formulaic assignment of credit to touchpoints depending on the order they occur in. Using Markov chain modeling techniques, we can arrive at a more robust, algorithmic understanding of the steps in our customer journey.
  • 10:10 AM - 10:40 AM

    Break

  • 10:40 AM - 11:00 AM

    Asmae Toumi

  • 11:05 AM - 11:25 AM

    Jared P. Lander

    Building an R Package with LLMs ...

    Can an LLM build an entire R package? We are going to prompt engineer our way to a working package. We are going to use a series of prompts to first build functions and write the roxygen documentation. After that we'll request it provide the steps for creating the package scafolding such as the DESCRIPTION file and folder structure. Then we will have the LLM write units tests, something that often falls be the wayside. We'll see how quickly the LLM can do all this for us as opposed to using the standard package building tools.
  • 11:30 AM - 11:50 AM

    Daniel Chen

  • 11:50 AM - 01:00 PM

    Lunch

  • 01:00 PM - 01:20 PM

    Mike Band

    The Many Models in Production at NFL Next Gen Stats

  • 01:25 PM - 01:45 PM

    Rob Hyndman

    Being Open to Being Open ...

    I will reflect on 30+ years of experience in producing open-source software and open-access resources. We'll explore the many benefits of working openly and publicly, including academic, commercial, and social good advantages. Discover how adopting an open mindset can lead to increased collaboration and innovation, as developers and users work together to enhance software and other resources to meet their needs. Open-source software is also more secure and reliable, thanks to the collective review of code by many eyes. We'll also explore the benefits of open-access resources, such as educational materials, research papers, and datasets. By making these resources openly available, we can promote access to knowledge and encourage collaboration among researchers and educators. Move beyond using open-source materials to be a developer of open resources, and help make the world more collaborative, innovative, and equitable.
  • 01:45 PM - 02:15 PM

    Break

  • 02:15 PM - 02:35 PM

    Bob Rudis

    Into the WebR-Verse ...

    In early 2022, intrepid scientist Dr. George Stagg created the first WebAssembly (WASM) version of R — dubbed "WebR" — and captured the imagination of scores of RStats enthusiasts. One year later, WebR 0.1.x has been unleashed, and has expanded the R universe to every browser on every device across the galaxy. In this session, we'll take a WebR-slinging journey into and through the WebR-Verse, explaining what it is, the heroic efforts taken to bring it to life, why it is a game-changer for R, and show you practical examples of how to tap into the potential of this amazing new technology, and sling WebR apps of your own.
  • 02:40 PM - 03:00 PM

    Molly Huie & Andrew Wallender

    How to Interrogate Data Like a Journalist ...

    This talk will explore how R can be used to produce data-driven news stories and graphics. We’ll explore best practices, important questions to ask of data, and how to better communicate complicated topics for a mass audience.
  • 03:05 PM - 03:25 PM

    TBD

  • 03:25 PM - 03:55 PM

    Break

  • 03:55 PM - 04:15 PM

    Ayanthi Gunawardana

    CaRtography: Creating Accurate and Beautiful Maps in R ...

    One of the more niche areas of data science is geographic data science, or the art of using geographic information to derive and present location-specific insights. This talk will cover basic geospatial concepts and data formats, the essential elements of a map, how to import geospatial data in to R, the types of geospatial packages used to manipulate this data, and how to accurately present this data on a static map for exploratory and presentation purposes. Participants will learn what makes a map misleading and how to ensure their analysis shows accurate insights and is easy for users to understand.
  • 04:20 PM - 04:40 PM

    TBD

  • 04:40 PM - 04:50 PM

    Closing Remarks

  • 04:50 PM - 06:30 PM

    Happy Hour

Friday, Jul 14

  • 09:00 AM - 09:50 AM

    Registration & Breakfast

  • 09:50 AM - 10:00 AM

    Opening Remarks

  • 10:00 AM - 10:20 AM

    George Perrett

    Bayesian Boosting ...

    Bayesian Additive Regression Trees (BART) is a powerful machine learning algorithm that combines the power of Boosted Regression Trees and Bayesian Inference. BART is well-suited for both causal inference and prediction problems. In this talk, I will provide an overview of BART, explain how it works, and discuss its benefits for various applications. BART requires almost no tuning, includes built-in prediction and credible intervals, and can be extended to account for non-independent data structures. BART is implemented in the dbarts family of R pages and included in the tidymodels framework and I'll discuss how this powerful class of models can be easily utilized with R!
  • 10:25 AM - 10:45 AM

    Hamdan Azhar

  • 10:45 AM - 11:15 AM

    Break

  • 11:15 AM - 11:35 AM

    Ryan Klein

    Using Plumber to Expose Models In Excel ...

    Getting useful models into the users hands can be a real game-changer for many organizations. While ShinyR is a great option to allow user interactivity, sometimes, users want to work in a spreadsheet. This talk will guide the audience through linking a Microsoft Excel workbook to a plumber R script via API to allow for the analysis of different combinations of variables and outputs.
  • 11:40 AM - 12:00 PM

    Caterina Constantinescu

    Deconstructing LLM Application: Key Considerations to Deliver Custom Solutions ...

    The rate of advancement in AI research (and LLMs specifically) won't have escaped anybody's attention by now. This degree of progress naturally lends itself to hot takes, memes and dramatisation, when reality is much more nuanced. This talk will explore why AI won't 'take away our jobs' just yet, by discussing some of the real-world constraints and customisations still required. For instance, licensing, privacy and data ownership are legitimate talking points before large-scale adoption in industry. In addition, data collection/scraping for model fine-tuning might also be non-trivial to implement depending on the specific scenario at hand, and experience/UI design may similarly require considerable conscious thought before LLM deployment / integration. In this talk, I will discuss these issues (and more!) to highlight that LLM use involves a high degree of subtlety and customisation, acting as counterweights to now common hyperbolae on the topic.
  • 12:05 PM - 12:25 PM

    TBD

  • 12:25 PM - 01:35 PM

    Lunch

  • 01:35 PM - 01:55 PM

    Caitlin Hudon

  • 02:00 PM - 02:20 PM

    Wes McKinney

    Leveling Up the Data Stack: Thoughts on the Last 15 Years ...

    In this talk, I will discuss some of my observations about data science tools and related computing infrastructure, both where we have come from and where we may be going in the coming years. I will connect these trends to different projects I’ve been involved with, such as pandas, Apache Arrow, Apache Parquet, Ibis, Substrait, and others. A particular focus will be on the themes of modularity and composability of system components. I will also touch on the rapid evolution of storage and computing hardware and how that may direct future development efforts in open source data software.
  • 02:25 PM - 02:45 PM

    Max Kuhn

    The Post-Modeling Model to Fix the Model ...

    It's possible to get a model that has good numerical performance but has predictions that are not really consistent with the data. Model calibration is a tool that can fix this. We'll show some examples of poor predictions and how different calibration tools can re-align them to the data.
  • 02:45 PM - 03:15 PM

    Break

  • 03:15 PM - 04:15 PM

    Jon Krohn & Chris Wiggins

    SuperDataScience Podcast Live

  • 04:15 PM - 04:25 PM

    Closing Remarks