The New York Data Science & AI Conference Presented by Lander Analytics
The New York Data Science & AI Conference
Presented by Lander Analytics
Workshops: Monday, August 25
Conference: Tuesday, August 26 & Wednesday, August 27
Location: New York City
Immerse yourself in the evolving world of data science and AI at The New York Data Science & AI Conference Presented by Lander Analytics—an intimate, single-track conference designed to connect data professionals and showcase world-class speakers. It will take place August 26 & 27, with hands-on workshops on August 25.
For over a decade, the New York R Conference has been the go-to event for R enthusiasts and data professionals. Now, as the field evolves, so does our conference. We continue to bring together data professionals from diverse industries such as technology, finance, healthcare, sports, retail, and more—fostering a space for exceptional content and unparalleled networking.
Attend in New York City and virtually to explore the latest advancements, share insights, and shape the future of data science and AI.
Agenda
Monday, Aug 25
-
08:30 AM - 09:15 AM
Registration & Breakfast
-
09:15 AM - 05:00 PM
Workshop: Machine Learning in R
Max Kuhn
Scientist @ Posit
More details
Join Max Kuhn on a tour through Machine Learning in R, with emphasis on using the software as opposed to general explanations of model building. This workshop is an abbreviated introduction to the tidymodels framework for modeling.
You'll learn about data preparation, model fitting, model assessment and predictions. The focus will be on data splitting and resampling, data pre-processing and feature engineering, model creation, evaluation, and tuning. This is not a deep learning course and will focus on tabular data.
Pre-requisites: some experience with modeling in R and the tidyverse (don't need to be experts); prior experience with lm is enough to get started and learn advanced modeling techniques. In case participants can’t install the packages on their machines, RStudio Server Pro instances will be available that are pre-loaded with the appropriate packages and GitHub repository.
(In-Person & Virtual Ticket Options Available)
-
09:15 AM - 05:00 PM
Workshop: Introduction of LLMs/AI
Daniel Chen
Post-Doc Research and Teaching Fellow & Data Science Educator @ University of British Columbia
More details
There's a lot of hype around AI around all their use cases and the amazing things they can do. This workshop aims to demystify how LLMs work and give you a practical understanding of how they work and how to use them beyond the desktop application.
We will code with LLMs using an API and introduce two packages, chatlas (python) and ellmer (r), that make it easier to interact with LLMs programitaclly. We'll also see how we can use LLMs in Shiny dashboards to create a user interface with your own chat bots. We'll then expand on these basics to learn about RAG (retrieval augmented generation) and tool calling to give our bots more context and abilities to work as "agents". Finally, we'll see how we can use LLMs to help us work with our data science projects.
(In-Person & Virtual Ticket Options Available)
Tuesday, Aug 26
-
09:00 AM - 09:50 AM
Registration & Breakfast
-
09:50 AM - 10:00 AM
Opening Remarks
-
10:00 AM - 10:20 AM
Generating New Data Through Simulating an NFL Game
Ally Blake
Senior Coordinator, Football Data & Analytics @ NFL
More details
A play-level game simulation model can precisely quantify the impact and any intended and unintended consequences of potential rules changes. The goal of this project is to mimic a real NFL game, review the results of simulated games, compare to what one expects from a real game, and evaluate the results based on points per game and plays per game. -
10:25 AM - 10:45 AM
How We Built It: An Offseason of Development at NFL Next Gen Stats
Mike Band
Sr. Manager, Research & Analytics @ NFL Next Gen Stats
-
10:45 AM - 11:15 AM
Break
-
11:15 AM - 11:35 AM
From Hype to Value: Mastering Gen AI Outcomes Through Effective Evaluations
Bill Gold
Head of AI @ Citizens Bank
More details
Large Language Models (LLMs) can produce varied quality results, necessitating effective evaluations. Understanding quality drivers like lossy models and reinforcement learning from human feedback (RLHF) is crucial. The presentation reviews evaluation approaches, including benchmarks, LLM as a judge, crowdsourcing, and human experts. Each method has trade-offs regarding scale, cost, and alignment with specific use cases. Developing strong intuitions about LLM behavior is vital for discerning impactful applications. Better practices involve early evaluation, aligning approaches with use cases, and leveraging human experts for gold standards. -
11:40 AM - 12:00 PM
How I Learned to Stop Worrying and Love Vibe Coding
Jared P. Lander
Chief Data Scientist @ Lander Analytics
-
12:05 PM - 12:25 PM
LLMs, Chatbots, and Dashboards: Visualize Your Data with Natural Language
Daniel Chen
Post-Doc Research and Teaching Fellow & Data Science Educator @ University of British Columbia
-
12:25 PM - 01:25 PM
Lunch
-
01:25 PM - 01:45 PM
TBD
-
01:50 PM - 02:30 PM
What's Going On In There? Bayesian Tools for Understanding a Fitted Model
Andrew Gelman
Professor @ Department of Statistics and Department of Political Science, Columbia University
More details
A fitted model is a mapping from data (including information encoded in the model specification and the prior) to inferences. We present Bayesian tools for model understanding, generalizing existing analytical methods such as R-squared, graphical approaches such as influence plots, and workflow procedures such as sensitivity analysis. Our goal is to better understand the weaknesses of our fitted models and ultimately to learn more from data. -
02:30 PM - 03:10 PM
Break
-
03:10 PM - 03:30 PM
Narratives in Data from the First Seven Months of Congestion Pricing
Gayan Seneviratna
Senior Data Scientist @ MTA Data and Analytics
More details
On January 5th, 2025, New York began an ambitious effort to reshape how we share our city streets. Under the Central Business District Tolling Program (CBDTP), drivers were charged on entry into southern Manhattan. By pricing the negative externality of driving, the program aimed to reduce vehicle entries, improve bus speeds in the CBD, and curtail polluting emissions. Did the program succeed in these goals? To answer that question, the MTA turned to its Data and Analytics team. My talk covers our work over the first months of CBDTP: the ingestion of tolling data with Apache Airflow, the building of a counterfactual for vehicle entries, and the storytelling needed for powerful, data-driven narratives. -
03:35 PM - 03:55 PM
Using Quarto to Create Reports for Hospital Quality Improvement: Safely Lowering Cesarean Rates
Xilin Chen
Analytics Manager @ Michigan Medicine
More details
Cesarean births can be lifesaving, but when overused, they carry significant health risks and financial consequences. At the Obstetrics Initiative, we work with over 65 Michigan hospitals to reduce unnecessary cesareans and improve maternal care. As the analytics manager, I lead the development of performance reports that inform both individual hospitals and our internal clinical teams. These reports are based on near real-time data from each hospital's EHR system and are used to monitor trends, identify areas for improvement, and guide targeted support. We rely heavily on Quarto to generate these reports. Its flexibility and scalability allow us to create daily, monthly, and annual reporting pipelines, as well as custom one-off reports tailored to specific needs. By leveraging Quarto’s and R’s powerful analytics capabilities, we produce clea, and impactful reports that our hospital partners rely on—and love. In this talk, I’ll share how we’ve structured our reporting system with Quarto, the benefits it’s brought to our quality improvement work, and practical tips for building similar tools in your own healthcare or analytics environment. -
04:00 PM - 04:20 PM
TBD
-
04:20 PM - 04:30 PM
Closing Remarks
Wednesday, Aug 27
-
09:00 AM - 09:50 AM
Registration & Breakfast
-
09:50 AM - 10:00 AM
Opening Remarks
-
10:00 AM - 10:20 AM
How to Use Free, Open-Source Text Embeddings to Accomplish Advanced Textual Analysis
Andrew Wallender
Data Editor @ Bloomberg Industry Group
More details
Have you ever found yourself overwhelmed by a mountain of documents to analyze? Discover how to easily find insights using a powerful, open-source text embedding model. Convert your text into meaningful numerical representations to go beyond keyword matching and uncover thematic clusters, find conceptually similar documents, and build semantic search applications. This session will show you how to leverage free tools to perform sophisticated textual analysis at a fraction of the cost of LLMs. -
10:25 AM - 10:45 AM
Processing Document Collections with LLMs: A Practical Workflow
Abigail Haddad
Data Scientist/Machine Learning Engineer @ Freelance
More details
Every organization has stacks of similar documents - customer complaints, resumes, error logs - that need the same questions answered about each one. This talk walks through a systematic workflow for processing these document collections with LLMs, covering the full pipeline from messy input to polished results. I'll share real examples and the tools I built to automate the repetitive parts across different projects, including wrangling LLM outputs and creating modular display components. -
10:45 AM - 11:15 AM
Break
-
11:15 AM - 11:35 AM
AI with ROI: How to Use ML to Cut Your Snowflake Bill in Half
Ben Lerner
CEO & Co-Founder @ Espresso AI
More details
Espresso AI uses two main techniques to run workloads substantially faster and cheaper on data warehouses: better job scheduling and automatically incrementalizing queries. This talk will dive into the technical details behind both approaches. -
11:40 AM - 12:00 PM
Fine-Tuning LLMs to Automate Energy Savings
Danya Murali
Lead Data Scientist @ Arbor
More details
In this talk, I’ll show how we turned our operations team’s deep energy expertise into curated training data to fine-tune LLMs that extract key information from notoriously inconsistent electricity bills, whose formats, field names, and terminology shift across utilities, rate plans, and energy sources. This pipeline now powers Arbor’s ability to give customers a clear view of their electricity costs and automatically broker them onto lower-cost suppliers, cutting manual work by over 90% and enabling us to scale efficiently. Our entire system rests on meticulous data engineering combined with human judgment and context, which remain essential for a reliable AI-driven solution. -
12:05 PM - 12:25 PM
TBD
-
12:25 PM - 01:25 PM
Lunch
-
01:25 PM - 01:45 PM
Rethinking A/B Tests for Connected Users and Teams
Chiraag Kala
Lead Data Scientist @ Airbnb
More details
Imagine running experiments that assume everyone acts independently—only to realize that, in practice, people collaborate. On platforms like Airbnb, for example, hosts can partner with other co-hosts, forming networks where behavior is interdependent. Traditional A/B tests that randomize individuals ignore these relationships, leading to biased results. In this talk, we introduce a new experimentation design that treats entire teams or networks as the unit of analysis. By accounting for collaboration and spillover effects, this approach yields more accurate results and leads to a better user experience. Whether you're building an e-commerce platform, a social network, or a financial product where users operate in groups, network-aware experiments will help you make smarter, more reliable decisions. -
01:50 PM - 02:10 PM
Dealing with Duplicate Data (in R)
Erin Grand
Senior Data Scientist @ TRAILS to Wellness
More details
Maintaining high data quality is essential for accurate analyses and decision-making. Unfortunately, high data quality is often hard to come by (especially for non-profits). This talk will focus on some "how-tos" of cleaning data and removing duplicates to enhance data integrity. We'll go over common causes of duplicates, how to use the {{janitor}} package to identify and remove duplicates, and business practices that can help prevent these data issues from happening in the first place. -
02:15 PM - 02:35 PM
Measuring LLM Effectiveness
Max Kuhn
Scientist @ Posit, PBC
More details
How can we quantify how accurately LLMs perform? In late 2024, Anthropic released a preprint of a manuscript about statistically analyzing model evaluations. The concepts are on target, but the statistical tactics have narrow applicability. A simpler statistical framework can be used to quantify LLM models that can be used in many more scenarios/experimental designs. We'll describe these methods and show an example. -
02:35 PM - 03:15 PM
Break
-
03:15 PM - 03:35 PM
From Prediction to Foundation: Deep Learning Models for Patient Care Optimization
Jon Sege & Vincent Pan
White Plains Hospital
More details
Accurately predicting medical specialties and follow-up appointment needs from electronic medical records (EMR) can enhance personalized care and resource allocation. We present a neural network pipeline using LSTM architecture and diagnosis codes to predict specialties and flag follow-ups, addressing challenges like class imbalance and interpretability. We also introduce an approach to learning general-purpose medical code embeddings from EMR sequences, using Masked Code Modeling (MCM) and Graph Convolutional Transformers (GCT). Functioning as a clinical foundation model, these embeddings encode relationships among medical codes and can be leveraged across diverse downstream applications in healthcare analytics. Finally, we will discuss an application that leverages these models to provide actionable decision-points for our quality and coding teams. -
03:40 PM - 04:00 PM
Understanding Artificial General Intelligence Futures: Toward a Shared Vocabulary for Policy Planning
Swaptik Chowdhury
Assistant Policy Researcher @ RAND Corporation
More details
The absence of shared terminology for describing artificial general intelligence (AGI) futures has created persistent misunderstandings in policy discussions. This talk presents a classification framework that makes explicit the assumptions underlying different AGI scenarios. It introduces six descriptive axes: locus of control, governance primacy, alignment level, takeoff speed, human AGI relationship, and AGI volition. These pivots enable policymakers and researchers to describe a comprehensive range of plausible AGI futures, clarify disagreements, and assess the robustness of policy responses across various scenarios. The framework supports more informed and structured dialogue in AI governance and foresight planning. -
04:05 PM - 04:25 PM
Tiering Teams and Predicting Attendance with R
Kelsey McDonald
Senior Manager, Strategy & Business Intelligence @ New York Yankees
-
04:25 PM - 04:35 PM
Closing Remarks
Speakers

Andrew Gelman
Professor
Department of Statistics and Department of Political Science, Columbia University
Talk: What's Going On In There? Bayesian Tools for Understanding a Fitted Model


Ben Lerner
CEO & Co-Founder
Espresso AI
Talk: AI with ROI: How to Use ML to Cut Your Snowflake Bill in Half


Kelsey McDonald
Senior Manager, Strategy & Business Intelligence
New York Yankees
Talk: Tiering Teams and Predicting Attendance with R
Andrew Wallender
Data Editor
Bloomberg Industry Group
Talk: How to Use Free, Open-Source Text Embeddings to Accomplish Advanced Textual Analysis

Gayan Seneviratna
Senior Data Scientist
MTA Data and Analytics
Talk: Narratives in Data from the First Seven Months of Congestion Pricing

Ally Blake
Senior Coordinator, Football Data & Analytics
NFL
Talk: Generating New Data Through Simulating an NFL Game

Jared P. Lander
Chief Data Scientist
Lander Analytics
Talk: How I Learned to Stop Worrying and Love Vibe Coding

Jon Sege
AVP, Data Management & Analytics
White Plains Hospital
Talk: From Prediction to Foundation: Deep Learning Models for Patient Care Optimization (Joint talk with Vincent Pan)

Xilin Chen
Analytics Manager
Michigan Medicine
Talk: Using Quarto to Create Reports for Hospital Quality Improvement: Safely Lowering Cesarean Rates

Chiraag Kala
Lead Data Scientist
Airbnb
Talk: Rethinking A/B Tests for Connected Users and Teams

Bill Gold
Head of AI
Citizens Bank
Talk: From Hype to Value: Mastering Gen AI Outcomes Through Effective Evaluations

Abigail Haddad
Data Scientist/Machine Learning Engineer
Freelance
Talk: Processing Document Collections with LLMs: A Practical Workflow

Mike Band
Sr. Manager, Research & Analytics
NFL Next Gen Stats
Talk: How We Built It: An Offseason of Development at NFL Next Gen Stats

Daniel Chen
Post-Doc Research and Teaching Fellow & Data Science Educator
University of British Columbia
Talk: LLMs, Chatbots, and Dashboards: Visualize Your Data with Natural Language

Swaptik Chowdhury
Assistant Policy Researcher
RAND Corporation
Talk: Understanding Artificial General Intelligence Futures: Toward a Shared Vocabulary for Policy Planning

Vincent Pan
Data Scientist
White Plains Hospital
Talk: From Prediction to Foundation: Deep Learning Models for Patient Care Optimization (Joint talk with Jon Sege)

Erin Grand
Senior Data Scientist
TRAILS to Wellness
Talk: Dealing with Duplicate Data (in R)
More speakers coming soon…
Workshops

Machine Learning in R
Hosted by Max Kuhn
Monday, Aug 25 | 9:15am - 5:00pm
Join Max Kuhn on a tour through Machine Learning in R, with emphasis on using the software as opposed to general explanations of model building. This workshop is an abbreviated introduction to the tidymodels framework for modeling.
You'll learn about data preparation, model fitting, model assessment and predictions. The focus will be on data splitting and resampling, data pre-processing and feature engineering, model creation, evaluation, and tuning. This is not a deep learning course and will focus on tabular data.
Pre-requisites: some experience with modeling in R and the tidyverse (don't need to be experts); prior experience with lm is enough to get started and learn advanced modeling techniques. In case participants can’t install the packages on their machines, RStudio Server Pro instances will be available that are pre-loaded with the appropriate packages and GitHub repository.
(In-Person & Virtual Ticket Options Available)

Introduction of LLMs/AI
Hosted by Daniel Chen
Monday, Aug 25 | 9:15am - 5:00pm
There's a lot of hype around AI around all their use cases and the amazing things they can do. This workshop aims to demystify how LLMs work and give you a practical understanding of how they work and how to use them beyond the desktop application.
We will code with LLMs using an API and introduce two packages, chatlas (python) and ellmer (r), that make it easier to interact with LLMs programitaclly. We'll also see how we can use LLMs in Shiny dashboards to create a user interface with your own chat bots. We'll then expand on these basics to learn about RAG (retrieval augmented generation) and tool calling to give our bots more context and abilities to work as "agents". Finally, we'll see how we can use LLMs to help us work with our data science projects.
(In-Person & Virtual Ticket Options Available)
Sponsors
© Lander Analytics 2025