The New York Data Science & AI Conference Presented by Lander Analytics
The New York Data Science & AI Conference
Presented by Lander Analytics
Workshops: Monday, August 25
Conference: Tuesday, August 26 & Wednesday, August 27
Location: Microsoft Office (Times Square)
Immerse yourself in the evolving world of data science and AI at The New York Data Science & AI Conference Presented by Lander Analytics—an intimate, single-track conference designed to connect data professionals and showcase world-class speakers. It will take place August 26 & 27, with hands-on workshops on August 25.
For over a decade, the New York R Conference has been the go-to event for R enthusiasts and data professionals. Now, as the field evolves, so does our conference. We continue to bring together data professionals from diverse industries such as technology, finance, healthcare, sports, retail, and more—fostering a space for exceptional content and unparalleled networking.
Attend in New York City and virtually to explore the latest advancements, share insights, and shape the future of data science and AI.
Agenda
Monday, Aug 25
-
08:30 AM - 09:15 AM
Registration & Breakfast
-
09:15 AM - 05:00 PM
Workshop: Machine Learning in R
Max Kuhn
Scientist @ Posit
More details
Join Max Kuhn on a tour through Machine Learning in R, with emphasis on using the software as opposed to general explanations of model building. This workshop is an abbreviated introduction to the tidymodels framework for modeling.
You'll learn about data preparation, model fitting, model assessment and predictions. The focus will be on data splitting and resampling, data pre-processing and feature engineering, model creation, evaluation, and tuning. This is not a deep learning course and will focus on tabular data.
Pre-requisites: some experience with modeling in R and the tidyverse (don't need to be experts); prior experience with lm is enough to get started and learn advanced modeling techniques. In case participants can’t install the packages on their machines, RStudio Server Pro instances will be available that are pre-loaded with the appropriate packages and GitHub repository.
(In-Person & Virtual Ticket Options Available)
-
09:15 AM - 05:00 PM
Workshop: Introduction of LLMs/AI
Daniel Chen
Post-Doc Research and Teaching Fellow & Data Science Educator @ University of British Columbia
More details
There's a lot of hype around AI around all their use cases and the amazing things they can do. This workshop aims to demystify how LLMs work and give you a practical understanding of how they work and how to use them beyond the desktop application.
We will code with LLMs using an API and introduce two packages, chatlas (python) and ellmer (r), that make it easier to interact with LLMs programitaclly. We'll also see how we can use LLMs in Shiny dashboards to create a user interface with your own chat bots. We'll then expand on these basics to learn about RAG (retrieval augmented generation) and tool calling to give our bots more context and abilities to work as "agents". Finally, we'll see how we can use LLMs to help us work with our data science projects.
(In-Person & Virtual Ticket Options Available)
Tuesday, Aug 26
-
09:00 AM - 09:50 AM
Registration & Breakfast
-
09:50 AM - 10:00 AM
Opening Remarks
-
10:00 AM - 10:20 AM
Generating New Data Through Simulating an NFL Game
Ally Blake
Senior Coordinator, Football Data & Analytics @ NFL
More details
A play-level game simulation model can precisely quantify the impact and any intended and unintended consequences of potential rules changes. The goal of this project is to mimic a real NFL game, review the results of simulated games, compare to what one expects from a real game, and evaluate the results based on points per game and plays per game. -
10:25 AM - 10:45 AM
How We Built It: An Offseason of Development at NFL Next Gen Stats
Mike Band
Sr. Manager, Research & Analytics @ NFL Next Gen Stats
More details
Since 2015, the NFL's Next Gen Stats group has logged every inch of on-field movement, marrying RFID tracking with high-frequency telemetry to craft the league’s definitive data set. Yet the real sprint begins once the Super Bowl confetti settles: from February through August, a lean crew of engineers and research analysts dives headfirst into model retraining, tech-debt triage, product development, and new-metric creation. This offseason, in partnership with Amazon Web Services, we tolled out deep-learning models that pinpoint coverage responsibilities and quantify performance with film-room granularity. We also overhauled QA tooling, launched fan-facing dashboards ahead of the Combine and NFL Draft, and spun up AI assistants that accelerate research workflows while powering new fan experiences. The result is a pipeline that turns offseason experimentation into broadcast-ready insights and app features by Week 1. In this talk, I’ll pull back the curtain on that offseason playbook—sharing the wins, the challenges, and the lessons learned along the way. -
10:45 AM - 11:15 AM
Break
-
11:15 AM - 11:35 AM
From Hype to Value: Mastering Gen AI Outcomes Through Effective Evaluations
Bill Gold
Head of AI @ Citizens Bank
More details
Large Language Models (LLMs) can produce varied quality results, necessitating effective evaluations. Understanding quality drivers like lossy models and reinforcement learning from human feedback (RLHF) is crucial. The presentation reviews evaluation approaches, including benchmarks, LLM as a judge, crowdsourcing, and human experts. Each method has trade-offs regarding scale, cost, and alignment with specific use cases. Developing strong intuitions about LLM behavior is vital for discerning impactful applications. Better practices involve early evaluation, aligning approaches with use cases, and leveraging human experts for gold standards. -
11:40 AM - 12:00 PM
How I Learned to Stop Worrying and Love Vibe Coding
Jared P. Lander
Chief Data Scientist @ Lander Analytics
More details
A couple years ago I gave a talk showing how I used ChatGPT to perform data analysis using R which left me with medium feelings about the tool. Over the past few months I focused on setting up a complex application on Kubernetes, a technology new to me personally, so I took the opportunity to see how much coding agents could help me with an unfamiliar endeavor. During this talk we'll see what I learned about vibe coding, what I found worked, what didn't worked and share my thoughts about it's utility. -
12:05 PM - 12:25 PM
LLMs, Chatbots, and Dashboards: Visualize Your Data with Natural Language
Daniel Chen
Post-Doc Research and Teaching Fellow & Data Science Educator @ University of British Columbia
More details
LLMs have a lot of hype around them these days. Let's demystify how they work and see how we can put them in context for data science use. As data scientists, we want to make sure our results are inspectable, reliable, reproducible, and replicable. We already have many tools to help us in this front. However, LLMs provide a new challenge; we may not always be given the same results back from a query. This means trying to work out areas where LLMs excel in, and use those behaviors in our data science artifacts. This talk will introduce you to LLms, the Ellmer, and Chatlas packages for R and Python, and how they can be integrated into a Shiny to create an AI-powered dashboard. We'll see how we can leverage the tasks LLMs are good at to better our data science products. -
12:25 PM - 01:25 PM
Lunch
-
01:25 PM - 01:45 PM
Understanding Artificial General Intelligence Futures: Toward a Shared Vocabulary for Policy Planning
Swaptik Chowdhury
Assistant Policy Researcher @ RAND Corporation
More details
The absence of shared terminology for describing artificial general intelligence (AGI) futures has created persistent misunderstandings in policy discussions. This talk presents a classification framework that makes explicit the assumptions underlying different AGI scenarios. It introduces six descriptive axes: locus of control, governance primacy, alignment level, takeoff speed, human AGI relationship, and AGI volition. These pivots enable policymakers and researchers to describe a comprehensive range of plausible AGI futures, clarify disagreements, and assess the robustness of policy responses across various scenarios. The framework supports more informed and structured dialogue in AI governance and foresight planning. -
01:50 PM - 02:10 PM
From Prediction to Foundation: Deep Learning Models for Patient Care Optimization
Jon Sege & Vincent Pan
-
02:15 PM - 02:35 PM
Dealing with Duplicate Data (in R)
Erin Grand
Senior Data Scientist @ TRAILS to Wellness
More details
Maintaining high data quality is essential for accurate analyses and decision-making. Unfortunately, high data quality is often hard to come by (especially for non-profits). This talk will focus on some "how-tos" of cleaning data and removing duplicates to enhance data integrity. We'll go over common causes of duplicates, how to use the {{janitor}} package to identify and remove duplicates, and business practices that can help prevent these data issues from happening in the first place. -
02:35 PM - 03:15 PM
Break
-
03:15 PM - 03:35 PM
Narratives in Data from the First Seven Months of Congestion Pricing
Gayan Seneviratna
Senior Data Scientist @ MTA Data and Analytics
More details
On January 5th, 2025, New York began an ambitious effort to reshape how we share our city streets. Under the Central Business District Tolling Program (CBDTP), drivers were charged on entry into southern Manhattan. By pricing the negative externality of driving, the program aimed to reduce vehicle entries, improve bus speeds in the CBD, and curtail polluting emissions. Did the program succeed in these goals? To answer that question, the MTA turned to its Data and Analytics team. My talk covers our work over the first months of CBDTP: the ingestion of tolling data with Apache Airflow, the building of a counterfactual for vehicle entries, and the storytelling needed for powerful, data-driven narratives. -
03:40 PM - 04:00 PM
Using Quarto to Create Reports for Hospital Quality Improvement: Safely Lowering Cesarean Rates
Xilin Chen
Analytics Manager @ Michigan Medicine
More details
Cesarean births can be lifesaving, but when overused, they carry significant health risks and financial consequences. At the Obstetrics Initiative, we work with over 65 Michigan hospitals to reduce unnecessary cesareans and improve maternal care. As the analytics manager, I lead the development of performance reports that inform both individual hospitals and our internal clinical teams. These reports are based on near real-time data from each hospital's EHR system and are used to monitor trends, identify areas for improvement, and guide targeted support. We rely heavily on Quarto to generate these reports. Its flexibility and scalability allow us to create daily, monthly, and annual reporting pipelines, as well as custom one-off reports tailored to specific needs. By leveraging Quarto’s and R’s powerful analytics capabilities, we produce clea, and impactful reports that our hospital partners rely on—and love. In this talk, I’ll share how we’ve structured our reporting system with Quarto, the benefits it’s brought to our quality improvement work, and practical tips for building similar tools in your own healthcare or analytics environment. -
04:05 PM - 04:25 PM
TBD
-
04:25 PM - 04:35 PM
Closing Remarks
Wednesday, Aug 27
-
09:00 AM - 09:50 AM
Registration & Breakfast
-
09:50 AM - 10:00 AM
Opening Remarks
-
10:00 AM - 10:20 AM
How to Use Free, Open-Source Text Embeddings to Accomplish Advanced Textual Analysis
Andrew Wallender
Data Editor @ Bloomberg Industry Group
More details
Have you ever found yourself overwhelmed by a mountain of documents to analyze? Discover how to easily find insights using a powerful, open-source text embedding model. Convert your text into meaningful numerical representations to go beyond keyword matching and uncover thematic clusters, find conceptually similar documents, and build semantic search applications. This session will show you how to leverage free tools to perform sophisticated textual analysis at a fraction of the cost of LLMs. -
10:25 AM - 10:45 AM
Processing Document Collections with LLMs: A Practical Workflow
Abigail Haddad
Data Scientist/Machine Learning Engineer @ Freelance
More details
Every organization has stacks of similar documents - customer complaints, resumes, error logs - that need the same questions answered about each one. This talk walks through a systematic workflow for processing these document collections with LLMs, covering the full pipeline from messy input to polished results. I'll share real examples and the tools I built to automate the repetitive parts across different projects, including wrangling LLM outputs and creating modular display components. -
10:45 AM - 11:15 AM
Break
-
11:15 AM - 11:35 AM
AI with ROI: How to Use ML to Cut Your Snowflake Bill in Half
Ben Lerner
CEO & Co-Founder @ Espresso AI
More details
Espresso AI uses two main techniques to run workloads substantially faster and cheaper on data warehouses: better job scheduling and automatically incrementalizing queries. This talk will dive into the technical details behind both approaches. -
11:40 AM - 12:00 PM
TBD
-
12:05 PM - 12:25 PM
Reshaping Enterprises During Era of Agentic AI - Becoming Frontier Firm
Jay Sen & Vikas Sawhney
Microsoft
More details
As organizations accelerate their adoption of generative AI, a transformative shift is underway—from task automation to intelligent, autonomous systems. This session introduces the concept of Agentic AI, where AI agents evolve from passive assistants to proactive collaborators capable of executing complex workflows, making contextual decisions, and adapting dynamically to enterprise needs. We will explore the strategic journey toward becoming a “frontier organization,” one that integrates human judgment with AI agents to scale operations, enhance productivity, and unlock new business value. The presentation outlines three phases of enterprise AI maturity, supported by industry trends and data, including the rise of multi-model experimentation, trust and governance challenges, and the emergence of agentic pilots across sectors. Through real-world use cases in customer service, sales, IT operations, security, and insurance, attendees will gain insight into how agentic systems are already delivering measurable impact. The session also addresses the technical, organizational, and ethical considerations critical to successful deployment—including hybrid human–AI workflows, governance-by-design, and workforce enablement. Participants will leave with a comprehensive blueprint for building agentic enterprises, an understanding of Microsoft’s Agentic AI platform capabilities, and actionable strategies to navigate the next wave of AI-driven transformation. -
12:25 PM - 01:25 PM
Lunch
-
01:25 PM - 01:45 PM
Karen Moon
Founding CCO @ Spangle AI
-
01:50 PM - 02:30 PM
What's Going On In There? Bayesian Tools for Understanding a Fitted Model
Andrew Gelman
Professor @ Department of Statistics and Department of Political Science, Columbia University
More details
A fitted model is a mapping from data (including information encoded in the model specification and the prior) to inferences. We present Bayesian tools for model understanding, generalizing existing analytical methods such as R-squared, graphical approaches such as influence plots, and workflow procedures such as sensitivity analysis. Our goal is to better understand the weaknesses of our fitted models and ultimately to learn more from data. -
02:30 PM - 03:10 PM
Break
-
03:10 PM - 03:30 PM
Measuring LLM Effectiveness
Max Kuhn
Scientist @ Posit, PBC
More details
How can we quantify how accurately LLMs perform? In late 2024, Anthropic released a preprint of a manuscript about statistically analyzing model evaluations. The concepts are on target, but the statistical tactics have narrow applicability. A simpler statistical framework can be used to quantify LLM models that can be used in many more scenarios/experimental designs. We'll describe these methods and show an example. -
03:35 PM - 03:55 PM
Rethinking A/B Tests for Connected Users and Teams
Chiraag Kala
Lead Data Scientist @ Airbnb
More details
Imagine running experiments that assume everyone acts independently—only to realize that, in practice, people collaborate. On platforms like Airbnb, for example, hosts can partner with other co-hosts, forming networks where behavior is interdependent. Traditional A/B tests that randomize individuals ignore these relationships, leading to biased results. In this talk, we introduce a new experimentation design that treats entire teams or networks as the unit of analysis. By accounting for collaboration and spillover effects, this approach yields more accurate results and leads to a better user experience. Whether you're building an e-commerce platform, a social network, or a financial product where users operate in groups, network-aware experiments will help you make smarter, more reliable decisions. -
04:00 PM - 04:20 PM
Tiering Teams and Predicting Attendance with R
Kelsey McDonald
Senior Manager, Strategy & Business Intelligence @ New York Yankees
More details
Discussing how professional sports teams use linear regression to predict attendance and revenue, then perform k-means clustering on those outputs to create opponent tiers based on the schedule each season. -
04:20 PM - 04:30 PM
Closing Remarks
-
04:30 PM - 05:30 PM
Happy Hour at Beer Authority
300 W 40th St, New York, NY
More details
Join us after the event for a casual Happy Hour just down the street at Beer Authority. This is a great opportunity to unwind, connect with fellow attendees and continue the conversation!
Speakers

Andrew Gelman
Professor
Department of Statistics and Department of Political Science, Columbia University
Talk: What's Going On In There? Bayesian Tools for Understanding a Fitted Model

Ben Lerner
CEO & Co-Founder
Espresso AI
Talk: AI with ROI: How to Use ML to Cut Your Snowflake Bill in Half

Kelsey McDonald
Senior Manager, Strategy & Business Intelligence
New York Yankees
Talk: Tiering Teams and Predicting Attendance with R

Andrew Wallender
Data Editor
Bloomberg Industry Group
Talk: How to Use Free, Open-Source Text Embeddings to Accomplish Advanced Textual Analysis

Ally Blake
Senior Coordinator, Football Data & Analytics
NFL
Talk: Generating New Data Through Simulating an NFL Game

Gayan Seneviratna
Senior Data Scientist
MTA Data and Analytics
Talk: Narratives in Data from the First Seven Months of Congestion Pricing


Jay Sen
Principal Solution Engineer, Azure AI Apps Global Black Belt
Microsoft
Talk: Reshaping Enterprises During Era of Agentic AI - Becoming Frontier Firm (Joint talk with Vikas Sawhney)

Xilin Chen
Analytics Manager
Michigan Medicine
Talk: Using Quarto to Create Reports for Hospital Quality Improvement: Safely Lowering Cesarean Rates

Jared P. Lander
Chief Data Scientist
Lander Analytics
Talk: How I Learned to Stop Worrying and Love Vibe Coding

Jon Sege
AVP, Data Management & Analytics
White Plains Hospital
Talk: From Prediction to Foundation: Deep Learning Models for Patient Care Optimization (Joint talk with Vincent Pan)

Chiraag Kala
Lead Data Scientist
Airbnb
Talk: Rethinking A/B Tests for Connected Users and Teams

Bill Gold
Head of AI
Citizens Bank
Talk: From Hype to Value: Mastering Gen AI Outcomes Through Effective Evaluations

Mike Band
Sr. Manager, Research & Analytics
NFL Next Gen Stats
Talk: How We Built It: An Offseason of Development at NFL Next Gen Stats

Abigail Haddad
Data Scientist/Machine Learning Engineer
Freelance
Talk: Processing Document Collections with LLMs: A Practical Workflow

Swaptik Chowdhury
Assistant Policy Researcher
RAND Corporation
Talk: Understanding Artificial General Intelligence Futures: Toward a Shared Vocabulary for Policy Planning

Vikas Sawhney
Azure Data & Analytics Product Specialist
Microsoft
Talk: Reshaping Enterprises During Era of Agentic AI - Becoming Frontier Firm (Joint talk with Jay Sen)

Erin Grand
Senior Data Scientist
TRAILS to Wellness
Talk: Dealing with Duplicate Data (in R)

Vincent Pan
Data Scientist
White Plains Hospital
Talk: From Prediction to Foundation: Deep Learning Models for Patient Care Optimization (Joint talk with Jon Sege)

Daniel Chen
Post-Doc Research and Teaching Fellow & Data Science Educator
University of British Columbia
Talk: LLMs, Chatbots, and Dashboards: Visualize Your Data with Natural Language
More speakers coming soon…
Workshops

Machine Learning in R
Hosted by Max Kuhn
Monday, Aug 25 | 9:15am - 5:00pm
Join Max Kuhn on a tour through Machine Learning in R, with emphasis on using the software as opposed to general explanations of model building. This workshop is an abbreviated introduction to the tidymodels framework for modeling.
You'll learn about data preparation, model fitting, model assessment and predictions. The focus will be on data splitting and resampling, data pre-processing and feature engineering, model creation, evaluation, and tuning. This is not a deep learning course and will focus on tabular data.
Pre-requisites: some experience with modeling in R and the tidyverse (don't need to be experts); prior experience with lm is enough to get started and learn advanced modeling techniques. In case participants can’t install the packages on their machines, RStudio Server Pro instances will be available that are pre-loaded with the appropriate packages and GitHub repository.
(In-Person & Virtual Ticket Options Available)

Introduction of LLMs/AI
Hosted by Daniel Chen
Monday, Aug 25 | 9:15am - 5:00pm
There's a lot of hype around AI around all their use cases and the amazing things they can do. This workshop aims to demystify how LLMs work and give you a practical understanding of how they work and how to use them beyond the desktop application.
We will code with LLMs using an API and introduce two packages, chatlas (python) and ellmer (r), that make it easier to interact with LLMs programitaclly. We'll also see how we can use LLMs in Shiny dashboards to create a user interface with your own chat bots. We'll then expand on these basics to learn about RAG (retrieval augmented generation) and tool calling to give our bots more context and abilities to work as "agents". Finally, we'll see how we can use LLMs to help us work with our data science projects.
(In-Person & Virtual Ticket Options Available)
Sponsors
© Lander Analytics 2025