The New York Data Science & AI Conference Presented by Lander Analytics

Thank you for attending The New York Data Science & AI Conference Presented by Lander Analytics!

View the talk videos & photos from the event.

Agenda

Tuesday, Aug 26

09:00 AM - 09:50 AM

Registration & Breakfast
09:50 AM - 10:00 AM

Opening Remarks
10:00 AM - 10:20 AM

Generating New Data Through Simulating an NFL Game

Ally Blake

Senior Coordinator, Football Data & Analytics @ NFL

More details

A play-level game simulation model can precisely quantify the impact and any intended and unintended consequences of potential rules changes. The goal of this project is to mimic a real NFL game, review the results of simulated games, compare to what one expects from a real game, and evaluate the results based on points per game and plays per game.
10:25 AM - 10:45 AM

Bloomberg Law/Fenwick’s Silicon Valley Top 150 Companies by Revenue

Princess Onyiri

Senior Data Scientist @ Bloomberg Law

More details

An overview into the making of the Bloomberg Law - Fenwick SV150 list. This talk walks you through the lifecycle of the project, navigating discrepancies in the data, and the key components that are taken into consideration when compiling the rank list.
10:45 AM - 11:15 AM

Break
11:15 AM - 11:35 AM

From Hype to Value: Mastering Gen AI Outcomes Through Effective Evaluations

Bill Gold

Head of AI @ Citizens Bank

More details

Large Language Models (LLMs) can produce varied quality results, necessitating effective evaluations. Understanding quality drivers like lossy models and reinforcement learning from human feedback (RLHF) is crucial. The presentation reviews evaluation approaches, including benchmarks, LLM as a judge, crowdsourcing, and human experts. Each method has trade-offs regarding scale, cost, and alignment with specific use cases. Developing strong intuitions about LLM behavior is vital for discerning impactful applications. Better practices involve early evaluation, aligning approaches with use cases, and leveraging human experts for gold standards.
11:40 AM - 12:00 PM

How I Learned to Stop Worrying and Love Vibe Coding

Jared P. Lander

Chief Data Scientist @ Lander Analytics

More details

A couple years ago I gave a talk showing how I used ChatGPT to perform data analysis using R which left me with medium feelings about the tool. Over the past few months I focused on setting up a complex application on Kubernetes, a technology new to me personally, so I took the opportunity to see how much coding agents could help me with an unfamiliar endeavor. During this talk we'll see what I learned about vibe coding, what I found worked, what didn't work and share my thoughts about it's utility.
12:05 PM - 12:25 PM

LLMs, Chatbots, and Dashboards: Visualize Your Data with Natural Language

Daniel Chen

Post-Doc Research and Teaching Fellow & Data Science Educator @ University of British Columbia

More details

LLMs have a lot of hype around them these days. Let's demystify how they work and see how we can put them in context for data science use. As data scientists, we want to make sure our results are inspectable, reliable, reproducible, and replicable. We already have many tools to help us in this front. However, LLMs provide a new challenge; we may not always be given the same results back from a query. This means trying to work out areas where LLMs excel in, and use those behaviors in our data science artifacts. This talk will introduce you to LLms, the Ellmer, and Chatlas packages for R and Python, and how they can be integrated into a Shiny to create an AI-powered dashboard. We'll see how we can leverage the tasks LLMs are good at to better our data science products.
12:25 PM - 01:25 PM

Lunch
01:25 PM - 01:45 PM

Understanding Artificial General Intelligence Futures: Toward a Shared Vocabulary for Policy Planning

Swaptik Chowdhury

Assistant Policy Researcher @ RAND Corporation

More details

The absence of shared terminology for describing artificial general intelligence (AGI) futures has created persistent misunderstandings in policy discussions. This talk presents a classification framework that makes explicit the assumptions underlying different AGI scenarios. It introduces six descriptive axes: locus of control, governance primacy, alignment level, takeoff speed, human AGI relationship, and AGI volition. These pivots enable policymakers and researchers to describe a comprehensive range of plausible AGI futures, clarify disagreements, and assess the robustness of policy responses across various scenarios. The framework supports more informed and structured dialogue in AI governance and foresight planning.
01:50 PM - 02:10 PM

From Prediction to Foundation: Deep Learning Models for Patient Care Optimization

Jon Sege & Vincent Pan

White Plains Hospital

More details

Accurately predicting medical specialties and follow-up appointment needs from electronic medical records (EMR) can enhance personalized care and resource allocation. We present a neural network pipeline using LSTM architecture and diagnosis codes to predict specialties and flag follow-ups, addressing challenges like class imbalance and interpretability. We also introduce an approach to learning general-purpose medical code embeddings from EMR sequences, using Masked Code Modeling (MCM) and Graph Convolutional Transformers (GCT). Functioning as a clinical foundation model, these embeddings encode relationships among medical codes and can be leveraged across diverse downstream applications in healthcare analytics. Finally, we will discuss an application that leverages these models to provide actionable decision-points for our quality and coding teams.
02:15 PM - 02:35 PM

Dealing with Duplicate Data (in R)

Erin Grand

Senior Data Scientist @ TRAILS to Wellness

More details

Maintaining high data quality is essential for accurate analyses and decision-making. Unfortunately, high data quality is often hard to come by (especially for non-profits). This talk will focus on some "how-tos" of cleaning data and removing duplicates to enhance data integrity. We'll go over common causes of duplicates, how to use the {{janitor}} package to identify and remove duplicates, and business practices that can help prevent these data issues from happening in the first place.
02:35 PM - 03:15 PM

Break
03:15 PM - 03:35 PM

Narratives in Data from the First Seven Months of Congestion Pricing

Gayan Seneviratna

Senior Data Scientist @ MTA Data and Analytics

More details

On January 5th, 2025, New York began an ambitious effort to reshape how we share our city streets. Under the Central Business District Tolling Program (CBDTP), drivers were charged on entry into southern Manhattan. By pricing the negative externality of driving, the program aimed to reduce vehicle entries, improve bus speeds in the CBD, and curtail polluting emissions. Did the program succeed in these goals? To answer that question, the MTA turned to its Data and Analytics team. My talk covers our work over the first months of CBDTP: the ingestion of tolling data with Apache Airflow, the building of a counterfactual for vehicle entries, and the storytelling needed for powerful, data-driven narratives.
03:40 PM - 04:00 PM

Using Quarto to Create Reports for Hospital Quality Improvement: Safely Lowering Cesarean Rates

Xilin Chen

Analytics Manager @ Michigan Medicine

More details

Cesarean births can be lifesaving, but when overused, they carry significant health risks and financial consequences. At the Obstetrics Initiative, we work with over 65 Michigan hospitals to reduce unnecessary cesareans and improve maternal care. As the analytics manager, I lead the development of performance reports that inform both individual hospitals and our internal clinical teams. These reports are based on near real-time data from each hospital's EHR system and are used to monitor trends, identify areas for improvement, and guide targeted support. We rely heavily on Quarto to generate these reports. Its flexibility and scalability allow us to create daily, monthly, and annual reporting pipelines, as well as custom one-off reports tailored to specific needs. By leveraging Quarto’s and R’s powerful analytics capabilities, we produce clea, and impactful reports that our hospital partners rely on—and love. In this talk, I’ll share how we’ve structured our reporting system with Quarto, the benefits it’s brought to our quality improvement work, and practical tips for building similar tools in your own healthcare or analytics environment.
04:00 PM - 04:10 PM

Closing Remarks

Wednesday, Aug 27

09:00 AM - 09:50 AM

Registration & Breakfast
09:50 AM - 10:00 AM

Opening Remarks
10:00 AM - 10:20 AM

How to Use Free, Open-Source Text Embeddings to Accomplish Advanced Textual Analysis

Andrew Wallender

Data Editor @ Bloomberg Industry Group

More details

Have you ever found yourself overwhelmed by a mountain of documents to analyze? Discover how to easily find insights using a powerful, open-source text embedding model. Convert your text into meaningful numerical representations to go beyond keyword matching and uncover thematic clusters, find conceptually similar documents, and build semantic search applications. This session will show you how to leverage free tools to perform sophisticated textual analysis at a fraction of the cost of LLMs.
10:25 AM - 10:45 AM

Processing Document Collections with LLMs: A Practical Workflow

Abigail Haddad

Data Scientist/Machine Learning Engineer @ Freelance

More details

Every organization has stacks of similar documents - customer complaints, resumes, error logs - that need the same questions answered about each one. This talk walks through a systematic workflow for processing these document collections with LLMs, covering the full pipeline from messy input to polished results. I'll share real examples and the tools I built to automate the repetitive parts across different projects, including wrangling LLM outputs and creating modular display components.
10:45 AM - 11:15 AM

Break
11:15 AM - 11:35 AM

AI with ROI: How to Use ML to Cut Your Snowflake Bill in Half

Ben Lerner

CEO & Co-Founder @ Espresso AI

More details

Espresso AI uses two main techniques to run workloads substantially faster and cheaper on data warehouses: better job scheduling and automatically incrementalizing queries. This talk will dive into the technical details behind both approaches.
11:40 AM - 12:00 PM

The Future of Data Work: Leading Your Own Career Growth

Rick Saporta

Founder @ OSiLLiA

More details

The nature of data work is rapidly evolving. Data practitioners are in high demand, but we also need to navigate a hiring landscape that at times feels self-contradictory and overall just hard to make sense of. What does career growth look like? How should directors and department organize their teams? In this talk, we’ll explore the future of data, which skills will be crucial, new directions that have been made possible, and of course how to set and lead your own career growth. Drawing on lessons from building and advising data teams across industries, Ricky will share practical ways to not only future-proof your career but to encourage you to be the one to lead it.
12:05 PM - 12:25 PM

Reshaping Enterprises During Era of Agentic AI - Becoming Frontier Firm

Jay Sen & Vikas Sawhney

Microsoft

More details

As organizations accelerate their adoption of generative AI, a transformative shift is underway—from task automation to intelligent, autonomous systems. This session introduces the concept of Agentic AI, where AI agents evolve from passive assistants to proactive collaborators capable of executing complex workflows, making contextual decisions, and adapting dynamically to enterprise needs. We will explore the strategic journey toward becoming a “frontier organization,” one that integrates human judgment with AI agents to scale operations, enhance productivity, and unlock new business value. The presentation outlines three phases of enterprise AI maturity, supported by industry trends and data, including the rise of multi-model experimentation, trust and governance challenges, and the emergence of agentic pilots across sectors. Through real-world use cases in customer service, sales, IT operations, security, and insurance, attendees will gain insight into how agentic systems are already delivering measurable impact. The session also addresses the technical, organizational, and ethical considerations critical to successful deployment—including hybrid human–AI workflows, governance-by-design, and workforce enablement. Participants will leave with a comprehensive blueprint for building agentic enterprises, an understanding of Microsoft’s Agentic AI platform capabilities, and actionable strategies to navigate the next wave of AI-driven transformation.
12:25 PM - 01:25 PM

Lunch
01:25 PM - 01:45 PM

AI-Enabled Lean Teams: The Evolving Playbook

Karen Moon

Founding CCO @ Spangle AI

More details

Startups today are scaling to $20M, $100M, even billion-dollar valuations with remarkably few employees, powered by copilots and agents embedded across every function. In this interactive session, we’ll cut through what’s working, what’s hype, and what’s next—while exploring how agentic AI is reshaping org structures, leadership, influence, and career growth. Our goal is for you to walk away with a new hack, fresh insights, a new lens on your career, or maybe even your next startup idea.
01:50 PM - 02:30 PM

What's Going On In There? Bayesian Tools for Understanding a Fitted Model

Andrew Gelman

Professor @ Department of Statistics and Department of Political Science, Columbia University

More details

A fitted model is a mapping from data (including information encoded in the model specification and the prior) to inferences. We present Bayesian tools for model understanding, generalizing existing analytical methods such as R-squared, graphical approaches such as influence plots, and workflow procedures such as sensitivity analysis. Our goal is to better understand the weaknesses of our fitted models and ultimately to learn more from data.
02:30 PM - 03:10 PM

Break
03:10 PM - 03:30 PM

Measuring LLM Effectiveness

Max Kuhn

Scientist @ Posit, PBC

More details

How can we quantify how accurately LLMs perform? In late 2024, Anthropic released a preprint of a manuscript about statistically analyzing model evaluations. The concepts are on target, but the statistical tactics have narrow applicability. A simpler statistical framework can be used to quantify LLM models that can be used in many more scenarios/experimental designs. We'll describe these methods and show an example.
03:35 PM - 03:55 PM

Rethinking A/B Tests for Connected Users and Teams

Chiraag Kala

Lead Data Scientist @ Airbnb

More details

Imagine running experiments that assume everyone acts independently—only to realize that, in practice, people collaborate. On platforms like Airbnb, for example, hosts can partner with other co-hosts, forming networks where behavior is interdependent. Traditional A/B tests that randomize individuals ignore these relationships, leading to biased results. In this talk, we introduce a new experimentation design that treats entire teams or networks as the unit of analysis. By accounting for collaboration and spillover effects, this approach yields more accurate results and leads to a better user experience. Whether you're building an e-commerce platform, a social network, or a financial product where users operate in groups, network-aware experiments will help you make smarter, more reliable decisions.
04:00 PM - 04:20 PM

Tiering Teams and Predicting Attendance with R

Kelsey McDonald

Senior Manager, Strategy & Business Intelligence @ New York Yankees

More details

Discussing how professional sports teams use linear regression to predict attendance and revenue, then perform k-means clustering on those outputs to create opponent tiers based on the schedule each season.
04:20 PM - 04:30 PM

Closing Remarks
04:30 PM - 05:30 PM

Happy Hour at Beer Authority

300 W 40th St, New York, NY

More details

Join us after the event for a casual Happy Hour just down the street at Beer Authority. This is a great opportunity to unwind, connect with fellow attendees and continue the conversation!

Speakers

Andrew Gelman

Professor

Department of Statistics and Department of Political Science, Columbia University

@StatModeling

Talk: What's Going On In There? Bayesian Tools for Understanding a Fitted Model

Ben Lerner

CEO & Co-Founder

Espresso AI

@ben_lern

Talk: AI with ROI: How to Use ML to Cut Your Snowflake Bill in Half

Kelsey McDonald

Senior Manager, Strategy & Business Intelligence

New York Yankees

@Yankees

Talk: Tiering Teams and Predicting Attendance with R

Max Kuhn

Scientist

Posit, PBC

@topepo.bsky.social‬

Talk: Measuring LLM Effectiveness

Andrew Wallender

Data Editor

Bloomberg Industry Group

@BBGIndustry

Talk: How to Use Free, Open-Source Text Embeddings to Accomplish Advanced Textual Analysis

Ally Blake

Senior Coordinator, Football Data & Analytics

NFL

@Ally_Blake3

Talk: Generating New Data Through Simulating an NFL Game

Gayan Seneviratna

Senior Data Scientist

MTA Data and Analytics

@MTA

Talk: Narratives in Data from the First Seven Months of Congestion Pricing

Karen Moon

Founding CCO

Spangle AI

@geekchicmoon

Talk: AI-Enabled Lean Teams: The Evolving Playbook

Jay Sen

Principal Solution Engineer, Azure AI Apps Global Black Belt

Microsoft

@Microsoft

Talk: Reshaping Enterprises During Era of Agentic AI - Becoming Frontier Firm (Joint talk with Vikas Sawhney)

Xilin Chen

Analytics Manager

Michigan Medicine

@xilinch

Talk: Using Quarto to Create Reports for Hospital Quality Improvement: Safely Lowering Cesarean Rates

Jared P. Lander

Chief Data Scientist

Lander Analytics

@jaredlander

Talk: How I Learned to Stop Worrying and Love Vibe Coding

Jon Sege

AVP, Data Management & Analytics

White Plains Hospital

@WPHospital

Talk: From Prediction to Foundation: Deep Learning Models for Patient Care Optimization (Joint talk with Vincent Pan)

Chiraag Kala

Lead Data Scientist

Airbnb

@Airbnb

Talk: Rethinking A/B Tests for Connected Users and Teams

Princess Onyiri

Senior Data Scientist

Bloomberg Law

@BLaw

Talk: Bloomberg Law/Fenwick’s Silicon Valley Top 150 Companies by Revenue

Bill Gold

Head of AI

Citizens Bank

@BillCGold

Talk: From Hype to Value: Mastering Gen AI Outcomes Through Effective Evaluations

Abigail Haddad

Data Scientist/Machine Learning Engineer

Freelance

@presentofcoding

Talk: Processing Document Collections with LLMs: A Practical Workflow

Swaptik Chowdhury

Assistant Policy Researcher

RAND Corporation

@RANDCorporation

Talk: Understanding Artificial General Intelligence Futures: Toward a Shared Vocabulary for Policy Planning

Vikas Sawhney

Azure Data & Analytics Product Specialist

Microsoft

@Microsoft

Talk: Reshaping Enterprises During Era of Agentic AI - Becoming Frontier Firm (Joint talk with Jay Sen)

Erin Grand

Senior Data Scientist

TRAILS to Wellness

@astroeringrand

Talk: Dealing with Duplicate Data (in R)

Vincent Pan

Data Scientist

White Plains Hospital

@WPHospital

Talk: From Prediction to Foundation: Deep Learning Models for Patient Care Optimization (Joint talk with Jon Sege)

Daniel Chen

Post-Doc Research and Teaching Fellow & Data Science Educator

University of British Columbia

@chendaniely

Talk: LLMs, Chatbots, and Dashboards: Visualize Your Data with Natural Language

Rick Saporta

Founder

OSiLLiA

@ricksaporta

Talk: The Future of Data Work: Leading Your Own Career Growth

Agenda

Tuesday, Aug 26

09:00 AM - 09:50 AM

09:50 AM - 10:00 AM

10:00 AM - 10:20 AM

10:25 AM - 10:45 AM

10:45 AM - 11:15 AM

11:15 AM - 11:35 AM

11:40 AM - 12:00 PM

12:05 PM - 12:25 PM

12:25 PM - 01:25 PM

01:25 PM - 01:45 PM

01:50 PM - 02:10 PM

02:15 PM - 02:35 PM

02:35 PM - 03:15 PM

03:15 PM - 03:35 PM

03:40 PM - 04:00 PM

04:00 PM - 04:10 PM

Wednesday, Aug 27

09:00 AM - 09:50 AM

09:50 AM - 10:00 AM

10:00 AM - 10:20 AM

10:25 AM - 10:45 AM

10:45 AM - 11:15 AM

11:15 AM - 11:35 AM

11:40 AM - 12:00 PM

12:05 PM - 12:25 PM

12:25 PM - 01:25 PM

01:25 PM - 01:45 PM

01:50 PM - 02:30 PM

02:30 PM - 03:10 PM

03:10 PM - 03:30 PM

03:35 PM - 03:55 PM

04:00 PM - 04:20 PM

04:20 PM - 04:30 PM

04:30 PM - 05:30 PM

Speakers

Andrew Gelman

Ben Lerner

Kelsey McDonald

Max Kuhn

Andrew Wallender

Ally Blake

Gayan Seneviratna

Karen Moon

Jay Sen

Xilin Chen

Jared P. Lander

Jon Sege

Chiraag Kala

Princess Onyiri

Bill Gold

Abigail Haddad

Swaptik Chowdhury

Vikas Sawhney

Erin Grand

Vincent Pan

Daniel Chen

Rick Saporta

Sponsors

Gold

Silver

Bronze

Supporting