Tecton
timezone
+00:00 GMT
SIGN IN
Livestream
apply(risk)

Join us for a free virtual event on data engineering and systems architecture for building machine learning risk and fraud detection systems! 

apply() is a community conference series for machine learning and data teams to discuss the practical data engineering challenges faced when building real-time machine learning systems. Participants learn from industry experts and share best practices with the community.

apply(risk) focuses on the specific challenges of building risk and fraud detection systems, from low-latency streaming and real-time data pipelines, to deploying features to production quickly and reliably.  

Join us at this event to discuss best practices, tools of choice, and emerging architectures to successfully build and manage production risk and fraud detection applications.

Speakers
Jake Weholt
Jake Weholt
Engineering Manager, Machine Learning - Fraud Detection @ Remitly
Francisco Arceo
Francisco Arceo
Engineering Manager @ Affirm
Devvret Rishi
Devvret Rishi
Co-founder & Chief Product Officer @ Predibase
Mike Del Balso
Mike Del Balso
Co-founder & CEO @ Tecton
Tocho Tochev
Tocho Tochev
Lead ML Engineer @ Tide
Aravind Maguluri
Aravind Maguluri
Lead Data Scientist @ Tide
Louis Brandy
Louis Brandy
VP Engineering @ Rockset
Dat Ngo
Dat Ngo
Data Scientist & ML Engineer @ Arize
Cooper Stimson
Cooper Stimson
Software Engineer, Machine Learning Platform @ Block
Vince Houdebine
Vince Houdebine
Sr. Solution Architect @ Tecton
Demetrios Brinkmann
Demetrios Brinkmann
Founder @ MLOps Community
Jake Weholt
Jake Weholt
Engineering Manager, Machine Learning - Fraud Detection @ Remitly
Francisco Arceo
Francisco Arceo
Engineering Manager @ Affirm
Devvret Rishi
Devvret Rishi
Co-founder & Chief Product Officer @ Predibase
Mike Del Balso
Mike Del Balso
Co-founder & CEO @ Tecton
Tocho Tochev
Tocho Tochev
Lead ML Engineer @ Tide
Aravind Maguluri
Aravind Maguluri
Lead Data Scientist @ Tide
Louis Brandy
Louis Brandy
VP Engineering @ Rockset
Dat Ngo
Dat Ngo
Data Scientist & ML Engineer @ Arize
Cooper Stimson
Cooper Stimson
Software Engineer, Machine Learning Platform @ Block
Vince Houdebine
Vince Houdebine
Sr. Solution Architect @ Tecton
Demetrios Brinkmann
Demetrios Brinkmann
Founder @ MLOps Community
Agenda
4:30 PM
4:35 PM
Opening / Closing
calendar

Welcome

Demetrios Brinkmann
4:35 PM
5:00 PM
Presentation
calendar

Powering ML Fraud Detection Models With Advanced Aggregations

ML models are an essential tool in combating fraud. They can improve fraud detection rates, reduce false positives, and be re-trained to identify new fraudulent behavior as fraudsters adapt.

However, fraud models require high-quality data that can be difficult to process and serve in production. Features typically require aggregations on streaming and real-time data, which are complex to build, compute intensive, and difficult to process at low latency.

In this talk, Mike will walk through a sample use case and show how aggregations are typically processed. He’ll then show how feature engineering frameworks, like the one offered by Tecton, can simplify the development of these features. He’ll explain how these frameworks are orchestrated under the hood to process data at <1 second, serve data with < 10ms latency, reduce processing costs, while ensuring consistency of offline and online data to improve model accuracy.

+ Read More
Mike Del Balso
5:00 PM
5:30 PM
Presentation
calendar

A Decade of Risk Machine Learning: Some Lessons

Over the last decade, Francisco has built software for machine learning models, data engineering, and risk at Affirm, Fast, Goldman Sachs, the Commonwealth Bank of Australia, and AIG. In this session, he'll discuss common pitfalls and some simple approaches to avoiding them.

He'll cover the importance of:

  • Treating model development as software
  • Viewing data through the lens of Data Producers and Consumers
  • Common mistakes in ML
  • Understanding the lineage of your data
  • Having a deep understanding of how your model interacts with the product experience and other software
+ Read More
Francisco Arceo
5:35 PM
6:05 PM
Presentation
calendar

Solving Twitter's Bot Problem With Less Than 10 Lines of Code

While working as the product lead for Kaggle, one challenge we needed to face was how to handle fraudulent user accounts—specifically ones that were being created to host pirated content (and abuse our free memory tier) or ones that would do crypto mining (and abuse our free compute tier).

Fraud detection has a hard set of challenges. In this particular case, two problems really stood out:

1) The data is imbalanced. There were at least 10+ legitimate accounts for every 1 fraudulent new account.

2) The data was multi-modal—for each user that had a Kaggle profile, we not only had the metadata about them (how long they had an account, how many times they had logged in) but also unstructured text data that was in their bios (which at times hosted links to pirated content), profile pictures, etc. This was harder to feed into a machine learning model.

In this talk, we’ll walk through how to build a fraud detection model for bot accounts using a public dataset of Twitter bot profiles that mirrors my challenge on Kaggle. We’ll cover:

  • How to set up the problem
  • How to build your first model, comparing both GBMs and Neural Networks
  • How to handle common cases of imbalance, like upsampling, class weight, etc.
  • How to deploy a model for fraud, which needs low-latency, high-throughput serving
+ Read More
Devvret Rishi
6:10 PM
6:20 PM
Break
calendar

Break

6:20 PM
6:30 PM
Lightning Talk
calendar

Access Control in ML Feature Platforms

Contemporary machine learning platforms connect a variety of data sources, tools, and applications, each with their own data governance requirements, constraints, and capabilities. In this talk, we’ll explore the role of the machine learning platform in integrating these elements, empowering data producers and feature engineers to own access control for their data and features.

+ Read More
Cooper Stimson
6:35 PM
6:45 PM
Lightning Talk
calendar

Fraud Prevention—Best Practices In ML Observability & Emerging Approaches for Multivariate Drift Detection

Fraud takes many forms across industries and is constantly evolving. Data scientists and MLOps professionals must similarly evolve in real time, staying a step ahead of model performance degradation and new attack vectors. In this 10-minute talk, we will cover best practices in ML observability for detecting and preventing fraud across industries. We will also discuss a novel approach to anomaly and drift detection using embeddings, UMAP dimensionality reduction, non-parametric clustering, and data visualization.

+ Read More
Dat Ngo
6:50 PM
7:20 PM
Presentation
calendar

Fighting Fraud with Machine Learning at Remitly

“The rise of digital transactions and online platforms has brought about new challenges in combating fraud. Traditional rule-based systems often fall short in identifying sophisticated fraud patterns, leading to significant financial losses and compromised user trust. This necessitates the adoption of advanced technologies such as machine learning for effective fraud detection and prevention. This conference talk aims to explore the utilization of machine learning algorithms and techniques to combat fraud across various domains.”

Sounds boring, right? While all true, fighting fraud with machine learning requires so much more than simply “adopting advanced technologies such as machine learning for effective fraud detection and prevention.” You can blame ChatGPT for that paragraph (and for also oversimplifying our heavily nuanced and complicated domain).

In this talk, we will explore what it actually takes to develop, launch, and maintain ML models in production in the highly dynamic and adversarial environment of FinTech risk. I will share how Remitly thinks about risk/fraud tradeoffs, how we frame problems to build successful ML products, and how this influences our team structures. I will also share some lessons (and mistakes) learned from years of developing valuable, robust, and customer-centric fraud models that score transactions every second of every day, to transform the lives of millions of immigrants and their families by providing the most trusted financial services on the planet.

+ Read More
Jake Weholt
7:25 PM
7:35 PM
Break
calendar

Break

7:35 PM
8:05 PM
Presentation
calendar

Challenges at the Intersection of ML & Real-Time Data: Lessons Learned Spam Fighting at Facebook

Spam fighting at scale occupies a unique niche at the intersection between real-time data infrastructure and high-powered anomaly detection and machine learning. When these disciplines collide, a whole host of interesting new challenges are presented by each to the other.

This talk draws on my experience building spam-fighting infrastructure at Facebook and real-time data experience at Rockset. I'll talk through some of these challenges and explore some of the mistakes engineers make when coming from one side into the other. Challenges to be discussed include:

  • Spam-fighting tends to require low-latency everything. Every aspect of the data system, designed for supporting ML, needs to think about latency.
  • Large ingest volumes of continuously arriving data needs to be queryable quickly. This requires streaming data to be indexed to power ML features. Spammers act quickly and their previous actions need to show up in the current classification.
  • Fast queries: Most spam is best stopped synchronously before it’s ever written to any system. Classifications must be quick. Features need to be generated quickly or pre-computed. This runs into the classic “materialized view” problems of traditional databases, except in an ML context.
  • Hybrid queries. The most valuable queries tend to involve both ML or anomaly detection techniques (e.g., vector search), combined with traditional SQL database techniques (e.g., where clauses).
  • Development loop. It’s always a good idea to make your development loop as tight as possible, but this is even more crucial in adversarial or time-critical situations. Every aspect of the orchestration and training of ML workflows becomes latency sensitive as well.
+ Read More
Louis Brandy
8:10 PM
8:40 PM
Presentation
calendar

Fighting Financial Crime With Machine Learning at Tide

Tide offers business accounts to SMEs (small and medium enterprises) and is on a mission to save them time and money so they can get back to doing what they love. Our goal at Tide is to become the world’s leading business financial platform for business owners who are burdened by the numerous financial tasks required to run a successful business.

Tide uses data-driven decision-making to manage risk at different stages of the customer journey. We will be focusing on FinCrime risk management at Tide and the technical architecture associated with training, hosting, and running ML models to facilitate this.

+ Read More
Aravind Maguluri
Tocho Tochev
8:50 PM
9:00 PM
Break
calendar

Break

9:00 PM
10:00 PM
Workshop
calendar

Workshop: How to a Build Real-Time Fraud Detection Application With a Feature Platform

In this workshop, we'll show how to build a real-time fraud detection system using Tecton’s Feature Platform. We'll walk through the process of building, deploying, and serving real-time data pipelines.

We’ll present common architectural patterns and explore three categories of features you’ll typically need in your real-time fraud model:

  • Batch pre-computed features
  • Streaming based features
  • Real-time features

You will learn how to:

  • Build new features
  • Automate the transformation of batch data
  • Automate the transformation of streaming and real-time data
  • Create training datasets
  • Serve data online using DynamoDB or Redis
  • Build fraud detection system using Tecton
+ Read More
Vince Houdebine
Sponsors
Rockset
RTInsights
O'Reilly Media
TFiR
Event has finished
May 30, 4:30 PM, GMT
Online
Organized by
Tecton
Tecton
Event has finished
May 30, 4:30 PM, GMT
Online
Organized by
Tecton
Tecton