apply() is a community conference series for machine learning and data teams to discuss the practical data engineering challenges faced when building real-time machine learning systems. Participants learn from industry experts and share best practices with the community.
apply(risk) focuses on the specific challenges of building risk and fraud detection systems, from low-latency streaming and real-time data pipelines, to deploying features to production quickly and reliably.
Join us at this event to discuss best practices, tools of choice, and emerging architectures to successfully build and manage production risk and fraud detection applications.
ML models are an essential tool in combating fraud. They can improve fraud detection rates, reduce false positives, and be re-trained to identify new fraudulent behavior as fraudsters adapt.
However, fraud models require high-quality data that can be difficult to process and serve in production. Features typically require aggregations on streaming and real-time data, which are complex to build, compute intensive, and difficult to process at low latency.
In this talk, Mike will walk through a sample use case and show how aggregations are typically processed. He’ll then show how feature engineering frameworks, like the one offered by Tecton, can simplify the development of these features. He’ll explain how these frameworks are orchestrated under the hood to process data at <1 second, serve data with < 10ms latency, reduce processing costs, while ensuring consistency of offline and online data to improve model accuracy.
Over the last decade, Francisco has built software for machine learning models, data engineering, and risk at Affirm, Fast, Goldman Sachs, the Commonwealth Bank of Australia, and AIG. In this session, he'll discuss common pitfalls and some simple approaches to avoiding them.
He'll cover the importance of:
While working as the product lead for Kaggle, one challenge we needed to face was how to handle fraudulent user accounts—specifically ones that were being created to host pirated content (and abuse our free memory tier) or ones that would do crypto mining (and abuse our free compute tier).
Fraud detection has a hard set of challenges. In this particular case, two problems really stood out:
1) The data is imbalanced. There were at least 10+ legitimate accounts for every 1 fraudulent new account.
2) The data was multi-modal—for each user that had a Kaggle profile, we not only had the metadata about them (how long they had an account, how many times they had logged in) but also unstructured text data that was in their bios (which at times hosted links to pirated content), profile pictures, etc. This was harder to feed into a machine learning model.
In this talk, we’ll walk through how to build a fraud detection model for bot accounts using a public dataset of Twitter bot profiles that mirrors my challenge on Kaggle. We’ll cover:
Contemporary machine learning platforms connect a variety of data sources, tools, and applications, each with their own data governance requirements, constraints, and capabilities. In this talk, we’ll explore the role of the machine learning platform in integrating these elements, empowering data producers and feature engineers to own access control for their data and features.
Fraud takes many forms across industries and is constantly evolving. Data scientists and MLOps professionals must similarly evolve in real time, staying a step ahead of model performance degradation and new attack vectors. In this 10-minute talk, we will cover best practices in ML observability for detecting and preventing fraud across industries. We will also discuss a novel approach to anomaly and drift detection using embeddings, UMAP dimensionality reduction, non-parametric clustering, and data visualization.
“The rise of digital transactions and online platforms has brought about new challenges in combating fraud. Traditional rule-based systems often fall short in identifying sophisticated fraud patterns, leading to significant financial losses and compromised user trust. This necessitates the adoption of advanced technologies such as machine learning for effective fraud detection and prevention. This conference talk aims to explore the utilization of machine learning algorithms and techniques to combat fraud across various domains.”
Sounds boring, right? While all true, fighting fraud with machine learning requires so much more than simply “adopting advanced technologies such as machine learning for effective fraud detection and prevention.” You can blame ChatGPT for that paragraph (and for also oversimplifying our heavily nuanced and complicated domain).
In this talk, we will explore what it actually takes to develop, launch, and maintain ML models in production in the highly dynamic and adversarial environment of FinTech risk. I will share how Remitly thinks about risk/fraud tradeoffs, how we frame problems to build successful ML products, and how this influences our team structures. I will also share some lessons (and mistakes) learned from years of developing valuable, robust, and customer-centric fraud models that score transactions every second of every day, to transform the lives of millions of immigrants and their families by providing the most trusted financial services on the planet.
Spam fighting at scale occupies a unique niche at the intersection between real-time data infrastructure and high-powered anomaly detection and machine learning. When these disciplines collide, a whole host of interesting new challenges are presented by each to the other.
This talk draws on my experience building spam-fighting infrastructure at Facebook and real-time data experience at Rockset. I'll talk through some of these challenges and explore some of the mistakes engineers make when coming from one side into the other. Challenges to be discussed include:
Tide offers business accounts to SMEs (small and medium enterprises) and is on a mission to save them time and money so they can get back to doing what they love. Our goal at Tide is to become the world’s leading business financial platform for business owners who are burdened by the numerous financial tasks required to run a successful business.
Tide uses data-driven decision-making to manage risk at different stages of the customer journey. We will be focusing on FinCrime risk management at Tide and the technical architecture associated with training, hosting, and running ML models to facilitate this.
In this workshop, we'll show how to build a real-time fraud detection system using Tecton’s Feature Platform. We'll walk through the process of building, deploying, and serving real-time data pipelines.
We’ll present common architectural patterns and explore three categories of features you’ll typically need in your real-time fraud model:
You will learn how to: