Low-Latency Engineering Tech Talks

Browse the full library of P99 CONF tech talks and decks. Discover how experts tackle low-latency, high-performance distributed computing challenges from a wide range of perspectives

Filter Videos

Browse our library of talks on low-latency engineering strategies.

Patterns of Low Latency

Pekka Enberg

Founder & CTO at Turso

Building for low latency is important, but the tips and tricks are often part of developer folklore and hard to…

DTrace at 21: Reflections on Fully-grown Software

Bryan Cantrill

CTO of Oxide Computer Company

Twenty one years ago, DTrace was integrated into the operating system. My any measure, the software is now fully-grown: it…

Rust + io_uring + ktls: How Fast Can We Make HTTP?

Amos Wenger

Writer & Video Maker aka @fasterthanlime

Working on Fluke: async Rust HTTP1+2 with io_uring & kTLS, sponsored by fly.io & Shopify. Unlike others, Fluke is built…

The Next Chapter in the Sordid Love/Hate Relationship Between DBs and OSes

Andy Pavlo

Associate Professor at Carnegie Mellon University

DBMSs struggle with OS constraints, but new tech like eBPF can change the game. Join us to explore “user-bypass” designs…

Zero-overhead Container Networking with eBPF and Netkit

Liz Rice

Chief Open Source Officer, Isovalent at Cisco

Introducing Netkit: a new eBPF enhancement replacing veth connections in container networking. Say goodbye to the overhead slowing down container…

Noisy Neighbor Detection with eBPF

Jose Fernandez

Senior Software Engineer at Netflix

Tackling “noisy neighbor” issues in multi-tenant setups! At Netflix, we use eBPF to monitor and mitigate excessive CPU usage in…

Rust: A Productive Language for Writing Database Applications

Carl Lerche

Principal Engineer at AWS

Think Rust is just about performance and safety? Let’s talk productivity. Last year, Rust’s library ecosystem needed work. What’s changed?…

Designing a Query Queue for ScyllaDB

Avi Kivity

CTO and Co-Founder of ScyllaDB

Database queries vary widely—from milliseconds to hours. Optimizing concurrency is a delicate balance of CPU, memory, and stability. Bad design…

You’re Doing It All Wrong

Michael Stonebraker

CTO & Co-founder of DBOS

Historically, business apps use a three-tier architecture. Now, cloud-native architectures and DBMS can be combined, allowing for resilient, cost-effective, and…

1BRC – Nerd Sniping the Java Community

Gunnar Morling

Principal Software Engineer at Decodable

Gunnar Morling dives into the tricks that the fastest 1BRC solutions used to process the challenge’s 13 GB input file…

Overcoming Distributed Databases Scaling Challenges with Tablets

Dor Laor

CEO of ScyllaDB

Maximizing performance goes beyond server-level tweaks. Even low level code, scaling requires more. In this session, learn about “tablets”—a dynamic…

The Performance Engineer’s Toolkit: A Case Study on Data Analytics with Rust

Will Crichton

Assistant Professor at Brown University

I optimized a Python data analytics pipeline, making it 180,000x faster with Rust! Using compiler optimizations, data structures, vectorization, parallelization,…

Using Sketching Technology to Optimize Services with Fewer Resources

Yichen Wei

Senior Software Engineer at Disney+/Hulu

Optimize your services with cost-efficient observability using high-performance sketching tools. Dive into creating sketching tech for various scenarios, making the…

Using eBPF Off-CPU Sampling to See What Your DBs are Really Waiting For

Tanel Poder

Performance Nerd at PoderC Consulting

At last year’s P99 CONF, Tanel introduced using eBPF Task State Arrays to track Linux apps’ thread states/activity without built-in…

Java Heap Memory Optimization to Improve P99 Query Latency at Linkedin Scale

Vivek Iyer Vaidyanathan

Staff Software Engineer at LinkedIn

Discover how LinkedIn optimized Apache Pinot’s performance! By using FALF Interning, a home-grown, lock-free method, they cut JVM heap usage…

Just In Time LSM Compaction

Aleksei Kladov

Staff Software Engineer at TigerBeetle

Matklad dives into the implementation of TigerBeetle’s JIT compaction algorithm for LSM, which is highly concurrent and uses all available…

Redis Alternatives Compared

Peter Zaitsev

Founder of Percona, Coroot, FerretDB

Join Peter as he dives into Redis alternatives like Valley, DragonflyDB, and Microsoft Garnet. He’ll cover licensing, features, community support,…

Detecting Memory Leaks in Android A/B Tests: A Production-Focused Approach

Pavlo Stavytskyi

Google Developer Expert

Discover how to detect subtle memory leaks and regressions in Android apps with a production-focused approach. Learn the key metrics…

One Billion Row Challenge in Golang

Shraddha Agrawal

Senior Software Engineer, Ceph, IBM

Join us as we tackle Gunnar Morling’s One Billion Rows Challenge in Golang! We’ll walk through optimizing a 16GB file…

Taming Discard Latency Spikes

Patryk Wróbel

Software Engineer at ScyllaDB

Learned a crucial lesson on read/write latency when fixing a real ScyllaDB issue! Discover how TRIM requests impact NVMe SSDs…

Why Databases Cache, but Caches Go to Disk

Felipe Cardeneti Mendes

Technical Director at ScyllaDB

Alan Kasindorf

Founder of Cache Forge

ScyllaDB teamed up with Memcached to compare how caches and databases handle storage and memory across different scenarios. We’ll dive…

Primitive Pursuits: Slaying Latency with Low-Level Primitives and Instructions

Ravi A Giri

Senior Principal Engineer at Intel

Harshad S Sane

Principal Software Engineer at Intel

This talk showcases a methodology with examples to break down applications to low-level primitives and identify optimizations on existing compute…

How to Improve Your Ability to Solve Complex Performance Problems: Part 2

Kerry Osborne

Google Database Black Belt Team Lead at Google

In Part 2 of my P99 2023 talk, I’ll dive into practical strategies to enhance our problem-solving skills in the…

Database Drivers: Performance Perspectives

Piotr Sarna

Founding Engineer at poolside

Unlock the full potential of database drivers! Dive deep into their design, uncover how they work under the hood, and…

Low-Latency Mesh Services Using Actors

Nikita Lapkov

Senior Software Engineer

We’re transforming elfo, our Rust actor system, into a distributed mesh of services. Learn how we tackled message serialization, compression,…

Minimizing Request Latency of Self-Hosted ML Models

Julia Kroll

Applied Engineer at Deepgram

Join our session on minimizing latency in self-hosted #ML models in cloud environments. Learn strategies for deploying Deepgram’s speech-to-text models…

Using Change Point Detection to Fight Noisy Benchmark Results

Matt Fleming

Co-Founer & CTO at Nyrkiö Oy

Discovering performance regressions in modern systems is tough due to inevitable noise. Change Point Detection (CPD) algorithms are gaining traction…

Enhancing P99 Latency: Strategies for Doubling/Tripling Performance in Third-Party APIs

Cristian Velazquez

Staff Site Reliability Engineer at Uber

Sharing our journey to improve P99 latency in third-party APIs. From optimizing network configs to fine-tuning connection management, we aimed…

Understanding Request Latency with Wallclock Profiling

Richard Startin

Senior Software Engineer at Datadog

Analyzing request latency is tough since it’s not always CPU-bound. Many devs give up on CPU profiling, but sampling profilers…

Fast, Secure and Dense: Finally Serverless with WebAssembly

Thorsten Hans

Sr. Cloud Advocate at Fermyon Technologies

Discover how WebAssembly is revolutionizing cloud computing. Join Thorsten Hans to learn about building serverless apps with Spin, achieving true…

Latency, Throughput & Fault Tolerance: Designing the Arroyo Streaming Engine

Micah Wylde

Co-Founder at Arroyo

Arroyo is a Rust-based, distributed stream processing engine offering millisecond-latency and high-throughput. It achieves fault tolerance and exactly-once processing via…

Get Low (Latency)

Benjamin Cane

Distinguished Engineer at American Express

Tyler Wedin

Vice President, Global Payments Network SRE at American Express

Building a real-time, low-latency card payments system is a challenge. Join the Amex Payments Network team to learn about their…

Reliable Data Replication

Cameron Morgan

Staff Infrastructure Engineer at Shopify

Data replication ensures high availability—reliable, consistent, and timely access. Dive into the tough problems often skipped: reliable backfills, schema changes,…

Scheduler Tracing With ftrace + eBPF

Jason Rahman

Principal Software Engineer at Microsoft

Dive into understanding app latency by exploring the Linux scheduler with ftrace, eBPF, and Perfetto for visualization. Uncover quirks in…

Aiding the CUDA Compiler for Fun and Profit

Joe Rowell

Founding Engineer at poolside

Get the most out of your CUDA code by understanding how the compiler works.

Building a Cloud Native LSM on Object Storage

Chris Riccomini

Creator of Materialized View

Rohan Desai

Co-Founder of Responsive

Excited to introduce SlateDB, an open-source, cloud-native storage engine. Built as an LSM on object stores like S3/GCS/ABS, it leverages…

Cheating the Cloud: 50% Savings with Compression Dictionaries

Łukasz Paszkowsk

Software Engineer Team Lead at ScyllaDB

Faced with high networking costs, we tackled insufficient compression with a custom RPC compressor using ZSTD and external dictionary support.…

Internet-Scale Semantic, Structural, and Text Search in Real Time

Ash Vardanian

Founder of Unum Cloud

Discover powerful search algorithms and their SIMD- and GPU-accelerated implementations for AI-powered semantic search, structure search, or exact & fuzzy…

Writing a Kernel in Rust: Code Quality and Performance

Luc Lenôtre

Site Reliability Engineer at Clever Cloud

Maestro kernel began as a C-based school project and transitioned to Rust for better code quality. Now, it’s in a…

Running Low-Latency Workloads on Kubernetes

Jimmy Zelinskie

Co-Founder of AuthZed

Configuring Kubernetes for optimal workload performance is a continuous journey. Best practices can sometimes harm performance. Join us as we…

Distributed Async Await: A New Programming Model for the Cloud

Dominik Tornow

Founder & CEO at Resonate HQ

Dive into the future of cloud dev with Distributed Async Await. Simplify your code and conquer the chaos of distributed…

Feature Store Evolution Under Cost Constraints: When Cost is Part of the Architecture

David Malinge

Senior Staff Software Engineer at ShareChat

Ivan Burmistrov

Principal Software Engineer at ShareChat

ShareChat’s scaling ML Feature Store to handle 1B features/sec was just the start. Next challenge: cutting costs while keeping quality.…

WebAssembly on the Edge: Sandboxing AND Performance

Brian Sletten

President at Bosatsu Consulting, Inc.

Ramnivas Laddad

Co-Founder of Exograph, Inc

Moving apps to the Edge can complicate performance due to security constraints. Learn how WebAssembly bridges the gap, enabling both…

Queues, Hockey Sticks and Performance

David Collier-Brown

Staff Engineer

Queues: both a blessing and a curse in computer science. They help predict performance but also signal overload. This talk…

Taming Tail Latencies in Apache Pinot with Generational ZGC

Christopher Peck

Senior Software Engineer at Uber

Discover how Generational ZGC slashed Java app pause times in real-world use! Learn how Apache Pinot tackled scatter-gather tail latencies…

Measuring and Diagnosing Performance Shouldn’t Require Magic

Cary Millsap

Distinguished Product Manager at Oracle

Struggling with performance issues despite all green dashboards? Experts say you need special skills, but we’ll show you how to…

Remote CAD that Feels Local

Adam Chalmers

Systems Engineer at Zoo

Adam Sunderland

Lead Cloud Infrastructure Engineer at Zoo

Zoo is creating a CAD suite that runs in the cloud but feels like it’s local. How? Regional deployment, WebRTC…

Profiling your Go Service with pprof

Miriah Peterson

Lead Engineer at Soypete Tech

Optimize your Go code with the powerful pprof tool. Learn how to integrate, access, and interpret pprof metrics, plus best…

Performance Pitfalls of Rust Async Function Pointers (And Why It Might Not Matter)

Byron Wasti

Founder & CEO

An in-depth analysis of asynchronous function pointers in Rust, why they aren’t a real thing (compared to normal function pointers)…

Elevating PostgreSQL: Benchmarking Vector Search Performance

Daniel Seybold

Co-Founder at benchANT

PostgreSQL continues to evolve with vector search extensions like pgvector and pgvecto.rs. We’ll explore recent benchmarks comparing vector search performance…

Sight Beyond Sight: See it All Through Observability

Leandro Melendez

Developer Advocate at Grafana Labs

Observability is more than metrics and logs—it’s knowing your system’s status without checking under the hood. From QA processes to…

Time-Series and Analytical Databases Walk Into a Bar…

Andrei Pechkurov

Core Engineer at QuestDB

In this talk, we share our journey in making QuestDB, an open-source time-series database, a much faster analytical database, featuring…

Profile-Guided Optimization (PGO): (Ab)using it for Fun and Profit

Aliaksandr Zaitsau

Solution Architect

Discover how to boost your software with lesser-known compiler flags and Profile-Guided Optimization (PGO). Learn what PGO is, how it…

How a Failed Experiment Helped Me Understand the Go Runtime in More Depth

Aadhav Vignesh

Software Engineer

In 2022, I began crafting a tool to visualize Go’s GC in real-time. I’ll dive into the hurdles of extracting…

What C and C++ Can Do and When Do You Need Assembly?

Alexander Krizhanovsky

CEO at Tempesta Technologies

Join us to dive into GCC and Clang optimizations for C/C++! We’ll explore how x86-64 executes code, use assembly for…

Low Latency Gal Presents: Low Latency Stuff

Sonia Kolasinska

Low Latency Gal

Lock-free programming and precise ultra low latency pipelining between CPU cores.