On this page On this page
In Episode 15 of netstack.fm, Glen sits down with Edward and Noah from Cloudflare to unpack the design of Pingora, the Rust based proxy framework that now powers Cloudflare’s origin facing traffic. The discussion covers why Cloudflare moved away from NGINX, how Pingora differs from Oxy, and what it takes to operate a high performance global proxy at massive scale. Listeners will learn about connection reuse strategies, dynamic traffic handling, gRPC and protocol translation, custom HTTP implementations, TLS backend choices, and the practical trade offs of Rust, Tokio, and work stealing in real production systems. It is an episode full of deep technical insights into building and operating modern networking infrastructure.
Learn more :
If you like this podcast you might also like our modular network framework in Rust: https://ramaproxy.org
00:00 Intro00:37 A bit of background on the episode and our guests03:18 The Evolution of Proxy Frameworks: Oxy vs. Pingora14:59 The Philosophy Behind Pingora's Design20:53 Understanding Pingora's Bare Bones Structure27:50 Metrics and Observability in Pingora39:19 Caching Strategies and Backend Structures42:56 Usage of OnceCell45:39 TLS Implementations and Their Importance50:51 Dynamic Traffic Management and gRPC Support01:02:10 Optimizing Connection Reuse with Pingora01:07:10 The Importance of Layer 7 Processing01:11:16 The Shift from Static to Dynamic Web Traffic01:18:48 Performance Improvements with Rust and Tokio01:26:00 Memory Management and Allocation Strategies01:37:59 Outro
Music for this episode was composed by Dj Mailbox. Listen to his music at https://on.soundcloud.com/4MRyPSNj8FZoVGpytj
Elizabeth (Plabayo)
0:13 | 🔗
This is Netstack.FM your weekly podcast about networking, Rust, and everything in between. You are listening to episode 15, recorded on the 14th of November, 2025, where Glen unpacks Pingora and Proxies at Cloudflare together with Edward and Noah. Welcome for another week of Netstack.FM This week with me is Edward and Noah from Cloudflare. I am very excited about this one as they are building a lot of nice infrastructure at Cloudflare. But one in particular we will discuss today is Pingora, which they open sourced I believe on the 28th of February 2024. I They also recently had a recording in Rust in production, which was a very fantastic episode as well. There we learned that a lot of Pingora's history also comes from the lessons they learned in NGINX We can also talk a bit about that. Now, welcome Noah and Edward. Thank you, it's good to be on. Thank you. Yeah, thank you, Glen So as we have two guests today in our virtual studio, we're going to have two introductions. I would say, Edward you can start as you were the first in the alphabet. Great. ⁓ Yeah, so my name is Edward Wang. I've been an engineer at Cloudflare for almost ⁓ five years now, I think. Prior to that, honestly, was a software engineer at a game studio for a few years before deciding that I wanted to learn more about networking. And boy, did I jump into the deep end with Cloudflare. And so I sort of... very much drank out of the fire hose in terms of learning. There's a lot of fantastic ⁓ expertise at CloudFlare when it comes to all sorts of networking topics in HTTP And that's where I also was able to start working on ⁓ Pingora, which we, think the very earliest inklings of that started in 2020. ⁓ And I was able, I've only technically, I guess, been working really closely with Pingora for the last few years. ⁓ But in that time, it's been, yeah, it's been really rewarding and it's been really great to see the open source journey. So that's, it's a little bit about myself. ⁓ Noah and I are actually on the same team ⁓ at CloudFlare ⁓ working on the, ⁓ one of the components built in Pingora ⁓ of our and delivery network. I'll hand it over to Noah. Okay, very cool. Yeah, cool. Yep. ⁓ Well, ⁓ Noah has already introduced. So ⁓ I've been at Cloudflare for a bit over four years. ⁓ Prior to that, I was in college. I have a bit of a history with ⁓ Tokio and Hyper, especially the H2 crate. And I'm actually involved ⁓ more or less depending on what time of year it is or what. ⁓ what year it is with both projects to some extent. and I've been involved with Tokio since actually before I was at Cloudflare. I ⁓ have only been on the Pinguro team actually for ⁓ about a year now at this point. ⁓ But prior to that at Cloudflare, I actually used to work on Oxy, which is the other Rust language proxy framework implementation that we have at ⁓ Cloudflare, which... is actually the one that is not open sourced, ⁓ largely because Oxy has kind of historically been coupled to Cloudflare's infrastructure and so like wasn't actually open sourceable, but that is kind of been changing as of late. So, but yeah, like I've one of the few people who's actually I'm the only person right now who has actually worked on both ⁓ proxy like implementations at Cloudflare. ⁓ So. Okay, very interesting. There's already some things to unpack there, which I will do soon. But yeah, no, I mostly know you actually from Tokio and from Hyper. So like I wasn't even aware that you work at Cloudflare. did like, so like Tokio started before Cloudflare. That was, I guess when you were still in college then. Yeah, very cool. Like, and then the Hyper work did that start because of... That's yeah that started so Tokio and hyper are Have historically been kind of sister projects and there's a lot of cross pollination with like the maintainer groups for both and I started I got really involved with hyper ⁓ Actually through like rapid reset and other like h2 related dos things that hit basically every like HTTP to stack at once and so I end up basically finding my way involved with Cloudflare's response to rapid reset and then in particular, like, ⁓ getting ⁓ the H2 crate hardened ⁓ and looking through different DOS vulnerabilities in the H2 crate and making that a much more secure and much more hardened stack. so that was actually kind of like, a lot of my involvement with H2 and Hyper has actually been kind of from the security and DOS side of things, of finding ways to make it very, very hardened. ⁓ which it actually already like the nice thing was like it actually was already pretty hardened and it's a like fairly well built stack when it comes to dos ⁓ like when it comes to like trying to find weird ways to use the HTTP protocol to like mess with the stack and get it to like blow out some resource. And do you think it was just like a cheered look or do you think there are specific choices that they made deliberately that resulted in it? There's some stuff that was like, there's some stuff in there that's definitely where people were being very deliberate and careful and were, choosing to use data structures that might be slower in the general expected time, but would always be constant time over any input you gave. For example, certain things around how HPAC, for example, is handled in that create. There are some very deliberate decisions that were made to use constant time data structures, even if they're not going to usually be the fastest. Because there's no worst case that someone can take advantage of and exploit to like cause a server to like handle things in the worst case and blow out its resource usage. But a lot of it I think is actually just down to like writing idiomatic rust gives you like it turns out that if you do that you have certain protections not even necessarily for memory safety but just like you're not trying to take certain types of shortcuts that you might take if you're writing C. for example, which is a lot less ergonomic of a language. And like some of the ways that Rust programmers just tend to do things with like channels, for example, ⁓ lend themselves a bit better to like, isolate, like to like, basically just not having these like weird split brain cases that you can take advantage of an exploit and exploit to like blow something out. And also like to like a slight amount of batch processing, which kind of happens in parts of the H2 create stack, which like allows you to like, debounce certain types of normally malicious operations and like make it so that they're not really a problem. So like, for example, with rapid reset, one of the things that Hyper actually like would do is it would take those ⁓ header frames and it would take the reset frames and in the parser, it would just like basically read a whole batch of operations and then it would look at like, then it would like basically just cancel in the parser the request ⁓ and like it would not try to like, it wouldn't try to do any of these sort of like, I'm going to queue up like a request task in the background. And then like, wait, I need to cancel that. But it's already started. It wouldn't do that because it would just like read through everything and be like, wait, but this is already canceled. Like, I don't actually have to do anything. And like, if you do certain types of like idiomatic rust things, it you it ends up like resulting in code that is like a bit better and like a bit more well behaved from like a DOS perspective. I see. Yeah, that makes totally sense. Thank you for that. And so, yeah, one question I want to ask, but is already like partially answered is like, I think you were first blogging about Oxy way before the name Pingora was ever mentioned in even like a blog post or like, I don't know, like, or at least, or maybe I'm wrong there, but like, what I mostly want to ask about is like, I didn't even know you were developing two different like proxy frameworks because I think what I want to say is that you were were blogging about this kind of work but I don't think maybe there was no name mentioned I thought there was maybe Oxy but maybe not and then Pingora came out and I thought that was just the only one that you were working on but now you mentioned that Oxy is still there and then maybe my question is how did that came to be and then why still having these two like isolate from each other. So this is one where I can give a bit of the history on that. So they kind of come out of two different projects that needed proxies that were very differently shaped ⁓ for different things. So Oxy actually comes out of originally. So Cloudflare has a thing called Warp, which is a VPN-esque thing, ⁓ which is actually also like our entire Zero Trust product line is built on that. Warp was not originally written in Oxy, it was its own thing that was pre-Async Rust, and so it has that sort of you know, hand-rolled event loop where you have like callbacks that you are dispatching based on equal events you see, type of structure. ⁓ And ⁓ Warp required folks to try to solve a lot of problems. on Cloudflare's infrastructure that we hadn't historically solved and that are kind of difficult to solve in a CDN environment. And then a couple of years later, Cloudflare publicly blogged about our involvement with a thing called iCloud Private Relay. We were involved with that, and we had to our solutions that we came up with for Warp to a very similar space. and had at that point we're building out more solution, more things around warp that required us to actually pull a bunch of those like solution code paths out of warp and into other things. And this, was for our zero trust infrastructure. And so Oxy basically comes out of like, okay, so we have a bunch of the, need like a proxy framework that's like good for handling like basically like tunneled traffic and allows you to handle things like geo egress. where you want to like, you don't want someone who's like ordering a pizza to like basically see that like, ⁓ like it says like I am right now in like Dallas, but like I'm not actually in Dallas physically, I should say, but you might be like, I live in Dallas, but it says that I'm in Ashburn, Virginia. How strange. ⁓ And like the pizza, like, like all of the pizza stores are all like flagged for Ashburn, Virginia. Like that's not. ⁓ That's not what you want. You want to have IPs that are geo-located spread around so that folks, whenever they hit, wherever their ISP decides to land them, which can often be somewhat variable, ⁓ they like, in theory, this should be our Dallas site, but maybe it's Ashburn. ⁓ They should just see like. the stuff that's nearby to them, regardless of where that is. And so a lot of these problems around things like spreading out IPs ⁓ and how to handle that and how to route traffic to the correct IPs was basically pulled out originally into Oxy. ⁓ And now there's a whole mesh of systems that this has actually been pulled out of Oxy and into ⁓ that are themselves built in Oxy. Oxy was like kind of originally came out of like folks looking at like how can we handle this sort of tunneled traffic that's from like end users or eyeballs as we tend to call them in the CDN world ⁓ rather than from ⁓ like rather than like doing normal reverse proxying. so like Oxy is not actually like Oxy is not like is kind of more used for forward proxying. And that was what it was originally built for. but it can also do reverse proxying. like, oxy ended up being philosophically a fair bit different from pingora. Like, oxy is... a like super generic, like in the sense of using Rust generics very heavily, ⁓ like is very built around generics and it's very built around like handling as much in the type system as possible. ⁓ And, know, Oxy also tends to be like, it was historically like a huge mono framework. Like it was a massive thing that basically had a solution to every problem baked into it somewhere. You just had to have a machine that could handle the compilation time. ⁓ Whereas I think Edward can speak a lot more to like the design philosophy of Pingora and a lot of the like problem space that it was originally built for. Yeah. Mm-hmm. Yeah, I mean, Pingora really like came out of around 2020 ⁓ folks on the Pingora team, or I guess it was pre-Pengora at the time. We were just all in the sort of cache content delivery team. ⁓ We're looking at the problem of... Nginx independently. ⁓ I think we had talked a little bit about this on the Rustin production podcast, but things had gotten to a point with maintaining our particular Nginx instance where it was... really difficult to reason about a lot of different and custom code additions that we had made inside of the NGINX fork and maintaining that alongside of, you know, all of the upstream changes that were coming in from NGINX ⁓ were, was really difficult because we were perhaps doing all sorts of strange things in order to optimize our caching system that we that maybe we shouldn't have been doing where we were not aware that this, you know, the lifetime of this particular ⁓ object we were working with was not actually guaranteed to be this long, say, something like that. ⁓ That was about the time when... ⁓ Couple folks I'd mentioned for example, you Chen Wu our tech lead ⁓ had decided to go off into a corner and ⁓ Really try to work with ⁓ the new a brand new ⁓ Nginx replacement really I think is the was the main Guiding like problem space there. It was very much like what can we to ⁓ what are the things that NGINX is maybe not as good at that we can learn from. ⁓ I believe he was also working a little bit with Oxy as well from time to time, ⁓ working on some of the projects there and like what are some of the different learnings ⁓ and things that we can pull into a new framework really. ⁓ That was how Pingora started. ⁓ I think I said on the other podcast it started as the name and you can see this in some of the source code still as like open rusty which is a play on open rusty the Lua framework on top of or a popular Lua framework on top of NGINX right and so Honestly, comes it really honestly comes out of solving a really much more I think particular problem ⁓ versus ⁓ I mean oxy it was was made to also solve like really particular problems as well, but ⁓ As you know describing all the things that Noah was talking about and regarding tunnel traffic and such ⁓ Whereas pingora was really like how do we replace our reverse proxy with caching, right? How do we replace our reverse proxy with caching and do it in a way that optimizes for a lot of our use cases? For example origin connection reuse and things like that in ways that nginx is perhaps not as ⁓ Or didn't really have that as like its top-of-mind priority when it started, right? So ⁓ And from there, I think it's kind of ⁓ I think I believe that the folks at OpenResty were actually working with CloudFlare for a decent portion of time as well. And some of the older folks on the team ⁓ were able to work with them too. ⁓ I think a lot of that goal behind, like, Pingora describes itself as like a programmable framework, a modular framework. I think a lot of that comes out of ⁓ open resties philosophies and At the end of the day, think it really came to be, ⁓ here is a reverse, mostly having to do with a reverse proxy framework. ⁓ Here is the engine for that. ⁓ How do we make the engine as performant or ideal as possible and then ⁓ allow the user to sort of swap in whatever ⁓ filtering logic that they want. in and out. So I think of it as sort of a ⁓ blank canvas ⁓ on which people can ⁓ inject whatever logic that they wish. It is pretty bare bones in that sense, I think, at least as it exists today. ⁓ Because we have a lot of customized ⁓ logic and we work with all sorts of weird H-T-E-B traffic and we need to be really careful about which headers we add modify, transform, when proxying from, you know, ⁓ different like our internal stacks protocol to either H1 or H2 or what the origin supports. having to like having that fine-grained control to was also really important. ⁓ in terms of ⁓ what we could actually, in terms of how the final product ended up being. ⁓ Thank you very much. And you say bare bones and in a way I understand why, but at the same time it does still seem like you have the foundations there and it seems like you kind of like do hold the user's hand as in it's not like they have to really ⁓ make like a step by step like how to put it all together like it the engine is there and you can and in a way It's not that difficult to make something yourself. I mean, you do have to read the guidelines, but in the end, it's just a matter of implementing a certain trait, and then you can choose which methods you want to override. Like maybe you want to add like a header or modify a header, which is very different than let's say I'm going to build a proxy with like hyper where like there's nothing like that and you kind of like have to figure it out on your own. And so, so I would say like, I wouldn't call it bare bones, but I get what you mean. Like it's neither something like I have like an NGINX or like open Resty and maybe everything is already figured out. I don't know, but yeah. Yeah. Yes. Yes. Yeah, it's something in between, right? It's something in the middle. And I think that we get requests on... The reason why I think of it as more bare bones is maybe because we tend to get people who do think of it, try to think of it as an NGINX wholesale replacement ⁓ and like a drop-in, like ready to go batteries included binary, ⁓ which like it, as you mentioned, it doesn't take that much to create a Pingora app of your own. ⁓ There's a lot of the traits ⁓ already available and a lot of like that engine. is already pretty much ready to go. However... ⁓ ⁓ Especially when it comes to as soon as you start doing I think things that are like slightly more complicated such as like routing and stuff you find that ⁓ Like location configs within nginx people really want things like that, right? And which makes sense as soon as you start getting into things like that I think people then realize There's it's really a blank space in that sense, right in which you Maybe have to do a lot of that yourself as it is currently today. ⁓ So this is where we kind of, ⁓ and we, the lines are slightly blurred, but we generally want to defer that to the people, ⁓ like some great community folks who build things on top of Pingora as well in order to create the sort of more user-friendly routing configs and such. Okay, and how would people, like for example, discover these? Because I couldn't find anything like community crates or I don't know, like external resources anywhere on the GitHub. I don't know if we actively, for a while there was a project with some of memory safety org that we were working with ⁓ that was kind of the advocated batteries included. ⁓ proxy built on top of Pingora ⁓ called River. At this point, I believe progress on that project is mostly paused. There were some other ⁓ crates that folks that I know of that folks I'm not sure how well known they are. Perhaps we could give them a shout out and read me at some point. ⁓ But I'm thinking about like, for example, Ping Gap. ⁓ There were some other, that is the one that comes to mind right now. I think there are a few others, ⁓ but I'm also happy to drop those into the podcast notes too. Another thing to me with bare bones versus batteries included is mentally I'm comparing Pingora a lot to Oxy. And Oxy built its own... ⁓ Originally it was hard-coded, was just specifically Jaeger tracing ⁓ framework that nowadays uses O-Tel, is now for O-Tel, but Oxy basically ⁓ built out its own entire very opinionated telemetry ecosystem. ⁓ That part actually ended up getting split out of Oxy eventually and open sourced as a create. foundations but like Pingora like doesn't like all tended to like Pingora tended to go often with the approach of like Taking a lot like what was out there in the rest ecosystem and just been saying like, okay Well, like you can just just use this with your like ping or a proxy and like you can use tracing to do logging or you can like do whatever and you can use the metrics create to like do ⁓ like Prometheus to like do Prometheus metrics or whatever ⁓ if you want and like whereas oxy like built out its own entire like framework ⁓ and ⁓ like that it like doesn't is not even built around like tracing for example or even the log crate ⁓ it has its own macros and stuff ⁓ and pushed folks who were using Oxy to use this sort of all-encompassing ecosystem. It's kind of, in some ways, the difference between Linux and Mac OS in a sense. ⁓ where like Oxy is like you need to like be wholly invested into this like whole like sort of ecosystem that it built. ⁓ And things tend to work fairly nicely if you're within that. like once you start wanting to step outside of that, you find that like, it's actually built not to be used this way. ⁓ Similarly, like Oxy, or as like Pingora, there's a lot more flexibility and control given to the user. In Oxy, it's like there's a that things flow through by default. then, like with Oxy, you basically have a whole binary ready when you just have a default app. And then ⁓ what you do is you have to specify via hooks. okay, but here's what I actually want my service to do in all of these different ways. ⁓ Whereas with Pingora, it's like a lot more... of what it's a lot closer to like how folks who are building web services typically tend to think about things and tend to like expect stuff to work. Where you like have like a smaller set of handlers that you were invoking. ⁓ And ⁓ yeah. Yeah that's something I was gonna ask around logging and metrics so we can link to the foundation that's for Oxy but as far as I know that's not the case for Pingora so like I was wondering there what's the story there on metrics and tracing and everything for example how do you guys use Pingora and do the tracing and those things So we actually implement that ⁓ generally at the level above. In user space, so to speak, we implement things like tracing. ⁓ There are some metrics and stuff built into Pingora. But ⁓ with Pingora, we also have a lot of other metrics we just collect outside of that. ⁓ Hangora like tends like uses like the Prometheus client crate ⁓ and like we use like basically what's fairly typical in the Rust ecosystem. So Cloudflare uses Prometheus extremely heavily. ⁓ Pingora is no different than most of the rest of the company in this respect. It's just like, we tend to do a lot of this stuff basically in user space, ⁓ and rather than in ⁓ the framework, so to speak. And so yeah, a lot of developers like Prometheus at the same time like open telemetry kinda like became a thing and like I wouldn't say it's like a world standard but definitely things are converging to that point. Is it like incentive or cloud flare for projects like I don't know, Oxy to move So we actually use Othel for a lot of stuff. So like I mentioned Jaeger earlier. ⁓ Cloudflare has like, you know, Jaeger stood up internally. ⁓ we these days like what we're actually collecting is Othel, ⁓ is like Othel traces. And so like we use Othel a lot for in particular for tracing. ⁓ We don't yet use it, like we don't use it super heavily for things like metrics at the moment. And to be honest, I don't see Cloudflare moving off of Prometheus ever. ⁓ Really it's not for a very long time, but like, cause it does generally work for what we do, but ⁓ like quite well. But ⁓ yeah, like I think O-Tel like the big thing that's driven us to adopt O-Tel a lot more is like tracing and like, ⁓ and that type of observability stuff. with metrics, with just straight up metrics, like Prometheus works quite well. But it's then when you want to dig into more fine-grained things with requests. You want to look through what paths are requests going through within your system, how long they're spending at different times, values, what flags were set for a request in different places. ⁓ And that's what Control Head like we're present, like basically like internal IPC stuff that we have that directs how to treat this request, like what control directives we had set in different places. All of that sort of stuff is where like ⁓ the, is where like distributed tracing is super useful to us. And like a lot of our like drive towards distributed, like a lot of our like OTA related, ⁓ like a lot of our OTA work internally has like, to like onboard and adopt OTA has come from distributed tracing. Yeah, because like I was also gonna ask about like what if in something like paingora a request fails for some unknown reason or a connection suddenly gets closed like how would you typically start diagnosing this kind of issue? I mean not like it is like wantonary cancer but still I wonder what kind of the flow is. Yeah. Yeah. Yeah, well, and there's a few ways. So we have like logging, ⁓ we have logging, we have metrics, ⁓ and we have ⁓ like distributed traces as like our general. And we also have... Cloudflare is a huge user of ClickHouse and we have these internal ClickHouse clusters for very, very aggregate ⁓ statistics around requests where you can, with very high cardinality, slice by things like CDN zone. So we have all four of those things and depending on what type of issues, we'll look at different things. If there's a customer escalation, a customer's like, hey, there's a problem with my connections. they are closing in this way that doesn't make sense. Your routes to go through there are generally like, a lot of weird super anomalous things are gonna get flagged into Kibana. If we're getting protocol errors, like HTTP protocol errors or something like that, you're gonna wanna look in Kibana and you're gonna see how, and it was when connecting to this particular origin, and okay, that's what's going on. One of the most commonly used ones we get these types of escalations though is like traces. And so we have systems internally where we can adjust the volume of traces, what proportion of requests we'll generate traces for. ⁓ And then we can also just create a request, do effectively the equivalent of a curl command. And we can just basically say, yeah, we're going to do this on this particular server ⁓ or any server in this particular site. And we're going to run this. effectively a curl command and it's going to give us a guaranteed trace that we can then go and look through. And so that you can use this to like. really understand, like you can use like distributed tracing to like really understand at a very like fine-grained level what exactly is going on in different parts of a request. And so like it's kind of a combination of those where we slice and dice or like we pick which ones we're using depending on what the problem is. A lot of the sorts of issues that I often end up looking at and that like I've kind of specialized around debugging ⁓ often tend to be like very resource related issues or like traffic management related things of like You know, we're lensing a bunch of traffic ⁓ in like we're lensing a whole bunch of traffic ⁓ into this like like small set of like servers or this one site ⁓ What's going on? Like how can we spread that out more and for that type of stuff? I tend to use Prometheus extremely heavily and so like you know, like my like the number one thing I tend to look at myself is generally Prometheus, but Like Edward, for example, is a much bigger user of things like ClickHouse than I am because Edward like often looks at a lot of these like very, very like what's going on with this zone or this type of traffic issues. Yeah. It's a very... yeah, it's... ⁓ We have a lot of different metrics and ways to slice and dice things because when it comes to analyzing issues, ⁓ there's often heavy hitters, right, at Cloudflare that are causing those issues. ⁓ And so we want to figure out what exactly is the kind of traffic ⁓ that is causing these issues. And we have different ways to slice and dice that via the different data sources that Noah is talking about. ⁓ The other thing, I mean, this is really top of mind in terms of debugging observability because, like, ⁓ with, ⁓ I think we are very much... More in the dark I would say with with nginx because there are cases where we were deliberately Trying to be a lot more verbose and and trying to inject hooks into like what errors ⁓ Connection errors as you mentioned would come up either on the downstream side or the upstream side ⁓ Either on the client or or server right? I suppose you can think of it ⁓ because either one of those causes problems and everyone turns to us, right, and asks which remote end hung up and we have to be able to answer that. those error, like for example the error while proxy hooks, very deliberate decision. Okay, makes sense. yeah, I also want to like, I was first wondering like, okay, let's get an overview of different crates because in Pingora there are many crates, but then I noticed the readme already does quite a good job on like describing the general overview. So I'm going to link to that instead. But then there are a couple of crates in specific where I have some questions about, for example, like the Pingola Ketama. That one I would have no idea what that is about. There are a lot of different... I think that the shape of the repo and the kinds of crates that are in there kind of give you like some hints as to what are the particular things that our team perhaps cares about. ⁓ For example, probably the place where we're really most opinionated about inside of the proxy framework is actually all of the different caching ⁓ hooks and filters and stuff. And I mean that within the actual like ping or a proxy create itself ⁓ We tend to be a lot more opinionated about things in those Contexts because caching is really caching HTTP is really specific and it is something that our team in particular really cares about ⁓ to that end for example as you mentioned ketama is ⁓ we as part of switching over from ⁓ NGINX to to using ping there are actually a lot of different ⁓ components inside of the CDN proxy stack, you could call it. ⁓ And so we wanted gradual deployments, gradual migrations. We wanted to be able to swap these components in and out. One of the things that... ⁓ is a problem maybe is if you change, ⁓ we actually hash, we do consistent hashing when it comes to selecting certain, ⁓ you know, where we should go to like for a certain URL, maybe we should go to a different server backend, right? ⁓ And if you decide to change that in the process of migrating from an nginx component to a Pingora component, that's not great, right? Because all of a sudden, ⁓ if you are doing that migration, ⁓ maybe you're going to change where your backend URL, which URL your backend ⁓ you choose. Sorry, which backend you choose for a URL. ⁓ which will cause you to... have a cache miss essentially, right? Because you're suddenly going to a different backend than the one you were going to before. So Ketema is one example of a consistent hashing algorithm that we re-implemented in Rust in order to make sure we had parity with the logic in Nginx. ⁓ So all of that, I think, comes out of the shape of like, what are the things that we tended to care about caching with caching and caching backends consistent hashing and things like that. Okay, well yeah, I see a lot of things around caching and memory caching and those things. But like, I mean, I guess you cannot keep everything in memory. So if it's not a memory, where do you store it then? Like, I suppose not in a file, but maybe like on something else. Like, what is the maybe, I guess what I'm wondering is like, what is the typical structure at Cloudflare for like the layers of storage? Mm-hmm. Sure, yeah. mean, a lot of the caching occurs on, you know, disks. Like, when it comes to actually, like, holding people's assets, ⁓ we can tend to cache really large files, right? ⁓ Where we're dealing with videos, you know, not just a tinky little HTML files anymore. Maybe AI models, right? ⁓ So, there's... So, obviously, a lot of that stuff gets put on disk, the memory caching creates some things that you were alluding to within Pingora are actually for things where we're, say, looking up certain configurations ⁓ or we're caching certain configurations that we need to look up because maybe some other team service is responsible for those configurations. ⁓ say, like with how we authenticate with the origin, for example, ⁓ what kinds of like TLS certificates should we be pulling? ⁓ That kind of stuff ⁓ is for the memory caching ⁓ and those little like tools that we can kind of ⁓ cache certain look-aside ⁓ things that we need to do in order to serve traffic. ⁓ Definitely not for the actual like, there's like the HTTP caching proxy built into ⁓ the ping or proxy crate. And then there's all of the various maybe look aside operations that you need to do, which is where the memory caching comes in. Okay, and the HTTP caching, that's guess all based on all the HTTP headers and all those kind of things like ETag and expiration dates and all that kind of mess. Yeah, if you look at the there, there's actually quite a. like extensive default implementation of like cache control parsing, for example, within the Pingora crates. It's very like, ⁓ literally came as a result of me ⁓ studying RFCs for a few months. And one of the first things that I was doing at Pingora. ⁓ as I mentioned, those parts tend to be really opinionated. a lot more so than the rest of the proxy filters, which are just, what do you want to do with the request filter? whatever, or what do you want to do with the HTTP request? Do whatever you want to it, right? ⁓ Yeah, very cool and then yeah I saw that like a lot of these crates are like very like small and like a lot of the logic seems to live in in Pingura Koosh. And then while I was going through the code several times in the last years, I also noticed sometimes patterns and one pattern I sometimes see there is like that you use a lot of like OnceCell where I see like, you're like having to get some information from a socket and then you put it in like a Wang cell. Is it, mean, I have not seen this pattern that often in other like libraries. Is it something you just... I don't know, there were performance issues and you noticed this helps a lot or like what is the reason behind it? If I think about, you might be thinking about for like certain APIs to grab say like the local address of a socket. think the reason why we do some of those things is that As much as possible, we try to make it so that the user doesn't have to pay for things that they don't have to ⁓ or that they aren't necessarily using. if they're, ⁓ for example, grabbing ⁓ maybe it's the peer address, maybe it's the local address, one of them is actually already available on the Tokio when Tokio accepts. ⁓ Whichever one isn't is probably the one that we're thinking of. ⁓ That is... Maybe it's local. I think it's probably local. Yeah. So that one is lazily instantiated because we don't want to... It's essentially... I think NGINX actually does the same thing where the first time you actually eagerly try to look it up... Be local address. Local address is what we'd be grabbing. you will then essentially cache it within whatever struct you're using. ⁓ taking inspiration from that, we did a lot of the same things in that situation. Because if you don't actually care about that address, ⁓ we'll try not to make you do the syscall, the extra syscall, in order to use it. Yeah, makes sense. Okay. And then as you said, there's a lot of strong opinions there. so around caching, I get it. But then what I was surprised by is how many TLS related creates there also, because you have boring SL, open SSL, Rust TLS. I mean, those are the obvious ones. But then the one that surprised me is like S2N. Like, like why that one? ⁓ That one was actually a request from some folks at AWS ⁓ because that is in like... It's a recent development. Yeah. That is the TLS implementation that is apparently very heavily used in, I guess, parts of AWS's stack. And they were using Pingora for some things, and they really wanted to be able to use their S2N TLS implementation with Pingora. And so they contributed that in a pretty sizable PR to add that support. so that one, as well as Rust TLS, were both ⁓ like user contributions. ⁓ Within Cloudflare we use Boring SSL across like within the company because basically we have to be FIPS compliant with a standard called FIPS and Boring SSL does have FIPS certification and so that is kind of like the thing that you sort of have to use generally if you want FIPS ⁓ and so like we use Boring SSL very, very heavily within the company. And so generally, in most Cloudflare stuff, you're going to see a Boring SSL Support option. Yeah, yeah, but then again, I wasn't the impression or maybe I was just my wild dream, but I was under the impression that FIPS was kind of like disappearing and no longer want to be a thing in the near future. I really wish it wasn't. be honest, really wish FIPS went away. That's me not speaking for Cloudflare at all. That is just me personally. I don't like FIPS I don't really like it. I don't think it's going away, unfortunately, because there's just so many... I don't think that it leaving federal contract requirements and thus like flowing downstream to like, you know, people who like provide services to the US federal government like also that need FIPS from their vendors. I don't think that that is changing, unfortunately, anytime soon. ⁓ And so like, they are kind of like stuck in very old standards, no? Like very old cryptographic algorithms. Yeah, thing with FIPS is that this just means that you have to have FIPS algorithms in there. For something like a library, need to have a FIPS. Let's pretend there's a magic stamp that says this is FIPS. You can add additional things that aren't FIPS, but that traffic is now not FIPS when you use those. but you can still have them in the library. ⁓ And so like with boring SSL, that's the case. like there's like FIPS algorithms in there. And so like you have to, like if you want, know, like we can speak non-FIPS ciphers, ⁓ but that's not FIPS ⁓ when folks like speak that to us. But if someone like speaks to us with FIPS site, like if a client speaks to us with FIPS ciphers, then like we're now talking FIPS and it's like like certified like FIPS like it is a certified like FIPS connection now. ⁓ And so like you can use just the FIPS subset, or you can like use non-FIPS stuff. I will also say FIPS is not as, FIPS is not insecure the way that it used to be, like they do roll new cipher algorithms into there. And there is stuff that is like, You know, there is stuff that, like, if you're using FIPS these days, you are hopefully, I would really hope so, ⁓ using the stuff that's actually very recent and very cryptographically secure that's in there. And is basically, effectively normal, still pretty normal TLS. Or it is like normal TLS. very bog standard normal TLS stuff that's all like FIPS. so like, it's not, you're not really using like old, like stuck into using old ciphers anymore. And then also even with post quantum, you can still do that with FIPS. ⁓ It turns out that like the approach that everyone, that us and everyone else is mostly doing with post quantum of like having a sort of an onion thing where either you do like a like second layer of post quantum ⁓ after the normal layer, or you do post quantum first. ⁓ approach, because one of the two layers that's happening here is FIPS, that approach is actually completely fine. like, you can so you can totally do post quantum with FIPS. ⁓ FIPS doesn't what happens is just FIPS like doesn't acknowledge that there is any security guarantees from like the post quantum bit that's not FIPS. But in the real world, it's like we know that, OK, we've actually just made this post-quantum safe by doing this. ⁓ And so, yeah, that's actually another thing that folks are pretty interested in doing, ⁓ is having FIPS and post-quantum together. Okay, that makes totally sense. but then like I wonder, okay, those users contribute to a TLS and they contribute it as to an, but there are plenty others like a TLS, as channel, LibreSSL, Wolf, whatever. If a user wants those, do they have to contribute upstream or they can just in their own, like, is there a way for them to hook it into? They would have to contribute it upstream. Part of this is there is no standard SSL API. ⁓ I want to throw a massive asterisk into this, is that in some ways, boring SSL, if you squint in some corners of the, like depending on how you're using TLS stuff, boring SSL kind of sometimes is the closest thing we have to a standard for like, how SSL should look, at least for certain things. Many, many, many, many, many asterisks are attached there, the, like, there are a ton of different SSL implementations that all do a lot of things differently, and there's no, like, standard, like, there is not, especially if you're trying to cover every case, like, there is not a standard, like... TLS like SSL library API. so it kind of has to be like, users have to add their own, people have to contribute it upstream because ⁓ we can't, there's no way that we can talk to every single one of those because they have just completely different APIs for how you do all sorts of operations. I understand. it's because basically because you have a set of operations that you want to do and they are configurable in Pingola because it's kind of like a very... because that's why I meant it's not really like a bare bone in my opinion. It's very like opinionated and it's like an engine which if you're on the happy pad it works for you because you have all these hooks and all these ways to use it but that means that you cannot just as a user at whatever you want because it has to kind of like hook into the system, right? And so that's kind of like, it makes it easier to use, but it also limits those use cases, guess, which I guess is fine if your AWS, can just upstream it and I guess you're more than happy to accept it. But I suppose there is also a limit. Like I don't think you will want to have all SSL implementations in there. That would be a nightmare. Probably not, yeah. We have it such that these SSL backends should be feature configurable. ⁓ You're generally not using more than one. It doesn't really make sense, I think, to use more than one TLS backend at a time. ⁓ So they are compile-time configurable. ⁓ ⁓ We generally probably will maybe there will be other TLS backends that we're interested in accepting like if there's popular if it's Significantly of interest to someone ⁓ that that would be I think that would be fine with us ⁓ To to continue accepting these things ⁓ We do want to make it such that like if you want to use something different He would like you to be able to swap it out because we you know, see that kind of usefulness and that kind of freedom for ourselves too. Okay. And then what is also not clear to me is like, okay, you, you, you need tight control. You're a proxy. So you have to be very careful in the traffic because you could easily break it. But then I wonder what do you use for like the HTTP, like server and client logic? Is it, it hyper? Is it, is it like the, core creates? What is that? Okay. So we yeah, so we mostly We We use a so for HTTP to we use a fork of the h2crate ⁓ that fork is like pretty close to standard like the difference is like We have to be able to speak to things that speak horrifically, to be honest and blunt, horrifically broken, extremely non-standards compliant, H2 should not support this, HTTP2. Yeah, and this is the thing, it turns out that there's a lot of like... Yeah, that's the life of a proxy, of course. clients out there that are actually, or actually in our origins in this case really, like origins out there that are horrifically broken in their support of HTTP2 And HTTP2 is a lot better than HDB1. Where HDB1 you kind of like for something like us, you need to be maximally permissive ⁓ because of just how much stuff there is out there. That just is like, this is not how you should be doing this, but it's been how many decades at this point. and there's just so many different weird things that turns out you have to handle because some like giant customers complaining that you can't talk to their mail server or something. And for HTP2 there's a lot less of those, but there's still a lot. And we still see things like regularly popping up of like some weird thing where you're looking at this and going, okay like the RFC actually straight up says you should reject all of these with a protocol error. And you're like, but like some this like and custody some like interface customer will be like, yeah, but like, you know, like I'm gonna make up an example like Microsoft's mail server or something like does this like does this and you're like, yeah Great, lovely. I guess we have to accept that. for the H2Create, it doesn't really make sense. The H2 crate is kind of targeted at much more typical use cases. for that sort of thing, you kind of actually want this. You want to be a bit stricter in what you will accept and what you will work with. Whereas for something like us, like... we need to handle a lot of these weird, stupid cases. ⁓ And so we have modifications in the H2 crate, but we upstream stuff that we do, and we pull in changes in upstream. we actually at Cloudflake have a pretty good relationship with Sean MacArthur, ⁓ and we work with him a lot. That's actually how I got involved with DOS-related stuff in the H2 crate. whereas with h1, like h1 is a lot more custom. Edward, you can speak to that a lot more. No worries. Yeah. Yeah, as Noah mentioned, like we, and publicly, I think we don't have a patched H2 and it's like used in Pingora Like it is just some of the H2 primitives ⁓ that we're like sending and receiving with. The H1, you, they're like... It is also very closely tied in with, ⁓ I guess, like on the connection level and the protocol level, we're working with the raw bytes themselves oftentimes. And it's not actually that it wasn't, at least when we first developed ⁓ Pingora, also that complicated to ⁓ consider implementing that. ⁓ It turns out H1 is actually very much a minefield, by the way, though. It looks simple. It's a textual protocol. It looks like plain text, but there are all sorts of foot guns you can run into because of its connection state management and things like that. ⁓ But ⁓ we do actually want that sort of control over what is happening, or we want to be able to see also and control that behavior. of what's happening on the connection level, ideally if possible too. ⁓ Some of the things, also because of the customizations and things we are talking about or that Noah was talking about, like we oftentimes, maybe sometimes would actually prefer if we had more of ⁓ like a lower level view of say what's going on on the H2 connection than what the H2 primitives surface to us. ⁓ So likely, whatever we do with H3 in the future or something like that, we would probably opt to do something a lot more ⁓ like working with the protocol on a more primitive level if possible. ⁓ that they're actually will probably honestly be using the some of the open source to Tokio quiche creates that a lot of the that the protocols team at Cloudflare has worked very hard on and rightly so so ⁓ Yeah, think that, so the answer is it depends. have H1 that we kind of handwrote in a lot of cases. And then H2, we were... We piggybacked off of a lot of the hard work that Sean did. ⁓ but ⁓ quite honestly, we do having that control on the protocol level is important to us. So ⁓ we'll see how the implementations change in the future. ⁓ But yeah. Okay, thank you for that. And then I also saw that you have like gRPC support. I wonder why that is. because, well, we have gRPC support because that's actually a thing we have to proxy, not infrequently. ⁓ There's a lot of people who put gRPC things behind Cloudflare, and so we have to be able to speak gRPC and we have to be able to proxy that through and make that work. And so, yeah. There's actually a big blog post on that too that describes what are the strange ways in which we've had to transform our internal traffic ⁓ into something called gRPC web ⁓ in order to be able to proxy that to origins. Okay, very nice. Well, we also had an episode about gRPC and there also it was about gRPC web and all those things so listeners can listen to a past episode about that. ⁓ But okay, that makes totally sense. Is there also caching required for that kind of traffic? Um, uh, no. Almost certainly not. Yeah, that's like, gRPC is one of those things that is pretty much always what we call dynamic traffic, which dynamic traffic is like a fun thing because it is stuff that's not cacheable. turns out like, if you know something that's something isn't cacheable, like ahead of time, there's actually a lot of interesting ways you might want to treat it. One of the things that the first application of it so far has been, ⁓ well, how do we fan in and lens in ⁓ traffic to a given ⁓ origin to the smallest set of machines we can ⁓ within a site ⁓ or a metro? Because the idea is, if we want to give you really good connection reuse, we don't want to be creating connections from every single machine. And so what this basically does is it looks at us if we like for example can do things with dynamic traffic like we can say okay well we don't need to like take a hop ever to another machine to like check if it's in cache or not and incur like the latency hit there let's like take that little spot like let's take that little spot and like let's like do a hop to a machine that like we're pretty confident is going to have a warm connection pool and so like one of the like big traffic things that's going on here is like we're like looking at like crossed our machines like okay well like let's assign like these ⁓ like let's let's assign like the like origin connection pool basically to like this machine or or this set of machines if it's like a giant zone like like some sort of like popular zone that's gonna have just like a lot of like traffic you might like put like one of those on the you're like, you know, that's going to need to be spread across a few machines, like a number of machines for like load balancing and just like keeping resource utilization, like that zone slice of resource utilization manageable. But you can like, basically like lens things in to like get really good connection reuse. And so like for dynamic traffic, like a lot of the work that we've done over the years on dynamic traffic actually tends to go along the lines of like, okay, for this traffic, we're actually extremely sensitive to connection reuse. So there's a lot of even some interesting historic hacks that we had back when we were using Nginx around getting around ⁓ improving the performance of dynamic traffic by improving its connection reuse. For example, Nginx has this interesting behavior whereby you have a separate connection pool per worker. So if you have like 16 cores, for example, that's like 16 different worker threads, each of which has a completely distinct connection pool. So you're going to get often like really bad connection reuse. ⁓ The but like and like a way for example that we used to back when we were doing origin requests from NGINX way that we used to like improve this was we would have like the big NGINX like caching instance ⁓ and then we'd have this tiny instance that like only handles the uncacheable traffic ⁓ and like basically and like the like those uncacheable requests we would basically send to like the smaller NGINX instance that only has like a fraction of the number of like worker threads, ⁓ like worker processes that like the bigger one had. And so like that smaller one would give way better connection reuse. This is the sort of hack that is not necessary when you're using Tokio. ⁓ Because if you're using Tokio, it's like, great. Like I have nice, like this like work stealing thing. Like I have this like nice work stealing thing there. I don't need to like worry about a lot of the performance issues that you have when you're trying to do shared nothing. Where like, you're like, ⁓ all of my caches are split between all of my like isolated instances. no, or all of my connection pools are split between my isolated instances. ⁓ And so like we could just like make this sort of thing work really nicely and easily. ⁓ And like, and magically when we like basically it's like when we switched over from origin fetch being ⁓ NGINX to being pingora, we just had this like massive performance improvement overnight for connection reuse that really positively impacted a lot of like these like dynamic traffic heavy workloads like gRPC. And now with Snowball, with fanning things in more, we have another huge improvement. these are the sorts of things that... One of the big things that's been nice about Pingora and Rust versus trying to do everything on top of NGINX is we can bake a lot of this sort of smart logic ⁓ into our proxies that we couldn't really do with NGINX that easily. And so we can chip these really impressive, really interesting features and bake... all sorts of cool stuff into our services ⁓ with Rust and Pingora that we just could never do historically. Okay, I get it. But then, like, I cannot even count how many times you use the word like connection reuse and all that is around like transport layers. And so I wonder, I mean, why do you even have to care about the application layer? Can you not just like do it purely on a transport layer? Maybe you have to peak sometimes on what kind of traffic, but I wonder like, what's the importance then to actually decode and re-encode the gRPC data? yeah. I mean, what's the cost of that for... I mean... Will it cause the thing? I mean it seems like lot of work to just anyway like proxy it. well, because a lot of times what we're doing is any other traffic at Cloudflare, we're security-related. ⁓ like we're doing security related checks, we're doing like some sort of like, you know, like providing DOS protection or we are like web firewall, like web application firewall type stuff in there. ⁓ But also with GRPC and dynamic traffic, turns out we're actually also in a really good spot to like accelerate performance with that and make things faster just by sitting there and by doing things at layer seven, because like, you know, we can like basically like from, we can basically like we have this giant edge network like eyeballs so like end users will basically like or like services that are like reaching out to some origin will hit us from around the world and all of these different sites and then like we can do things like we can take we can find like the fastest paths basically through our network and like and like substantially improve the like origin fetch performance because like public internet routing is not necessarily very fast you can do lots of like there's a lot of like weird tromboning and things that happen just because of how BGP works. ⁓ And so like by doing a lot of this sort of stuff internally we can accelerate things a lot even for these like dynamic traffic use cases. But then also like it turns out that like this sort of connection pooling from like you know, somewhere that's like really like close to the eyeball ⁓ all the way out, like through to the origin, like, and like the way that we can hide round trip times with like basically like all of this connection pooling and like improving connection reuse. We can also just get like a massive speed up from that. so like, The reason that folks use this is often for security or for all sorts of more typical product use cases. people also get a giant speed improvement. there are some customers where that is actually specifically what they want, ⁓ is accelerating their performance in one way or another. ⁓ Dynamic traffic is also interesting because it's increasingly a large share of what people really care about. about how the web historically operated. You had a lot of static assets. ⁓ In the Web 1.0 era, was basically all stat. It was almost all static assets. ⁓ And then Web 2.0, things start shifting a lot more towards being dynamic. this is a trend that we've kind of seen just picking up more and more over time, especially with AI traffic picking up as an increasing thing. That AI traffic, as it turns out, pretty much entirely dynamic. Most AI-related traffic, for example, is dynamic and like a lot of like growing use cases like not just AI but just generally are all like really really dynamic traffic heavy ⁓ and like involve a lot of like API traffic and a lot of stuff that's like you know all like post requests or like get requests that are never going to be cached. ⁓ And so a lot of those cases where a lot of our users are increasing, a lot of those things that our users are increasingly doing are often less and less cacheable. so like. dynamic traffic has become like a growing focus over time of like what are all the things and so like we've like a lot of like there's a lot of work that we've done that's like okay well like what are ways that we can actually make this a lot faster and make this like this type of traffic a lot more performant than it normally is so Yeah. I also wanted to just get at your point though, too, about like encoding and decoding. ⁓ as well on the application layer. We can really leverage Cloudflare has a really impressive network that we're leveraging in order to do a lot of this acceleration and that customer pay us for. But ⁓ sometimes when it comes to, one of the reasons why gRPC needed to be converted to gRPC web to begin with is because a lot of legacy components didn't necessarily speak that. ⁓ And so when we're actually proxying from different different components within our network or different services within our network really. ⁓ That is, we need to do these sorts of translations sometimes. ⁓ If you were just simply like L4 proxying it, said that, you know, in theory you could do something ⁓ or you would get this sort of similar benefits of utilizing the network, right? But for talking to the different components ⁓ or sending that request there are different parts of the stack that maybe only speak HTTP1 ⁓ then you need to be able to do these sorts of things ⁓ on the L7 layer. ⁓ Maybe unfortunately, right? Okay. Yeah, I mean, I just wondered in the end they are like, it's nice you can do for gRPC but then there are like a lot of protocols and there is no way you're gonna ever support all of them. So in the end there are limits but yeah, if it works out for gRPC why not? It can only be good for the user. sure. Yeah. Yeah. There are yeah, yeah. And I mean, there are other sort of L3, L4 proxying. ⁓ we, as I mentioned, we're really focused on ⁓ traffic that ends up going through like HTTP traffic essentially that ⁓ on our. team really, there are we we are we're actually relatively ⁓ our infrastructure is really important to Cloudflare but our product and and those services and you know the traffic that we work with is also not the whole certainly. Yeah, okay. And then as far as I know, it's purely built on Tokio for the Async Runtime. And it's quite known for the fact that by default it's often used in a work stealing mode. But then I saw in Pingora Runtime that you have something like, you can configure like no steal and steal, meaning no steal, like you have probably like a runtime per thread or something. Like, is that something that you guys utilize in different ways? Do you always use one of them or? always use work stealing except for, actually I'd go ahead, I know what the background, the specific case you're about to talk about is. Yeah, yeah, yeah, yeah, no, no, I mean, I think that the the no steel ⁓ flag that you're talking about in particular, ⁓ was really something that ⁓ we were trying to experiment with for a bit. ⁓ And seeing if ⁓ you would get, I mean, the kind of motivation behind it is that work stealing comes with some, know, ⁓ inherently by design comes with some overhead in terms of thread coordination, right? But when we looked at what the actual, because we have various ways we can deploy things to various sites and and test things out. ⁓ Oftentimes what we found was that the no-steal runtime wasn't actually giving us the same sort of, like the, any sort of performance gains that we might have expected or any reduction in that overhead. Cause it turns out that Tokio is actually very, very well optimized in order to make that overhead as low as possible. And ⁓ when you're dealing with like ⁓ having not all requests be equal, oftentimes like, and maybe even, you know, requesting the same connection, you do often want to be spreading that around or distributing that around to ⁓ different threads in order to make their load, yeah, more equal, the whole purpose behind that. Yeah, that makes sense, but then like I noticed for example in one of your creates like Pingora header 30 like I I Think you kind of hinted at that create a bit in some previous explanation When you were doing this like life of a request kind of like story I think at some point I think you touched on it without mentioning the name but in there I saw usage of like things like threat local and then I wonder like How is that useful if you're also doing work stealing because you kind of like don't even, you're no longer in control in what threads you, all right? So I wonder how is that then used in there or is it not? Hmm, I believe that thread local in that case is like a very It's not often that we that we use all sorts of things in that case. I think it was really specific to like ⁓ Z standard like compression decompression and and like that a thread local buffer that we use in that case because in that We're also I believe not I don't believe we are ⁓ have like a single weight points between between that either ⁓ basically each thread like ⁓ In those situations it doesn't necessarily make sense to have like a shared for example like a shared buffer or a shared workspace between those threads to to decompress to Yeah to do like compression and decompression ⁓ ⁓ When it makes sense, think that having that per thread workspace. ⁓ makes sense in that situation. The really nice thing with like ping-or generally is that we ⁓ often and Tokio is that we don't often have to think about that and don't like when in cases where it actually when we're actually want to be coordinating things between threads it's actually really easy to do to have like a shared ⁓ to have a shared cache that isn't sharded. ⁓ for all the reasons that we mentioned before. ⁓ So I think, yeah, that's like very particular, like localized function there. Okay, Okay, very cool. In that case, I think I asked most of the things I wanted to ask. I wonder if there is like anything that we didn't discuss yet that you think we should plug in or like that you want to still discuss. was gonna say like one of the biggest, you kind of touched on it with work stealing, with discussions of work stealing at the end, but like one of the biggest performance improvements, not just with Pingora, but across Cloudflare stack that we found with switching just a ton of things to Rust has actually been Tokio and like work stealing because... At Cloudflare, we have lot of institutional experience with shared nothing and trying to do performance engineering on shared nothing environments. one of the things that we've learned very heavily over time is shared nothing is actually a really terrible fit for most use cases. And one of the big interesting things that we found with this is just like, there's shared caches, it turns out, is like, that like or like things where there's some sort of affinity like performance affinity where if this thing is pooled across a whole bunch of like, no, like units or instances, like things get faster, like that's a common case. But like one of the biggest things is just like, it turns out that like, trying to balance load like at like the sort of micro scale that like ⁓ latency stalls can be introduced at across a bunch of shared nothing like instances is really hard ⁓ to the point that Like actually, depending on what type of traffic you're serving, it's actually just often not possible to do within the bounds where it will become a problem. ⁓ And so like shared nothing, like one of the big things that like we've had to like that we found ⁓ increasingly is just like moving to Tokio means that like we can do things like, for example, like run a single instance a lot hotter before we start to see tail latency go up because you're not really bound by like the worst performing couple of cores at any moment in time, like work will be spread out. ⁓ And so like you can handle like a lot more like you can handle things like ⁓ asymmetry and how expensive requests are, for example, quite a bit better with work stealing. ⁓ so there's an interesting thing where oftentimes it doesn't really matter if you make things slightly slower on average, ⁓ if you can better spread traffic out and better distribute traffic, because you actually do just get a lot of savings from doing that. ⁓ it turns out that work stealing makes it easier to do lot of things that have a very real impact on how fast people's traffic is. actually loading config, for example. There's a lot of per CDN zone config, for example, that we have. if you have a per NGINX worker cache, it turns out that you are going to have a very high miss rate. ⁓ And if you make that not a per worker cache and if you make that like a shared cache ⁓ that like in memory for everything things get a lot faster magically and you also get to save a lot of memory because you don't have to have you don't have to like massively oversize all of these caches to try to eke out a bit more like cache performance and so like there's a lot of these sorts of things where like we found that like it makes it a lot easier to like it has like just the like writing idiomatic like Rust and just using Tokio makes a lot of our like stuff so much faster and like more effective ⁓ than when we were dealing with NGINX. Yeah, I mean, you don't have to convince me. That's also been my experience. But like, I ask mostly because, I don't know, like, I've been using Rust now for so many years already and every now and then you see these things flare up like, yeah, work stealing, non-work stealing, you have all these other things. Yeah. Yeah, and that's part of why I bring this up is because it's like, someone who like, you know, Edward and I both like, maintain one of the highest traffic rust language services out there ⁓ with like our Pingora services and like, a lot of these, it's interesting to go like... Going through and reading them can often be annoying and I've wanted for a long time to write up a cloud-based blog post on what we see, for example, what types of things we notice performance-wise and do some sort of A-B testing in parts of our fleet, in corners of our fleet. ⁓ But ⁓ the... the like, one of the things that always like comes to mind is it's like, well, like we have all the like, I see like, and regularly interact with so much data around like, how, ⁓ like how these sorts of things tend to perform at scale and like, also like under all sorts of just radically different traffic profiles and things that like, I think I like a lot of that sort of a lot of that sort of like discussion around like, okay, but like, what does like, what do real traffic patterns actually tend to look like and like what are different real traffic like traffic patterns that pop up. What happens when you start mixing them together on the same multi-tenanted systems. What happens when they're all isolated individualized and like what does like and like how do these sorts of differences in usage like lead to different performance outcomes like I think a lot of that is often what's missing because like a lot of the shared nothing conversate versus works to the conversations are like how does this sort of are like how does this work and what's better or not, and it's extremely in a vacuum and often very divorced from like, what does traffic tend to look like? What does load often tend to look like? And like, how are those things shaped? And like, what costs often do they incur on a system? And then like also like, how do these sorts of things often manifest as like tail latency? ⁓ Like, and I think like a lot of that is often like what's what's frequently missing from conversations and like what I think is like a lot of times is like the much more interesting story is like, okay, well, like what are the cases where shared nothing actually performs better and like what is special about those cases? Because like what I found is like there are cases where shared nothing is better, but they're all like really interesting, really special little use cases that are super cool. And a lot of times like the like interesting story there is like, okay, well, what's unique about this that causes it to work a lot better? ⁓ Which is always super cool and interesting. yeah you know even then i guess people can just mix the two also a bit like they could just give all the threat to just doing that and yeah i mean that's what i would do i mean i need to mix those two but ⁓ yeah i i also think it's because I guess our experience matches because we're also both doing proxies and so I think especially in proxies that's definitely something where work stealing helps a lot. At least, yeah, mean what you say where it makes a lot more sense to share all that kind of stuff. Okay, by the way, is there something you also do around like reuse of like memory to avoid allocations as you are like dealing with a lot of like traffic? I mean, do you use like things like ring buffers or allocate the bumper, whatever? depends on, it depends on what the like, depends on the like, we use a lot of different things in a lot of different places. ⁓ but like the interesting thing is like, one of the, one thing that is pretty consistently done across most of Cloudflare stack is use JEMalloc. ⁓ if you're familiar with that allocator and one of the interesting things about allocators is like a smart. like modern optimized allocator does is actually already like automatically doing for you a lot of the tricks that people often like think think of and try to do to optimize their own performance. ⁓ Like, for example, some people will sometimes be like, yeah, well, like I have like a thread local where like I store buffers. ⁓ And I just like look up against a table of like what size, like one of these like standard size buffers I'm like going for. I pull from that. And it's like, congratulations. A lot of allocators do that actually for you. And so when you call like the allocate function, like that's actually just happening for you right there. And like that, that's like using like a like not. default glibc allocator is actually one of the, I would actually say by far it is the most impactful generally memory performance, or memory allocation performance improvement. And jemalloc is again what we tend to mostly use at Cloudflare, but there's a lot of allocators out there and there's a number of ones that are like, very well adapted to specific use cases. so like, is one where like, I think it's actually like really, it's a really good idea often to like experiment with a few and like, sort of evaluate how that performs on your production workloads and like, or in your particular use cases, like what are the performance of these different ones, because different allocators can be optimized for like very different use cases. ⁓ so, like, yeah, like using like a good memory allocator that is not the default glibc one ⁓ is often like, like, is often like a like huge, can often like give you a pretty significant like performance improvement. ⁓ We do have some things where we do like ring buffery type things. ⁓ There's also like lot of like BPF related cases where like when you're writing EBPF how you do like management of a lot of things is very different. EBPF you also don't like you're you're writing like very bare bones like it's kind of like embedded systems but even more so where you don't really like you don't have a heap you have a very limited set of stuff at your disposal and So like BPF just like how you interact with buffers is often quite a bit different. But I don't know like ring buffers specifically it's like there are cases where sometimes you want to use that. ⁓ But I think I would say across the board, the biggest thing that we do is just use JEMalloc or another modern allocator that handles, in particular, concurrent workloads very, very well. so ⁓ other things with memory that are useful that we do. There are a number of cases where I would say I am a person who uses small string optimization and ARC ⁓ STR all over the place a lot of the time, depending on what I'm doing. ⁓ Those sort of heap allocation reduction tricks are used a lot. ⁓ The avoiding, yeah. Yeah, that was what I was going to mention. That seems to be the main thing that we control in the app, like on our layer aside from... aside from, you know, being smart about allocations generally, ⁓ like, you know, we're trying to be smart about not copying like a bunch of bytes around when we don't need to. you know, ⁓ it all comes down, a lot of that just boils down to essentially a glorified arc in a lot of cases. Yeah, and like in a lot of my cases, like I have some like... when you're doing distributed load balancing, you're trying to like schedule traffic around, like you often find that like, you care about like the compute cost of whatever underlying operation you're doing. ⁓ If you're doing some sort of like scheduling, like giant distributed scheduling problem. And so like in a lot of those cases, I actually care a lot more often about the number of like basically like reads ⁓ that are not going to fall in the same cash line. that I do about like how much like, that I do about like the cost of a single allocation because like a lot of times my allocations are like very nicely amortized anyways. Where it's like, yeah, when do I allocate? ⁓ when this particular hash map or whatever or slab or something grows every once in a while. And that's when I allocate. the memory related concern that often pops up is, OK, well, I'm now doing a new read for a following a pointer around or something. And that often tends to be, for me at least, is often the more expensive thing. And so there's a lot of cases where I'm just like, yeah, I'm just going to copy more data around. hold more data here because like that way I don't have to like you know like that way like I don't care if I'm like allocating more memory now like I am jumping around a lot less and so like that can often be like pretty like pretty useful to do. Okay. And that may be like a last question I have is like, okay, for now, Pingoda only supports like HTTP one and two, but I know you said soon you will probably maybe do something based on your other work you do in an area, but in the meanwhile, as far as I know, maybe I was wrong. I do think you can, I mean. because like a browser will anyway like do its tree. Is there something then within Cloudfaire is terminating the h tree? this, the other interesting difference between like oxy and like. between like Oxy and ⁓ Pingora is like, Pingora is a lot more origin facing. ⁓ And like, indeed, like once you get to the like the entire like, you know, cache and origin fetch and like the whole origin facing part of the stack, that's all Pingora. But Oxy, because it was dealing with like these like tunneled connections and was dealing very early on with things like quick ⁓ for like using mask to do like that type of process. like Oxy for example, Oxy ended up basically building out this a bunch of H3 ingress facing type support. so Oxy has an H3 server and this has actually been open sourced in the form of Tokio quiche which grew out of Oxy as the built on top of quiche as the like H3 implementation. so like the, ⁓ like that is like what we are using under the hood for the like ingress facing part of the stack. So right now, actually like we use quiche with nginx is what's servicing still like most of our ingress facing like traffic. ⁓ Like that part of the stack is not as oxidized and rusted as the origin-facing part yet. That's cool. Okay, that answered that question. So then maybe the actual last question is like, know Noah, you have like a very cool t-shirt about Pingora our listeners cannot like see it is, where can we get a nice t-shirt like that? I don't know if we have a channel to merchandise those things. I was about to say we should- I was about to say like... We should do that. We honestly, I think we are a team that really tries, that has a lot of different responsibilities at Cloudflare. We're really, you know. We really love the Pincor community and like it's really, think it's been, the response is both overwhelming and also like not super overwhelming for like maybe a standard like open source contributor who has really dedicated focus time. ⁓ But like we, I think we just wish that we were able, we had a little bit more bandwidth to, know, get open source things from time to time. And in doing that, like to that end, think, like if there was, you know, we would probably want to be fostering more of that community and selling t-shirts, I suppose. Or, you know, like just getting in touch with more. folks using Pingora. Because we don't necessarily know like it when people talk about using Pingora, sometimes it could oftentimes it very much comes to us as a surprise, right? And, you know, we'll... Maybe we want to do more with that in the future if we have the capacity to do so. We would certainly love to do so, but yeah, well, I guess we'll see. For now, I know that only we have these shirts, unfortunately. Maybe I should like get a stack of them and take them to Tokio Con. Maybe that will happen next year. Yeah, yeah, yeah. That would be cool. Okay, so we all heard it, Noah is gonna bring a stack of t-shirts to Tokio conference. That is definitely promised. So anyway, on a more serious note, I do want to thank you for your time. I'm sure you're both very busy and so I'm very grateful. I mean, I learned a lot. I'm sure our listeners will learn a lot. We will link to a lot of the resource we discussed today. And of course if you both have like resource you me to share, I'll definitely send them my way after the recording and we will do that. Thank you very much for having us. Yep. Thank you so much. It's been a pleasure to be on. Elizabeth (Plabayo)
1:37:59 | 🔗
Netstack.fm is brought to you by Plabayo building secure, open, and resilient infrastructure with Rust protocols, and purpose. This show is also made possible by Rama, the open source networking framework. Plabayo offers service contracts and welcome sponsorships to keep building and supporting its ecosystem. The theme music of this podcast was composed by DJ Mailbox. If you enjoyed this episode, don't forget to subscribe on your favorite podcast platform and leave a five-star review. It really helps others discover the show. Thanks for tuning in. We'll see you next time for the next handshake.