On this page On this page
episode 37: dial9: from black box to insight in Tokio.
In this episode of Netstack.fm , Glen talks with Jess Izen and Russell Cohen from Amazon about Dial9, a new tool for understanding whatβs happening inside Rust and Tokio applications. They explain how it captures events from different layers of your system and puts them into a single timeline, making it much easier to debug tricky performance issues and unexpected behavior.
They also share how Dial9 came out of real debugging challenges at Amazon, where engineers often had to rely on complex, low level tools. The goal with Dial9 is to make those kinds of insights more accessible, so developers can diagnose problems faster and with less guesswork, while the tool continues to evolve with new features and improvements.
If you like this podcast you might also like our modular network framework in Rust: https://ramaproxy.org
00:00 Intro02:01 Meet Russell and Jess05:27 The Mission of the Rust Team at Amazon11:47 Integration with OpenTelemetry and Tracing13:49 The Evolution of Dial 917:17 Comparing Dial 9 with Existing Tools20:07 Continue exploring history and development and UX of Dial933:47 Building Self-Serve Solutions36:22 perf sched38:30 Task Dumps41:17 Dial 9: almost a free lunch43:00 Cross-Platform Considerations for Dial 946:20 Future Features and Improvements for Dial 950:20 Dial 9 usage today54:31 Considerations before using Dial 901:03:05 Getting Started with Dial 901:05:30 Outro
More information: https://netstack.fm/#episode-37
Join our Discord: https://discord.gg/29EetaSYCD
Reach out to us: hello@netstack.fm
Music for this episode was composed by Dj Mailbox. Listen to his music at https://on.soundcloud.com/4MRyPSNj8FZoVGpytj
Elizabeth (Plabayo)
0:13 | π
This is netstack.fm, your weekly podcast about networking, Rust, and everything in between. You are listening to episode 37, recorded on April 28th, 2026. In this episode, we welcome Jess Izen and Russell Cohen from Amazon to discuss Dial9. We talk about how Dial9 captures runtime, kernel, and application events together. and how this approach helps you better understand what is really happening inside your system, especially when debugging complex problems. Let's begin. Welcome to another week of netstack.fm. This week we will talk about Dial9. It's a very exciting new tool that hooks into the Rust ecosystem and Tokio in specific. It labels itself as a low overhead runtime telemetry for Tokio There is a lot to unpack here. It's a very new tool. There is also a very exciting horizon for it I bet. But before all that, I want to introduce the guests. They are Jess Izen and Russell Cohen both work at Amazon and they are the creators of Dial9. So welcome both of you. Russell Cohen (Amazon)
1:36 | π
Thank you very much. Jess Izen (Amazon)
1:37 | π
Thanks, good to be here. Yes, and of course I hope I pronounced your names correctly. That's always a difficult one. Okay, that's a good start. So with that said, it's the first time that we meet the both of you. Some in the community might know you already. Still, I would like to get to know a bit, the both of you a bit better. Jess Izen (Amazon)
1:44 | π
Perfect. β Go for it, Russell. Russell Cohen (Amazon)
2:01 | π
Sounds good. Yeah, so my journey to being here kind of started during the pandemic. I was freelancing and as I was freelancing, I started getting involved Rust, both the Rust compiler and also I worked some semi-popular open source Rust libraries freelancer during the pandemic, got pretty boring. I decided I wanted to find a real job and I found a job writing the Rust SDK for AWS on Reddit and kind of the rest is history. I joined to start the Rust SDK and worked on that for about four years until it went 1.0 and then I switched to... Amazon has the central Rust team where our job is really to make Rust builders at Amazon successful. And I switched to really start leading the efforts focused on internal Rust builders at Amazon in about 2024. And I've been there ever since. And what about you, Jess? How did you get here? Jess Izen (Amazon)
3:06 | π
All right, Jess, Izen so I am a bit of a non-traditional background. I don't have a CS degree I used to be a bicycle courier and we needed a online ordering platform to compete with Grubhub And then in the process of helping to build that I hopped over to the website the company that did build it and then I was a project manager and then a backend engineer and bada-bing bada-boom. I wound up software I didn't know Rust when I started working at Amazon. I joined a team that was doing advanced bot control stuff, similar to like when you think about cloud players, bot protections, similar problem space. And they were using Rust for a mix of things, both like a pretty high performance data plane, but also like control plane and things higher up the stack. I love the language. then a year ago, I joined Russell's team working with like Nico and Carl and a bunch of folks in the rest community. I'm really excited to be able to take things that we were doing and figure out how to scale them to make it successful, you know, make them useful for Rust in general. More broadly, it's like something I get a lot of excitement about in my job. Right now I'm pretty focused on sort of the telemetry and metrics story in the network services trying to get it so that you get You know, when you're using a Java network stack, see packets in, packets out. I want that for Rust. That's me. Very cool, thank you for And did I understand correctly? So this is a team specifically focused on open sourcing software and working on tooling or what is the context of this team specifically? Russell Cohen (Amazon)
4:40 | π
Our team is really focused on whatever it takes to make Rust successful at Amazon is sort of in our wheelhouse. So some days that's working on some internal package, some days that's upstreaming contributions to Cargo or the Rust compiler or Tokio, and some days that's working on external libraries. We try as much as possible to build software that's open source first. So to that extent, it works on open source stuff. But our ultimate mission is mostly to make Rust successful at Amazon, whatever that needs to be. Jess Izen (Amazon)
5:20 | π
And our position is Rust being successful at Amazon is Russ being successful in general, right? And it's two sides of the same coin. Yeah, that's very cool and very exciting. course, Amazon started like long ago and it is not one of those newer companies where Rust was there from the first days. It has gone through a huge journey and Amazon is investing a lot in Rust. Do you still need to convince people at Amazon that Rust is there for the benefit of them or what was the story at this point? Russell Cohen (Amazon)
5:54 | π
Good question. I think we're seeing a lot of new projects choose Rust. Ultimately today. I think we help people make informed choices about Rust much more than we try to convince people to use Rust. The worst thing that can happen is that some team picks Rust and it's not a good fit and one year later they deeply regret it. So we're really all about giving people the tools they need to make educated choices. Yeah, I mean I couldn't agree more because of course it's just a tool. It's one of many great languages and depending on what you want to produce or what is the context. you might or you might not use Rust so I can appreciate it very much. Now something I also think I understood from earlier is that you looked at what are you producing internally, what is working internally, the things you're finding internally and you want to kind of bring that into the open. Is that also so for Dial9 like was there some predecessor that you were using or some kind of bunch of tools? and Dial9 is kind like evolved version. Russell Cohen (Amazon)
7:03 | π
I think in some sense, there is no tool that was Dial9 before, there was, there's two things internally that led to Dial9. One is this idea I see a lot inside of Amazon, but not so much outside, which is that... rather than thinking about metrics as the unit of things that you're producing, thinking about producing events and deriving the metrics from the events, which means that when you were trying to debug, you actually had the exact events available. And this is a pattern at Amazon that teams have used to make it much easier to debug complex problems. So that's the first thing. And we'll, talk about how that fits into Dial9 The second one is I had people very skilled engineers debugging really hard problems and they had a fluency deep system tools like changing kernel scheduling settings or running their workload under perf schedished to look at the scheduling trace and back solving what must have been the problem or running profilers and esoteric probing modes to look for certain problems. And for a lot of rust problems at Amazon, this seemed to be what it took. And Dial9 is kind of the combination of two of those ideas, which is one, if you can record all the events, it's much easier to debug. And two, making it possible for anybody to get access effectively to those data streams that exist and are not that hard to use, but can be quite tricky to get in practice. I see. And a listener mentioned that also he had a conversation with you Russell about Dial9, where you mentioned that production systems at AWS is highly restricted. Does this constrain like shape in any way the tool is built? And let's say you would be allowed to access production systems, what would be unlocked by that that you couldn't do now. Russell Cohen (Amazon)
9:06 | π
Uh, yes. So the things that you can't, like, for example, you cannot simply SSH into a production host at AWS. I think this should be not too surprising. And a lot of those tools like getting perf event data, you simply can't get without SSHing into the you operate in an environment where you can, think some of this benefit of Dial9 is negated, but still simply being able to get all the data. into a trace is quite powerful. I think if you operate in an environment where you had very little restrictions on what was available in production, then you could imagine that we build some modules for Dial9 that access extremely privileged APIs run with dangerous levels. Those are the scare quotes of kernel permissions available. beyond what people at AWS use. Is that the question you're asking? Yeah, yeah, that's perfect. And I can imagine even if you have that access, there are still things you would want to do anyway. So that makes totally sense. Thank you for that. Now you mentioned around events and you wanted metrics driven. You also mentioned in, well, you didn't mention it here, but like in the read me, talks about, it wants to be, Low overhead runtime telemetry. So we hear the word telemetry there. You also mentioned the word, span. So all of this makes me think a lot about open telemetry is that what you are building upon, like using as maybe the transport vehicle or I don't know medium or whatever you call it or it is just something very closely related Russell Cohen (Amazon)
10:51 | π
I think they're closely related. I think there's a world where Dial-9 and OpenTelemetry eventually sort of fit together. least at AWS, very few people are using OpenTelemetry, so having a standalone system was pretty important. The way that Dial-9 data gets out to host, and generally, is that Dial-9 will write it to disk, and then a background thread will upload it to S3. So separately from, it's not using an open telemetry collector or anything like that. It more borrows concepts, I would say. And we're talking about the deadline trees format itself. think it will become sort of relevant why it is its own format and not an open telemetry specific format. But there's absolutely a lot of overlaps. Personally, I'm not a deep OTel expert, but. Okay. Yeah, yeah, well, I think very few of us or even though we might admire it now In Tokio there's also the tracing ecosystem and for example it is compatible with OpenTelemetry but it's not specific to OpenTelemetry. It also is used I believe within Tokio itself and adding a lot of events, logs and all these things. Is that something you also then make use of within Dial9 as a source of information or is that not yet the thing? Jess Izen (Amazon)
12:16 | π
Yeah, I recently added a tracing layer. We have the ability to emit custom events inside the trace, which is really nice because then you have a single place to go and correlate a given request ID with the behavior on the And so the tracing layer will essentially inject some field value pairs per span, enter and exit, is really nice because then you can see inside a given Future, if it has a lot of individual sub Futures that go forever, they can result in a long poll, you can see all the spans entering and It does have somewhat high overhead. Tracing in general is not the cheapest, β so you don't want to instrument every possible future. We're still figuring out a better model for really low-overhead custom event emissions. Probably that will look, at least in the short term, like just a nicer API to dump your own events in and then... β We have a really low-overhead metrics library called the TREQ that we'll probably But today, yes, you can set up your tracing layer and then, you know, it'll just magically dump all your field-value pairs into your Dial9 traces. Okay, very cool so most of us learned about Dile-9, I think, because Carl asked you to make a guest blog on the Tokio blog, and that was on the 18th of March. I imagine you were developing this tool already, like, a lot earlier. Like, at what point did the development start? And... what problem specifically did you have at that point in mind and also what goals maybe give us like a rough timeline from there Russell Cohen (Amazon)
13:49 | π
Yeah, absolutely. So the ideas for Dial9 started about two years ago. Someone said to me, someone building an AWS service, I wish I could just understand what was happening inside of Tokio. at the time I really didn't have a good answer for them. it was sort of ruminating and ruminating. was also, was seeing people debug these complex problems tools. Then... In February of this year, so what is that, four months ago? About three months ago? β I got a DM from someone who was trying to ramp up traffic to a Rust service at Amazon. And the performance was really weird. Once it got to 90 % CPU, the latency went crazy. And they had actually read my previous blog post, some internal Amazon blog post about adding metrics. β recording Tokio Metrics and the metrics that they had didn't really make any sense because it kind of looked like the runtime was just sitting idle even though it was also very busy. And if it became pretty clear, we weren't really going to be able to understand and give them a satisfying answer, unless we were actually able to record all of the events in order. know, poll start poll stop the worker parked, the worker be able to actually at them afterwards and try to understand. kind of what ordering of actual events led to these somewhat very confusing metrics. So that was February. The first version of Dial9 probably only took like a week because it was quite simple. doesn't have, we'll talk about the Dial9 format that it is today. The Huorai back then was much, it was still very compact, but it was much simpler. It wasn't open-ended. didn't support arbitrary events. It really just supported Tokio events. So it subscribes to the existing Tokio runtime books and we just record events into the trace each time we get one. And what we eventually saw in that case particular application had terrible kernel scheduling delay. So just like Tokio, the Linux kernel also needs to schedule user space threads on a limited number of CPUs in response to them being ready to And in this particular case, because Rust code was competing with a Java process with 16,000 threads. They unsurprisingly here is relative. The service P99 is 10 milliseconds. So when you were delayed for 10 milliseconds by kernel scheduling delay, you're immediately in a bad position. And from there, it really started to show that this was going to be a useful tool. Kind of from then to now, it's really been about expanding the scope of the data that comes into it, productionizing it so that it's something that we can really give to people with confidence that they can go run in production and doing the work that Jess is doing to start to bring application level events into the trace as well. Because we very quickly saw, you know, I had a friend try this at their company. and they sent me a screenshot of their trace file being like, what is happening? I have literally no idea. And it's because if you don't have application events to correlate to the Tokio events, it's very difficult to see what's going on. I see you mentioned okay you started two years ago and also there were these people coming to you with these problems now at that point there was already this tool called Tokio console was I think I think it's like more of an official Tokio project and it is not exactly like Dial9 actually it's a very different tool but still they hook also into these runtime hooks they do show you something Russell Cohen (Amazon)
17:29 | π
Yes. Was that ever useful in those kind of cases or are we talking about actually like a subtly different problem space? Russell Cohen (Amazon)
17:52 | π
I think it's a little bit of a different problem space. We have seen teams and Amazon try to use Tokio console and there's two main problems. One is that Tokio console is really for the problem space of I'm literally looking at it, having a problem right now, help me figure out what's going on. Especially in the case where you are have handwritten futures and awake is missing or a deadlock, things like that. It can show the pattern, but because it uses the runtime tracing for inside of Tokio and for a few other reasons, you really can't run it safely in production. It simply is not designed in a way that it can be run safely in production, at least not inside of AWS. I'm sure people are running it in production elsewhere. And because it doesn't... provide an easy way to retroactively look back and be like, okay, this happened, something bad happened. Now let me sit down and really try to digest the exact timeline that led to this problem. yeah. It makes me think that the use case of Tokio console actually is fairly limited. Like, I mean, it's a very specific problem while something like Dial9, to me at least, shows a lot more potential. So that's cool, yeah. Jess Izen (Amazon)
19:07 | π
I mean, they're, you know, it's an evolution, right? Didn't Tokio console come out of our team in the first place? Like didn't Jack ran work on that? I mean, we're, you know, that was when people were really learning basics around troubleshooting, async systems and just like fundamental mistakes in how they were designing futures. Whereas Dio9 is like in user space, what you have looks right, but things are still going wrong and you need to understand that more deeply. They're just solving a different layer of problem. Tokio console is still great as an entry point. Think of it like a rep list tile, you know, debugging. Russell Cohen (Amazon)
19:12 | π
Yeah. Yeah, yeah, correct. But at the same time, if I think back on some war stories where I really was stuck and I think, okay, now I really need this kind of tool. Rarely did I have the luxury to be able to say, okay, I can exactly reproduce it or I can exactly look at this is the problem like that. Those almost seem like simple problems, even though I'm sure they were difficult one in that space, but seems I didn't really encounter them. But yeah, fair point, Jess that's definitely true as well. So two years ago you started, people had issues. the first version took like around a week, you said. you had like a pretty simple thing where it worked there already showed value at that point what was missing and and how did you move forward from there Russell Cohen (Amazon)
20:25 | π
Yeah, so from the original version, which just had Tokio events, I kind of hit the space where I kept trying to add things and I would add it and it would kind of work, but I knew it wasn't quite the right fit. And there were also things we just knew it needed. I think the first version would just record files to disk and the operator had to go retrieve the files, which is obviously quite, it's not scalable, it's very annoying, and operator still has to in some way access the host. So basic things like getting the files to S3, we worked on, but the real work happened in splitting out a separate dial line trace format where events would be serialized into that would allow us to have a format that still had the benefits of the format, which is that it's very compact and very efficient to write, but it would also allow us to have open-ended events. So you could at runtime or compile time define custom events in your application and emit those events. Jess Izen (Amazon)
21:26 | π
. You're skipping a little bit, Wasn't the sort of looking at type syscall is one of the first big things we added after the runtime. And that was like where we started to see the limitations of the scheme as we needed to add new event types. was this game of adding more and more wire. Russell Cohen (Amazon)
21:33 | π
yeah, yeah. Yeah, absolutely. Yeah, exactly. As once we had the basic format, then well, it would really be nice if we could, this could also be a profiling tool. So was like, well, how hard is it to actually subscribe to Linux performance events? And the answer is like kind of hard, but not that hard. And we're definitely still in the era of, well, it works on this instance type, but if you're on Fargate, you need a different API and we're continuing to work through those issues. But yeah, exactly, we started adding more and more different types of data into the trees, which started as hard-coded schema stuff, and then eventually we were able to move out into our open-ended format. Okay, and so what does open-ended mean? Because you mentioned already like a couple of times, but I still cannot really figure out what you really mean with it. Russell Cohen (Amazon)
22:31 | π
what I mean is that, self-describing, it means you can say, "I have my own event, the fields are these" and you can create the event and say, "Dial9 record event", and we will record that event, and then you will be able to extract it later from your trace file. The original version of had four events that it knew about, and you could not add, this was the full set of events. Jess Izen (Amazon)
22:32 | π
Self-describing. That's the word you're looking for. Russell Cohen (Amazon)
23:00 | π
There was no way to add additional events to the format. Jess Izen (Amazon)
23:03 | π
Concretely, there's a header that describes fields and values per event type, and then it's still fairly compact on the wire because individual event records are not containing the full schema. But there's the header that lets us do schema evolution without updating the format for every new field. Okay, got you. And so as a user of Dial9, does it feel the same as if I'm adding like a tracing events or is different? Russell Cohen (Amazon)
23:28 | π
I would say it is pretty similar The difference is that in tracing the way the macro works today is it basically magically will sort of define some compile time constructs when you use the macro. With Dial9 today you would have to make a struct derived trace event and then emit that at the moment that you wanted it. I suspect we will eventually just make it so that there is a full, we already have span subscriptions. There's no reason why we can't get tracing events into the Dial9 trace as well. And you could literally use tracing, subscribe to Dial9, and you'll just have your tracing events in your Dial9 trace as well. Jess Izen (Amazon)
23:57 | π
Sure. Yep. Okay, and so what I also seem to understand from this history that we are going through is that from very early on it was clear to you that you want something that hooks into your process, something that's embedded in your process. So it's like a dependency you import, you enable it, and the way you achieve kind like security is by the fact that nothing can access that because it is just a producer of some kind of files which it uploads to somewhere that it access to but the other side doesn't need access in the other direction it's only like it's producing the files so it's kind of like a control and you mention S3 because I can imagine it's a natural fit for AWS but I imagine that you can bring your own exporters and do whatever you want I suppose Russell Cohen (Amazon)
25:02 | π
Yeah, that API doesn't literally exist today, but it's... This week, I think we will add it to be able to add your own export destinations. Exactly. There's nothing... Jess Izen (Amazon)
25:07 | π
This week though, we're working on it. And we might offer some turnkey stuff with Google Cloud, but also just bring your own I.O. destination. Russell Cohen (Amazon)
25:17 | π
Yeah, exactly. and and if you're in a locally Jess Izen (Amazon)
25:19 | π
I'd love to see it integrate with OTLP for what it's worth. think that'll be pretty useful. Just given the sheer amount of tooling around OTLP, OTel's wire protocol. yeah, I mean, in general, like OTel or there's also something that I don't like about OTel but it definitely has great potential. And if you can make use of some of this tooling or protocol, that could be cool. Now, if you use it locally, is it just going to write files to disk or how does that work? Russell Cohen (Amazon)
25:46 | π
Yep, you can leave it writing files to disk and you can go look at those files. For our own internal testing, we just run a local S3. So you could also use that locally, but there are people who have discussed some clever collecting locally to live view files, which I think is a good idea. haven't explored that deeply, but yeah, exactly. Locally, we just push the files to disk and then you could read them. Okay, very cool. it seems that quite quickly you came to somewhere that is very recognizable from the tool list today or was it at that point that we were in the story? you would see the tool at that point, what would you be missing that you have today? Russell Cohen (Amazon)
26:28 | π
Yeah, I think very early on, maybe probably about a month in around, I think I wrote the Tokio blog post about one month after I started coding in earnest on the tool. And I think at that point, all of the major pieces were pretty much fully formed. I think we had the format and S3 was working. There's a lot of nuance around making it actually work well in production that was basically the next two months, but the core pieces were there. Jess Izen (Amazon)
26:59 | π
Custom events came later. To me, the biggest, the biggest thing I think that really day one was like Tokio instrumentation. Day 10 was capturing kernel scheduling delay that was delaying runtime worker threads. feel like that was the moment where it became sort of net, net new functionality, Russell Cohen (Amazon)
27:01 | π
Yeah. We're very lucky that the Go ecosystem has multiple versions of their flight recorder, which is essentially the exact same thing. If we can make ours as good as Go's, that would be great. And that was a... β a lot of inspiration for the design because I could look at what they did in V1, I could look at what they did in V2, and I could look how they rewrote it and all the mistakes that they made in V1 and basically say, aha, I will not go down that path, I will go down this path instead and probably it will work And the design has definitely been slowly converging more and more to basically be exactly what the Go telemetry recorder does internally. Okay, and. You mentioned already, okay, you kernel events, syscalls, a lot of service and production are running within a Kubernetes cluster and within containers. Do you also gather events from this because like, let's say a cluster or the permissions for, I don't know, there are all kinds of stuff that influence your app. Russell Cohen (Amazon)
28:20 | π
Yeah, exactly. It's more limited. We don't have a great story. Running in containerized environments, can still basically get profiling data so we can see what your code is doing on the CPU. But unless you have... fairly elevated kernel permissions, which I suspect most Kubernetes clusters do not have, you're not going to be able to get the really detailed data. Everything is sort of additive. Like we can get this data stream and this data stream and this data stream. So at the absolute minimum, we can use the timer_create API to basically get the kernel to send us signals at a particular interval. And when we get those signals, we can walk the stack. and basically make you a flame graph. And if it's slightly better than that, the kernel could do the stack walking for us, which is faster and more reliable. And then even more reliable than that, or the next level of events is we can actually get the kernel to send us a stack exactly when it stops running the work on the CPU. We can talk about that. more later, but this turns out to be extremely useful for Tokio applications specifically if your code is polling, if your feature is actually running the kernel decides to stop running you at the CPU, is basically 100 % of the time, is bad. you know, something wrong has happened. see okay very cool learned already quite a lot about this too now today there is also a dashboard is that something that also came quite early or however you consume it in the beginning Russell Cohen (Amazon)
29:54 | π
Yeah, the dashboard came really early. This is a modern AI tooling things possible that wouldn't have been possible a year think the first version of the dashboard probably took 15 minutes to say, OK, here's the trace file. I want it to look like this. And I was like, OK, I don't know. And all of a sudden, it. Jess Izen (Amazon)
30:15 | π
Yikes. Russell Cohen (Amazon)
30:17 | π
A pretty crappy but like functional version of the dashboard worked. Jess Izen (Amazon)
30:20 | π
Yep. Day three, I'm like, okay, now make a keyboard accessible. It's pure technical depth We expect to fully rewrite it, but it's functional. Russell Cohen (Amazon)
30:24 | π
You Yeah, exactly. Yeah, but at least you can start playing with it and otherwise who knows how long it might have taken. Russell Cohen (Amazon)
30:36 | π
Yeah, absolutely. Jess Izen (Amazon)
30:37 | π
Yeah, in practice, you know, it's neat for us debugging and it's neat for sort of orienting yourself, but we expect the sort of the CLI experience to be more impactful for day-to-day debug and we expect the viewer to become more of a like jump to this spot and the viewer to see something you've discovered out of band. I don't think ultimately it's going to be the primary mode of access. It'll be more for dumb human to keep up with what agent is doing. Yeah, so that would be my next question is like, okay, you have all these files. I suspect if you use like some agents or elements, whatever you want to call them. I imagine they will just inspect these and be able to find at least something or give you some clue like where the problem might be. Russell Cohen (Amazon)
31:21 | π
Yeah, We took the same algorithms that power the dashboard and we basically put them in a folder and there's a decent amount of prompting to explain to the agent like this is the file format, this is how you parse it, this is how Tokio works, this is how the Tokio events are persisted, here's some common problems and yeah, it works really well to be able to ask an agent to go. Jess Izen (Amazon)
31:38 | π
Your red flags. Russell Cohen (Amazon)
31:48 | π
analyze and you'll also be, can also start to do things that very bespoke that I think would be hard to do in any other mode to ask questions like, oh, for this specific event, for events where it's greater than the P99, what exactly was running on the CPU during these events that were great on it, you know, that was in these anomalous cases and diff that against the CPU in the general case, et cetera, et cetera. These very clever analysis that I think would be really hard to without. It's actually the ability to run arbitrary code against the trace files. Very cool. And do you offer some kind of, I wouldn't say prompts, like some kind of commands that you can say like, if you're in this command, you get this kind manual for like a LLM that's how the format looks like and that's how you look at it. Jess Izen (Amazon)
32:35 | π
There's a binary crate with both the viewer and a toolkit. And then the toolkit can be used by a human. It has prompts for agents. It's got helper functions. It's all currently JavaScript. We find that's been nice for iterating and tweaking. And it's convenient to be able to share the same utilities from the viewer to CLI. But yeah, you can generally buddy at it and tell it to call cargo, what is it? Cargo dial9-viewer agent. Russell Cohen (Amazon)
32:35 | π
Yeah, exactly. Jess Izen (Amazon)
33:01 | π
or something like that. Russell Cohen (Amazon)
33:02 | π
Exactly. Very cool. Is it also part of the repo or is it like some separate repo story? Jess Izen (Amazon)
33:06 | π
Yep, Itβs in the tree, and itβs published as a standalone like Crate, you can install or bin-install. So you can find it in the very cool. And so, okay, so we learned a bit about the history, we learned a bit about you, we learned a bit about how we got here. Now, as it's been in development, you say, close to two years, yeah, maybe the idea, maybe the idea was like, yeah, okay. Russell Cohen (Amazon)
33:26 | π
Sorry, the original conception was the idea. Yeah, yeah. Jess Izen (Amazon)
33:30 | π
people solving these problems with collection of tools for two years. Okay, okay, now I get it. Okay, and then you set like... Jess Izen (Amazon)
33:36 | π
But it's been, what, a few months, right? That's all for the Dial9 itself. Yeah, yeah, because you mentioned that you only started like a month before you wrote a blog post. So yeah, that's gonna be too long. Jess Izen (Amazon)
33:45 | π
Yeah, so I think Russell gave a little bit of background, but basically we've seen people solve these sorts of problems by hand and it came up again and we said, we're not going to have another expert parachute in and solve these problems by hand. Let's build something that's self-serve. mean, so was a good motivating factor to figure out how to abstract this stuff into tooling. Okay, actually, that is pretty interesting though. Maybe we can, because I think there's a lot we can learn from this. Can you maybe think of one interesting example that is relatively small in scope, but still, how did this person solve it by, like you mentioned, by hand? Because in the end, there are tools, there are the Tokio hooks where they're like all manually probing these. And can you tell a bit about that process? Jess Izen (Amazon)
34:27 | π
They were forking Tokio to toss things in. They were doing a lot of out of band joining of like kernel side stuff with Tokio with just like scripts, layers and layers of Python scripts and visualization What else, Russell? What are they doing? Russell Cohen (Amazon)
34:42 | π
Yeah. I know, for example, I know someone was running into Tokio scheduling delays. What they ended up doing was they loaded up eBPF probes against Tokio to try to see exactly when Tokio was running certain methods. And then they ran the service under a load test. Obviously this is not in prod. And from this, they were able to see, yes, a kernel scheduling delay was preventing. Tokio from waking up the next worker when one worker was unable to keep up with the load. But the second worker, because of kernel scheduling delay, didn't wake up for another 20 whatever. And then what ended up causing the scheduling Another story that was inspirational for this is kernel scheduling story, their P99 was bad. and they couldn't really figure out why. And then they ran their program, I think locally, or either that or in a pre-prod environment, with perf-sched, which shows records way more data than Dial9 records. It records every single context switch in the kernel. And from there, what they realized is that when the Tokio worker was delayed to wake up, it was always because of tracing-appender So they moved tracing-appender to its own thread, they pinned their Tokio runtime to a separate set of cores, and then their P99 got way better. β So a lesson for all listeners is to use tracing-appender to put it on its own thread. Russell Cohen (Amazon)
36:19 | π
Well, it's always on some thread, in it's on core. Yeah, exactly. Jess Izen (Amazon)
36:20 | π
Or use non-blocking, yeah. yeah yeah at and and you were mentioning what you said there like did they because it got more information than than Dial9 provides what they were using specifically Russell Cohen (Amazon)
36:33 | π
Yeah, there's a command called perf s-c-h-e-d, which when you run that will cause the kernel or perf will record every single context switch. So anytime that an operating system level thread is run or de-scheduled by the kernel. And from there, you can create this timeline of all your CPUs and all of your threads and see what is actually active at any given time to understand the causes of these problems. That must be a massive amount of information, Russell Cohen (Amazon)
37:04 | π
Yes, it's a crazy amount of data. I have ideas for how we can get that much information into Dial9, but I haven't solved the problem yet, so... Jess Izen (Amazon)
37:13 | π
That's going to need tail sampling or something for it to be, yeah. Yeah. Russell Cohen (Amazon)
37:14 | π
Well, 'll have to do something clever to be able to do that usefully in Dial9, but this is an example of a story watching someone do something very clever that normal people I think would really struggle to figure out how to accomplish. And how do they do that? Like, how do they make sense of all this data? Like, are there some techniques they use for that? mean... Russell Cohen (Amazon)
37:34 | π
Yeah, there's a command called perf that basically lets you like scroll through the events and it's not, I did eventually, I tried it at one point. It's not easy, but it possible a very small number of threads that you actually care about, namely the Tokio runtime workers. So it allows you to reduce the crazy amount of events. to a much smaller amount of events where the thread that comes onto the CPU as a Tokio runtime worker and you can further filter when it was delayed. Or if you know this very specific instant when you had a high P99, you can go focus at that instant. Yeah, detailed techniques that require a lot of understanding of internals to use. Jess Izen (Amazon)
38:23 | π
And more data to identify problems than you really need. They're so noisy compared to being able to instrument the right parts to start with. Okay, and so you've been at this now a couple of months where you iterate on the Was there something missing in Tokio that you, for example, had to contribute to Tokio itself to make it work? And can you give some examples of that? Russell Cohen (Amazon)
38:45 | π
Yeah, largely all of the hooks are there. We want to be able to add task dumps. So that means when your test goes idle, being able to capture a stack trace of the actual future itself when it goes idle, because this allows you to see what was actually happening when your code was not running. If you look at a Dial9 trace, one thing that is very surprising when you first look at it is a tiny amount of the time is actually your code running. Most of the time it is waiting for something. And task dumps would let us see that. We still have not actually landed task dumps in Dial9 because feature that landed in Tokio to make it possible to take user space task dumps is quite new. so it requires the latest version of Tokio. But I think we'll probably try to task dumps. Jess Izen (Amazon)
39:38 | π
and explain why we needed to land user space task dumps. It's a fun down line story. Russell Cohen (Amazon)
39:41 | π
Yeah, exactly. Yeah, so the first time that we tried to add task dumps, which was quite early in the development, I added them. Tokio does have an API to capture a task dump. But immediately, the performance of the application that I was testing was terrible. the overhead went from like 3 % to 100 % overhead. And in the dial line trace, I could see why, which is that it turns out calling backtrace trace actually has a global lock. Which is bad, obviously. There are good reasons why it exists. β Namely, it's because of the race between symbolization and unwinding. So to avoid having to deal with that race, they simply have a global lock on everything. But every time we took a task dump, it would try to acquire this global lock. And of course, it's a heavily contended mutex. Your thread gets descheduled, and your whole application performance gets quite bad. So our upstream, which we eventually will work to build, to fully extract task dumps from Tokio itself and basically make them a library, makes it possible for Dial9 to have its own unwinder. Because the Dial9 already has the machinery to do frame point or unwinding and already has the machinery to do symbolization. So we don't actually need the full complexity of what the backtrace crate is doing. We can do a much more efficient stack unwind. when we do a task dump, which should make it possible a 100 % of the time, but close to 100 % of the time for most applications take a task dump every single time your feature goes idle. does sound expensive to me too, but guess maybe it's less than my intuition says. Russell Cohen (Amazon)
41:22 | π
It's not free it's also... Many applications spend very little of their time on CPU. Most of the time they're actually just waiting for I.O. So taking a task dump, takes, let's say it takes 200 nanoseconds before you go idle and then you're gonna wait five milliseconds for some networking request, it doesn't end up being a complete blocker for your application. Jess Izen (Amazon)
41:39 | π
About the same as the tracing layer. Okay, and so as you said, Dial9, you can kind of think of it as layers of sources of information. Tokio being one of them, does that mean that Dial9 will still be useful to people that don't use Tokio? Russell Cohen (Amazon)
42:06 | π
Today, not exactly, but probably in the next few weeks, we will fully separate out the Dial9 core, which has nothing to do with Tokio. And then the Tokio instrumentation is just, could think of it a plugin or a layer feeding data into Dial9. Yeah. So already, I suspect we will have internal customers at Amazon that don't actually want the Tokio stuff. At least they don't want it all the time. They really just want profiling data and they're just using Dial9 as basically a bus to pull in profiling data from the Linux kernel and get it to and enrich it with and enrich with their custom events. So yeah, exactly. The Dial9 started for Tokio and it is still a little, there's still some tendrils in there, but I'm working on extracting all of the tendrils to make it so that Tokio is not special in terms of how it interacts with Dial9 Jess Izen (Amazon)
42:40 | π
and enrich it with custom events. very of course given it's mostly focused on these kind of hard problems bases that often happen in like production infrastructure we usually think about Linux but some people are on Windows is that something that is currently supported or is like out of scope or I don't know what the story there Jess Izen (Amazon)
43:14 | π
Russell Cohen (Amazon)
43:19 | π
Yeah, the only things that are Linux specific in Dial9 today is these Linux operating system events. Presumably there is a Windows equivalent of it and presumably somebody if sufficiently motivated could basically write a Windows Stack Trace sampler for Dial9 and those things would flow in just fine. Everything else is not operating system specific. And let's say, someone has their own source of information and they would like to somehow tie it Jess Izen (Amazon)
43:50 | π
in. It's not like an event because it's not in their process, but it's from some other source. Somehow they got information. Can they easily hook it into the system or what's the story there? Russell Cohen (Amazon)
44:02 | π
The best way if it's not, you know, for example, if they have external logs or something is to have something like a request ID in their logs and then emit an event in Dial9 that has the same request which will then allow them to correlate between the Dial9 events and their external events. This is somewhere we're having being able to analyze these things with code. I think it's really powerful because you can write some code that First look at these logs and find request IDs that are bad in some way and then go to Dial9 and filter out the Dial9. You you know the timestamps, you find the right Dial9 file and then you search that Dial9 file for the request ID. And then because request ID is probably only running on one Tokio task normally, you can then recreate the entire life cycle of that request ID in your Dial9 trace and figure out, aha, the problem was you know, it got stuck behind this task that had a very long poll and that's why this request that he was slow. Yeah, that does show the power of just operating on raw files, which I guess a huge flexibility that you do miss if you want to just have some kind of visual interface or something. And I guess that's also why you were kind of saying that, it's nice to have this dashboard and it's nice to see it, but it's definitely not going to be the main interface to really solve the difficult issues because... Yeah, most of the time you will probably need a more flexibility. Russell Cohen (Amazon)
45:31 | π
Yeah, or you need to aggregate bigger. Jess Izen (Amazon)
45:31 | π
I think there's probably a story here about wrapping into your Honeycomb and having the right visualizations to be wrapping timeline in. It's a larger problem. We're focused on getting sort of the internals and the CLI because it's a simpler problem. in the fullness of time, I imagine you will be able to have visualization that ties into your other visualization. Clearly, the writing on the wall here is that you want to drill into these things. We just, one problem at a time. yeah yeah of course and the community can of course also help with this like maybe some people are more like visually inclined and also they might have ideas like how to i don't know do that okay very cool so you already mentioned that you were working on this task stack unwinding and it's like a feature that's recently added to Tokio or there are some other features that you would like to see added or stabilized in Tokio and that meaningfully improve your capabilities. Russell Cohen (Amazon)
46:31 | π
Yeah, the biggest one, and there's no easy way to do this. I was lucky enough to chat with Alice at length about this at Tokio To get the most value out of Dial9, you really need to be running our Future. So our Future wraps your Future because that allows us to instrument the Waker so that we know exactly when the actual wake event happened so that we can measure scheduling And it allows us to capture a task dump for you. And well, those are really the two main things, but there are two main things that I think are quite important. As, β and sorry, the last thing, which is that right now poll start / poll stop callbacks, because they use dynamic dispatch on like a Box<dyn Fn> for applications that have a very large number of polls, it's actually simply having those callbacks is a performance, creates a lot of performance overhead that we wouldn't have if you basically ended up statically compiling your Future inside of our Future. So figuring out a way to have Tokio always the caller-provided Future in a given generic Future is kind of the holy grail, I think, for... for this sort of observability and. There are a lot of clever ideas that would make this tractable, but they're not very Rusty, if you know what I mean. β You know, for example, internally in Amazon where we have the luxury of being able to unilaterally patch all crates, we can simply patch Tokio to wrap your Future in this, you know, magic Dial9 Future, for example, but that does not work externally. There are crazy proc macro approaches where you late expand but... I suspect eventually someone will come up with a very clever idea to solve this problem because it's something that is generally needed for a lot of different use cases where you're trying to understand exactly what's happening with your tasks and with your Future. Yeah, I mean some language like Python I guess you would just monkey patch it but I guess you wish you could do that here and how I would do those things in in C or C++ I would like just provide my own definitions against the same kind of header files but of that's also not possible here and Rust is pretty tricky. I will be very curious to see what you come up with because Russell Cohen (Amazon)
48:33 | π
Yes, exactly. one thing you could say okay when you register the Future in β in your runtime at that point you could of course wrap it but not all Futures get registered because it's only for the I guess the the outer feature right which is the task okay Russell Cohen (Amazon)
49:06 | π
And we only need to wrap the outer Future to be clear. We really just need to wrap the call to tokio::spawn so that tokio::spawn at that moment it's wrapping it in some sort of Dial9 Future. Okay. That does make it more doable, I guess. I mean, should be possible somehow, no? To let your runtime somehow provide some kind of optional callback or something, I don't know. Russell Cohen (Amazon)
49:31 | π
Yeah, the problem you run into is that it is possible, but by the time you resolve all of the generics, you're like three boxes deep inside the future, which would technically work, but I suspect we'd kind of defeat the performance point. yeah, because of course you mentioned somewhere I forgot where it was mentioned, but seem to recall wrote somewhere, maybe it's in your blog post, I don't know where, but where it said you want Dial9 to be something that people can just enable in any product system and they don't have to worry about it. Like it shouldn't be something that explicitly enable. They just should configure it once and then it just lives But for that, of course, yeah, given how perform on the systems you are dealing with. I mean that's a high bar right? Russell Cohen (Amazon)
50:19 | π
Yeah, exactly. Yeah, very cool, very exciting. now these people do have this tool. yeah, especially those kinds of people, ones who actually are very good, like they have the tools and they could do it the hard way. Do they now also grab to Dial9 and... find ways to use it Russell Cohen (Amazon)
50:36 | π
Yeah, think Dial9 has allowed to actually, instead of having to simulate production, they can just look at it in production. For example, this is not a happy story or not a successful story, the original customer, the one with 16,000 operating-system threads, and they're also trying to run Rust at the same tried making the nice level on Rust much lower so that the kernel would prioritize it. And unfortunately, Dial9 showed that it did not help, which is not too surprising because of the way that the clock works inside of the kernel. But these sorts of things previously would be difficult even to validate. Now they can see like, "yep, we changed the nicest level. and it did not help the kernel scheduling delay all." It's still fairly early days with other services at Amazon, so I don't have a ton of real production stories yet, but... we're getting much closer to seeing people start to solve problems in production with them. Yeah, and I guess given how you also hook into all the other things I suspect that dial9 can be a tool in general Amazon even for non-res teams I suppose Russell Cohen (Amazon)
52:00 | π
Yeah, you could imagine that you build Python bindings to the, like, emission APIs or something and start emitting events that way. Yeah, exactly. Is that something that you're planning or just a wild idea? Russell Cohen (Amazon)
52:12 | π
Wild idea. This path has crossed my mind, but it's definitely not that high on the next set of things that I'm gonna try. Jess Izen (Amazon)
52:23 | π
I would expect that to probably come from a team that is using Rust and Python and has this use case and then, you know, we'll start shimming it in and we'll help them facilitate contributing that upstream and so forth. Yeah, for sure. And so, you mentioned, okay, only because now we are in this moment where LLMs can help us with these kind of things that we're not ourselves maybe good at or maybe we have no time for. And one example was dashboard. what else would be missing in Dial9 that now you actually can develop. Russell Cohen (Amazon)
52:55 | π
Yeah, definitely. I'll say a few things. Even the ability to analyze the trace files is something that absolutely people can write the code to do. No question. This is not complicated code, but this is exactly the kind of code that I would sit down and it would take me an hour to look at the docs for the trace file and assemble a stack trace and deduplicate the stack trace and whatever. And LLM is so good at it. It's very small. It's very scoped. They're not trying to refactor some complicated project. They're literally trying solve an interview question, basically, to analyze this trace file. And it's so powerful. The other thing as well, especially at the beginning, did allow me to... right? Once I had the idea in mind for exactly what shape I wanted it to the back end, the core to be, it was able to really accelerate process, especially in things want an integration test that spins up a local S3, runs a Dial9 trace, uploads all the files to local S3, downloads all the files from local S3, validates that every single event is there. You know, of course, yes, I could write that test. It would take me 20 minutes or you know, an LLM could write the test and it 20 seconds. I think because we had so much prior art that made it less of like not so much wandering in the design that was useful just to even with the back end as well. Okay, and then before we look a bit into the horizon, maybe we can like formally β understand a bit better like what the Dial9 code base exactly is because we already know, okay, you have your format for like how write these events and this and You have some kind of exporters. You have a dashboard. You this other tool to help you with some, I don't know, talks or manuals or prompts or whatever. suppose you have some utilities to make it easier to Dial9 into your code base. Is there some other critical component I'm missing in the Dial9 code base that people should like discover or that they might not Russell Cohen (Amazon)
55:03 | π
Yeah. Yeah, so I would say if you sort of start at the bottom, the format is probably the most... has a lot of sort of load bearing design decisions. The fact that it's self-describing, it's very compact, it's very fast to encode. It also allows us to have nanosecond precision timestamps, which is quite useful for correlation, obviously. And the fact that it's quite easy to One of that I think a lot of the other efforts... like JFR, the Java Flight Recorder format, or even the Go format are extremely complicated to parse and completely undocumented, which makes it really hard to have like an ecosystem of tools around it. From there, there's actually the buffers where all your data goes. So all the data in is going into a thread local buffer, which allows us to obviously record events without any concurrency. So you get the event. we encode it into its final binary format, like right then when we get it into the thread local buffer. And the problem with thread local buffers is, of course, how do you actually get the data out of the local buffer? So the thread local buffers, there's an epoch counter that ticks up, I think, about every 30 seconds, depending on how you have it And basically, when it ticks the next event that the buffer gets, it will actually flush the buffer to the central queue. But because these events are happening relatively rarely, It doesn't create a lot of contention. There's a little bit of other cleverness to also be able to flush buffers even when they aren't producing events, but eventually the buffers make it to a central queue where that eventually is going to disk. And then once it gets to disk, yeah, it just was mentioning. when we were any time that we are recording stack traces in Dial9, we're just recording addresses because that's basically all we have time to do at the moment that we're capturing them. So in the background thread, we look at all the addresses that you have and we turn those into symbols that when you look at a FlameGraph in Dial9, it's actually useful and understandable. And then from there, that goes to... S3 or wherever you have it configured. that's all basically all that we just discussed right there is both the core part is in dial line Tokio telemetry. The Linux stuff is in a crate called dial line per self profile. And then the format is in a crate dial9-trace-format, I think all in the same GitHub repo. And then the analysis stuff is all in the dial9-viewer crate. yeah, the format, that's Dial 9 trace format, I guess. That's your specification, basically, your data format. And then what is Dial9 macro about? Russell Cohen (Amazon)
58:05 | π
Yeah, exactly. yes. So we recently added a new way to onboard to Dial9 that's much simpler where you can basically just replace tokio::main with dial9::main effectively. And we will automatically turn on Dial9 for you and you can choose how to configure environment variables or however fits your use case. It's not, it is not adding a ton of code. It's just some boilerplate that is kind of annoying. to add so it makes it much easier to switch to it. okay right now we have Dial9 what's coming next for the two like like what are some features you are working on that you're most excited near term and yeah how do you prioritize like what's on the roadmap Russell Cohen (Amazon)
58:54 | π
the top thing on the run for me, recently a lot of of big AWS services has started using it. So I actually am actually integrating Shuttle, the deterministic simulation testing runtime Dial9 to try to really tease out any weird bugs. So correctness and getting it as stable as possible is my number one priority right now. To be clear I have no. reason to suspect that it is not stable, but I would much rather find these problems before our customers do. Separate from that, task dumps and memory profiling are probably the next two biggest customer-facing features. is working on getting metric events. So that's events from our wide event metrics library directly into your Dial9 traces. Jess Izen (Amazon)
59:42 | π
The idea of being path of least resistance should be whatever metadata you already have should go in the trace. And tracing is a fairly expensive way to do that. So Matric is much lower overhead. You don't have hash maps and so forth in the picture. Russell Cohen (Amazon)
59:54 | π
After that, I think we'll probably try to do tail sampling so that by default, Dial 9 is recording the data, but basically not doing anything And if in your app, there will be an application level API to basically say something bad has and then Dial9 will dump all of the data that it has in its buffer to S3 or whatever your output is. This is because general application usage, the main overhead is actually just simply the amount of data that Dial9 is producing. So being able to constrain how much data you're literally uploading to S3 is kind of main problem. And beyond that, trying to make the viewer have a better experience, especially when you're trying to use both AI to analyze your traces and use the viewer to analyze your traces and making those fit together better. Glen (Plabayo)
1:00:48 | π
The last one sounds like something you can just lose endless amount of I bet you will keep finding Now, given all the information that people have about Dial9, is there something that we didn't talk about yet that they should know? Russell Cohen (Amazon)
1:01:03 | π
Hmm. I think we covered a lot. We should probably, okay, here the, write down a couple. It would probably be good to talk about considerations for enabling Dial9 in production. That would probably be a useful topic. That's the main one. Do you have anything in mind, Jess? Jess Izen (Amazon)
1:01:25 | π
β well, I mean, if you're going to use tracing, don't show the AWS SDK traces. They're very noisy. Be careful about how much you instrument. Probably mentioning using Handle::spawn is important so you can capture wake times. Russell Cohen (Amazon)
1:01:40 | π
Yeah. Jess Izen (Amazon)
1:01:41 | π
You want to give the rep? Russell Cohen (Amazon)
1:01:43 | π
Yeah. So if you are considering enabling Dial9 in production, Obviously, the most important thing is to try it with your app outside of production or in production in a very limited environment before you do so. It does require using tokio_unstable. tokio_unstable refers to API instability and not runtime instability. So the main problem here is that consuming a new Tokio version could cause your code to stop compiling. Dial9 does increase the memory usage of your application because we have these thread local buffers that we're recording events into. So if you have a huge number of threads, all of which are producing Dial9 events, this is not this is not really how most Tokio based applications are structured, but it does Then Dial9 could end up using a lot more memory. Currently, there's a one megabyte buffer allocated for each thread, which is not crazy. But if you have 10,000 threads or something, it would probably be a problem. To get the most useful data out of Dial9 you do need to spawn via our SpawnWrapper, which allows us to wrap the Future for you. Other than that... I think those are the main main considerations Glen (Plabayo)
1:03:05 | π
Okay, let's say people are they can use it. Where would you guide them Like what would be the starting point? Russell Cohen (Amazon)
1:03:14 | π
Yeah, the README should actually be a great place to start. There are a few examples that show specifically how to enable it based on runtime configuration. So for checking some environment variable and then based on that environment variable, turning Dial9 on or not turning Dial9 on. would be the best place to start. And then once you have a trace file, which you can just push to disk. I would definitely take a look at the trace file and see what you're getting before you ship it, if possible. there's a number of small little configuration things like, "Aha, my perf event setting is not right, so I don't have any FlameGraphs". Or "this setting is not right, so symbolization didn't work", et cetera, cetera. And ironing out those issues locally is much easier than ironing them out in production. But the README is the best place to start. Glen (Plabayo)
1:04:07 | π
Okay, very So with that in mind, I thank you both your time. Like was very insightful. I came into this episode not knowing too much about Dial 9 said to myself I really need to play with it before this recording. Well, we know how these things go, that didn't happen. Russell Cohen (Amazon)
1:04:25 | π
Hahaha Glen (Plabayo)
1:04:27 | π
I'm still convinced it's a very useful tool and I really do want to start playing with it. I'm very excited for it the things you said to me about it and what I learned about it, some things upfront had a wrong conception about but now I understand it better. and so I really want to start using it so very grateful for the both of you to explain us where this tool came from, how you're using it, how maybe people did it before they had Dial9. I did also already watch your demo so that demo is definitely something I will link in the show notes. I'm also very much looking forward to seeing your lightning talk from TokioConf. as the first ever Tokio conference two weeks, but definitely more than a week, I So that's very cool. So thank you, both of you. And I wish you a lot of luck with the continued development of Dial9 and all the other exciting stuff that you both end up doing at Amazon. Russell Cohen (Amazon)
1:05:25 | π
Thank you so much. Jess Izen (Amazon)
1:05:26 | π
Yeah, thanks for having us. Let us know what you think. PRs are very welcome. Elizabeth (Plabayo)
1:05:31 | π
Netstack.fm is brought to you by Rama, an open source framework for moving and transforming network packets. Rama is built and maintained by Plabayo a company focused on secure, open, and resilient infrastructure with rust, protocols, and purpose. The theme music of this podcast was composed by DJ Mailbox. For more conversations like this, subscribe so you don't miss what's coming next. And if you know someone who could benefit from this episode, share it with them. They might appreciate have experience in protocols, networking, or infrastructure, and want to share your work, your ideas, or experience, we would love to hear from you. Reach out at hello@netstack.fm. Thank you for being here. See you next time for the next handshake.