Mostly Human Show | Episode 1

Oleksii_ · ‎12-06-2025

In this first episode of the Mostly Human Show, Shereen Bellamy and Adrian Iliesiu explore the intersection of AI and networking, particularly focusing on fine-tuning language models for networking tasks. They discuss how to leverage open-source models to enhance network management and troubleshooting processes.

Key Takeaways

Model Sourcing You can find and download various language models from HuggingFace.co, which serves as a hub for open-source AI models.
Fine-Tuning Fine-tuning smaller models tailored to specific networking tasks can significantly enhance performance compared to general models.
Data Preparation Quality data is crucial; focus on gathering structured examples and cleaning your data to ensure effective model training.
Iterative Process Model training is an iterative process that requires testing, evaluating, and fine-tuning to achieve optimal results.

Video transcript (click here):

Hey, everyone, and welcome to the first episode of the Mostly Human show, where we talk about how AI can be applied in the networking world. My name is Shereen Bellamy. I'm an AI Developer Advocate at Cisco, and joining me... Hey, everyone. My name is Adrian Iliesiu. I'm also a Developer Advocate here at Cisco. And it is Friday. We've got these nice fall backgrounds, both of us. And in this episode, we're actually going to talk about something that's kind of bridging both our worlds, Shereen, right? It's about fine-tuning language models for networking. Exactly. And a quick intro on myself. I come from the AI, ML side of things. I've been working with LLMs for a while, but I'm still relatively new to the networking world. Cisco Syntax still gives me a little bit of a headache sometimes. On the other side, I'm the networking person. I've been doing networking for many, many years, as you might be able to see. And I work with everything from Cisco to Juniper to Citrix to F5s to load balancers to cloud. But with the LLMs, right, I'm kind of getting started. So we're just starting to dive deeper this year. So Shereen, what about you? Yeah, we're literally figuring this out together, kind of bridging both of our expertises and sharing it with you all. But with no polished demos, just real conversation about what works and what doesn't. Exactly. So hopefully by the end, right, you will see that actually AI is useful for network folks and you'll be able to apply what we're talking about in your day-to-day job. Okay. I'm going to start with a question I get asked by most folks on the networking side. Where do you even find these models? Do you just download them? Yeah, that was actually my question too. I thought this stuff was all locked down behind open AI APIs, but is that not the case? It's not. You know, even though GPT-4 is technically like that, the coolest thing about the open source LLM movement is that there's this whole ecosystem of models that you can literally download and run yourself. And the main hub that you can use to do that is HuggingFace.co. And you can think of that as like GitHub, but for AI models. That's cool. I actually set up an account last month and it was pretty easy and they had like 2 million large language models for all I've seen, so there's like a lot of them. Yeah, and they have options like Quen from Alibaba, Lama from Meta, Mistral, a bunch on there. For today, I feel like we should just focus on like the smaller models for our conversation and that would look like something around 7 billion parameters or less. Okay, so explain please a bit of parameters with these models for network engineering folks. Because I always thought that, you know, more parameters means the model is better. Is that not the case? Well, you would think so, but not always. So you can think of parameters like, I don't know, like the size of maybe a routing table. Okay, so I like that analogy. So basically a bigger table has more routes, but takes more memory and processing, right? Exactly. So if you look at something like a 7 billion parameter model, which is way smaller than GPT-4, which is probably over like a trillion and now it's GPT-5, who knows more than that, growing every day, but it's way more manageable. You can run a 7 billion parameter model on a single GPU locally and you can train it to understand specialized tasks like understanding Cisco configs, right, for your networking project. And with that kind of focus, that's how it gains the traction to eventually maybe outperform the giants. That makes sense. It's actually specialized knowledge versus general knowledge, right? You want to be focused on a specific domain. Yeah, yeah. And then again, to get started on that, it just starts with going to this website, HuggingFace.co and then you just choose a model and then start training on that because they're all free and they're all open source. So that just blows my mind. Is that easy, right? So let me explain why I even got interested in this, right? In my world, we're drowning in travel tickets. Every day, hundreds of them, interface down, BGP neighbor flapping, can't reach VLAN 50, right? It's all that fun stuff. Yeah. It sounds painful. It is. And here's the thing, right? Our junior engineers spend hours just triaging these, figuring out which ones are urgent, which are duplicates, which device logs to pull. I thought there's got to be a better way. And then you went to chat GPT, right? You tried that first? Yes, I did. And it's okay for general stuff. But when I paste in a Cisco error message or a show command output, right? It hallucinates most of the times. So it'll suggest commands that don't exist, it'll misinterpret SNMP traps, or it'll suggest configuration changes that don't make sense. Yeah. And the reason for that is like we were saying before, you know, these general models like GPT, they weren't trained on networking specific data. So they've seen some Cisco configs probably in their training, but it's a very small fraction of their knowledge. Exactly, right. And in networking, you can't have 80% correct, that 20% is going to bring down your production network instantaneously. Yeah. And then that is where the fine tuning comes in. So you take a smaller model like this QUEN with 7 billion parameters, and then you train it specifically on your data. So your tickets, your configs, your general environment. Which, you know, sounds great in theory, but have you done this before actually? Actually, I have. So I've done this fine tuning across various different domains. I've done this in customer support, with legal docs, co-generation, but with networking, like I said earlier, it's, you know, it's I'm new at this. And that's why we're doing this here today together, right? Oh, definitely. So we're learning together right here. Okay, so walk me through it. I want a model that understands Cisco configs and can help with ticket triaging. Where do I start? Well, the first thing that you're going to want to do is to train your data. Like what, just dump all our configs into it? No, not exactly. So what you're going to want to do is you want structured examples. And what that would look like is something like instruction response format. So for your use case, it might look something like the instruction would be analyzed as BGP configuration and identify whatever issues. Then after that, you're going to want your input. And the input would be the actual router config. And then that, then after that, you'd want the response. So the response would be what's configured correctly, potential problems, and potential recommendations. Okay, so I would actually need to create these pairs. Where do I get them? That would depend on what you have access to. Well, we've got years of historical trouble tickets with resolutions, right? We have sanitized device configs, we have command outputs, we have internal documentation on our network standards. Would that be it? Well, yeah, I think so. I mean, it sounds perfect. You'd want to pull from all of it, really. The key is quality over quantity. So like, if you have a ticket where the resolution was wrong, you wouldn't want to include that. Yeah, we had some of those. And yeah, some did not get resolved necessarily right away. So how much data are we talking about here, Shereen? Well, for decent fine tune, we're talking maybe like a few hundred high quality examples, ideally in the thousands, but even a couple hundred can make a really big difference if they're really good. Okay. And I'm guessing I need to clean that data, right? Like strip out customer names, sensitive IPs, credentials, that type of stuff. Absolutely. Like data cleaning is huge. You're also going to want to remove duplicates, fix any like formatting issues and make sure that all of your examples are consistent. So building this data, this input data sounds pretty tedious. It is, but ultimately what you're going to get is a really good model. It's the fine line between a model that works and one that doesn't. All right. So I've got my data cleaned and formatted. Now what? Do I need to buy some crazy expensive hardware? No, not necessarily. You have a bunch of options nowadays. So if we're focusing on like a 7 billion parameter model, you can run it locally on a decent GPU, like a Nvidia RTX 4090. You can also run it on the cloud on something like GCP or AWS. We've actually got some GPU servers on-prem. Would that work? It could work if it has those requirements, like a Nvidia GPU with at least 16 GB of VRAM, then yeah. I don't see why not. Okay. Okay. So I can make that happen. So once I have the hardware, what next? Well once you have the hardware, then you'd have to look at what training framework that you want to use for the training. The Hugging Face Transformers are actually really popular already, so you don't have to stray too far from the site that you already got your model from. They can handle all the complicated stuff like loading the model and processing the data in the full training loop. Is that why I need to know a lot of Python for this? Well again, the Hugging Face Transformer handles most of it for you, so you just, I mean, basic Python helps. But honestly, yeah. There's so many tutorials and scripts already out that you can just follow along that and you're fine. Okay. Okay. That's less intimidating than I thought. Yeah. The hardest part is truly the data loop. Once that's done, the actual training is pretty automated. So you're basically telling the framework, here's my base model. Here's my data that's already cleaned. Now please go train it and then you'll test it later. Okay. So how long does training take usually for these things? Depends on your data size and your hardware. So again, you know, like if you're doing something relatively small, like just a couple hundred on a single GPU, maybe a few hours. But if you're looking at like the thousands, could be overnight, could be a couple of days. It depends on your setup. Gotcha. So what are the gotchas? What should I watch out for? Oh man, where do I start? I mean, I would say the first one is overfitting, especially if you're new at like the process of training these models. This is basically when your model memorizes your training examples instead of learning the general patterns that it's supposed to. How do you avoid that? Well, in order to do that, you'd have to hold some data out for validation. That way the model never sees it during its training. And when you do that, then you watch the metrics and you make sure that your training performance keeps getting better, but the validation gets worse. Actually, sorry, I misspoke. The training performance keeps getting better and the validation gets worse. Then you're an overfit. If it's the opposite, then you're not. That makes sense. What else? Well, hyperparameters would be another one. Things like the learning rate, the batch size, the number of epochs. These are some things that control how the training works, and honestly, it can make or break your model. Okay, so you have all these levers and parameters that you can adjust. So it's like tuning OSPF metrics basically, right? There's no one-size-fits-all. Yeah, I guess like that. Could you remind me what OSPF metrics are? Oh yes. So OSPF is basically open shortest path first. It is a routing protocol that's used by routers to advertise and learn routes. So think of it of just a way of a router advertising to their neighbors what routes they can reach so that they can build routing tables. Okay, yeah. So I guess, yeah, in that case, it is exactly like that. You can start with your recommended defaults and then you can experiment. It's no one-size-fits-all. Also, different models can behave differently. So if you're going to use QUEN, it might need different settings than if you decided to go with LLAMA. And have you found QUEN to be good for this kind of structured data? From what I've seen, yeah, it has been. It tends to be really good with technical content and structured formats, but honestly, you might want to test a few. There are actually ways that you can do that on the Hugging Face website. QUEN, LLAMA, Mistral, you can see which is the best for Cisco configs or whatever your use case may be. That's actually my plan. I want to run some experiments with these different models, see which one performs better. Great. And then, you know, when you're running your experiments, make sure to document your experience. That way you can come back and we can all learn from it. Will do, for sure. Okay, last question. Once I fine-tune a model, how do I know if it's actually good? You know if it's good by testing it. So you have to hold a test containing the real set, like your real test set of examples that it's never seen, and then you run through them manually and evaluate the outputs. So for my ticket triage use case, I give it tickets and see if it categorizes them correctly, right? Yeah. And then after you decide on that, you compare it to what your team would actually do. So you want to track the accuracy and then see where it falls. What if it's not good enough? If it's not good enough, then you're going to need to iterate or run more cycles with more data, try out different models, tune your hyperparameters. It's not like just one process and you're done. It's not a one-to-one scenario. All right. I see it's going to be a lot of work, but I'm really excited to try this out now. What did you learn that you're ready to implement? Yes. Yes. Let's go over it. So first I'm going to download different large language models with 7 billion parameters or less from HuggingFace.co, right? Those will be my base parameters, my base models. Next I'm going to sanitize and prepare the input data that contains my Cisco information, my tickets, all my configuration of the devices. I've already got an on-prem server with the GPU in it, ready to go. Then I'm going to use the HuggingFace transformer framework to bring everything together and start training my custom fine-tuned data model for my ticket triage use case. Right. Exactly. So that's all the steps that you're going to want to do when focusing on training your data, but we also can't forget to watch out for those issues that we said might arise like overfitting, right? That's a really, really common obstacle. I can't stress that enough. In order to avoid it, you want to focus on looking at your hyperparameters when you're fine-tuning. They're very, very important. Remember that iteration is really important to do

Mostly Human Show | Episode 1

Key Takeaways

Exploring Cisco's Machine Learning-Enhanced Solutions

Introducing Python and Guest Shell on IOS-XE 16.5

Network Programmability Foundations in DEVNET Zone at Cisco Live