One of the many challenges of testing in the telecoms sector is the threat of “The Killer Packet”.
What is this monster of which I speak? It’s the danger of something or someone sending some kind of IP packet over the network, either innocently or maliciously that causes your application to crash.
This problem isn’t unique to telecoms but it’s certainly something we have to worry about. There are two factors which combine to make this such an issue
In order to do anything useful, your telecoms software is often exposed to the public internet. Yes, there’s probably some kind of firewall or Session Border Controller but they typically won’t do much to police the internal contents of the packets being sent to you.
One fairly innocent source of killer packets is other devices trying to inter-operate with your application (and these may even be in a trusted network rather than out on the internet). Someone could have a really cheap, nasty, buggy SIP soft phone that’s sending your system all kinds of nonsense!
Another potential source of killer packets is malicious hackers. It’s not rocket science to set up a call and then deliberately change some of the signalling or media packets.
The real curse of the killer packet though is in the potential consequences. You might have a server hosting hundreds of thousands of subscribers. If one nasty packet causes your system to crash that could impact service for a whole city.
One of the particularly problematic things about killer packets is that redundant systems don’t necessarily protect you. Most carrier grade telecoms equipment have either 1+1 or N+1 redundancy. But if someone tries to make a call and that packet causes your primary server to crash – what happens next? The call doesn’t go through so the user immediately tries to make the same call again. This time it hits your backup server and BANG now you’ve got a total outage while both servers restart.
What does this mean for testing?
In order to answer that we first need to ask what all this means for our software architecture. You really want to design software that isn’t vulnerable to this kind of crash in the first place. For example, it’s preferable to handle signalling traffic in a context where if a software error is hit, the exception is caught and only that single call is affected rather than the whole server going down.
There’s clearly lots more to say about testing in this context than I can fit in a single post, but here’s a few thoughts for starters.
Don’t assume anything about external interfaces.
If an interface is externally visible then you will probably receive rubbish on it. Make sure you test for this as well as the nice packets you were expecting to receive. Fuzzing is a great technique to use for this. This is where you start with a reasonable looking set of packets and then randomly change parts of them to try and upset the system. Most fuzzing tools do this intelligently rather than purely randomly to maximise their chances of finding an issue. Unfortunately these smart fuzzing tools can be hard to come by for some of the less mainstream telecoms protocols.
Don’t just fix the bugs – fix the underlying vulnerability
If you do manage to find a bug where a particular packet can crash the whole system, once you’ve taken a moment to celebrate your testing success, make sure that as well as fixing the bug, you also tackle the underlying structure of the code that allowed this particular bug to be hit. You have no chance of finding all the packets that might cause a problem so you have to use the ones you do find to make wide ranging fixes, not just sticking plasters for one particular symptom.
Build testability into your code
Over its lifetime your software will likely receive packets with many orders of magnitude more variety than you can simulate in the test lab. As well as trying to expose bugs using fuzzing, it can be useful to build hooks into the product code that allow you to deliberately trigger errors. For example, you could add code to trigger an assert or exception in some packet parsing code which will allows you to check that this doesn’t have a detrimental impact on the wider system.
Happy hunting! I’d love to hear any Killer Packet stories you have.