I’ve been doing some more experimentation with Linux QoS configurations using my ping-exp utility. Today I noticed that whenever I add a SFQ to the configuration there are large latency spikes. After a bit of digging it appears that these spikes happen when the SFQ changes its flow hash. This occurs every perturb interval as configured when the SFQ is created.
Below are the results from a couple experiments which show this behavior. For both experiments I had two outbound ping floods of MTU sized packets. This saturated the outbound link. The experiment itself pinged three other hosts. I made sure to use four distinct hosts (one for flood, three for the experiment) to avoid collisions in the SFQs flow hash.
The PNGs below are not ideal for detailed inspection of the graphs. However, you can also download the data files from the experiment and load them using ping-exp. This allows zooming in on the graph. See the links at the end.
The above graph is based on an experiment where the perturb value was set to five seconds. Although the large latency spikes do not occur at every five second interval, when they do occur they are on the five second grid.
The second experiment used a perturb time of twenty seconds. Again, the latency spikes do not occur every twenty seconds but they do occur on the twenty second grid.
During the experiment I ran a packet capture to make sure there wasn’t any activity that might skew the results. The amount of captured traffic was very small.
The network I performed this experiment on consists of a P3-450 Linux gateway where the QoS configuration is applied to the ppp0 device. The kernel version is 2.6.27.24-170.2.68.fc10.i686. A host behind the gateway was used to generate the ping floods and run ping-exp.
Configuration and data files
HTB SFQ limit 10 perturb 5 script
HTB SFQ limit 10 perturb 5 ping-exp data file
Any thoughts on an optimal perturb time? Or perhaps pfifo with a lower packet limit is the best solution if we want to avoid large spikes in latency?
I’m currently working to compile a custom firmware for WRT54GL routers, using some suggestions you have tested to reduce bufferbloat.
Any input would be helpful.
I never figured out the source of those spikes for certain but I’m pretty sure it wasn’t due to the size of the queue. There’s been some fixes to SFQ in recent kernels. I should try running these tests again to see if it is fixed.
I doubt there is an optimal perturb time but if the latency spikes I observed have been fixed then there probably isn’t much downside to a short interval.
Hmmm. Yeah, today I spent a lot of time working with the Tomato 2.6.22 kernel base, and it seems back-porting newer kernel schedulers is a waste of time.
I gave pfifo a shot at back-porting, and even that depends on pervasive kernel changes.
DD-WRT and OpenWRT use 2.6.24/25 kernels, but those are that different from the 2.6.22 base I have.
Sooooo, with that in mind, all I can really do is tweak the perturb and limits.
I’ve set perturb to 15 seconds, and limits to 2 for sfq. That generally works out well for 1Mbit or less uplinks.
I would appreciate a test to see if the newer SFQ fixes the spikes.
It may just be the case that these little SOHO routers are just not the best thing to work with for scheduling, since they are often stuck on old kernels. It would take significant work to update their kernels.
The best solution it seems is some type of small linux computer (wall plug server) that can easily be updated with newer versions.
I use an old PC. Kinda power hungry but it makes it a lot easier to test new things.
I’ve been having some PMTU troubles with newer kernels and IPIP tunnels. Once that’s figured out I’ll hopefully do some more experimentation.
Related to the above post, if you are interested in where packets can be queued in the Linux kernel take a look at the below URL.
http://www.coverfire.com/articles/queueing-in-the-linux-network-stack/