Recently I’ve been playing around with the Linux flow classifier on my gateway. The flow classifier provides the ability to group network flows by configuring which parts of the packet headers (referred to as keys) are used in a hash calculation which chooses the output queue.
All of my Internet traffic travels over an IPIP tunnel to another Linux box. I do this so I have control of the QoS in both the upstream and the downstream. A result of this configuration is that from the perspective of the output interface there is only a single network flow.
I configured the flow classifier to use the src,dst,proto,proto-src,proto-dst keys which aims to provide 5-tuple flow fairness. Here’s the simple tc script I used. Due to the IPIP tunnel I expected to see that all traffic would be placed into the same queue. Strangely, the below is what my little ping-exp utility showed when running at the same time as an SCP upload.
Coincidentally I ran ping-exp configured to send three different streams of ICMP traffic with different IP TOS values. Note that SCP automatically sets the IP TOS to the equivalent of the “Low” stream in the test.
Notice that the pings using the high and default TOS values appear to be unaffected by low priority ping and SCP traffic. This was unexpected because none of src,dst,proto,proto-src or proto-dst keys should be affected by the TOS value.
After a bit of experimentation I determined that the proto-dst key was the source of the problem. If you spend a bit of time with the flow_get_proto_dst() function in cls_flow.c you’ll see that if the protocol is ICMP or IPIP, as it is in my test, then the following value is returned:
return addr_fold(skb_dst(skb)) ^ (__force u16)skb->protocol;
skb_dst() returns a pointer to a dst_entry structure. Since Linux maintains separate dst_entry structures for each destination,TOS pair the source of the unexpected behaviour is obvious.
I’m not knowledgeable enough about the Linux network stack to be certain but I don’t see any value in returning a value for proto-dst which is random with respect to the actual traffic on the wire. At the very least this is not intuitive behaviour.
If you look at flow_get_proto_src() you’ll see something similar:
return addr_fold(skb->sk);
In this case a pointer to the local socket structure is used as a fallback. Again, this has no relation to the actual packets on the wire and if the packet does not originate at the local machine then no socket exists which causes this value to be zero anyway.
It seems to me that the most intuitive behaviour would be to have the proto-src and proto-dst keys return zero when they are applied to traffic that doesn’t have the notion of transport layer ports.
I’ll post to Netdev about this and see what the kernel devs have to say.
Related to this, I have a patch to the flow classifier that adds tunnel awareness which I plan post to Netdev this weekend as well.
Here is the URL for the related Netdev thread. As of this writing there haven’t been any replies.
http://www.spinics.net/lists/netdev/msg177054.html
The flow classifier patch I mentioned at the end of this article can be found at:
http://www.coverfire.com/archives/2011/10/16/making-the-linux-flow-classifier-tunnel-aware/