AMQP and Twisted

In one of my pet projects I’ve been using Twisted and txamqp. I use Twisted’s twistd to launch the service but unfortunately txamqp doesn’t come with any examples of how to use it with twistd. So I wrote a factory and protocol which makes it trivial to use txamqp with twistd. I haven’t tested it extensively but it appears to survive losing the connection to the AMQP server and reconnecting. I’m far from a Twisted expert though so please let me know if this isn’t the way it supposed to be done.

You can find the code in my Git repository.

ejabberd default permissions

I upgraded my ejabberd to 2.1.0-rc1 today and while doing so decided to start with a fresh ejabberd.cfg. This reminded me of something I noticed when I first switched to ejabberd but forgot to blog about. The default permissions in ejabberd are a bit surprising.

Before I go into details, I’m not arguing any of these problems are the end of the world but I think it would make lot of sense for ejabberd to ship with a safer configuration and allow administrators to open things up if desired.

MUC permissions

The default MUC (XEP-0045) access list is:

{access, muc, [{allow, all}]}.

This access list allows all JIDs, even those on remote servers. The default MUC configuration uses this access list for all operations.

{mod_muc [
    %%{host, "conference.@HOST@"},
    {access, muc},
    {access_create, muc},
    {access_persistent, muc},
    {access_admin, muc_admin}
]},

As a result, the default configuration allows users on other XMPP servers to create rooms on the local MUC server. Probably not that big of a deal but I don’t see a good reason why my server should be hosting channels for external users. Worse, would I be responsible if the channel was used for some nefarious purpose?

I created a new access list which only allows local JIDs and used this access list for access_create and access_persistent.

{access, muc_create, [{allow, local}]}.
{mod_muc [
    %%{host, "conference.@HOST@"},
    {access, muc},
    {access_create, muc_create},
    {access_persistent, muc_create},
    {access_admin, muc_admin}
]},

Pubsub permissions

The default Pubsub (XEP-0060) permissions are:

{access, pubsub_createnode, [{allow, all}]}.

Again, this allows all JIDs, even remote ones to create nodes on the Pubsub server. I changed this to the following.

{access, pubsub_createnode, [{allow, local}]}.

In-band registration

This really amazes me. In-band registrations (XEP-0077) allows users to connect to an XMPP server and create new accounts. This is enabled in the default configuration that ships with ejabberd.

{access, register, [{allow, all}]}.

I wonder how many ejabberd servers there are with unexpected users?

The solution is documented immediately above the register access list definition.

{access, register, [{deny, all}]}.

Create your own economy

I just finished reading Create Your Own Economy by Tyler Cowen. The overall theme of the book revolves around Autistic thinking and framing effects. The author posits that autistic thinking has benefits that we all can learn from. The discussion of framing effects is less coherently spread throughout the book but suggests that people can decide what is important and improve their lives by choosing how to look at the world.

While I found the Autistic meme to be stretched, there are many valuable insights into Internet communications, economics and psychology. There is also a strong defence of modern bite sized culture which is really worth thinking about if you pine for the glory days of traditional culture.

Linux SFQ experimentation

I’ve been doing some more experimentation with Linux QoS configurations using my ping-exp utility. Today I noticed that whenever I add a SFQ to the configuration there are large latency spikes. After a bit of digging it appears that these spikes happen when the SFQ changes its flow hash. This occurs every perturb interval as configured when the SFQ is created.

Below are the results from a couple experiments which show this behavior. For both experiments I had two outbound ping floods of MTU sized packets. This saturated the outbound link. The experiment itself pinged three other hosts. I made sure to use four distinct hosts (one for flood, three for the experiment) to avoid collisions in the SFQs flow hash.

The PNGs below are not ideal for detailed inspection of the graphs. However, you can also download the data files from the experiment and load them using ping-exp. This allows zooming in on the graph. See the links at the end.

HTB SFQ limit 10 perturb 5

HTB SFQ limit 10 perturb 5

The above graph is based on an experiment where the perturb value was set to five seconds. Although the large latency spikes do not occur at every five second interval, when they do occur they are on the five second grid.

HTB SFQ limit 10 perturb 20

HTB SFQ limit 10 perturb 20

The second experiment used a perturb time of twenty seconds. Again, the latency spikes do not occur every twenty seconds but they do occur on the twenty second grid.

During the experiment I ran a packet capture to make sure there wasn’t any activity that might skew the results. The amount of captured traffic was very small.

The network I performed this experiment on consists of a P3-450 Linux gateway where the QoS configuration is applied to the ppp0 device. The kernel version is 2.6.27.24-170.2.68.fc10.i686. A host behind the gateway was used to generate the ping floods and run ping-exp.

Configuration and data files

HTB SFQ limit 10 perturb 5 script

HTB SFQ limit 10 perturb 5 ping-exp data file

HTB SFQ limit 10 perturb 20 script

HTB SFQ limit 10 perturb 20 ping-exp data file

Some infrastructure links for Canada 3.0

Tomorrow the Canada 3.0 conference starts. Since I am attending the infrastructure track I thought it might be useful to collect a bunch of links relating to the Internet as infrastructure.

http://www.linuxjournal.com/content/why-internet-infrastructure-need-be-fields-study

http://hakpaksak.wordpress.com/2008/09/22/the-etymology-of-infrastructure-and-the-infrastructure-of-the-internet/

http://lafayetteprofiber.com/FactCheck/OpenSystems.html

http://news.cnet.com/Fixing-our-fraying-Internet-infrastructure/2010-1034_3-6212819.html

http://www.interesting-people.org/archives/interesting-people/200904/msg00168.html

http://www.interesting-people.org/archives/interesting-people/200904/msg00175.html

http://cis471.blogspot.com/2009/04/why-is-connectivty-in-stockholm-so-much.html

http://www.linuxjournal.com/xstatic/suitwatch/2006/suitwatch19.html

http://publius.cc/2008/05/16/doc-searls-framing-the-net

http://free-fiber-to-the-home.blogspot.com/

http://communityfiber.org/cringely.html

http://www.linuxjournal.com/article/10033

ping-exp: Ping experiment utility

Recently I’ve been playing with Linux’s QoS features in order to make my home Internet service a little better. Since I’m primarily interested in latency I used ping to benchmark the various configurations. This works reasonably well but it quickly becomes hard to compare the results.

So I decided to build a tool to perform several ping experiments, store the results and graph them. The result of this work is ping-exp.

At present ping-exp can vary the destination host name as well as the TOS field. The interval between pings and total number of pings is globally configurable. The results can be written to a file to be loaded later, output to a PNG or both. Line and scatter plots are supported. When not writing the image to a file ping-exp displays the graph using Matplotlib’s default graph viewer. This allows zooming in on interesting parts of the graph. In the future I’d like to add the ability to specify the ping packet size.

As an aside, Python and Matplotlib make this kind of stuff so much fun.

Below are a few graphs created by ping-exp.

ping-exp example #1

ping-exp example #1

ping-exp example #2

ping-exp example #2

ping-exp example #3

ping-exp example #3

Linux/Fedora PPPoE problems and solutions

This weekend I’ve been doing some network experimentation on my little DSL connection. I’ve learned a couple of things the hard way so I figured a quick blog post is in order in the hopes that it will save someone else time.

PPP interface errors

Over the last while my Internet connection has been a little slow. I noticed that there were occasionally packet drops but I didn’t take the time to figure out where they were occurring. The testing I was doing this weekend was very sensitive to packet loss so I had to get to the bottom of this.

There were two symptoms. The first was a bunch of log entries like the following.

Apr 19 12:03:21 titan pppoe[26690]: Bad TCP checksum 109c
Apr 19 12:10:35 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:10:35 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:10:36 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:10:36 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:24:50 titan pppoe[26690]: Bad TCP checksum 3821
Apr 19 12:31:54 titan pppoe[26690]: Bad TCP checksum 9aeb
Apr 19 12:33:22 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:33:49 titan pppd[26689]: Protocol-Reject for unsupported protocol 0xb00
Apr 19 12:33:57 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x2fe5
Apr 19 12:33:58 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:01 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:02 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:12 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x58e6
Apr 19 12:34:14 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:17 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:27 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:29 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:30 titan pppd[26689]: Protocol-Reject for unsupported protocol 0xb00
Apr 19 12:34:31 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x800
Apr 19 12:34:33 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:36 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x7768

The bad TCP checksum entries hinted at some kind of packet corruption. However, I didn’t know if this was coming from packets being transmitted or received. Since I don’t know the inner workings of PPP as well as I’d like, the Protocol-Reject messages were harder to get a handle on. I grabbed a capture on the Ethernet interface underlying ppp0 so I could look at the PPP messages in Wireshark.

PPP Unknown protocol

Suspect PPP message

My PPPoE client sent a message with the protocol field set to 0. Wireshark doesn’t know what 0 is supposed to mean.

PPP reject

PPP rejection message

And the remote PPPoE device is sending a message back rejecting the transmitted message. And it’s even nice enough to return the entire payload thereby wasting download bandwidth as well. From this packet capture I became pretty confident that the problem was on my end not the ISP’s. After this I wasted a bunch of time playing around with the clamp TCP MSS PPP option because the data size in the above messages (1412) matched clamp TCP MSS setting in my PPP interface configuration file.

The second symptom was a large number of receive errors on the ppp0 interface – the underlying Ethernet interface did not have any errors. Opposite to the PPP errors above, the receive errors made it look like the problem was in the PPP messages being received by my PPPoE client.

After several unsuccessful theories I finally figured out what the problem was. The PPPoE implementation on Linux has two modes: synchronous and asynchronous. Synchronous mode uses less CPU but requires a fast computer. I guess the P3-450 that I use as a gateway doesn’t qualify as fast because as soon as I switched to the asyncronous mode all of the errors went away.

Fixing the problem was good but this still didn’t make sense to me because I’ve been using this computer as a gateway for years. Then I discovered this Fedora bug. It turns out that Fedora 10 shipped with a version of system-config-network which contained a bug that defaulted all PPPoE connections to synchronous mode. This bug has since been fixed and pushed out to all Fedora users but that didn’t fix the problem for me because the PPP connection configuration was already generated.

In summary, this was a real pain but I did learn more about PPP than I’ve ever had reason to in the past.

Dropping PPP connections

Some of the experimentation I’ve been doing this weekend required completely congesting the upload channel of my DSL connection. I don’t just mean a bunch of TCP uploads; this doesn’t cause any problems. What I was doing is running three copies of the following.

ping -f -s 1450 alpha.coverfire.com

This generates significantly more traffic than my little 768Kbps upload channel can handle. During these tests I noticed that occasionally the PPPoE connection would die and reconnect. Examples of the log entries associated with these events are below.

Apr 19 20:02:31 titan pppd[15627]: No response to 3 echo-requests
Apr 19 20:02:31 titan pppd[15627]: Serial link appears to be disconnected.

Since I had already been looking at PPP packet captures in Wireshark I recognized the following.

PPP echo

PPP echo

It appears that too much upload traffic causes enough congestion that the PPP echos fail and the PPP connection is dropped after a timeout. I would have thought the PPP daemon would prioritize something like this over upper layer packets but nevertheless this appears to be the case. For the purposes of my testing this problem was easy to avoid by modifying the following lines in /etc/sysconfig/network-scripts/ifcfg-INTERFACE. I increased the failure count from 3 to 10.

LCP_FAILURE=10
LCP_INTERVAL=20

A little IPv6 experiment

I’ve been running IPv6 on my home network for a while now. Since my provider doesn’t provide native IPv6 all external traffic occurs via 6to4.

Last week I setup 6to4 on my server which lives inside a local ISP’s colocation facilities. This provided IPv6 connectivity between my home network and the server. The only changes required were a couple of ACL modifications and configuring sendmail to listen on an IPv6 socket. Sadly I did discover that ejabberd cannot listen to both IPv4 and IPv6 addresses on the same port.

For a little experiment I decided to add an AAAA record to www.coverfire.com and see how much IPv6 traffic arrives. I know that the IPv6 Internet is vastly smaller than the IPv4 Internet so I didn’t expect a huge amount of traffic. In order to analyze whatever traffic arrived I captured all IPv6 port 80 traffic for the duration of the experiment.

The results of this experiment were disappointing. Over about 1.5 days there were only five IPv6 hosts which visited the site. One of the five hosts wasn’t even able to establish a TCP connection. From the capture file it looks like the ACKs from my server never arrived at the remote host.  Of the five addresses, four were 6to4 addresses. I found this a little surprising. Also interesting is the fact that there was no traffic from Teredo hosts.

A more interesting question is whether or not adding an AAAA record has caused troubles for people visiting the site via IPv4. See this article for one of the reasons why AAAA records can cause IPv4 users trouble.

For anyone who is interested I have uploaded the quick Python hack I used to analyze the capture file.

ip6.py