Mobile networks, like most others, rely on transmission control protocol (TCP) for data transfer. But this decision often produces poor results. TCP is a protocol meant for stable networks -- and all too often, mobile is anything but stable, with unreliable and often bottlenecked connections that conflict with TCP methodologies like slow starts on transfers.
But while TCP looks inevitable from some angles, outperforming TCP is actually not as difficult as it looks. The problem is certainly complex, when many moving pieces are involved. But if we can break the problem down into digestible chunks and apply observations from the real world, improvement becomes achievable.
To explain, let's start with a picture of how the internet works today. In simple terms, the internet is made up of links of different capacities, connected by routers. The goal is to achieve maximum possible throughput in this system. However, the links all possess different capacities, and we don't know which will present bottlenecks to bandwidth ahead of time.
I often visualize this system as a freeway. Imagine we have to move trucks from a start to end point. At the start point, the freeway has six lanes -- but at the end, it has only two lanes. When the trucks are received at the end, an "acknowledgement" is sent back (think of this as a motorcycle messenger if, like me, you have a penchant for two-wheeled transportation). Each acknowledgement improves the knowledge of road conditions, allowing trucks to be sent more efficiently.
The goal is for the maximum number of trucks to reach the destination as fast as possible. You could, of course, send six lanes worth of trucks to start -- but if the four extra lanes suddenly end, the extra trucks must be taken off the road and resent. Given the unknown road conditions at the start of the journey, TCP Shipping, Inc. sensibly starts out with just one lane worth of trucks.
This is the "slow start" phase of TCP, and it's one of the protocol's unsolved problems in mobile. TCP will start with the same amount of data for all kinds of channels, and thus often gets into the situation of an initial sending rate that's far smaller than the available bandwidth -- or, if it we were to try to start off with too much data, the rate would be larger than available bandwidth and require resending.
However, it is possible to be smarter: we could choose the initial estimate based on channel properties discernable from the mobile context. In other words, we could start off with the ideal 2 lanes of trucks, shown below in frame C.
Of course, it's impossible to have perfect knowledge ahead of time. What we can do is get a much better estimate based on prior knowledge of network conditions. If we manage to get our estimates close to correct, we'll enormously improve on the TCP slow start problem.
The contention here is not that TCP has never improved in any use case. Traditionally, TCP was set to 3 MSS (maximum segment size, the MTU of the path between two endpoints). As networks improved, this was set to 10 MSS; then Google's Quick UDP Internet Connections protocol, in use in the Chrome browser, raised it to 32 MSS.
But mobile has largely been passed by, because traffic run through a browser is the minority on mobile. A larger fixed channel, like QUIC, is also not a solution, given the vast range of conditions between a 2G network in India and an ultra-fast wifi connection at Google's Mountain View campus.
Today, a very common case is mobile devices accessing content via wireless networks where the bandwidth bottleneck is the last mobile mile. And that path has very specific characteristics based on the type of network, carrier and location. For instance, 3G/T-Mobile in New York would behave differently than LTE/AT&T in San Francisco. Even the latter is quite different from LTE/Verizon at a packed Giants game at AT&T park.
From the real world data we've collected at Packetzoom, we have observed initial values ranging from 3 MSS to over 100 MSS for different network conditions. Historical knowledge of these conditions is what allows us to avoid slow starts -- to have instead a head start.
Crunching a giant stream of performance data for worldwide networks to constantly update bandwidth estimates is not a trivial problem. But it's not an intractable problem either, given the computing power available to us today. In a typical scenario, if a connection starts with a good estimate and performs a couple round trips across the network, it can very quickly find an accurate estimate of available bandwidth. Consider the following wireshark graph, which shows how quickly a 4MB file was transferred with no slow start (red line) versus TCP slow start (blue line).
TCP started with a very low value and took 3 seconds to fully utilize the bandwidth. In the controlled experiment shown by the red line, full bandwidth was in use nearly from the start. The blue line also shows some pretty aggressive backoff that's typical of TCP.
In Cricket, we often say that you need a good start to pace your innings well, and to strike on the loose deliveries to win matches. In our case, the initial knowledge of bottlenecks gets the good start. And beyond the start, there's even more that we can do to "pace" the traffic and look for "loose deliveries." The payoff to these changes would be a truly big win: a huge increase in the efficiency, and speed, of mobile data transfer.