Or: How long does it really take to access your Mobile API?
Many app developers use mobile APM (analytics) solutions such as New Relic or Apteligent to better understand the performance of their application. Since these solutions were designed with a web mindset (i.e. http traffic only), when used in modern applications that leverage advanced mobile networking stack they overlook a crucial performance area related to network disconnects and the impact these disconnects have on the end user.
In an Ideal World (i.e. best case scenario)
Network conditions remain stable and perform close to perfect: users do not experience packet loss, nor do they suffer from disconnects. When the user hits the search button the Search API is called and takes 20 seconds to return the search results.
Timeline calculation: 20 seconds
In the Real World (i.e. mobile networks)
Mobile users are not always stationary, which can cause serious problems for network conditions. For instance, when users are traveling via public transportation, they often move through dead zones, sometimes even different types of networks (WiFi-->LTE, LTE-->3G, 3G-->2G, etc.), which would force their IP address to change. If they hit the search button while traveling as such, a disconnect will take place, and since most of the developers have a fixed timeout set (e.g. 20 seconds), it might take an additional 20 seconds to retry the search call.
In the example below we assumed that the disconnect took place after 10 seconds, so the search activity took a total time of 10 + 20 + 20 = 50 seconds. When using a traditional APM solution it would mark this call as a failed transaction that lasted 20 secs and would ignore the additional (10 + 20) seconds of wait time. In other words, all that switching between networks and changing IP addresses caused the transaction to fail unnecessarily!
Timeline calculation: 10+20+20= 50 seconds
What if there was a modern mobile protocol that could rescue connections?
Let's review the previous scenario again. What if there was a smart way to use big data from millions of devices and billions of data points to better handle disconnects? Such data could be utilized to accurately create connections and terminate them using the proper timeouts. Waiting another second and avoiding a connection drop for example could save many seconds of establishing a new connection.
In the example below we show that a disconnect 10 seconds into the search activity could be rescued by waiting a bit longer. In this corner case it would only take 10+21+10 = 41 seconds to complete the search. A true mobile APM solution would measure such a connection rescue as a successful transaction that lasted 41 seconds.
In the last scenario we have seen how a connection rescue resulted a big performance gain of nine seconds (i.e. about 20% faster response time).
* NOTE: numbers used in the examples are inflated for demonstration purposes.
Mobile APM solutions such as New Relic and Apteligent simplify the story by categorizing requests into failures and retries. However, from the user's perspective, connection failure and the subsequent retry could actually be a single successful transfer (even if a bit slower).
A successful transfer is much more valuable to the end user as it spares the need to manually retry, the overall user experience flows and feel "right." This may ultimately translate into higher engagement, increased retention, and improved conversion rates.
Network disconnects are very common and take place all the time as we see in the PacketZoom's Mobile Application Performance Benchmark, with disconnects varying from 4% to 13%. PacketZoom is able to consistently rescue 60-90% of the disconnects for our customers across the board.