Here’s the experimental data analysis, what I call a domestic route yield/PRASM performance comparison targeting UA, as it takes some work to set my analysis up for the particularities of each carrier:
And here’s the (quite long explanation) of what it is and how it works:
Back when I worked at an airport air service development office I had access to the airport’s subscription to Diio Mi by Cirium. This was (and probably still is) an amazing tool for analyzing route performance including yield and RASM comparisons across a carrier’s entire network, or an airport’s entire range of routes.
(An old DiioMi report)
Now Cirium is so expensive that its pretty much only available to people actually in the industry. I don’t have it anymore. But, being still just as interested in questions of the relative performance of different routes, I’ve been taking it upon myself to experiment with the data sources to which I do have access to see how far I can get to manually. My goal in particular is to replicate something of their “Hub Grader” feature, comparing the relative financial performance of say DEN-LGA for UA versus DEN-ORD or DEN-RIW.
Cirium uses a lot of data sources that are not publicly available for free, but for the United States domestic market basics of T100 (onboard traffic) and DB1B (itinerary origins and destinations and revenue) are free from the USDOT but in relatively incomprehensible format without lots of work and recombinatory effort.
All that follows is my experimental attempt to use these data sources to build up to this kind of analysis, with my example being a stage length adjusted segment RASM (revenue per available seat mile) comparison of about fifteen UA routes out of DEN.
First, an explanation on the sources themselves.
T100 gives traffic data on a monthly basis such as the number of onboard passengers, available seats, number of departures, and aircraft type for each segment. This means that it tells how many people are on each flight WITHOUT telling where their journeys actually started or ended.
DB1B comes in three forms, market, ticket, and coupon, and is a random 10% airline-provided sample of domestic ticketed itineraries giving such information as where passenger journeys began and ended, who sold them the ticket, what airline operated each segment of their itinerary, and what fare they paid, and comes out on a quarterly basis.
From DB1B data, assuming it is a fair sample, I calculated on a one-way origin and destination (O&D) basis passengers daily each way carried by UA between each domestic origin airport and each destination, as well as the average circuity, average fare, and average yield of each of these possible itineraries that UA sold tickets to and carried passengers on in Q3 2022.
Note that most of these are city pairs not served nonstop by UA.
Next, I also calculated United’s average fare and yield for each domestic city it serves. I placed these on a scatterplot plotted against the average itinerary distance for each city to find via regression analysis the trendline of United’s domestic passenger yields. (DATA here: https://docs.google.com/spreadsheets/d/1PpLdM3vCm3fIZaS40iEQSd0EjVbn6exJC_YskRURjy8/edit?usp=sharing)
The trendline (here I chose a power series), is, I am assuming (and note that this is an important assumption) is the median yield for UA at a given domestic itinerary length.
The reason that finding this trendline from a scatterplot is important is because fares are not directly proportional to itinerary length, which is what would happen if a carrier’s average yield were the same at different itinerary lengths. Rather, yield tends to decrease at higher itinerary/stage lengths, the reason being that, most of the time, the cost per available seat mile (CASM) also goes down at higher itinerary/stage lengths as the fixed costs of operating a flight are spread out over longer flights.
This analysis, however, gives me a trendline equation relating itinerary/stage length to the median yield at the length.
Itineraries and stations with yields that are above this trendline are more profitable, broadly than those below it, but, more importantly, the existence of this trendline allows me to more things in regard to my chosen subject of United, specifically, “stage length adjusted yields” which are really just a way of making apples to apples comparisons of the relative revenue generating ability of different itineraries for United in relation to their median domestic yield.
Since I have UA’s median domestic yield, median itinerary length and this trendline, its a simple matter to calculate from all this “stage length adjusted yield” for each station/itinerary, normalizing each itinerary’s revenue performance to how it would perform at UA’s system average itinerary length.
This data (for stations only) is again linked here: https://docs.google.com/spreadsheets/d/1PpLdM3vCm3fIZaS40iEQSd0EjVbn6exJC_YskRURjy8/edit?usp=sharing
It is important to note that this is not yet route performance but is merely station revenue performance for domestic passengers beginning or ending their journeys in each station. DEN data, for example, in this above spreadsheet, does not account for the revenue of passengers connecting through DEN. It only includes data for how UA is performing in revenue generation from DEN originating or terminating passengers versus its other stations.
To go to the next step of route-level analysis, we need to unravel another puzzle, what is known as “proration”.
Suppose you are flying from Cincinnati to Salt Lake City on United by connecting through DEN. Suppose also you pay a $250 one-way fare.
That’s $250 to fly 1460 miles, 1069 from CVG-DEN and 391 from DEN-SLC.
It would seem to be most simple to allocate how the fare goes to support the two segments proportionally. (391/1460 of it to support DEN-SLC and 1069/1460 to support CVG-DEN or $183 and $67, respectively.
But, this isn’t how it’s done. Imagine flying BOS-EWR-PSP for $250. Less than $25 would go toward the BOS-EWR segment if we divided the fare proportionally. Not even enough to cover airport fees at BOS and EWR. Again, shorter routes are more expensive to operate on a per mile basis.
Rather carriers divide out fares between segments proportionally between the square root of their length.
Exactly how I turned this into a formula is bit too complicated to explain easily, but this was my next analytical step, and follows again, what I have learned is industry standard practice for this process of proration.
Next, I started creating what are called passenger segment profiles, or summaries of the itinerary origins and destinations of the passengers of a particular individual flight segment
Here’s an example of SYR-DEN on UA in Q3 2022: https://docs.google.com/spreadsheets/d/1z0MHF7ZH7InR44QA86JEgYVsFpHE5iRNfAfchK3YCbw/edit?usp=sharing
This spreadsheet includes for the particular flight segment SYR-DEN, the origins, destinations, number of daily passengers, total itinerary fares and yields, prorated fare amounts and yields allocated to this segment from out of their overall itineraries.
Summing up the data gives us daily total revenue and yield information for this flight segment, including several percentage-based columns that give the relative performance of this segment versus the UA system average: “Segment Yield as % of Stage Length Adjusted Carrier Median”, “Beyond Traffic Yield as Stage Length Adjusted % of Carrier Median”, and “Local Traffic Yield as Stage Length Adjusted % of Carrier Median”.
The first of these gives the yield (from the parts of the fare allocated to this segment in comparison to UA’s domestic system stage length adjusted average), the next gives the yield of the connecting traffic to or from this flight to other UA flights, and the third gives the yield of the local traffic on this flight segment in comparison to the UA system average. The numbers for this example, should broadly, make sense. Local traffic is typically more lucrative than connecting traffic as the latter has to, typically, be sold at a discount.
Finally, we’re close. I performed this analysis for about fifteen different routes, placed these route totals into another spreadsheet to compare them to each other, and then added one more piece to this analysis, load factor, in order to allow us to fully compare routes with differing yields and load factor on an apples to apples basis by providing a percentage based proxy for the revenue per available seat mile (RASM) as it compares to UA’s system average.
Here’s the ultimate data so far: https://docs.google.com/spreadsheets/d/1TZvORy0NMARd6Jt2_StRCuEcDtfhov2hqW7MoGi33ho/edit?usp=sharing
This final column is, I purport, is, as best as I can with the current data available to me, a relative measure of domestic route revenue performance to UA in Q3 2022.
Again, there are still a lot of caveats. Given UA’s widely varying fleet with very different cost per available seat miles between on a CRJ200 versus a B737-900ER, a higher yielding route is not necessarily more profitable than a lower yielding one that operates on a larger aircraft than the first. Again, this analysis excludes international traffic. My regression analysis based formula for UA’s median domestic yield could take some work as it only has an R-squared value of 0.436 (worryingly low for those professionally into data analysis).
But I think it’s something and I’m confident in the basic logic of each step, as well as most of the results as I’ve been checking them as I can against the work of a paid site which offers paid proxy info for route performance on a classification scale of (Weak-Great) and my numbers are so far broadly similiar. It’s a puzzle I’m working to continue to perfect. Applying a similar analysis to other U.S. carriers is next, as well as figuring out how to automate the last stage of the analysis, combining data from different routes, which is currently quite manual and takes a lot of effort.
For the moment, however, I’m set up to process UA routes quite easily and have 25 examples “graded” so far for their relative performance in Q3 2022. Reply with a domestic UA route and I’ll add it to this spreadsheet.
Again, this data is quite experimental and is also only as good as what the carriers provide to the USDOT.
Thanks in advance for any feedback!