14 Mar 2014

chai is a Python command line tool I wrote to ease some of the pain of booking railway tickets online. You can get it from Github. python --help should give you an overview.

It has two commands right now: avail and optimize. avail just prints out the availibility between two stations for a given train and date. optimize is interesting because it actually increases your chances of being able to travel. It does this by breaking up the journey into segments, for which tickets are available individually. So by booking multiple tickets you can make journeys which would otherwise have been difficult or impossible.

There are two observations that are important here. To help explain them, I will use a train route A -> B -> C -> … -> Z, where A, B, …, Z are all stations.

The first key observation is that if A -> B is available and B -> C is available, A -> C may not be available. This can happen if there is no seat that is empty all the way from A -> C, but there is a seat empty from A -> B and a different seat empty from B -> C. So if we book a ticket from A -> B, and a different one from B -> C, and change our seat at B, we can travel from A -> C.

The second observation is that if a ticket is available from A -> C, there may not be any available from A -> B and B -> C. This can happen for two reasons. There may be some seats reserved for journeys beginning from A, or ending at B. Or we may simply not be allowed to book tickets for journeys as short as A -> B. We can utilize this by choosing the ticket destination further down the route than our actual destination, and simply getting off, or by choosing the ticket source earlier up the route than our actual source, and getting on the train at the actual source.

Now, suppose we want to travel from C to F, but no tickets are available. chai can find a way. If C -> D and D -> F are available, we can book our journey in two segments. Or the train might reserve seats for G, so we might find C -> G available, and get off at F. Or even combinations of the two.

Implementing this required some thought. We can get all possible availabilities between any two stations in \( n \choose 2 \) requests. For 20 stations that’s 190 requests.

$ time python avail -t 22812 -s NDLS -d KGP -D 13 -m 4
[. . .]
python avail -t 22812 -s NDLS -d KGP -D 13 -m 4  0.15s user 0.04s system 38% cpu 0.501 total

That’s 0.5 seconds per request in the middle of the night, when there’s no load. When it’s bad, it can be tens of seconds per request. Even in the best case, that’s 95 seconds to get all availabilities. Then, once we have them, how do we figure out the best route? A simple brute force approach that considered all possible segmentations of the entire route would be \(O(2^n)\), quite impractical for n = 20.

To solve the first problem, instead of sending requests serially, I used the grequests library to make connections concurrently, hoping for a speedup.

$ time python optimize -t 22812 -s NDLS -d KGP -D 13 -m 4
[. . .]
python optimize -t 22812 -s NDLS -d KGP -D 13 -m 4  3.88s user 0.11s system 89% cpu 4.472 total

Only 4.47 seconds instead of 95 seconds - so the concurrency worked. Sending more than 100 requests at a time starts throwing connection errors sometimes, I’m not sure why.

To solve the second problem, I represent the entire route as an directed acyclic graph. For every station between the source and upto the last station, except for the destination station, there is a weighted edge between it and every station after it. Then, add all the stations before the source to the graph, and connect each with a weighted edge to every station after the source station. Add an edge from the source station to every station before it. Finally, connect every station after the destination with a weighted edge to the destination.

To illustrate, suppose there is a route A -> B -> C -> D -> E -> F, and we want to travel from C -> D. Apply the above method, and the graph ends up looking like this:


Once the graph is made, we want to find the shortest path between the source and the destination, which can be done in \(O(E) = O(n^2)\), perfectly practical. Now we have to decide how to assign weights to edges.

I use a 3-tuple to denote weight. It is a representation of a 3 digit number in some large base, with decimal coded digits. This allows me to compare two weights by comparing the first element, then the second, and then the third. This weight is a representation of the cost of travelling between two vertices. So if no ticket is available between two vertices, I could set the cost as \((p, 0, 0)\) and if only RAC is available, I could set the cost as \((0, q, 0)\), where \(p\) and \(q\) are positive integers, and be sure that the first cost would always be greater than the second.

We create edges for stations before the source and after the destination so that we can include the possibility of booking longer than required journeys, to ensure we get a ticket.

The way in which the cost tuple is calculated and the relative weight given to different scenarios affects the optimum path. For example, in the graph above, if the edge from E to D has a cost of (0, 0, 10) and C -> D is RAC 10 and we assign that a cost of (0, 1, 0), we would always prefer booking to E and getting off at D than travelling RAC to D.

This sounds fine in theory, but how does it work in practice? I will deal with the full answer to that question in a future post, but based on my own usage, it does work well and is often able to find ways to make a journey possible. The example given below is taken from a real trial, and shows it in action.

$ python avail -t 22812 -s NDLS -d KGP -D 13 -m 4

New Delhi to Kharagpur on the BBS Rajdhani (22812) on 13 April - shows an availability of waitlist 21. Uh oh.

$ python optimize -t 22812 -s NDLS -d KGP -D 13 -m 4
Fetching stations on route... done.
Using up to 100 concurrent connections.
Fetching availibility... 100%
Optimum plan is:
NDLS  -->  BHC ( 9  stations ) : RAC2/RAC 2
BHC  -->  KGP ( -2  stations ) : Get off at KGP

chai found that booking from New Delhi to Bhadrack, which is two stations after Kharagpur, shows an availability of RAC 2. So all you have to do, as the output suggests, is book this slightly longer journey, and get off at Kharagpur. Cool.

comments powered by Disqus