Process Mining: Desire Lines or Cow Paths?



The term desire line originates from urban planning where it has been used for decades. A desire line shows where people naturally walk. The width and degree of erosion of such an informal path indicates how frequently the path is used. Often the desire line is very different from the formal pathway. Therefore, some planners simply let erosion tell were the paths need to be. For example, the paths across Central Park in New York were reconstructed using this approach. Before becoming the 34th president of the United States, Dwight Eisenhower was the president of Columbia University. He used the same approach to optimize the network of walkways on the Columbia’s campus. The places where the grass was most worn by people's footsteps were turned into sidewalks.

Process mining aims to exploit desire lines in event logs.  The motivation for doing this is twofold. On the one hand, event data are already omnipresent and the total volume will continue to grow exponentially. On the other hand, many organizations have problems managing their processes, partly because stakeholders lack detailed insights into how processes really run. In a recent study, MGI estimates that enterprises globally stored more than 7 exabytes of new data on disk drives in 2010, while consumers stored more than 6 exabytes of new data on devices such as PCs and notebooks.  Process managers can no longer shy away from this torrent of event data showing how people and organizations really work.


Good information systems do not show signs of erosion like grassy areas. However, they often contain a wealth of event data indicating the paths followed by the users of the system. Therefore, it is possible to determine desire lines in organizations, systems, and products. Besides visualizing such desire lines, we can also investigate how these desire lines change over time, characterize the people following a particular desire line, etc. There may also be desire lines that are "undesirable" (unsafe, inefficient, unfair, etc.). Uncovering such phenomena is a prerequisite for process and product improvement.


The potential value of desire lines in big data is enormous. The identification of such information can be used to redesign procedures and systems (reconstructing the formal pathways), to recommend people taking the right path (adding signposts were needed), or to build in safeguards (building fences to avoid dangerous situations).



Some people argue that desire lines are like "cow paths" and one should avoid "paving the cow paths". This is the mantra of Business Process Reengineering. Processes need to be reengineered to make them faster, more efficient, more reliable, cheaper, etc.  This requires people to "think out of the box" to allow for dramatic changes. I fully agree that the goal of process mining should not be to maintain a status quo, but to improve processes wherever possible. However, if one does not know where the cow paths are, then it is very likely that one will not succeed in improving the existing situation. Moreover, process mining can also be used for operational support, i.e., combining historic information with information about running cases in order to make predictions and provide recommendations.

To clarify this view, I often use metaphors related to cartography and navigation. Process models can be viewed as electronic maps, running process instances can be viewed as traffic trying to get from A to B, and information systems can be viewed as navigation systems.


Process mining will provide an organization with maps showing how processes are really executed. How detailed the map should be and what aspects it should cover depends on the purpose.  Some people may argue that there is no need to produce a map if you already know the real processes. This is true, but (a) few organizations have detailed knowledge of their actual processes (often only misleading simple PowerPoint and Visio information is present) and (b) the coupling of event logs to discovered process models allows for the projection of additional information based on facts (e.g., bottlenecks).  The coupling of event data to maps is very powerful. It is comparable to the widespread use of mashups based on Google maps. It is possible to project event data on process models just like one can project restaurants, traffic jams, vacant apartments, Wikipedia entries, etc. onto geographic maps.


Other people may argue that there is no point in making maps as processes changes all the time (we call this concept drift). In my view, frequent changes make process mining even more important because people will get easily lost without up-to-date maps. Process mining enables organizations to closely monitor changing processes. This is much better than producing piles of Visio diagrams that are not looked at nor maintained.


The link to navigation is obvious. A navigation device provides directions and guidance rather than enforcing a particular route. Moreover, it is making predictions like estimating the arrival time.  This is the functionality any BPM system should provide. Unfortunately, today’s BPM systems force people to work in a particular way and do not provide operational support (estimating the remaining flow time of a case or suggesting an alternative route to avoid congestion).  Nowadays many people talk about adaptive case management (although the ideas have been around for more than a decade). It is obvious that navigation-like capabilities are essential for case management. However, these capabilities require the learning abilities provided by process mining technology.


Hence, process mining is not about "paving the cow paths", but about the clever use of desire lines. Moreover, never trust a business process reengineer, manager or consultant that does not know where the cow paths are.


Wil van der Aalst


ps.  For more information read the Process Mining Book ( or the Process Mining Manifesto (