The SQL Miner
Flexible business processes can often be modelled more easily using a declarative rather than a procedural modelling approach. Process mining aims at automating the discovery of business process models. Existing declarative process mining approaches either suffer from performance issues with real-life event logs or limit their expressiveness to a specific set of constaint types. Lately, RelationalXES, a relational database architecture for storing event log data, has been introduced. We introduce a mining approach that directly works on relational event data by querying the log with conventional SQL. By leveraging database performance technology, the mining procedure is fast without limiting itself to detecting certain control-flow constraints. Queries can be customised and cover process perspectives beyond control-flow, e.g., organisational aspects. We evaluated the performance and the capabilities of our approach with regard to several real-life event logs.
SQL Miner Functionality
Process mining is the area of research that embraces the automated discovery, conformance checking and enhancement of business process models. All involved techniques are evidence-based, as the input is always event logs that comprise a collection of computer recorded information that track the executions of process instances. Declarative languages like Declare, Dynamic Condition Response (DCR) graphs or Declarative Process Intermediate Language (DPIL) can be used to represent these models, and tools like DeclareMiner, MINERful or DPILMiner offer capabilities to automatically discover such models from event logs. Existing declarative process mining approaches either suffer from performance issues with real-life event logs or limit their search space to a specific and fixed set of constaints to be able to cope with the size of real-life event logs. Both issues are highlighted in current literature. To the best of our knowledge an approach that is fast and customisable does not exist.
We fill this research gap by introducing a declarative mining approach that works on event data that is stored in relational databases by querying the log with conventional SQL. Process mining by means of SQL queries turns out to be an integrated and language overspanning solution to process discovery. By leveraging relational database performance technology, e.g., indexes on data columns it is fast without limiting itself to certain predefined constraints. Queries can be tailored to arbitrary aspects of a process, e.g., control-flow as well as organisational issues. Furthermore, the results can be transformed to each of the mentioned process modelling languages.
The Structured Query Language (SQL) is a declarative language designed for managing data held in a relational database. It is based upon relational algebra. We show the approach with an example, namely, the Response(A,B) constraint template, which aims at discovering all the activity combinations (a,b) where b is forced to be executed if a was completed at some point before. The query for discovering Response constraints is organised as follows:
SELECT `response', A, B, [Support], [Confidence] FROM Log l1, [ActivityCombinations] c WHERE [Constraint Expressions Subqueries] GROUP BY c.A, c.B HAVING [Support] > minSupp AND [Confidence] > minConf
Other constraint templates of the Declare language, like AlternateResponse, Precedence or NotSuccession can be mapped in an analogous manner. Subsequently, the result set can be post-processed with existing pruning methods for declarative process mining, like pruning based on constraint hierarchies. Furthermore, resource assignment contraints can be discovered by means of SQL queries. Resource assignment in business processes is extensively discussed by the workflow resource patterns. These patterns capture the various ways in which resources are represented and utilised in processes. As an example, we explain the SQL query to extract role-based resource assignment constraints, i.e., that a certain activity can only be executed by resources assigned to a certain role. This constraint type is captured by the RoleBasedAllocation(A,R) template in the DPIL language and consists of parameters for activities A and roles R. Here, we assume organisational information to be available in a relational table Relations. Without loss of generality Relations has the attributes Resource, RelationType and Group. The FROM, WHERE and GROUP BY clauses of the query are then given as follows:
FROM Log l1, Relations r1, [ActivityCombinations] c WHERE l.Activity = c.A AND r1.RelationType = 'role' AND l.Resource = r1.Resource AND NOT EXISTS(SELECT * FROM Log l2, Relation r2 WHERE l2.Resource = r2.Resource AND r2.RelationType = 'role' AND l2.Activity = l1.Activity AND NOT r2.Group = r1.Group) GROUP BY c.A, r1.Group
Material for CAiSE Paper:>>> Download Business Trip example event log
|Declare Control-Flow Constraint||Download SQL Query|
|Resource Assignment Constraint||Download SQL Query|
|Binding of Duties||BindingOfDuties.sql|
|Separation of Duties||SeparationOfDuties.sql|
|Cross-Perspective Constraint||Download SQL Query|