Recursive CTEs | SQLite SQL Tutorial Link Search Menu Expand Document

Recursive CTE structure vs. While/Repeat loop

By design, SQL delegates statement-level flow control to the database engine, so the standard SQL grammar does not include any flow control structures. The only exception is expression-level branching control. This control structure has operator and function representations, so it naturally integrates within an expression (similar to other functions/operators). Adding statement-level flow controls, such as branching or loops, necessitates grammar extension or special conventions. Recursive CTEs (RCTEs) follow the latter approach and implement the while/repeat loop structure using the standard SELECT grammar/syntax (except for the self-reference) and a special convention. The result is quite contrived, so comprehending and mastering RCTEs may not be straightforward.

A general loop structure has two essential code blocks - initialization and loop body. Conventionally, the RCTE analogous blocks must fit within a single WITH clause member block. Because the latter only supports SELECT-type members, an RCTE must also represent a valid SELECT statement. At the same time, RCTEs, like SQL in general, focus on row sets/views manipulation. The idea is that the initialization query generates a starting view used as one of the sources by the first loop cycle. Each execution of the loop body yields a new view used as one of the sources by the following loop cycle. Accordingly, both code blocks must represent complete SELECT statements. And because the two SELECT blocks must form a valid SELECT query, they should be glued together via a set operator, such as UNION.

By convention, the resulting view of an RCTE query is the compound consisting of the row set produced by the initialization query combined with all row sets returned by individual loop cycles using the specified set operator (the operator separating the initialization and the body blocks). This convention permits, for example, retrieving a subtree from a hierarchy modeled with the parent reference pattern.

Yet another convention is related to the meaning of the RCTE name. Outside the RCTE, its name refers to the resulting view above. This meaning is the same for both ordinary and recursive CTEs. An ordinary CTE cannot use its name as one of its sources. An RCTE, on the other hand, must use its name. Within an RCTE, its name refers to the processing frame/view, which, to a first approximation, contains the row set returned by the previous cycle or the initialization query for the first cycle. In other words, the inner and outer references to the RCTE name mean two completely different things.

Another essential feature of a loop structure is loop body start/end markers. By convention, the RCTE loop body starts with the first SELECT query portion that uses the RCTE name as one of its sources. The loop body must incorporate all code through the end of the RCTE code block, with all preceding code forming the initialization query. The body part may be a compound SELECT query combined using the same set operator as the one that combines the body and the initialization code.

There are two possible termination conditions. If the loop body includes the LIMIT clause, the loop terminates after the specified number of rows is processed. Alternatively, the loop terminates after processing all previously returned rows, with the loop body returning empty row sets.

Processing frame/queue

Both Initial SELECT and Loop Body SELECT generate row sets, which are appended to the processing queue, as schematically illustrated in the table below.

Processing Stage   Processing Queue
Initial SELECT ==> row 1
row 2
row 3
Loop Cycle #1 ==> row 1.1
row 1.2
row 1.3
row 2.1
row 3.1
Loop Cycle #2 ==> row 1.1.1
row 1.1.2
row 1.3.1
row 1.3.2
row 3.1.1
row 3.1.2
row 3.1.3

This table represents a simplified processing protocol. It assumes that each loop cycle retrieves a row set generated by the previous step from the processing queue, processes it, and produces a new row set. Each processed row is, in turn, moved from the processing queue to the result view. Note how rows are labeled in the processing queue above. Because individual rows from source row sets or tables are processed independently, the processing order is irrelevant, and we can say, e.g., that row 1 from the Initial SELECT (parent row) yields row 1.### after processing by the Loop Cycle #1 (child rows). Therefore, the output of an RCTE is essentially a row set resulting from traversing a tree/forest structure.

Also, note that because a processed row is immediately moved from the queue to the result view, the processing queue cannot contain rows from different levels of the same ancestral line. Even though the table above shows the processing queue filled with three ancestral lines starting from row # at the top, row 1 is removed from the processing queue before row 1.# are inserted, as illustrated in the table below. In other words, all rows in the processing queue are completely independent whether they belong to the same row set or not. For this reason, rows in the processing queue may be processed in any order. In fact, according to the documentation, each loop cycle extracts a single (top) row from the processing queue, and the order of rows in the processing queue may be controlled by adding the ORDER BY clause to the loop body. We can say that the processing frame always contains the processing queue top row (or points to it), and this processing frame is the actual meaning of the RCTE name used with the loop body.

The table below shows a proper representation of the processing queue. This table contains actual snapshots of the processing queue (default ordering) after several loops. The top row in each column forms the processing frame for the next loop cycle. Note, for example, Loop Cycle #1 above consists of three cycles, one for each row generated by the Initial SELECT.

Initial SELECT Cycle #1 Cycle #2 Cycle #3
row 1 row 2 row 3 row 1.1
row 2 row 3 row 1.1 row 1.2
row 3 row 1.1 row 1.2 row 1.3
  row 1.2 row 1.3 row 2.1
  row 1.3 row 2.1 row 3.1