Fork It: Supporting Stateful Alternatives in Computational Notebooks
An opinionated summary
This paper introduces two new features in Jupyter Notebook: forking and backtracking. These features enable multiple states in each notebook (instead of just one) and facilitate exploring and comparing alternatives (such as different Machine Learning algorithms). Forking is for creating new (independent) states in a side-by-side layout. Backtracking allows checking and restoring previous states.
- Authors: Nathaniel Weinman, Titus Barik, Steven M. Drucker, Robert DeLine
- Year: 2021
- Paper
- Presentation
Takeaways
- In this context, "state", "interpreter session", "execution state", and "Python kernel" are used interchangeably. In practice, a state is a set of variables and their values, and it is associated with a cell.
- A fork consists of two or more paths (each with an independent state/Python kernel). By default, we can see two on the viewport and scroll horizontally to check the rest.
- We can only have one fork at a time. Also, there are no individual cells below the fork.
- Each path in a fork is like a mini-notebook we can use in isolation.
- When we fork, the original kernel ("above-fork" kernel) is also maintained, in addition to the new ones for each path.
- We can fork from the current state or a previous state. In other words, we can create a fork from the most recent cell or a cell before this one. In the second case, the fork will contain the cells following the cell we chose in one of the paths. Thus, backtracking is combined with forking.
- The authors provide an "Example Usage Scenario" (like a small case study) that helps to better understand these features before moving on to the details. They also added a section for "Threats to Validity".
- Five new buttons support these features on the toolbar. We can use these buttons to add or delete paths, for example.
dill
as an alternative to thepickle
module for serialization in Python.- For the usability study, the authors adapted the Creativity Support Index (CSI) in the final, Likert-based questionnaire.
- One of the participants mentioned that forking allows a smooth transition from main code to experimental code.
- Forking helps document decisions and compare alternatives. So, forking seems more interesting for exploration, not communication.
- Based on the feedback and in terms of backtracking, a simpler way to return to previous states, such as an undo cell execution feature, may be sufficient. Backtracking was used to a limited extent by only two participants.
- As seen by some participants, backtracking competes with the habit of rerunning cells to recreate states manually.
- The cell execution counter (to the left of each code cell) is adapted to consider paths/Python kernels.
- When a fork is active, it is possible to run code in the individual cells above, such as importing a new package and having access to it in a fork ("above-fork", out-of-order execution). It is unclear how it works.