Dataflow

How marimo notebooks run

Reactive execution is based on a single rule: when a cell is run, all other cells that reference any of the global variables it defines run automatically.

To provide reactive execution, marimo creates a dataflow graph out of your cells. ## References and definitions

A marimo notebook is a directed acyclic graph in which nodes represent cells and edges represent data dependencies. marimo creates this graph by analyzing each cell (without running it) to determine its

  • references (“refs*), the global variables it reads but doesn’t define;
  • definitions (“defs”), the global variables it defines.

There is an edge from one cell to another if the latter cell references any global variables defined by the former cell.

The rule for reactive execution can be restated in terms of the graph: when a cell is run, its descendants are run automatically. ### Example

The next four cells plot a sine wave with a given period and amplitude. Each cell is labeled with its refs and defs.

Use mo.refs() and mo.defs() to inspect the refs and defs of any given cell. This can help with debugging complex notebooks. For example, here are the refs and defs of this cell:
  • refs: ('amplitude', 'mo', 'period', 'plot_wave')
  • defs: ()
  • refs: ('mo',)
  • defs: ('period',)
  • refs: ('mo',)
  • defs: ('amplitude',)
  • refs: ('matplotlib_installed', 'mo', 'np', 'numpy_installed', 'plt')
  • defs: ('plot_wave',)

🌊 Try it! In the above cells, try changing the value period or ampltitude, then click the run button ( ▷ ) to register your changes. See what happens to the sine wave. Here is the dataflow graph for the cells that make the sine wave plot, plus the cells that import libraries. Each cell is labeled with its defs.

                   +------+               +-----------+
       +-----------| {mo} |-----------+   | {np, plt} |
       |           +---+--+           |   +----+------+
       |               |              |        |
       |               |              |        |
       v               v              v        v
  +----------+   +-------------+   +--+----------+
  | {period} |   | {amplitude} |   | {plot_wave} |
  +---+------+   +-----+-------+   +------+------+
      |                |                  |
      |                v                  |
      |              +----+               |
      +------------> | {} | <-------------+
                     +----+

The last cell, which doesn’t define anything, produces the plot. ## Dataflow programming

marimo’s runtime rule has some important consequences that may seem surprising if you are not used to dataflow programming. We list these below. ### Execution order is not cell order

The order in which cells are executed is determined entirely by the dataflow graph. This makes marimo notebooks more reproducible than traditional notebooks. It also lets you place boilerplate, like imports or long markdown strings, at the bottom of the editor. ### Global variable names must be unique

Every global variable can be defined by only one cell. Without this constraint, there would be no way for marimo to know which order to execute cells in.

If you violate this constraint, marimo provides a helpful error message, like below:

({'name': 'planet', 'cells': ('Xref',), 'type': 'multiple-defs'},) ({'name': 'planet', 'cells': ('PKri',), 'type': 'multiple-defs'},)

🌊 Try it! In the previous cell, change the name planet to home, then run the cell. Because defs must be unique, global variables cannot be modified with operators like += or -= in cells other than the one that created them; these operators count as redefinitions of a name.

🌊 Try it! Get rid of the following errors by merging the next two cells into a single cell.

({'name': 'count', 'cells': ('BYtC',), 'type': 'multiple-defs'},) ({'name': 'count', 'cells': ('SFPL',), 'type': 'multiple-defs'},)

Underscore-prefixed variables are local to cells

Global variables prefixed with an underscore are “private” to the cells that define them. This means that multiple cells can define the same underscore-prefixed name, and one cell’s private variables won’t be made available to other cells.

Example.

[{'msg': "This cell raised an exception: NameError('name '_private_variable' is not defined')", 'exception_type': 'NameError', 'raising_cell': None, 'type': 'exception'}]

Deleting a cell deletes its variables

Deleting a cell deletes its global variables and then runs all cells that reference them. This prevents severe bugs that can arise when state has been deleted from the editor but not from the program memory.

🌊 Try it! Delete this cell by clicking the trash bin icon.
variable still exists

Cycles are not allowed

Cycles among cells are not allowed. For example:

({'edges_with_vars': (('iLit', ['one'], 'ZHCJ'), ('ZHCJ', ['two'], 'iLit')), 'type': 'cycle'},) ({'edges_with_vars': (('iLit', ['one'], 'ZHCJ'), ('ZHCJ', ['two'], 'iLit')), 'type': 'cycle'},)

marimo doesn’t track attributes

marimo only tracks global variables. Writing object attributes does not trigger reactive execution.

🌊 Example. Change the value of state.number in the next cell, then run the cell. Notice how the subsequent cell isn’t updated.

0
marimo can't reliably trace attributes to cells that define them. For example, attributes are routinely created or modified by library code.

marimo doesn’t track mutations

In Python, it’s impossible to know whether code will mutate an object without running it. So: mutations (such as appending to a list) will not trigger reactive execution.

You can use the fact that marimo does not track attributes or mutations to implement mutable state in marimo. An example of this is shown in the ui tutorial.

Best practices

The constraints marimo puts on your notebooks are all natural consequences of the fact that marimo programs are directed acyclic graphs. As long as you keep this fact in mind, you’ll quickly adapt to the marimo way of writing notebooks.

Ultimately, these constraints will enable you to create powerful notebooks and apps, and they’ll encourage you to write clean, reproducible code.

Follow these tips to stay on the marimo way:

Keep the number of global variables in your program small to avoid name collisions across cells. Keep the number of global variables defined by any one cell small to make sure that the units of reactive execution are small.
Use descriptive variable names, especially for global variables. This will help you minimize name clashes, and will also result in better code.
Encapsulate logic into functions to avoid polluting the global namespace with temporary or intermediate variables.
We saw earlier that marimo cannot track object mutations. So try to only mutate an object in the cell that creates it, or create new objects instead of mutating existing ones. For example, don't do this:
# a cell
numbers = [1, 2, 3]
# another cell
numbers.append(4)
Instead, prefer
# a cell
numbers = [1, 2, 3]
numbers.append(4)
or
# a cell
numbers = [1, 2, 3]
# another cell
more_numbers = numbers + [4]
Write cells whose outputs and behavior are the same when given the same inputs (refs); such cells are called idempotent. This will help you avoid bugs, and let you cache expensive intermediate computations (see the next tip).
Use Python's builtin functools library to cache expensive intermediate computations. You can do this if you abstract complex logic into idempotent functions, following earlier tips. For example:
import functools
@functools.cache
def compute_prediction(problem_parameters):
  ...
Whenever compute_predictions is called with a value of problem_parameters it has not seen, it will compute the predictions and store them in a cache. The next time it is called with the same parameters, instead of recomputing the predictions, it will just fetch the previously computed ones from the cache.

What’s next?

Check out the tutorial on interactivity for a tour of UI elements:

marimo tutorial ui
Back to top