py-typedlogic tutorial
This tutorial walks you through some of the basic features of the typedlogic
library, and how to define logical formulae for Python data structures and programs.
The tutorial makes use of a simple database of Link
structures, which represent direct steps in a path through different US states. We will define a predicate Path
that represents the transitive closure of the Link predicate.
For example, if we have Link
s:
CA -> NV
NV -> AZ
AZ -> UT
Then have 3 paths of length 1 (corresponding to the links), as well as paths of length 2 from CA
to AZ
and NV
to UT
, and a path of length 3 from CA
to UT
.
Installation
To install:
pip install typedlogic
This tutorial makes use of the Snakelog and Z3 integrations, so you would need the Z3 extra:
pip install 'typedlogic[snakelog,z3]'
This tutorial is also a Jupyter notebook, which can be executed interactively.
Defining our core data structures
You can define your data model through a different mechanisms. Currently the recommended ways are
- Pydantic classes
- Python dataclasses
Support for other mechanisms (e.g SQLModels, Pandera) is planned.
For this tutorial, we will use a ready-made pydantic file containing the definition of our data model, which will consist of two classes:
Link
, representing a direct edge between two entitiesPath
, representing a path of one or more edges
(For the purposes of this tutorial, we will use a show
function to display the contents of our program,
but this step is not necessary for everyday use).
We will use this to show the contents of the paths/model.py file.
# convenience function for running in a notebook
from utils import show
show("paths/model.py")
We can use this just as we would any other Pydantic schema. Here we will import it, and create instances of the Link
class,
representing direct connections between US states. For our purposes here, we will consider these directed.
from paths.model import Link
data = [("CA", "NV"), ("NV", "AZ"), ("AZ", "UT"), ("CA", "OR"), ("OR", "WA"), ("WA", "ID"), ("ID", "MT")]
links = [Link(source=source, target=target) for source, target in data]
links[0:2]
Because we chose to use Pydantic, we get excellent runtime validation on object creation. Our schema so far has minimal constraints, but we have declared that the types of the two fields are strings
. This means that we will get a validation error if we try to create a Link
with non-string values.
from pydantic import ValidationError
try:
new_link = Link(1,2)
except ValidationError as e:
print("We got a validation error!")
print("** Don't worry, this is expected **")
print(e)
So far everything we have done in standard Pydantic! Now, onto so unique typedlogic features:
Adding axioms
Next we want to define logical axioms that determine what inferences to expect from our data model. In particular, we want to define axioms (in this case, simple rules) that
- Derives one-hop
Path
s from theLink
predicate. - Derives multi-hop
Path
s from the transitive closure of the one-hopPath
predicate.
We have a ready made program for doing this:
show("paths/axioms.py")
Note that typedlogic is based in Python, we can take advantage of python constructs like import
to modularize our theory.
This uses the @axiom
decorator to define the rules. These may look a little unusual at first - they are not intended to be executed directly,
instead they provide a declarative specification of the rules of our system.
If you are familiar with first-order logic, you can read each decorated function as a "forall" (universal quantifier) statement, where the arguments are the (typed) variables in the quantifier. If you are not, don't worry for now. You can just read each of the axioms as a simple rule.
Solvers
The rules are not executed directly - instead they are used as inputs for solvers. Different solvers will operate differently, but broadly they can be thought of as fulfilling two roles:
- Finding new facts: Given a set of rules and a set of facts, the solver can derive new facts that are logically entailed by the rules and the facts.
- Checking whether a set of facts is consistent with the rules, or whether the rules themselves are consistent.
Not all solvers are capable of fulfilling both roles, and solvers may vary in what expressions they understand and how fast they run. py-typedlogic provides as easy bridge onto a number of different solvers, through its integrations mechanism.
We will start with a simple pure python builtin solver called SnakeLog. This implements only a subset of logic (formally, it is a small subset of datalog), but it has the advantage of requiring no additional installation steps, and being fast for large datasets.
First we'll create the solver object, and load the axioms into it.
import paths.axioms as paths_axioms
from typedlogic.integrations.solvers.snakelog import SnakeLogSolver
solver = SnakeLogSolver()
solver.load(paths_axioms)
Next we'll add our data to the solver. In this case, we'll add the Link
objects we created earlier.
for link in links:
solver.add(link)
Now we will ask the solver for a model.
Note the term "model" is a bit ambiguous. If you are coming from a data modeling and Pydantic background, "model" means schema or data model. The use in formal systems and "model theory" is a bit different. For now you can think of it as representing a universe of things that might possibly be true.
We'll get onto what it would mean for there to be multiple models later, but our simple snakelog solver (like many datalog systems) will just yield a single model, so we'll get that.
model = solver.model()
We can ask the model to retrieve all ground terms (or facts) that can be proved and are consistent in that model.
Here we use iter_retrieve
to get all the Path
objects that can be derived from the axioms
for path in model.iter_retrieve("Path"):
print(path)
We can see it has given us all transitive / multi-hop paths that can be derived from the starting facts.
We can also derive the one-hop links, but this is not very interesting, corresponding to the initial database (because we provided no axioms to entail new links).
Let's check this:
original_links = {(link.source, link.target) for link in links}
entailed_links = {term.values for term in model.iter_retrieve("Link")}
assert original_links == entailed_links
We can encode more complex inference rules, using predicates with more arguments, encoding the complexity of our domain. We will return to some examples later.
For now, let's switch to a different use case, checking our data and our data models.
Satisfiability Checking
For this we're going to switch to a more expressive solver, the Z3 Prover. You will need to install this as an extra, either:
pip install 'typedlogic[z3]'
We will set up the solver in the same way.
This time we won't ask for a model, we will instead run check
which checks for satisfiability.
from typedlogic.integrations.solvers.z3 import Z3Solver
solver = Z3Solver()
solver.load(paths_axioms)
for link in links:
solver.add(link)
solver.check()
So far so good! Our data and theory is consistent and satisfiable. But this is not surprising as we have not specified any axioms involving negation.
Let's add an axiom that says that there are no cycles, i.e there is no path that leads back to itself. We can write this in FOL as:
$$ \forall x. \neg Path(x, x) $$
But here we will write this in Python syntax using the @axiom
decorator:
@axiom
def acyclicity(x: ID, y: ID):
assert not (Path(source=y, target=x) and Path(source=x, target=y))
Here is what our modified program looks like:
show("paths/axioms_strict.py")
Now let's set up the solver with the new axioms and check for satisfiability again. We will also add a link that creates a cycle.
solver = Z3Solver()
import paths.axioms_strict as paths_axioms_strict
solver.load(paths_axioms_strict)
for link in links:
solver.add(link)
solver.add(Link(source="MT", target="CA"))
solver.check().satisfiable
Hurrah! As expected, the solver has found that the data is not satisfiable, because we have added a cycle.
Currently there is no single easy way to see an explanation, but this will be provided in the future.
Multiple Models
Next we will explore what happens when we add sentences that lead to multiple possible worlds in which all sentences are true. These separate "worlds" are typically called "models" in formal logic.
We will use the Clingo solver, which is based on Answer Set Programming (ASP) which allows for enumeration of these models.
We will set up our base knowledge base as before:
from typedlogic.integrations.solvers.clingo import ClingoSolver
solver = ClingoSolver()
import paths.axioms_strict as paths_axioms_strict
solver.load(paths_axioms_strict)
for link in links:
solver.add(link)
Next we'll add uncertainly using the ExactlyOne
boolean sentence type:
from typedlogic import ExactlyOne
solver.add(ExactlyOne(Link("MT", "ND"), Link("MT", "SD")))
We'll check this gives two worlds:
models = list(solver.models())
assert len(models) == 2
Next we will explore which paths are present in each world:
for n, model in enumerate(models):
print(f"Model: {n}")
for t in model.iter_retrieve("Path", "CA", "ND"):
print(f" has path CA to ND: {t}")
for t in model.iter_retrieve("Path", "CA", "SD"):
print(f" has path CA to SD: {t}")
As expected there is one world with a path from CA to SD, and another with a path from CA to ND.
We can collapse the number of universes by ruling out a path. So far we have only asserted the base one-hop links, but there is nothing to stop us making assertions about multi-hop paths. We can do this by adding a negation of a path to the solver, here stating there is no path from CA to ND:
from paths.model import Path
solver.add(~Path("CA", "ND"))
This will collapse the number of models to one:
models = list(solver.models())
assert len(models) == 1