Synthetic Biology Design Checks with TypedLogic
This gallery tutorial turns a small synthetic biology design problem into composable logical checks. The same TypedLogic theory covers assembly structure, GO-driven pathway feasibility, and GO-CAM curation consistency.
The running example is sucrose utilization in E. coli: a design needs a transporter (cscB) to bring sucrose into the cell and an invertase (cscA) to convert intracellular sucrose into glucose and fructose.
Installation
This notebook uses the Clingo solver because the pathway rule is recursive and naturally evaluated as a Datalog-style fixpoint.
pip install 'typedlogic[clingo]'
Setup
The formal predicates and axioms live in synbio_theory.py next to this notebook. Keeping the theory in a normal Python module makes it reusable from tests, scripts, and notebooks.
from __future__ import annotations
from contextlib import contextmanager
from pathlib import Path
from typing import Iterable, Sequence
import os
import sys
from IPython.display import Markdown, display
from IPython.lib.display import Code
from typedlogic.integrations.solvers.clingo import ClingoSolver
EXAMPLE_DIR = Path.cwd()
if not (EXAMPLE_DIR / "synbio_theory.py").exists():
EXAMPLE_DIR = Path.cwd() / "docs" / "examples"
THEORY_PATH = EXAMPLE_DIR / "synbio_theory.py"
sys.path.insert(0, str(EXAMPLE_DIR))
from synbio_theory import (
AvailableMetabolite,
CausalEdge,
CausalRelation,
DesignContains,
EncodesProtein,
FunctionCatalyzes,
GOAspect,
GOCAMIndividual,
HasMolecularFunction,
Part,
RequiredMetabolite,
)
@contextmanager
def silence_clingo_stderr():
"""Suppress harmless Clingo messages about predicates that only appear as facts."""
saved = os.dup(2)
devnull = os.open(os.devnull, os.O_WRONLY)
os.dup2(devnull, 2)
try:
yield
finally:
os.dup2(saved, 2)
os.close(saved)
os.close(devnull)
def infer(facts: Iterable[object], *predicate_names: str) -> dict[str, list[tuple[object, ...]]]:
"""Load the synbio theory, add facts, and return selected derived predicates."""
solver = ClingoSolver()
solver.load(THEORY_PATH)
for fact in facts:
solver.add_fact(fact)
with silence_clingo_stderr():
model = solver.model()
rows = {predicate_name: [] for predicate_name in predicate_names}
for term in model.ground_terms:
if term.predicate in rows:
rows[term.predicate].append(term.values)
return {predicate: sorted(values) for predicate, values in rows.items()}
def markdown_table(headers: Sequence[str], rows: Sequence[Sequence[object]]) -> str:
"""Render a small Markdown table without adding a pandas dependency."""
lines = ["| " + " | ".join(headers) + " |"]
lines.append("| " + " | ".join(["---"] * len(headers)) + " |")
for row in rows:
lines.append("| " + " | ".join(str(value) for value in row) + " |")
return "\n".join(lines)
def pairs(values: Sequence[tuple[object, ...]]) -> str:
"""Format binary or assembly-scoped predicate values as compact edges."""
formatted = []
for value in values:
if len(value) == 2:
left, right = value
formatted.append(f"{left}->{right}")
elif len(value) == 3:
assembly, left, right = value
formatted.append(f"{assembly}:{left}->{right}")
else:
formatted.append("->".join(str(part) for part in value))
return ", ".join(formatted) or "-"
def csv(values: Sequence[object]) -> str:
"""Format a short sequence for a Markdown table cell."""
return ", ".join(str(value) for value in values) or "-"
Code(filename=str(THEORY_PATH), language="python")
Layer 1: Assembly Structure
The structural layer treats overhangs as assembly-scoped typed facts. From those facts it derives which parts can ligate, which parts are intended neighbors, whether an intended neighboring pair fails to ligate, and whether any compatible ligation skips the intended next position.
safe_assembly = [
Part("clean", "p_lac", "promoter", "GGAG", "TACT", 0),
Part("clean", "rbs_b0034", "rbs", "TACT", "AATG", 1),
Part("clean", "cscB", "cds", "AATG", "GCTT", 2),
Part("clean", "term_b0015", "terminator", "GCTT", "CGCT", 3),
]
risky_assembly = [
Part("risky", "p_lac", "promoter", "GGAG", "TACT", 0),
Part("risky", "rbs_b0034", "rbs", "TACT", "AATG", 1),
Part("risky", "cscB_skip_rbs", "cds", "TACT", "GCTT", 2),
Part("risky", "term_b0015", "terminator", "GCTT", "CGCT", 3),
]
def assembly_report(label: str, facts: Sequence[Part]) -> tuple[str, str, str, str, str]:
results = infer(facts, "CanLigate", "IntendedAdjacent", "IntendedLigationGap", "MisligationRisk")
gaps = results["IntendedLigationGap"]
risks = results["MisligationRisk"]
return (
label,
pairs(results["CanLigate"]),
pairs(results["IntendedAdjacent"]),
pairs(gaps) if gaps else "none",
pairs(risks) if risks else "none",
)
assembly_rows = [
assembly_report("clean overhang grammar", safe_assembly),
assembly_report("CDS can skip RBS", risky_assembly),
]
display(
Markdown(
markdown_table(
["Assembly", "Can ligate", "Intended adjacency", "Intended gap", "Skip risk"],
assembly_rows,
)
)
)
| Assembly | Can ligate | Intended adjacency | Intended gap | Skip risk |
|---|---|---|---|---|
| clean overhang grammar | clean:cscB->term_b0015, clean:p_lac->rbs_b0034, clean:rbs_b0034->cscB | clean:cscB->term_b0015, clean:p_lac->rbs_b0034, clean:rbs_b0034->cscB | none | none |
| CDS can skip RBS | risky:cscB_skip_rbs->term_b0015, risky:p_lac->cscB_skip_rbs, risky:p_lac->rbs_b0034 | risky:cscB_skip_rbs->term_b0015, risky:p_lac->rbs_b0034, risky:rbs_b0034->cscB_skip_rbs | risky:rbs_b0034->cscB_skip_rbs | risky:p_lac->cscB_skip_rbs |
In the second assembly, rbs_b0034 and cscB_skip_rbs are intended neighbors but have incompatible overhangs. The same assembly also lets p_lac ligate directly into cscB_skip_rbs, so the theory reports both the intended ligation gap and the skip risk before considering pathway function.
Layer 2: GO-Driven Pathway Feasibility
The functional layer connects parts to proteins, proteins to GO molecular functions, and GO functions to metabolite transformations. A recursive rule derives every metabolite reachable from the growth medium.
BIOLOGY = [
EncodesProtein("cscA", "P10000_cscA"),
EncodesProtein("cscB", "P30000_cscB"),
HasMolecularFunction("P10000_cscA", "GO:0004575"), # sucrose alpha-glucosidase
HasMolecularFunction("P30000_cscB", "GO:0008515"), # sucrose:H+ symporter
FunctionCatalyzes("GO:0008515", "sucrose_out", "sucrose_in"),
FunctionCatalyzes("GO:0004575", "sucrose_in", "glucose_in"),
FunctionCatalyzes("GO:0004575", "sucrose_in", "fructose_in"),
]
def pathway_report(label: str, design_parts: Sequence[str]) -> tuple[str, str, str, str]:
facts = [
*BIOLOGY,
*(DesignContains("sucrose_design", part) for part in design_parts),
AvailableMetabolite("sucrose_design", "sucrose_out"),
RequiredMetabolite("sucrose_design", "glucose_in"),
]
results = infer(facts, "AvailableMetabolite", "RequiredMetabolite")
available = sorted({metabolite for _, metabolite in results["AvailableMetabolite"]})
required = sorted({metabolite for _, metabolite in results["RequiredMetabolite"]})
missing = [metabolite for metabolite in required if metabolite not in available]
status = "complete" if not missing else f"gap: {csv(missing)}"
return label, csv(design_parts), csv(available), status
pathway_rows = [
pathway_report("full sucrose pathway", ["cscA", "cscB"]),
pathway_report("permease only", ["cscB"]),
pathway_report("invertase only", ["cscA"]),
pathway_report("empty design", []),
]
display(Markdown(markdown_table(["Design", "Parts", "Reachable metabolites", "Verdict"], pathway_rows)))
| Design | Parts | Reachable metabolites | Verdict |
|---|---|---|---|
| full sucrose pathway | cscA, cscB | fructose_in, glucose_in, sucrose_in, sucrose_out | complete |
| permease only | cscB | sucrose_in, sucrose_out | gap: glucose_in |
| invertase only | cscA | sucrose_out | gap: glucose_in |
| empty design | - | sucrose_out | gap: glucose_in |
The full design reaches intracellular glucose from extracellular sucrose. The permease-only design imports sucrose but cannot cleave it. The invertase-only design has the enzyme, but no route for sucrose to enter the cell.
Layer 3: GO-CAM Curation Consistency
The curation layer checks a common GO-CAM modeling rule: declared causal relations such as RO:0002413 should connect molecular function individuals, not biological process individuals.
GO_ASPECTS = [
GOAspect("GO:0004575", "MF"), # sucrose alpha-glucosidase activity
GOAspect("GO:0008515", "MF"), # sucrose:H+ symporter activity
GOAspect("GO:0005992", "BP"), # trehalose biosynthetic process
GOAspect("GO:0006012", "BP"), # galactose metabolic process
]
CAUSAL_RELATIONS = [
CausalRelation("RO:0002411"), # causally upstream of
CausalRelation("RO:0002413"), # provides direct input for
]
def gocam_report(
label: str,
individuals: Sequence[tuple[str, str]],
edges: Sequence[tuple[str, str, str]],
) -> tuple[str, str, str, str]:
facts = [
*GO_ASPECTS,
*CAUSAL_RELATIONS,
*(GOCAMIndividual(iri, go_class) for iri, go_class in individuals),
*(CausalEdge(upstream, downstream, relation) for upstream, downstream, relation in edges),
]
violations = infer(facts, "GOCAMViolation")["GOCAMViolation"]
violation_text = ", ".join(f"{iri}: {rule}" for iri, rule in violations) or "none"
return (
label,
csv([f"{iri}={go_class}" for iri, go_class in individuals]),
pairs([(upstream, downstream) for upstream, downstream, _ in edges]),
violation_text,
)
gocam_rows = [
gocam_report(
"valid MF to MF causal edge",
individuals=[("ind:1", "GO:0008515"), ("ind:2", "GO:0004575")],
edges=[("ind:1", "ind:2", "RO:0002413")],
),
gocam_report(
"invalid MF to BP causal edge",
individuals=[("ind:3", "GO:0008515"), ("ind:4", "GO:0005992")],
edges=[("ind:3", "ind:4", "RO:0002413")],
),
gocam_report(
"invalid BP to BP causal edge",
individuals=[("ind:5", "GO:0006012"), ("ind:6", "GO:0005992")],
edges=[("ind:5", "ind:6", "RO:0002413")],
),
gocam_report(
"non-causal relation to BP",
individuals=[("ind:7", "GO:0008515"), ("ind:8", "GO:0005992")],
edges=[("ind:7", "ind:8", "BFO:0000050")],
),
]
display(Markdown(markdown_table(["GO-CAM model", "Individuals", "Causal edge", "Violations"], gocam_rows)))
| GO-CAM model | Individuals | Causal edge | Violations |
|---|---|---|---|
| valid MF to MF causal edge | ind:1=GO:0008515, ind:2=GO:0004575 | ind:1->ind:2 | none |
| invalid MF to BP causal edge | ind:3=GO:0008515, ind:4=GO:0005992 | ind:3->ind:4 | ind:4: causal_downstream_not_MF |
| invalid BP to BP causal edge | ind:5=GO:0006012, ind:6=GO:0005992 | ind:5->ind:6 | ind:5: causal_upstream_not_MF, ind:6: causal_downstream_not_MF |
| non-causal relation to BP | ind:7=GO:0008515, ind:8=GO:0005992 | ind:7->ind:8 | none |
The same solver workflow handles design-time checks and curation-time checks. In a production pipeline, the facts could come from a parts registry, GO annotations, Rhea mappings, and GO-CAM RDF exports rather than hand-authored lists.
Inspect the Compiled Rules
TypedLogic parses the Python axioms into a logical theory and the Clingo backend renders that theory as ASP. The recursive pathway rule is the key fixpoint rule.
solver = ClingoSolver()
solver.load(THEORY_PATH)
compiled_rules = solver.dump().splitlines()
for rule in compiled_rules:
if rule.startswith(("availablemetabolite", "intendedligationgap", "misligationrisk", "gocamviolation")):
print(rule)
Next Steps
This pattern scales by swapping the toy facts for real sources: a parts library for assembly facts, GO and Rhea mappings for function-to-reaction facts, and GO-CAM exports for curation facts. Violations can then be reported back as design review comments or curator feedback.