LANTERN: an interpretable genotype-phenotype landscape model¶
What is LANTERN?¶
LANTERN is a tool for learning interpretable models of genotype-phenotype landscape (GPL) data.
Installation¶
LANTERN currently must be installed from source. It is recommended to install in a virtual environment (e.g. venv or conda):
python -m pip install git+https://github.com/usnistgov/lantern.git
Quickstart¶
LANTERN provides a straightforward interface for training models:
import pandas as pd
from torch.optim import Adam
from lantern.dataset import Dataset
from lantern.model import Model
from lantern.model.basis import VariationalBasis
from lantern.model.surface import Phenotype
# create a dataframe containing GPL data
df = pd.DataFrame(
{"substitutions": ["", "+a", "+b", "+a:+b"], "phenotype": [0.0, 1.0, 1.0, 0.8]},
)
# convert the data to a LANTERN dataset
ds = Dataset(df)
# build a LANTERN model based on the dataset, using an upper-bound
# of 8 latent dimensions
model = Model(
VariationalBasis.fromDataset(ds, 8),
Phenotype.fromDataset(ds, 8)
)
loss = model.loss(N=len(ds))
X, y = ds[:len(ds)]
optimizer = Adam(loss.parameters(), lr=0.01)
for i in range(100):
optimizer.zero_grad()
yhat = model(X)
lss = loss(yhat, y)
total = sum(lss.values())
total.backward()
optimizer.step()
For a more thorough introduction, see the Tutorial.
Citation¶
LANTERN can be cited as: <insert biorxiv link>
The workflow used for generating the results of the manuscript is available at github.com/ptonner/lantern/manuscript