API Reference

Surface

class lantern.model.surface.Phenotype(D, K, mean, kernel, variational_strategy)

A phenotype surface, learned with an approximate GP.

Parameters
Return type

None

Method generated by attrs for class Phenotype.

property Kbasis

The number of dimensions provided by the basis

classmethod build(D, K, Ni=800, inducScale=10, distribution=<class 'gpytorch.variational.cholesky_variational_distribution.CholeskyVariationalDistribution'>, mean=None, kernel=None, learn_inducing_locations=True, *args, **kwargs)

Build a phenotype surface object.

Parameters
  • D (int) – Number of dimensions of the (output) phenotype

  • K (int) – Number of latent dimesions

  • Ni (int, optional) – Number of inducing points

  • inducScale (float, optional) – Range to initialize inducing points over (uniform from [-inducScale, inducScale])

  • distribution (gpytorch.VariationalDistribution) – The distribution of the variational approximation

  • mean (gpytorch.means.Mean, optional) – Mean function of the GP

  • kernel (gpytorch.kernels.Kernel, optional) – The kernel of the GP

  • learn_inducing_locations (bool, optional) – Whether to learn location of inducing points

forward(z)

The forward prediction of the phenotype for a position in latent phenotype space.

classmethod fromDataset(ds, *args, **kwargs)

Build a phenotype surface matching a dataset

Basis

class lantern.model.basis.Basis

A dimension reducing basis for mutational data.

Parameters
  • p (int) – Input dimension, e.g. the number of mutations

  • K (int) – output dimension, e.g. the number of latent directions

Return type

None

Method generated by attrs for class Basis.

property order

The rank order of latent dimensions

class lantern.model.basis.VariationalBasis(W_mu, W_log_sigma, log_alpha, log_beta, alpha_prior)

A variational basis for reducing mutational data.

Method generated by attrs for class VariationalBasis.

Parameters
  • W_mu (torch.nn.parameter.Parameter) –

  • W_log_sigma (torch.nn.parameter.Parameter) –

  • log_alpha (torch.nn.parameter.Parameter) –

  • log_beta (torch.nn.parameter.Parameter) –

  • alpha_prior (torch.distributions.gamma.Gamma) –

Return type

None

property order

The rank order of latent dimensions

Loss

class lantern.loss.ELBO_GP(mll)

The variational ELBO objective for GPs

Method generated by attrs for class ELBO_GP.

Return type

None

Dataset

class lantern.dataset.tokenizer.Tokenizer(lookup, tokens, sites, mutations, delim=':')

A class for tokenizing strings representing genetic variants.

Parameters
  • lookup (Dict[str, int]) – A lookup from token to index

  • tokens (List[str]) – A lookup from index to token

  • sites (List[int]) – A site number for each token, if valid

  • mutations (List[Union[None, str]]) – A mutation value for each token, if valid

  • delim (str) – The delimiter for this tokenizer

Return type

None

Method generated by attrs for class Tokenizer.

detokenize(t)

Convert a binarized token tensor into a mutation string

classmethod fromVariants(substitutions, delim=':', regex='(?P<wt>[a-zA-Z*])(?P<site>\\d+)(?P<mut>[a-zA-Z*])')

Construct a tokenizer from a list of variants.

property p

Total number of tokens

tokenize(*s)

Convert a mutation string (or strings) into a binarized tensor

class lantern.dataset.dataset._Base(substitutions='substitutions', phenotypes=['phenotype'], errors=None, tokenizer=None)

Base genotype-phenotype dataset class, shuttling a pandas dataframe to a TensorDataset.

Parameters
  • substitutions (str) – The column containing raw mutation data for each variant.

  • phenotypes (list[str]) – The columns of observed phenotypes for each variant

  • errors (list[str], optional) – The error columns associated with each phenotype, assumed to be variance (\(\sigma^2_y\))

  • tokenizer (lantern.dataset.tokenizer.Tokenizer) – The tokenizer converting raw mutations into one-hot encoded tensors

Method generated by attrs for class _Base.

property D

The number of dimensions of the phenotype

_errors_correct_length(attribute, value)

Check for correct length between errors and phenotypes

meanEffects()

The mean effects of each mutation against each phenotype, returned as a (p x D) tensor

property p

The number of mutations in the dataset

to(device)

Send to device

class lantern.dataset.Dataset(df, substitutions='substitutions', phenotypes=['phenotype'], errors=None, tokenizer=None)

The runtime option for datasets, taking a dataframe as the first argument.

Method generated by attrs for class Dataset.

classmethod from_sequences(df, wildtype, sequence_column='sequence', substitutions='substitutions', *args, **kwargs)

Build a Dataframe dataset using full sequences, converting to a compressed substitution string.

Parameters
  • wildtype (str) –

  • sequence_column (str) –

class lantern.dataset.CsvDataset(pth, substitutions='substitutions', phenotypes=['phenotype'], errors=None, tokenizer=None)

Method generated by attrs for class CsvDataset.