266
v1v2v3v4 (latest)

Boltzmann machine learning and regularization methods for inferring evolutionary fields and couplings from a multiple sequence alignment

IEEE/ACM Transactions on Computational Biology & Bioinformatics (TCBB), 2019
Abstract

The inverse Potts problem to infer a Boltzmann distribution for homologous protein sequences from their single-site and pairwise amino acid frequencies recently attracts a great deal of attention in the studies of protein structure and evolution. We study regularization and learning methods and how to tune regularization parameters to correctly infer interactions in Boltzmann machine learning. Using L2L_2 regularization for fields, group L1L_1 for couplings is shown to be very effective for sparse couplings in comparison with L2L_2 and L1L_1. Two regularization parameters are tuned to yield equal values for both the sample and ensemble averages of evolutionary energy. Both averages smoothly change and converge, but their learning profiles are very different between learning methods. The Adam method is modified to make stepsize proportional to the gradient for sparse couplings and to use a soft-thresholding function for group L1L_1. It is shown by first inferring interactions from protein sequences and then from Monte Carlo samples that the fields and couplings can be well recovered, but that recovering the pairwise correlations in the resolution of a total energy is harder for the natural proteins than for the protein-like sequences. Selective temperature for folding/structural constrains in protein evolution is also estimated.

View on arXiv
Comments on this paper