PROTEIN FOLDING

We can begin with a simplified version of the 2D HP model in the square lattice.

2D: real proteins fold in 3D. Alternatives (3D):
- simple cubic lattice
- body centered cubic lattice
- face centered cubic lattice
Square lattice: real proteins also don't always fold in 90º angles. Alternatives:
- Triangular lattice

This simplified model has only two types of aminoacids, H and P. H stands for hydrophobic and P for polar. As the name states it, the H is afraid of water. In our body, proteins are suspended in water (or mostly water), however, some of the aminoacids must not be in contact with it (they react), while others can. When the protein folds, the H aminoacids must be on the inside, that is why the score is measured by the number of H molecules that are bonded. The more H molecules are bonded, the more grouped are, and the less surface touching the outside, where it can be the nasty water.

Modelization

Elements:
- H: Hydrophobic
- P: Hydrophili (polar)
- H-H edge
- H-P edge
- H-H bond
Score: Number of H-H bonds

Problem

Maximize the number of H-H bonds . Technically, the H-H edge is also a H-H bond, but they are not mutable, as we would be breaking the structure of the protein, so these are ignored.

Notes and observations of the problem

In this first 2D squared lattice, a typical node (except the first and the last) has 2 edge neighbors and 2 other neighbors.
The optimal folding has the maximum score
Considdering the string of molecules, only even and odd H's can possibly bond together, (because of the 2D squared lettuce).
OPT <= 2·min{#even H's, #odd H's}. Explanation: + Every bond defines an even and an odd + Every even and odd can only get hit up to twice + If there is more even than odds its not going to help, as there is + way to use more even than odds (excludig first and last)

Hardness

This problem is already NP-hard. To find the optimal folding of a given HP string is NP-Hard. First it was proved that 3D HP was NP-Hard, later was proved that 2D also.

Approximations

1/3 - approximations: this approximation is based on the fact that the optimal number of H bonds is OPT <=2·min{#even H's, #odd H's}. The goal is to get 1/3 of OTP. To reach the 1/3, a set of stages are followed. Following them the chance of finding relatively quickli a 1/3 of the OPT is high.

Implementation schema

Basic, implementation schema idea storm.

Iteration 0.1
- A file stores a secuence of H and P letters and other parametters
- Seed parametter, to be able to track random walks
- Maximum number of steps, in order not to get stuck searching the optimum
- The program will check for all the combinations, storing the one with better score
- We will think about representation later
- First implementation will be in python
Iteration 0.2
- Iteration 0.1 almost completed. Jumping to 0.2
- Fix different usages with arguments
- It will have a benchmark mode, this mode will give the time that has taken to make a given number of folds. To make the bench more even, the number of folds and the protein will be given.
Iteration 1
- Iteration 1 reached. Functional and it begins to be not that slow
- There is a lot of symetry to study and filter, probably a +60% faster
  searching for the absolute solution

TODOS

TODO: Write use in README and --help in fold.py
Fix vertical excess padding when drawing

Folding program annotations

There are a lot of symetries easily avoidables. Check out if half of them (only up to k/2+1) are necessary, it could be a massive improvement

Other stuff

The hydrophobicity of a protein is a parametter heavily researched.
The natural folding of proteins is a very unknown field, it is supposed that all information of how to fold a protein lies in its aminoacids, so there is no such thing as an Ikea guide that the cell has. So the combinatory problem is so large that it is already not know how it is thone for the cell to acquire the lowest energy configuration for that protein, wich is what makes it functional, (otherwise is considered "dead").
To know the folding configuration maded naturally by the cells it is needed a process that can take a lot of research, money and time. To throw some light, it is needed to cristalise a proteine (wich can take years of research). From that you get one 3D image that is collected in a protein data bank.

TMP

correct output protein string HPHHPHHPPHHH

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
outputs		outputs
README.md		README.md
SOURCES.md		SOURCES.md
archive_bench.sh		archive_bench.sh
benchs.dat		benchs.dat
cyton_test.py		cyton_test.py
folder.py		folder.py
params.conf		params.conf
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PROTEIN FOLDING

Modelization

Problem

Notes and observations of the problem

Hardness

Approximations

Implementation schema

TODOS

Folding program annotations

Other stuff

TMP

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PROTEIN FOLDING

Modelization

Problem

Notes and observations of the problem

Hardness

Approximations

Implementation schema

TODOS

Folding program annotations

Other stuff

TMP

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages