Bio-Informatics
Time Limit: 1 Second Memory Limit: 32768 KB
Bio-informatics is an exciting new field of science, in which computer science
techniques are applied to solving biological problems. The search for genetic
drugs is one of the central problems of bio-informatics. In tackling this problem,
genes from various organisms are compared.
A gene is characterized by the sequence of amino acids that can be derived from
it.
There are altogether 20 amino acids. Each amino acid is identified by a one-letter
abbreviation of its full chemical name.
(The upper case letters of the alphabet, except B, J, O, U, X, and Z, are used
to identify amino acids.)
Input
Each logical column of the input specifies a particular gene from a different
organism. The number of organisms is at least three but not greater than eight.
The first line specifies the names of the organisms. Each name consists of at
least one but not more than eight lower case characters and is right-justified
in a field of width 9 characters.
Each of the remaining lines specifies an amino acid for each of the organisms
listed on the first line. Each amino acid is represented by its one-letter abbreviation,
right-justified in a field of width 9 characters, under the name of the organism
with which it is associated. Thus, in the example shown, the amino acid sequence
for the particular yeast gene is M, E, S, L, D, A, N, C, T, M.
The amino acid sequences of all organisms represented in a given input
will have the same length (in this example, 10).
The minimum length of the amino acid sequences in the input is 10, the
maximum length is 9999.
Each amino acid in the amino acid sequence of a particular gene occupies a certain
position. The positions are numbered starting at 1 and they increase sequentially.
Thus, the yeast sequence in the example shown has M in positions 1 and 10, E
in position 2, A in position 6, etc.
After re-displaying the names of the organisms (in the same order as in the
input), your program will look for discrepancies among the amino acid sequences
of the given organisms.
Output
For those positions in which all organisms have the same amino acid (positions
1, 2, 5, 6, and 9 in the example shown) , no output will be produced.
In those positions in which not all organisms have the same amino acid (positions
3, 4 ,7, 8 and 10 in the example shown) your program will:
Print the position number.
Identify by an asterisk those organisms that deviate (in that particular position)
from the most frequently occurring amino acid (in that particular position).
In position 3 of the given example, S is certainly the most frequently occurring
amino acid, and human is the only organism that does not have S in position
3.
In case of a tie for the most frequently occurring amino acid in a particualar
position, the amino acid that has the rightmost occurrence (among those involved
in the tie) will be chosen as the most frequent one.
For example, in position 8 in the given example, both C and A occur twice. We
choose C as the most frequent amino acid in position 8, because its rightmost
occurrence is under yeast, which is further right than the rightmost occurrence
of A (nematode) in this position.
In the given example, there is an extreme case of a tie in position 10: all
five organisms have different amino acids. Therefore, the amino having the rightmost
occurrence, namely R, will be designated as the most frequent one.
Two lines of the above output are reproduced here with a formatting template:
1 2 3 4 5 6 7 12345678901234567890123456789012345678901234567890123456789012345678901234567890 human fruitfly nematode yeast bacteria 3 *
Sample Input
human fruitfly nematode yeast bacteria M M M M M E E E E E C S S S S L L L L W D D D D D A A A A A K Q G N G C A A C K T T T T T S H E M R
Sample Output
Program 7 by team X human fruitfly nematode yeast bacteria 3 * 4 * 7 * * * 8 * * * 10 * * * * End of program 7 by team XSubmit
Source: Rocky Mountain 2000