Statistical Trouble
Time Limit: 1 Second Memory Limit: 32768 KB
Special Judge
Your team was hired by the international corporation ACM (Analytical Calculation
Maxims). Every year ACM creates and conducts various surveys. Surveys themselves
are simple forms with a list of questions and a list of possible answers for
every question. Surveys are distributed around the globe, where field agents
question the target group of people. All the answers are gathered in the ICPC
(International Computation and Processing Center), where teams of well-paid
analysts mine raw data in search for relevant correlations. The raw data for
each individual survey consists of lots of lines of answers. Each line corresponds
to every questioned person and for every question lists answers that the person
has made on that particular survey.
The first step of analysis that your team was hired to automate is to create
cross tables that correlate answers on interesting pairs of questions. In its
most simple way, given a pair of questions, cross table has a row for every
possible answer on the first question, and has a column for every possible answer
on the second question. Each cell of the cross table contains a number of lines
in the raw data that has both answers for the corresponding questions at the
same time.
However, your task is complicated by the fact that you are to compute and output
not only simple cross table values, but also total values for every row and
column in the cross table (that is the sum of values in the corresponding row
and column) that are placed in an additional last column and last row, as well
as a percentage distributions for every row and column. Percentage distribution
for a row is an additional number in every cell in that row that shows percent
ratio of the value in that cell to the total value for that row, unless the
total value is zero (in that case percentage distribution for this row is not
defined). The same applies to the percentage distributions of columns. Thus,
the cross table in your output will have at most three values in every cell
(the value itself, row-wise percent, and column-wise percent). Please note,
that percentage distributions also apply to totals. For example, in the total
column for every row the row-wise percent will be always 100%, unless the total
value for the row is zero (in that case row-wise percents are not defined),
and column -wise percent shows percents ratio of the total value for this row
to the total number of lines in the raw data (which is the value that can be
found in the last column of the last row).
Percents are rounded to integers on output. Percent that has a non-zero fractional
part is rounded to either the smallest integer number greater than the resulting
percent, or the largest integer number smaller than the resulting percent, in
such a way, that the sums of all corresponding row-wise percents by row (without
row totals) or column -wise percents by column (without column totals) are equal
to 100% unless they are undefined. There are various rounding algorithms that
produce results satisfying the above constraints. You are free to use any rounding
algorithm as long as the above constraints are satisfied.
Input
The input file consists of 3 sections: survey description, survey results, and
cross table descriptions.
The first line of the input file contains the name of the survey, which is at
most 100 characters long. Subsequent lines describe all the questions in the
survey. On the first line of every question there is a 3-character question
code (capital letters and digits only) followed by a space, and followed by
the question name, which is at most 80 characters long. Each subsequent line
for a question describes one possible answer on the question and starts with
a space, followed by a single-character code for the answer (capital letter,
digit, or character '.', '*', or '@'), followed by a space and followed by an
answer description, which is at most 40 characters long. The list of questions
is terminated by the line with a single character '#'. All answer codes are
unique within the question, and all question codes are unique within the input
file. There are at least 2 and at most 10 possible answers per question and
at least 2 and at most 100 questions.
Next lines in the input file describe survey results. Every line contains a
character per question (in the order they appear in the input file) that gives
the answer code for the corresponding question. The characters follow one another
without any delimiters. This section is terminated by the line with a single
character '#'. There is at least one line with answers in the section and at
most 10000 answers in total (the number of lines times the number of questions).
Next lines in the input file describe cross tables that are to be created. Each
cross table description occupies one line. That line contains the code for the
first question, followed by a space, followed by the different code for the
second question, followed by a space, and followed by the cross table name,
which is at most 100 characters long. This section is terminated by the line
with a single character '#'. There are at most 100 cross table descriptions
in the input file.
The input file has no trailing spaces on any line. All names do not start or
end with a space, but may contain spaces.
This problem contains multiple test cases!
The first line of a multiple input is an integer N, then a blank line followed by N input blocks. Each input block is in the format indicated in the problem description. There is a blank line between input blocks.
Output
Write to the output file a cross table for every cross table description in
the input file in the order they appear in the input file. On the first line
of the cross table write the survey name, followed by a space, followed by a
'-' (dash) character, followed by a space, followed by the cross table name.
Then write the description of the first question, and the description of the
second question exactly as they appear in the input file and in the same format.
Then write an empty line, followed by the table itself.
The table contains exactly 1+3*(N1+1) lines and exactly 6*(N2+2) characters
on every line, where N1 is the number of possible answers for the first question,
and N2 is the number of possible answers for the second question. The table
has one line for column headings, and N1+1 rows (3 lines per row). The first
N1 of these rows correspond to the answers on the first question in the order
they appear in the input file, and the last row is for column totals. The table
also has N2+2 columns, where each column is 6 characters wide. The first column
is for row headings; the subsequent N2 columns correspond to the answers on
the second question in the order they appear in the input file, and the last
column is for row totals. All information in the cells (including headings)
is aligned to the right and is padded on the left with spaces to become 6 characters
wide.
The heading for the first column is empty. The headings for the subsequent N2
columns are composed from the second question code, followed by a ':' (colon)
character, and followed by the corresponding answer code. The heading for the
last column is the string "TOTAL" (without quotes). The headings for
the first N1 3-line rows of the cross table are composed from the first question
code, followed by a ':' (colon) character, and followed by the corresponding
answer code. The heading for the last row is the string "TOTAL" (without
quotes). Row headings are situated on the first line of the corresponding row.
The subsequent 2 lines in the heading column of every row must be blank.
All non-heading cells in the table contain computed values and percents. On
the first line of every cell the corresponding cross table integer value is
situated. The second line contains properly rounded to integers row-wise percent,
with a mandatory trailing '%' (percent) character, or a single '-' (dash) character
if the corresponding row-wise percent is not defined. The third line contains
column -wise percent in the same format. All cross tables in the output file
must be separated by a single empty line.
The output format consists of N output blocks. There is a blank line between output blocks.
Sample Input
1 New Year Phone Survey for ACM ICPC Q01 Hello! H Hello! Y Yes! * Uhm... . (silence) @ (other) Q02 How are you? H Hello! Y Yes! F Fine! Q Who are you? @ (other) BYE Happy New Year! Y You too. * (censored) @ (other) . (hang up) # .@. HH@ .@. YFY HQ* H@. YYY .H@ HFY HH@ # Q01 Q02 Health vs greeting style Q02 BYE Politeness matrix #
Sample Output
New Year Phone Survey for ACM ICPC - Health vs greeting style Q01 Hello! H Hello! Y Yes! * Uhm... . (silence) @ (other) Q02 How are you? H Hello! Y Yes! F Fine! Q Who are you? @ (other) Q02:H Q02:Y Q02:F Q02:Q Q02:@ TOTAL Q01:H 2 0 1 1 1 5 40% 0% 20% 20% 20% 100% 66% 0% 50% 100% 33% 50% Q01:Y 0 1 1 0 0 2 0% 50% 50% 0% 0% 100% 0% 100% 50% 0% 0% 20% Q01:* 0 0 0 0 0 0 - - - - - - 0% 0% 0% 0% 0% 0% Q01:. 1 0 0 0 2 3 33% 0% 0% 0% 67% 100% 34% 0% 0% 0% 67% 30% Q01:@ 0 0 0 0 0 0 - - - - - - 0% 0% 0% 0% 0% 0% TOTAL 3 1 2 1 3 10 30% 10% 20% 10% 30% 100% 100% 100% 100% 100% 100% 100% New Year Phone Survey for ACM ICPC - Politeness matrix Q02 How are you? H Hello! Y Yes! F Fine! Q Who are you? @ (other) BYE Happy New Year! Y You too. * (censored) @ (other) . (hang up) BYE:Y BYE:* BYE:@ BYE:. TOTAL Q02:H 0 0 3 0 3 0% 0% 100% 0% 100% 0% 0% 100% 0% 30% Q02:Y 1 0 0 0 1 100% 0% 0% 0% 100% 33% 0% 0% 0% 10% Q02:F 2 0 0 0 2 100% 0% 0% 0% 100% 67% 0% 0% 0% 20% Q02:Q 0 1 0 0 1 0% 100% 0% 0% 100% 0% 100% 0% 0% 10% Q02:@ 0 0 0 3 3 0% 0% 0% 100% 100% 0% 0% 0% 100% 30% TOTAL 3 1 3 3 10 30% 10% 30% 30% 100% 100% 100% 100% 100% 100%Submit
Source: Northeastern Europe 2001