CS50x Readability
From problem set 2
Readability applies The Coleman-Liau index formula to a text. The Coleman-Liau index of a text is designed to output what (U.S.) grade level is needed to understand the text. The formula keeps track of letters, words and sentences to return that index.
index = 0.0588 * L - 0.296 * S - 15.8
Here, L is the average number of letters per 100 words in the text, and S is the average number of sentences per 100 words in the text.
Here’s an example output:
$ ./readability
Text: Harry Potter was a highly unusual boy in many ways. For one thing, he hated the summer holidays more than any other time of year. For another, he really wanted to do his homework, but was forced to do it in secret, in the dead of the night. And he also happened to be a wizard.
Grade 5
We can notice that our code’s going to have to take an input, calculate and save those values, apply them in the formula and print it’s result.
There’s also some specifications given by the course for this problem:
Letters can be any uppercase or lowercase alphabetic characters, but shouldn’t include any punctuation, digits, or other symbols.
For the purpose of this problem, we’ll consider any sequence of characters separated by a space to be a word (so a hyphenated word like “sister-in-law” should be considered one word, not three). You may assume that a sentence will not start or end with a space, and you may assume that a sentence will not have multiple spaces in a row.
You should consider any sequence of characters that ends with a “.” or a “!” or a “?” to be a sentence. In practice, sentence boundary detection needs to be a little more intelligent to handle these cases, but we’ll not worry about that for now.
If the resulting index number is 16 or higher (equivalent to or greater than a senior undergraduate reading level), your program should output “Grade 16+” instead of giving the exact index number. If the index number is less than 1, your program should output “Before Grade 1”.
Being week two, I’m jumping through explaining how to create a directory and file and starting with the pseudocode.
// Ask user for text input
// Calculate and store the number of letters
// Calculate and store the number of words
// Calculate and store the number of sentences
// Apply the formula
// Print grade as specified
With that in mind, let’s start coding! We have a formula that returns float numbers, but the index is a integer, so we’ll need math.h together with the usual headers, ctype.h may also be helpful to counting letters. Can you think about the functions we may create by reading the pseudocode? This is my code, ready to do nothing:
#include <stdio.h>
#include <cs50.h>
#include <math.h>
#include <ctype.h>// Prototype functions
int countLetters(string text);
int countWords(string text);
int countSentences(string text);
int colemanIndex(int l, int w, int s);
void printGrade(int index);int main(void)
{
string text = get_string("Text: ");
int l = countLetters(text);
int w = countWords(text);
int s = countSentences(text);
int index = colemanIndex(l, w, s);
printGrade(index);
}int countLetters(string text)
{
return false;
}int countWords(string text)
{
return false;
}int countSentences(string text)
{
return false;
}int colemanIndex(int l, int w, int s)
{
return false;
}void printGrade(int index)
{
}
Further on in the course we get code exactly like this with a //TODO comment in the functions. For now, it’s enough that it looks exactly as the pseudocode. Next step: building the functions!
Starting with letters, we need to know that every text that we get from get_string is just a biiiiiiiiiig sequence of chars that always end in ‘\0’. That means we can iterate through that and increase a counter by 1 until we find ‘\0’. How about spaces, and other nonalphabetical chars? That’s why we included ctype with its isalpha function.
int countLetters(string text)
{
int count = 0;
for (int i = 0; text[i] != '\0'; i++)
{
if (isalpha(text[i]) > 0)
{
count++;
}
}
printf("%d Letters\n", count);
return count;
}
New syntaxes: text[i] is a pointer to the position inside the array, it starts as 0 and goes to the end when it finds ‘\0'. The function isalpha returns a number greater than 0 when the condition is true. That’s our if case. Would it be equals 0, for instance, it would count everything that is not a letter. We’re printing the count just for the sake of testing and will remove it before submitting. Here’s a test I ran.
~/pset2/readability/ $ ./readability
Text: abcd efgh!
8 Letters
Seems to work fine.
The next function is counting words. That should be a little trickier… ok, no, it isn’t. The specification was clear that every word ends with a space, and there’s no double space, so we’re actually counting spaces. There’s also no paragraph, that means our count should start at 1.
int countWords(string text)
{
int count = 1;
for (int i = 0; text[i] != '\0'; i++)
{
if (text[i] == ' ')
{
count++;
}
}
printf("%d Words\n", count);
return count;
}
Nothing really different to note. Instead of using a function inside if, we compared the data at our pointer to a space.
FINALLY SENTENCES! That’s got to be a pain… sorry, not yet. They really made it easy for us. We can assume, by specification, that every sentence ends in “.”, “!” or “?”. We’re just changing the contents of our if again, using || to indicate the word “or”.
int countSentences(string text)
{
int count =0;
for (int i = 0; text[i] != '\0'; i++)
{
if (text[i] == '.' || text[i] == '!' || text[i] == '?')
{
count++;
}
}
printf("%d Sentences\n", count);
return count;
}
Now for testing all our counters, we hope everything is ok.
Text: abcd efgh! ijkl mnop. qrst uvxw?
24 Letters
6 Words
3 Sentences
Pretty good. All functions learned basic math and can count. Seems perfect… But, hey! Noticed how all for loops are copy-pastes of the same? Seems that we’re iterating 3 times over the whole text, but we could do it only once! Imagine the difference it would make when iterating trough a 800 pages book! Let’s merge all our counter functions into one.
#include <stdio.h>
#include <cs50.h>
#include <math.h>
#include <ctype.h>// Prototype functions
int counters(string text);
void printGrade(int);
int colemanIndex(int, int, int);int l, w, s, index;int main(void)
{
string text = get_string("Text: ");
counters(text);
colemanIndex(l, w, s); printGrade(index);
}int counters(string text)
{
int letters = 0;
int words = 1;
int sentences = 0;
for (int i = 0; text[i] != '\0'; i++)
{
if (isalpha(text[i]) > 0)
{
letters++;
}
if (text[i] == ' ')
{
words++;
}
if (text[i] == '.' || text[i] == '!' || text[i] == '?')
{
sentences++;
}
}
return l = letters, w = words, s = sentences;
}int colemanIndex(int a, int b, int c)
{
return false;
}void printGrade(int grade)
{
}
I’m printing the whole code again so we can see all the change that had to be made. Of course, the big one was that we have now only one counter function. Second is that now l, w, s are global variables, declared before main, so they can work through all functions. For that to work we had to make some changes to the coleman index function. Now, if you look at its prototype, you’ll notice it only states to take 3 ints as parameters — not naming the parameters at the prototype is actually a good idea that I don’t apply, yet — we also had to change it at the function declaration, otherwise code won’t compile. And you know what?! I liked it and moved index up there too!
The 3 printf’s are there so we can test and see that the output is the same! Oh, and they’re in main, don’t wory, they’re going away soon.
Now for the colemanIndex. We just need to change the variables into the formula. It’s easier to take a look before explaining.
return
index = round
(0.0588*(a/((float)b/100))-0.296*(c/((float)b/100))-15.8);
It’s actually one single line, but I figured it’s easier to read here if I formatted it as it is.
All the numbers are given by the formula, a, b, c replace l, w, s respectively (Although we could use the global variables anyway, just thought it would confuse understanding the function block, maybe hardcode the values, not sure). We use (float) to cast and integer into a float. It’s the same as saying to the compiler to act as if that integer was a float. That way we make the accurate calculations based on all those floats the formula gave us. Seriously, try to put an integer there and nothing else makes sense. The way it’s written makes sure everything inside parentheses ends up as a float. Last, but not least, we round the formula result to the nearest integer. YEAH. :)
Last function, printGrade:
void printGrade(int grade)
{
if (grade < 1)
{
printf("Before Grade 1\n");
}
else if (grade >= 16)
{
printf("Grade 16+\n");
}
else
{
printf("Grade %i\n", grade);
}
}
Really simple, just following specifications. No numbers over 15, just a 16+. No numbers under 1, just a “less than one”. Given that, just print any other grade.
This was a nice program to code. I hope you felt the same. Staff tests and full code with comments bellow. See ya!
Results generated by style50 v2.7.4
Looks good!Results for cs50/problems/2020/x/readability generated by check50 v3.1.2
:) readability.c exists
:) readability.c compiles
:) handles single sentence with multiple words
:) handles punctuation within a single sentence
:) handles more complex single sentence
:) handles multiple sentences
:) handles multiple more complex sentences
:) handles longer passages
:) handles questions in passage
:) handles reading level before Grade 1
:) handles reading level at Grade 16+