you will create a spell checker. The program will take three command line arguments: number of words in the dictionary, a dictionary file name, and a text file name. The program will first create a hash table. The number buckets of the hash table should be about twice the number of words in the dictionary. Then, it will read the dictionary from the file, insert the words into the hash table, and report collision statistics. After reading the dictionary, the spelling checker will read a list of words from a text file. Each word will be looked up in the dictionary. If it is incorrect, it will be written to the standard output together with a list of suggested corrections. The algorithm for generating corrections is given below. Hash Table The hash table programs, QuadraticProbing.h and QuadraticProbing.cpp, are posted on Canvas. The programs use quadratic probing to deal with collisions. You should carefully study these programs and make some changes to collect the required statistics. The following is the sample statement to create a dictionary hash table: HashTable & dictionary = HashTable (2 * numWords); Or HashTable dictionary(2 * numWords); Hash Table Statistics To help you understand how the hashing table works, you should track and report the following statistics:  The number of words in the hash table.  The hash table size (the array size)  The load factor in the table.  The number of insertions that encountered a collision.  The average length of all collision chains (Note, when there is no collision, the collision chain length is 1).  The length of the longest collision chain. We only need to collect these statistics during the insertions. Note that the last three statistics will need to be reinitialized whenever you expand the table. After you read the dictionary, you should report the above statistics to cerr. Spell Checker Once the dictionary has been created, your program reads a list of words from the text file. If a word is found in the dictionary, your program should produce no output. Otherwise, you should generate suggested corrections and write them, together with the original word and the line number on which it occurs, as a single output line. For example, suppose the input word was "wrod". The output might be: wrod(66): brod prod trod whod wood wrox rod wod wro word Generating Corrections The easiest way to generate corrections in a spell checker is a trial-and-error method. If we assume that the misspelled word contains only a single error, we can try all possible corrections and look each up in the dictionary. Traditionally, spelling checkers have looked for four possible near misses: a wrong letter ("wird"), an inserted letter ("woprd"), a deleted letter ("wrd"), or a pair of adjacent transposed letters ("wrod"). When a word isn't found in the dictionary, you will need to look up all near misses. Whenever you find a match in the dictionary, you should add it to your output line. You will have to generate every possible near miss and look for it in the dictionary. Specifically:  Construct every string that can be made by deleting one letter from the word. (n possibilities, where n is the length of the word)  Construct every string that can be made by inserting any letter of the alphabet at any position in the string. (26*(n+1) possibilities)  Construct every string that can be made by swapping two neighboring characters in the string. (n-1 possibilities)  Construct every string that can be made by replacing each letter in the word with some letter of the alphabet. (26*n possibilities (including the original word n times, which is probably easier than avoiding it)) Upper-case letters Note that both the words in the dictionary and the text file can contain upper case letters. For a given word that contains upper case letters, your program should first look it up in the dictionary. If not found, the program will convert all its letters to lower cases, and then look it up again in the dictionary. If the lower case word is found, the program should consider it as a correctly spelled word. Input Format The dictionary file is a list of words with one word per line. The file to be spell-checked is normal text, which contains punctuation, sentences, paragraphs and so on. After read a line of the text file, you should first remove all the punctuations and then split it into a list of words. To remove the punctuations and other control characters, you can trim the input line as follows: for(int i=0; i

you will create a spell checker. The program will take three command line arguments: number of words in the dictionary, a dictionary file name, and a text file name. The program will first create a hash table. The number buckets of the hash table should be about twice the number of words in the dictionary. Then, it will read the dictionary from the file, insert the words into the hash table, and report collision statistics. After reading the dictionary, the spelling checker will read a list of words from a text file. Each word will be looked up in the dictionary. If it is incorrect, it will be written to the standard output together with a list of suggested corrections. The algorithm for generating corrections is given below. Hash Table The hash table programs, QuadraticProbing.h and QuadraticProbing.cpp, are posted on Canvas. The programs use quadratic probing to deal with collisions. You should carefully study these programs and make some changes to collect the required statistics. The following is the sample statement to create a dictionary hash table: HashTable & dictionary = HashTable (2 * numWords); Or HashTable dictionary(2 * numWords); Hash Table Statistics To help you understand how the hashing table works, you should track and report the following statistics:  The number of words in the hash table.  The hash table size (the array size)  The load factor in the table.  The number of insertions that encountered a collision.  The average length of all collision chains (Note, when there is no collision, the collision chain length is 1).  The length of the longest collision chain. We only need to collect these statistics during the insertions. Note that the last three statistics will need to be reinitialized whenever you expand the table. After you read the dictionary, you should report the above statistics to cerr. Spell Checker Once the dictionary has been created, your program reads a list of words from the text file. If a word is found in the dictionary, your program should produce no output. Otherwise, you should generate suggested corrections and write them, together with the original word and the line number on which it occurs, as a single output line. For example, suppose the input word was "wrod". The output might be: wrod(66): brod prod trod whod wood wrox rod wod wro word Generating Corrections The easiest way to generate corrections in a spell checker is a trial-and-error method. If we assume that the misspelled word contains only a single error, we can try all possible corrections and look each up in the dictionary. Traditionally, spelling checkers have looked for four possible near misses: a wrong letter ("wird"), an inserted letter ("woprd"), a deleted letter ("wrd"), or a pair of adjacent transposed letters ("wrod"). When a word isn't found in the dictionary, you will need to look up all near misses. Whenever you find a match in the dictionary, you should add it to your output line. You will have to generate every possible near miss and look for it in the dictionary. Specifically:  Construct every string that can be made by deleting one letter from the word. (n possibilities, where n is the length of the word)  Construct every string that can be made by inserting any letter of the alphabet at any position in the string. (26(n+1) possibilities)  Construct every string that can be made by swapping two neighboring characters in the string. (n-1 possibilities)  Construct every string that can be made by replacing each letter in the word with some letter of the alphabet. (26n possibilities (including the original word n times, which is probably easier than avoiding it)) Upper-case letters Note that both the words in the dictionary and the text file can contain upper case letters. For a given word that contains upper case letters, your program should first look it up in the dictionary. If not found, the program will convert all its letters to lower cases, and then look it up again in the dictionary. If the lower case word is found, the program should consider it as a correctly spelled word. Input Format The dictionary file is a list of words with one word per line. The file to be spell-checked is normal text, which contains punctuation, sentences, paragraphs and so on. After read a line of the text file, you should first remove all the punctuations and then split it into a list of words. To remove the punctuations and other control characters, you can trim the input line as follows: for(int i=0; i

Database System Concepts

7th Edition

ISBN:9780078022159

Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Chapter1: Introduction

Section: Chapter Questions

Problem 1PE

See similar textbooks

Related questions

Question

We only need to collect these statistics during the insertions. Note that the last three statistics will

need to be reinitialized whenever you expand the table. After you read the dictionary, you should
report the above statistics to cerr.

Spell Checker

Once the dictionary has been created, your program reads a list of words from the text file. If a
word is found in the dictionary, your program should produce no output. Otherwise, you should
generate suggested corrections and write them, together with the original word and the line
number on which it occurs, as a single output line. For example, suppose the input word was
"wrod". The output might be:
wrod(66): brod prod trod whod wood wrox rod wod wro word
Generating Corrections
The easiest way to generate corrections in a spell checker is a trial-and-error method. If we
assume that the misspelled word contains only a single error, we can try all possible corrections
and look each up in the dictionary.
Traditionally, spelling checkers have looked for four possible near misses: a wrong letter
("wird"), an inserted letter ("woprd"), a deleted letter ("wrd"), or a pair of adjacent transposed
letters ("wrod"). When a word isn't found in the dictionary, you will need to look up all near
misses. Whenever you find a match in the dictionary, you should add it to your output line. You
will have to generate every possible near miss and look for it in the dictionary. Specifically:
 Construct every string that can be made by deleting one letter from the word. (n possibilities,
where n is the length of the word)
 Construct every string that can be made by inserting any letter of the alphabet at any position
in the string. (26*(n+1) possibilities)
 Construct every string that can be made by swapping two neighboring characters in the string.
(n-1 possibilities)
 Construct every string that can be made by replacing each letter in the word with some letter
of the alphabet. (26*n possibilities (including the original word n times, which is probably
easier than avoiding it))
Upper-case letters
Note that both the words in the dictionary and the text file can contain upper case letters. For a
given word that contains upper case letters, your program should first look it up in the dictionary.
If not found, the program will convert all its letters to lower cases, and then look it up again in the
dictionary. If the lower case word is found, the program should consider it as a correctly spelled
word.
Input Format
The dictionary file is a list of words with one word per line. The file to be spell-checked is
normal text, which contains punctuation, sentences, paragraphs and so on.
After read a line of the text file, you should first remove all the punctuations and then split it into
a list of words. To remove the punctuations and other control characters, you can trim the input
line as follows:
for(int i=0; i <lineStr.length(); i++)
{
if (!isalpha(lineStr[i]))

lineStr[i] = ' ';

}
Sample Dictionaries
A small dictionary, simple_dict.txt, of 341 words should help you to get most of your bugs out.

Expert Solution

Trending now

This is a popular solution!

Step by step

Solved in 2 steps

SEE SOLUTION Check out a sample Q&A here

Knowledge Booster

Learn more about

Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.