you will create a spell checker. The program will take three command line arguments: number of words in the dictionary, a dictionary file name, and a text file name. The program will first create a hash table. The number buckets of the hash table should be about twice the number of words in the dictionary. Then, it will read the dictionary from the file, insert the words into the hash table, and report collision statistics. After reading the dictionary, the spelling checker will read a list of words from a text file. Each word will be looked up in the dictionary. If it is incorrect, it will be written to the standard output together with a list of suggested corrections. The algorithm for generating corrections is given below. Hash Table The hash table programs, QuadraticProbing.h and QuadraticProbing.cpp, are posted on Canvas. The programs use quadratic probing to deal with collisions. You should carefully study these programs and make some changes to collect the required statistics. The following is the sample statement to create a dictionary hash table: HashTable & dictionary = HashTable (2 * numWords); Or HashTable dictionary(2 * numWords); Hash Table Statistics To help you understand how the hashing table works, you should track and report the following statistics:  The number of words in the hash table.  The hash table size (the array size)  The load factor in the table.  The number of insertions that encountered a collision.  The average length of all collision chains (Note, when there is no collision, the collision chain length is 1).  The length of the longest collision chain. We only need to collect these statistics during the insertions. Note that the last three statistics will need to be reinitialized whenever you expand the table. After you read the dictionary, you should report the above statistics to cerr. Spell Checker Once the dictionary has been created, your program reads a list of words from the text file. If a word is found in the dictionary, your program should produce no output. Otherwise, you should generate suggested corrections and write them, together with the original word and the line number on which it occurs, as a single output line. For example, suppose the input word was "wrod". The output might be: wrod(66): brod prod trod whod wood wrox rod wod wro word Generating Corrections The easiest way to generate corrections in a spell checker is a trial-and-error method. If we assume that the misspelled word contains only a single error, we can try all possible corrections and look each up in the dictionary. Traditionally, spelling checkers have looked for four possible near misses: a wrong letter ("wird"), an inserted letter ("woprd"), a deleted letter ("wrd"), or a pair of adjacent transposed letters ("wrod"). When a word isn't found in the dictionary, you will need to look up all near misses. Whenever you find a match in the dictionary, you should add it to your output line. You will have to generate every possible near miss and look for it in the dictionary. Specifically:  Construct every string that can be made by deleting one letter from the word. (n possibilities, where n is the length of the word)  Construct every string that can be made by inserting any letter of the alphabet at any position in the string. (26*(n+1) possibilities)  Construct every string that can be made by swapping two neighboring characters in the string. (n-1 possibilities)  Construct every string that can be made by replacing each letter in the word with some letter of the alphabet. (26*n possibilities (including the original word n times, which is probably easier than avoiding it)) Upper-case letters Note that both the words in the dictionary and the text file can contain upper case letters. For a given word that contains upper case letters, your program should first look it up in the dictionary. If not found, the program will convert all its letters to lower cases, and then look it up again in the dictionary. If the lower case word is found, the program should consider it as a correctly spelled word. Input Format The dictionary file is a list of words with one word per line. The file to be spell-checked is normal text, which contains punctuation, sentences, paragraphs and so on. After read a line of the text file, you should first remove all the punctuations and then split it into a list of words. To remove the punctuations and other control characters, you can trim the input line as follows: for(int i=0; i

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question
you will create a spell checker. The program will take three command line
arguments: number of words in the dictionary, a dictionary file name, and a text file name.
The program will first create a hash table. The number buckets of the hash table should be about
twice the number of words in the dictionary. Then, it will read the dictionary from the file, insert
the words into the hash table, and report collision statistics.
After reading the dictionary, the spelling checker will read a list of words from a text file. Each
word will be looked up in the dictionary. If it is incorrect, it will be written to the standard output
together with a list of suggested corrections. The algorithm for generating corrections is given
below.
Hash Table
The hash table programs, QuadraticProbing.h and QuadraticProbing.cpp, are posted on Canvas.
The programs use quadratic probing to deal with collisions. You should carefully study these
programs and make some changes to collect the required statistics. The following is the sample
statement to create a dictionary hash table:
HashTable<string> & dictionary = HashTable <string> (2 * numWords);
Or
HashTable<string> dictionary(2 * numWords);
Hash Table Statistics
To help you understand how the hashing table works, you should track and report the following
statistics:
 The number of words in the hash table.
 The hash table size (the array size)
 The load factor in the table.
 The number of insertions that encountered a collision.
 The average length of all collision chains (Note, when there is no collision, the collision
chain length is 1).
 The length of the longest collision chain.
We only need to collect these statistics during the insertions. Note that the last three statistics will
need to be reinitialized whenever you expand the table. After you read the dictionary, you should
report the above statistics to cerr.
Spell Checker
Once the dictionary has been created, your program reads a list of words from the text file. If a
word is found in the dictionary, your program should produce no output. Otherwise, you should
generate suggested corrections and write them, together with the original word and the line
number on which it occurs, as a single output line. For example, suppose the input word was
"wrod". The output might be:
wrod(66): brod prod trod whod wood wrox rod wod wro word
Generating Corrections
The easiest way to generate corrections in a spell checker is a trial-and-error method. If we
assume that the misspelled word contains only a single error, we can try all possible corrections
and look each up in the dictionary.
Traditionally, spelling checkers have looked for four possible near misses: a wrong letter
("wird"), an inserted letter ("woprd"), a deleted letter ("wrd"), or a pair of adjacent transposed
letters ("wrod"). When a word isn't found in the dictionary, you will need to look up all near
misses. Whenever you find a match in the dictionary, you should add it to your output line. You
will have to generate every possible near miss and look for it in the dictionary. Specifically:
 Construct every string that can be made by deleting one letter from the word. (n possibilities,
where n is the length of the word)
 Construct every string that can be made by inserting any letter of the alphabet at any position
in the string. (26*(n+1) possibilities)
 Construct every string that can be made by swapping two neighboring characters in the string.
(n-1 possibilities)
 Construct every string that can be made by replacing each letter in the word with some letter
of the alphabet. (26*n possibilities (including the original word n times, which is probably
easier than avoiding it))
Upper-case letters
Note that both the words in the dictionary and the text file can contain upper case letters. For a
given word that contains upper case letters, your program should first look it up in the dictionary.
If not found, the program will convert all its letters to lower cases, and then look it up again in the
dictionary. If the lower case word is found, the program should consider it as a correctly spelled
word.
Input Format
The dictionary file is a list of words with one word per line. The file to be spell-checked is
normal text, which contains punctuation, sentences, paragraphs and so on.
After read a line of the text file, you should first remove all the punctuations and then split it into
a list of words. To remove the punctuations and other control characters, you can trim the input
line as follows:
for(int i=0; i <lineStr.length(); i++)
{
if (!isalpha(lineStr[i]))
lineStr[i] = ' ';
}
Sample Dictionaries
A small dictionary, simple_dict.txt, of 341 words should help you to get most of your bugs out.
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Files and Directory
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education