723_HW3_Description

.pdf

School

Pennsylvania State University *

*We aren’t endorsed by this school

Course

412

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

10

Uploaded by nishantbalepur

Report
CMSC723 - HW3 Description Neural Machine Translation Part 1: Understanding the provided MT model Prerequisite: Test out the installation by training the provided MT model on a small training set by running: python3 main.py This trains an RNN encoder-decoder model with attention and with default parameters defined in main.py for French to English translation. Using default settings, the training loss is printed every 1000 iterations so you can track progress. When training is done, the script prints the translations generated by the model for a random sample of the training data, as a sanity check. There are 3 output files: <output-dir>/args.pkl -> Stores the model parameters <output-dir>/encoder.pt -> Stores the trained encoder model <output-dir>/decoder.pt -> Stores the trained decoder model The same command if run with " ±eval " flag (i.e. python3 main.py —²eval ) loads the trained model and runs it on the test file. The script translates examples from the provided test set, and evaluates the performance of the model by computing the BLEU score. The script also stores the output and reference translations to the file <output-dir>/output.pkl On a CPU, running this to completion takes about 5-6 mins on our machines. Q1 Understanding the Training Data [12 pts] Two training sets are provided: eng-fra.train.small.txt and eng-fra.train.large.txt . Data files are read in and processed by the prepareData function in data.py.
Compute and report the following training data statistics (after processing by prepareData): the number of training samples the number of English word types and tokens the number of French word types and tokens the average English sentence length the average French sentence length Based on these statistics only (no knowledge of French needed), describe one way in which English and French are different, and how it might impact machine translation performance. Q2 Understanding the Model: Encoder [8 pts] A RNN encoder is implemented in class Encoder in model.py. The Encoder class defines a RNN computation graph using Tensors , which define nodes in the graph and hold the values of learnable parameters in the network, and Functions, with define the network structure by defining how output Tensors are computed as functions of input Tensors (i.e. forward propagation). Here the encoder uses a Gated Recurrent Unit (GRU) , which is a variant of the simple RNN designed to address the vanishing gradient problem. For every input word, the encoder produces an output vector and a hidden state, and uses the hidden state as input to encode the next input word, as illustrated below: Given default parameter settings What are the dimensions of the word embedding matrix for the encoder? How many trainable parameters does the GRU have?
Q3 Understanding the Model: Decoders [11 pts] Two decoders are provided. SimpleDecoder implements a left-to-right RNN decoder based on GRU, as illustrated below: The class AttnDecoderRNN implements a more sophisticated decoder, which uses an attention network to compute an attention-weighted representation of the input sentence for each decoding time-step as illustrated below:
Based on the provided code and consulting the pytorch documentation as needed, provide the complete mathematical formula used to compute the weighted sentence representation vector (in attn_applied ) as a function of encoder and decoder hidden state vectors, stored in encoder_outputs , previous_hidden , and embedded (i.e., no need to include the dropout layer in your function).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help