preview

Assgn

Decent Essays

Assignment 1: Using the WEKA Workbench A. Become familiar with the use of the WEKA workbench to invoke several different machine learning schemes. Use latest stable version. Use both the graphical interface (Explorer) and command line interface (CLI). See Weka home page for Weka documentation. B. Use the following learning schemes, with the default settings to analyze the weather data (in weather.arff). For test options, first choose "Use training set", then choose "Percentage Split" using default 66% percentage split. Report model percent error rate. ZeroR (majority class) OneR Naive Bayes Simple J4.8 C. Which of these classifiers are you more likely to trust when determining whether to play? Why? D. What can you say about …show more content…

1,2,..38) and an Affymetix "call" (P is gene is present, A if absent, M if marginal). Think of the training data as a very tall and narrow table with 7130 rows and 78 columns. Note that it is "sideways" from machine learning point of view. That is the attributes (genes) are in rows, and observations (samples) are in columns. This is the standard format for microarray data, but to use with machine learning algorithms like WEKA, we will need to do "matrix transpose" (flip) the matrix to make files with genes in columns and samples in rows. We will do that in step 3B.6 of this assignment. Here is a small extract Gene Description Gene Accession Number 1 call 2 call ... GB DEF = GABAa receptor alpha-3 subunit A28102_at 151 A 263 P ... ... AB000114_at 72 A 21 A ... ... AB000115_at 281 A 250 P ... ... AB000220_at 36 A 43 A ... 3B: Clean the data Perform the following cleaning steps on both the train and test sets. Use unix tools, scripts or other tools for each task. Document all the steps and create intermediate files for each step. After each step, report the number of fields and records in train and test files. (Hint: Use unix command wc to find the number of records and use awk or gawk to find the number of fields). Microarray Data Cleaning Steps 1. Remove the initial records with Gene Description containing "control". (Those are Affymetrix controls, not human genes). Call the resulting files ALL_AML_grow.train.noaffy.tmp and

Get Access