Assessment Task 3

.pdf

School

University of Technology Sydney *

*We aren’t endorsed by this school

Course

32130

Subject

Information Systems

Date

Jan 9, 2024

Type

pdf

Pages

7

Uploaded by UltraCrane3864 on coursehero.com

06/10/2023, 14:35 Assessment Task 3: Data analytics in action - report https://canvas.uts.edu.au/courses/28413/assignments/152566 1/7 Assessment Task 3: Data analytics in action - report Due Nov 3 by 23:59 Points 60 Submitting a file upload Available Oct 6 at 0:00 - Nov 10 at 23:59 Start Assignment Task details This assignment is a practical data analytics project that follows on from the data exploration you did in Assignment 2. You will be acting as a data scientist at a consultant company and you need to make a prediction on a dataset. The dataset can be found below. You need to build classifiers using the techniques covered in the lectures to predict the class attribute. At the very minimum, you need to produce a classifier for each method we have covered. However, if you explore the problem very thoroughly (as you should do in Industry), preprocessing the data, looking at different methods, choosing their best parameters settings, and identifying the best classifier in a principled and explainable way, then you should be able to get a better mark. If you choose to use KNIME and you show 'expert' use (i.e. exploring multiple classifiers, with different settings, choosing the best in a principled way, and being able to explain why you built the model the way you did), this will attract a better mark. You need to write a short report describing how you solved the problem and the results you found. See below for the requirements for the report. You also need to attend a short oral defence of your classifier of around 5 minutes where you show the classifier (e.g. using the KNIME workflow or Python/R code) and answer some questions about it. Details about the oral defences will be given by email and in class. Objectives: This assessment task addresses the following subject learning objectives (SLOs): 3 & 4 This assessment task contributes to the development of the following Course Intended Learning Outcomes (CILOs): C.1. & D.1 Using Kaggle
06/10/2023, 14:35 Assessment Task 3: Data analytics in action - report https://canvas.uts.edu.au/courses/28413/assignments/152566 2/7 The Kaggle Competition will be available at a later time. Datasets Below you will find 3 datasets: a training dataset for training and optimising your model (it contains the target values), an "unknown" dataset for the final model assessment (it does not have the target values - you need to predict them) and a submission sample which shows you what the file submitted to Kaggle should look like. In particular, you will need to set the column names in your submission file correctly - that is, "SK_ID_CURR" and "TARGET". These datasets can also be found on the Kaggle competition page under the "Data" tab. Assignment3-TrainingDataset.csv (https://canvas.uts.edu.au/courses/28413/files/5761997?wrap=1) (https://canvas.uts.edu.au/courses/28413/files/5761997/download?download_frd=1) Assignment3-UnknownDataset.csv (https://canvas.uts.edu.au/courses/28413/files/5761998?wrap=1) (https://canvas.uts.edu.au/courses/28413/files/5761998/download?download_frd=1) Assignment3-sample_submission.csv (https://canvas.uts.edu.au/courses/28413/files/5762000?wrap=1) (https://canvas.uts.edu.au/courses/28413/files/5762000/download?download_frd=1) The attribute description for the dataset is similar to that from assignment 2: Assignment3-Dataset-Attribute- Description (https://canvas.uts.edu.au/courses/28413/files/5762003?wrap=1) (https://canvas.uts.edu.au/courses/28413/files/5762003/download?download_frd=1) Assessment Assessment is real-time. This means that as soon as you submit the file, Kaggle will assess the performance of your classifier and provide you with the result. You can submit multiple times, but Kaggle has a limit for the number of times you can do this per day. Do not use the measure of performance reported by Kaggle as a measure of your test error in the final competition and optimise to it. This is because Kaggle has two measures: a public measure, which it reports to you, and a private measure, which it keeps hidden. Instead, develop several models and estimate the test error yourself before submitting to Kaggle. Remember that your estimate of test error is just that: an estimate. The actual private measure will probably be a little bit different. Classification task Build a classifier that classifies the “TARGET” attribute. The classification goal is to predict whether a client has loan repayment difficulties or not (target attribute: “TARGET“ {(binary: 0, 1), 1--> client with payment difficulties: he/she had late payment more than X days on at least one of the first Y installments of the loan in our sample and 0--> all other cases}. You can do various data pre- processing and transformations (e.g. grouping values of attributes, converting them to binary, etc.), providing explanations for why you have chosen to do that. You may need to split the provided training set further into training, validation and/or test sets to accurately set the parameters and evaluate the quality of the classifier. You can use KNIME to build classifiers, or feel free to use Python scikit-learn or other packages. If you do this, though, please explain more about your classifier - and be sure that you are producing valid results! You don't need to limit yourself to the classifiers we used in class, but if you do use other classifiers you need to describe them in your report and make sure you are producing valid results. At the very minimum, you need to produce a classifier for each method we have covered.
06/10/2023, 14:35 Assessment Task 3: Data analytics in action - report https://canvas.uts.edu.au/courses/28413/assignments/152566 3/7 Submission details Length: The task requires the submission of a report (approx. 2000-3000 words or 10-12 pages using 11 or 12- point Times or Arial fonts) as well as an oral defence of around 5 minutes. On average you will require between 24 and 36 hours to complete this assignment. A hint: Usually it's not a case of having a 'better' classifier that will produce good results. Rather, it's a case of identifying or generating good features that can be used to solve the problem. Assignment report and oral defence Report Your report should include the following information: A description of the data mining problem; The data preprocessing and transformations you did (if any); How you went about solving the problem ; Classification techniques used and summary of the results and parameter settings ; The best classifier that you selected - the type, its performance, how it solved the problem (if it makes sense for that type of classifier), and reasons for selecting it; Reflection: One page reflecting on your learning in Assignment 3. What did you learn about data mining and yourself as a result of doing the assignment? How would you approach the problem differently if you were to do it again? The more incisive and thoughtful your reflection is, the better your mark. The report contributes up to 30 out of the total 50 marks. See the marking criteria below. Oral defence The oral defence contributes up to 20 out of the total 50 marks. At the oral defence, students need to explain how they solved the problem and answer questions about their solutions showing the workflow in KNIME or working code in Python. You will receive a mark of either 0, 10, 15 or 20 depending on the strength of your response. See the marking criteria via the oral defense portal. Note: Students who fail the oral defence will be permitted to undertake it one more time. If they pass, they will receive a maximum of 10 marks out of 20. Submitting via Kaggle The predictions on the unknown set should be submitted as a .csv file to the Kaggle competition (link will provide soon in our workshop). Submission to Kaggle is not mandatory, but you do need to make predictions on the unknown dataset.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help