lesson 2 quiz
.docx
keyboard_arrow_up
School
Foundation University, Islamabad Campus *
*We aren’t endorsed by this school
Course
D101
Subject
Computer Science
Date
May 6, 2024
Type
docx
Pages
2
Uploaded by BarristerKnowledgeKingfisher46
1.
The
Data Exploration
node in Model Studio enables you to do which of the following?
a.
Impute variables based on summary statistics.
b.
View the most important inputs or suspicious variables.
c.
See variables with a high percentage of nonmissing values.
2.
To define variable metadata and assign rules to modify variables (for example, assigning a type of transformation), you can use either the Data tab or the
Manage Variables
node.
a.
True
b.
False
3.
Which of the following statements is true about the
Text Mining
node?
a.
It processes audio and video data.
b.
It transforms a term-by-document frequency matrix using singular value decomposition (SVD) to create binary coefficients.
c.
It creates topics based on groups of terms that occur together in several documents. Each term-document pair is assigned a score for every topic.
d.
It does not allow terms and documents to belong to multiple topics.
4.
After a pipeline is run, which of the following can you do using the
Manage Variables
node?
a.
Specify a different target variable.
b.
Modify the target variable attributes.
c.
Set up imputation and transformation rules.
d.
Perform imputation and transformations.
5.
How do the transformations available in the
Transformations
node minimize bias in model predictions?
a.
by reducing the effect of extreme or unusual input values
b.
by replacing missing values and avoiding complete case analysis
c.
by converting unstructured data to structured data
d.
by reducing the total number of variables to reduce dimensionality
6.
The
Variable Selection
node uses only supervised methods to select inputs.
a.
True
b.
False
7.
Which of the following transformations creates bins for a numeric variable?
a.
inverse
b.
exponential
c.
standardize
d.
quantile
8.
Which of the following statements is true about the validation data that the
Variable Selection
node creates from the training data?
a.
The Variable Selection node always creates these validation data.
b.
These validation data are used for variable selection during data preparation.
c.
These validation data are used for model assessment during the modeling process, instead of the original validation partition.
9.
inputs during data preparation?
a.
A model that is based on a large number of inputs is very likely to be underfit to the training data.
b.
The more inputs you use to build the model, the more cases are required to discover the relationship between the inputs and the target.
c.
Modeling algorithms do not reduce the number of inputs.
10.Which of the following is a best practice for handling high-cardinality input variables?
a.
binning
b.
Winsorizing
c.
standardization
d.
text mining
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
State whether the following statement is true or false. ‘Feature engineering is applied in the very initial stages of the data science projects while model tuning comes at the end of the project where we want to inch-up the model performance.’
arrow_forward
To declare a variable, 'type' isn't adequate. To put it another way, all variables have data types and other attributes. How can we use the concept that enables us to represent any variable's characteristics?
arrow_forward
Model evaluation
Create a predictions variable using your fitted model and the test dataset; call it y_pred. Then get the accuracy score of your predictions and save it in a variable called accuracy. Finally get the confusion matrix for your predictions and save it in a variable called confusion_mat.
Code:
y_pred = Noneaccuracy = Noneconfusion_mat = None
arrow_forward
Your department is interested in keeping track of information aboutmajors. Design a data structure that will maintain useful information for yourdepartment. The roster of majors, of course, should be ordered by last name(and then by first, if there are multiple students with the same last name).Your department is interested in keeping track of information aboutmajors. Design a data structure that will maintain useful information for yourdepartment. The roster of majors, of course, should be ordered by last name(and then by first, if there are multiple students with the same last name).Your department is interested in keeping track of information aboutmajors. Design a data structure that will maintain useful information for yourdepartment. The roster of majors, of course, should be ordered by last name(and then by first, if there are multiple students with the same last name).
arrow_forward
A variable's description must include not only its name, but also its "type" and "additional attributes." Every variable has more than one type of data. If you can describe the idea, we'll be able to describe any number better in the future.
arrow_forward
[For this question, you can use a program such as Excel or Word (paste into the quiz, then format at needed), or use the table formatting tools in the Canvas Toolbar in the quiz question to create your matrix.]
Using the scenario information, illustrate a factorial matrix with identifiers for each side of the matrix, and the variable identifications in each box of the table. The goal is that your reader should be able to visually see what you are researching based on the matrix table. Do not copy and paste the matrix from SPSS or other automatically generated format. You are to create the table manually. However, it does not need to be complicated. Simply identify a letter or code that illustrates your variables in the scenario, and be sure your reader knows what they mean. An example is provided below. The goal here is not that you produce a beautifully formatted table for this quiz, but that your information is clearly understandable to your reader, and correct to the scenario.…
arrow_forward
Why do we need to separate the data into training and validation sets? What will the training set be used for? What is the validation set's purpose? Computer science
arrow_forward
A. You need to use 1 dataset with 2 classes for this exercise:
• You can use any dataset (download from UCI Machine Learning Repository)
B. You have to DO THE FOLLOWINGS for EACH of the data set above:
• Suggest a number of classification methods.
Prepare the confusion matrix based on the classification results
• Calculate the following measurement based on the classification result.
Term
Definition
Calculation
Ability to select what needs to be
Sensitivity
TP/(TP+FN)
selected
Ability to reject what needs to be
rejected
Specificity
TN/(TN+FP)
Proportion of cases found that
Precision
ТР/(ТP+FP)
were relevant
Proportion of all relevant cases
Recall
ТР/(ТР+FN)
that were found
Aggregate measure of classifier
performance
Accuracy
(TP+TN)/(TP+TN+FP+FN)
Draw ROC graph.
C. Report your result
1. Give a brief report on your experiment the answer of the following questions:
a. Describe the classification techniques.
b. Describe the data and what problem to be solved.
c. Report the results in…
arrow_forward
Do an analysis of a real data set and also mention where did you find the data set. Please do analysis completely in R studio. A basic requirement for the data set is that it includes one response variable and at least two predictor variables.The main objectives of this question are• to identify a suitable data set,• to come up with meaningful research questions based on the data,• to experience some of the problems encountered when analyzing real data,Also mention:• Where I find the data set?• Why the problem is of interest?• Which method or model is appropriate to this problem?• How do I apply the method to analysis the data set?• What is my conclusion?
arrow_forward
What can you do to guarantee that your model has all it needs to function properly?Why might problem statements benefit from data modeling techniques?
arrow_forward
1
Basic Measures - Explicit vs Implicit Measures
3
Implicit measures are created automatically when we drag a column of values into a visual.
Explicit measures are created manually and define how the column should be summarized.
What are the advantages of using explicit measures in pivot tables? Select ALL that apply.
Because we can define the formatting of explicit measures, and maintain consistent presentation when the measures are used in different
visuals.
You have greater control over the outcome, which is even more important when handing over the model to another user.
Because it's nice to have a complete list of measures.
To ensure that you always know what aggregation is being done.
Because implicit functions don't get saved when you remove them from a visual.
Photos - 3.png
Fullscreen
The Date Dimension & Time Intelligence
A combination of a date dimension table and time intelligence functions can help us create
powerful comparisons across time periods.
What actions do you…
arrow_forward
Q1. The purpose of surrounding attributes with methods as a ‘wall’ in a class is
a. better readability
b. clear representation of a class
c. information hiding
d. easier understanding
Q2 Which of the following describes about data flow modeling correctly?
a. Data flow diagram depicts relationships between data objects.
b. Data flow modeling represents how users interact with a software system.
c. Data flow diagram indicates how data are transformed by the system.
d. all of the above.
Q3 Which of the following describes about behavioral modeling correctly?
a. It represents functions that transform the data flow in a software system.
b. It may indicate how a system responds to external events by changing its states.
c. Both sequence diagram and statechart diagram can be used for this modeling.
d. b and c.
Q4 In data/class design, you need to consider different kinds of classes, including:…
arrow_forward
Distinguish between the capabilities of ModelMUSE and MODFLOW.
arrow_forward
Models are useful for a wide range of reasons.
Sort the models into their appropriate groupings.
arrow_forward
Sarah is working on a design that has physical independence and needs to make a change. Which change will not affect the internal model?
a. logical design
b. external model
c. internal schema
d. storage methods
arrow_forward
The way data is presented may reveal a lot about the connection between variables. Explain each of the three (three) "presenting data formats" with a brief example.
arrow_forward
In Task 1, what is the role of the 'prompt'?
Select one:
a.
The prompt is the output given by the model.
b.
The prompt tells the user what the model is doing.
c.
The prompt is the input to the model, in which you express what you want it to generate.
arrow_forward
what will be the answer of final feature map?
arrow_forward
Which of the following are part of data preprocessing steps?
a. Aggregation
b. Modelling
c. Dimensionality reduction
d. Testing
e. Feature selection
f. Attribute transformation
All of the above
a, b, e, f
a, c, d, f
a, c, e, f
arrow_forward
What exactly is the point of separating the data into a training set and a testing set? It is not obvious what the goal of the training set is. What exactly is the validation set supposed to accomplish?
arrow_forward
The hyper-parameters of a model must NOT be tuned on the test data ( i.e, the data used to evaluate the performance of the final model after selecting the hyper-parameters)
Group of answer choices
True
False
arrow_forward
Create an in-depth description of the process you use to construct a model in Plaxis, keeping in mind the points below.
arrow_forward
You are a Data Scientist at United Health. You want to check if a patient will develop cancer based on smoking habits. Please write the R code to generate confusion matrix. Use the following details.
After splitting the dataframe, we have test_cancer, and training_cancer. The outcome variable is develop_cancer.
arrow_forward
2. How do you construct the profile matrix?
arrow_forward
What actions can you take to make sure your model has all the data it needs?
How may problem statements benefit from data modeling techniques?
arrow_forward
Explain sample size and training/testing a model.
arrow_forward
In addition to a variable's name, its "type" and "extra characteristics" must be specified. That is to say, apart from its data type, every variable has its own distinct characteristics. If you could elaborate on the idea so that we could better clarify the terms, that would be great.
arrow_forward
If you have a training set with millions of features, which Linear Regression training procedure should you use?
arrow_forward
So here, where is the scope?
arrow_forward
Quiz 2/III
Draw the use case model and write main scenario and extensions for the following:
student online registration in a computer college :
a student enters his information in the college website and receives a registration code
to his email . then login the email to complete the process and get a message that he is
being added to the student database if his degrees are under the average the system will
redirect the record to another college.
arrow_forward
course title DECISSION SUPPORT SYSTEM, R PROGRAMMING, (rstudio)
PS: kindly solve problem by writing codes and run in rstudio
Your goal is to properly classify people who have defaulted based on student status, credit card balance, and income (Default: to fail pay a loan debt).
Load data “Default” from ISLR package.
Split data to train and test sets.
Build your prediction model using logistic regression.
Comment on the results
Predict your test data using the model you built.
Calculate the accuracy using real labels.
Create a table to show predicted vs actual values (confusion matrix)
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education
Related Questions
- State whether the following statement is true or false. ‘Feature engineering is applied in the very initial stages of the data science projects while model tuning comes at the end of the project where we want to inch-up the model performance.’arrow_forwardTo declare a variable, 'type' isn't adequate. To put it another way, all variables have data types and other attributes. How can we use the concept that enables us to represent any variable's characteristics?arrow_forwardModel evaluation Create a predictions variable using your fitted model and the test dataset; call it y_pred. Then get the accuracy score of your predictions and save it in a variable called accuracy. Finally get the confusion matrix for your predictions and save it in a variable called confusion_mat. Code: y_pred = Noneaccuracy = Noneconfusion_mat = Nonearrow_forward
- Your department is interested in keeping track of information aboutmajors. Design a data structure that will maintain useful information for yourdepartment. The roster of majors, of course, should be ordered by last name(and then by first, if there are multiple students with the same last name).Your department is interested in keeping track of information aboutmajors. Design a data structure that will maintain useful information for yourdepartment. The roster of majors, of course, should be ordered by last name(and then by first, if there are multiple students with the same last name).Your department is interested in keeping track of information aboutmajors. Design a data structure that will maintain useful information for yourdepartment. The roster of majors, of course, should be ordered by last name(and then by first, if there are multiple students with the same last name).arrow_forwardA variable's description must include not only its name, but also its "type" and "additional attributes." Every variable has more than one type of data. If you can describe the idea, we'll be able to describe any number better in the future.arrow_forward[For this question, you can use a program such as Excel or Word (paste into the quiz, then format at needed), or use the table formatting tools in the Canvas Toolbar in the quiz question to create your matrix.] Using the scenario information, illustrate a factorial matrix with identifiers for each side of the matrix, and the variable identifications in each box of the table. The goal is that your reader should be able to visually see what you are researching based on the matrix table. Do not copy and paste the matrix from SPSS or other automatically generated format. You are to create the table manually. However, it does not need to be complicated. Simply identify a letter or code that illustrates your variables in the scenario, and be sure your reader knows what they mean. An example is provided below. The goal here is not that you produce a beautifully formatted table for this quiz, but that your information is clearly understandable to your reader, and correct to the scenario.…arrow_forward
- Why do we need to separate the data into training and validation sets? What will the training set be used for? What is the validation set's purpose? Computer sciencearrow_forwardA. You need to use 1 dataset with 2 classes for this exercise: • You can use any dataset (download from UCI Machine Learning Repository) B. You have to DO THE FOLLOWINGS for EACH of the data set above: • Suggest a number of classification methods. Prepare the confusion matrix based on the classification results • Calculate the following measurement based on the classification result. Term Definition Calculation Ability to select what needs to be Sensitivity TP/(TP+FN) selected Ability to reject what needs to be rejected Specificity TN/(TN+FP) Proportion of cases found that Precision ТР/(ТP+FP) were relevant Proportion of all relevant cases Recall ТР/(ТР+FN) that were found Aggregate measure of classifier performance Accuracy (TP+TN)/(TP+TN+FP+FN) Draw ROC graph. C. Report your result 1. Give a brief report on your experiment the answer of the following questions: a. Describe the classification techniques. b. Describe the data and what problem to be solved. c. Report the results in…arrow_forwardDo an analysis of a real data set and also mention where did you find the data set. Please do analysis completely in R studio. A basic requirement for the data set is that it includes one response variable and at least two predictor variables.The main objectives of this question are• to identify a suitable data set,• to come up with meaningful research questions based on the data,• to experience some of the problems encountered when analyzing real data,Also mention:• Where I find the data set?• Why the problem is of interest?• Which method or model is appropriate to this problem?• How do I apply the method to analysis the data set?• What is my conclusion?arrow_forward
- What can you do to guarantee that your model has all it needs to function properly?Why might problem statements benefit from data modeling techniques?arrow_forward1 Basic Measures - Explicit vs Implicit Measures 3 Implicit measures are created automatically when we drag a column of values into a visual. Explicit measures are created manually and define how the column should be summarized. What are the advantages of using explicit measures in pivot tables? Select ALL that apply. Because we can define the formatting of explicit measures, and maintain consistent presentation when the measures are used in different visuals. You have greater control over the outcome, which is even more important when handing over the model to another user. Because it's nice to have a complete list of measures. To ensure that you always know what aggregation is being done. Because implicit functions don't get saved when you remove them from a visual. Photos - 3.png Fullscreen The Date Dimension & Time Intelligence A combination of a date dimension table and time intelligence functions can help us create powerful comparisons across time periods. What actions do you…arrow_forwardQ1. The purpose of surrounding attributes with methods as a ‘wall’ in a class is a. better readability b. clear representation of a class c. information hiding d. easier understanding Q2 Which of the following describes about data flow modeling correctly? a. Data flow diagram depicts relationships between data objects. b. Data flow modeling represents how users interact with a software system. c. Data flow diagram indicates how data are transformed by the system. d. all of the above. Q3 Which of the following describes about behavioral modeling correctly? a. It represents functions that transform the data flow in a software system. b. It may indicate how a system responds to external events by changing its states. c. Both sequence diagram and statechart diagram can be used for this modeling. d. b and c. Q4 In data/class design, you need to consider different kinds of classes, including:…arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education