Assignent 11
.docx
keyboard_arrow_up
School
University of Missouri, Columbia *
*We aren’t endorsed by this school
Course
8740
Subject
Computer Science
Date
Jan 9, 2024
Type
docx
Pages
5
Uploaded by sharukh95
Assignment 11
1.)
The code is exploring topics in the Associated Press dataset using Latent Dirichlet Allocation (LDA) from the topicmodels library.
This code applies LDA to the Associated Press dataset with 2 topics and sets the seed for reproducibility.
The tidy function is used to convert the LDA results into a tidy data frame.
This code extracts the top 10 terms for each topic based on their beta values.
This code creates a bar plot using ggplot2, displaying the top terms for each topic.
Overall, this script is a comprehensive exploration of topics in the Associated Press dataset using LDA, followed by visualizing the top terms for each topic. The tidy
function from the tidytext
library is used to convert the LDA results into a tidy data frame. This makes it easier to work with
the data and extract relevant information.
The code groups the tidy LDA results by topic, then extracts the top 10 terms for each topic based
on their beta values (indicating the strength of association with the topic). The result is sorted in descending order of beta values.
The code uses ggplot2
to create a bar plot. Each bar represents a term, and the bars are colored by
the topic. The plot is faceted, meaning there is a separate facet for each topic. The scale_y_reordered()
ensures that the terms are ordered within each facet based on their beta values.
#2.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Computer programming for business analytics. R programming: import the HouseData.csv file into a data frame called house using the stringsAsFactors = FALSE setting. Next, multiply the Parking and City_Category vectors using the factor() function.
arrow_forward
R programming First, import the HouseData.csv file into a data frame called house with the stringsAsFactors = FALSE setting. Next, use the factor() function to encode the Parking and City_Category vectors as factors.
arrow_forward
Create six TIME SERIES(USING pd.series function) and store in a Pandas data frame:a)The data frame must have an index that is a range of dates from 2016-01-01 until today.b)Each column contains one set of random numbers in a range 0 to 1 (each column will haveone random number for each date).
number of columns = 6
number of rows = date range between today and 2016-01-01
arrow_forward
get_total_cases() takes the a 2D-list (similar to database) and an integer x from this set {0, 1, 2} as input parameters. Here, 0 represents Case_Reported_Date, 1 represents Age_Group and 2 represents Client_Gender (these are the fields on the header row, the integer value represents the index of each of these fields on that row). This function computes the total number of reported cases for each instance of x in the text file, and it stores this information in a dictionary in this form {an_instance_of_x : total_case}. Finally, it returns the dictionary and the total number of all reported cases saved in this dictionary.
arrow_forward
Create six TIME SERIES(USING pd.series function) and store in a Pandas data frame:a)The data frame must have an index that is a range of dates from 2016-01-01 until today.b)Each column contains one set of random numbers in a range 0 to 1 (each column will haveone random number for each date).
arrow_forward
Computer programming for business analytics. 1. Using the R programming language, import the HouseData.csv file into a data frame called house while setting stringsAsFactors = FALSE. After that, use the factor() function on the Parking and City_Category vectors.
arrow_forward
Q7: Using the AIRPORT KLX Table (Textbook page 131), describe an example that illustrates the insertion anomaly.
Q8: Using the AIRPORT KLX Table (Textbook page 131), describe an example that illustrates the deletion anomaly.
Q9: Using the AIRPORT KLX Table (Textbook page 131), describe an example that illustrates the modification anomaly.
arrow_forward
Week 4 Lab assessment task
Use the function julia, that you have defined, to produce an image of a Julia set.
Please try to find a "nice" seed constant that produces an interesting Julia set that is different from the provided
example. At your choice, you may also customise the colour map or other aspects of the image.
Use the title command, and optionally the subtitle command, to add a title in the same format as the image
shown below.
Sample code:
julia (0.4,0.4,1500,50) ;
2 hold on
3 title (" Julia set c = 0.4 + 0.41")
4 subtitle ("Oliver Heaviside ID 123456789")
s hold off
arrow_forward
List data_list contains integers read from input, representing a sequence of data values. For each index i of data_list from 1 through the second-to-last index:
The element at index i is a drop if the element is less than both the preceding element and the following element.
If the element at index i is a drop, then output 'Drop: ', followed by the preceding element, the current element, and the following element, separating each element by a space.
arrow_forward
This exercise allows a user to load one of two CSV files and then perform histogram analysis and plots for select variables on the datasets. The first dataset represents the population change for specific dates for U.S. regions. The second dataset represents Housing data over an extended period of time describing home age, number of bedrooms and other variables. The first row provides a column name for each dataset. The following columns should be used to perform analysis: PopChange.csv: Pop Apr 1 Pop Jul 1 Change Pop Housing.csv: AGE BEDRMS BUILT ROOMS UTILITY Notice for the Housing CSV file, there are more columns in the file than are required to be analyzed. You can and should still load each column. Specific statistics should include: Count Mean Standard Deviation Min Max Histogram A user interface might look similar to this: ***************** Welcome to the Python Data Analysis App********** Select the file you want to analyze: 1. Population Data 2.…
arrow_forward
Language - Python
arrow_forward
For this exercise, we’ll use the (built-in) dataset VADeaths. a) Make sure the object is a data frame, if not change it to a data frame. b) Create a new variable, named Total, which is the sum of each row. c) Change the order of the columns so total is the first variable.
arrow_forward
Background: We have a set of handwritten digits, 0 to 9, the image sizes is 28-by-28 pixels; it includes ten images per digit (i.e., 100 images in total). We want to create a barcode to present each image. We will use the corresponding barcode of the image to search for the most similar image in the dataset. In fact, we will compare the barcode of the query image with other barcodes to find the most similar image (the closest would be the most similar). Then, we will conduct some experiments to report the retrievalaccuracy. Furthermore, you will analyze the designed algorithms complexity (based on Big-O notation).
Question: Create an Barcode_Generator Algorithm to generate the barcodes for each image?
arrow_forward
Plz help
arrow_forward
7]: fig, ax = plt.subplots()
data_d.plot.hist(density=False,
ax=ax, title='Histogram: Set1 and Setl samples vs. Set2 and Set2 samples', bins=40)
data.plot.hist (density=False, ax=ax, bins=40)
ax.set_ylabel('Count')
ax.grid(axis='y')
Count
Histogram: Set1 and Set1 samples vs. Set2 and Set2 samples
35
30
25
20
15
10
600
500
400
300
5
200
0
100
0
100
Use boxplots to compare the four sets. Discuss their differences.
0
200
8]: fig = plt.figure(figsize =(10, 7))
plt.boxplot ([set1, set1_s, set2, set2_s],1, '')
plt.show()
300
400
500
Setls
Set2s
Set1
Set2
2
600
3
The first pair and the second pair look similar while the two pairs look differnet, right? The question is how can we KNOW if two sets are truly (significantly) different or not?
arrow_forward
#The Iris Dataset
import sklearn.datasetsimport matplotlib.pyplot as plt import numpy as np import scipy
iris = sklearn.datasets.load_iris()
Write a function that takes in an index i and prints out a verbose desciption of the species and measurements for data point i. For example:Data point 5 is of the species setosaIts sepal length (cm) is 5.4Its sepal width (cm) is 3.9Its petal length (cm) is 1.7Its petal width (cm) is 0.4
arrow_forward
sacramento.pyUse the sacramento.csv file to complete the following assignment. Create a file, sacramento.py, that loads the .csv file and runs a logistic regression. The regression should predict whether or not a house has 1 or more than one bathroom based on beds, sqft, and price, in that order. Note: you will not need to upload the .csv to CodeGrade because I have pre-loaded it.You will need to create a new variable from baths, and it should make it such that those observations of 1 bath correspond to a value of 0, and those with more than 1 bath correspond to a 1.Make sure to add a constant using sm.add_constant(X)Your file should print the results in this way:
print(mod.params.round(2))print(mod.pvalues.round(2))print('The smallest p-value is for sqft')
sacramento.csv…
arrow_forward
Alter the attached code so that the bar chart can be up to 25 categories, and can be sorted ascending or descending. In this assignment an array of Category structs are used to store both the category name (label) and the category value. The Category struct is provided below along with changes to the prototypes from assignment 3. The get_longest_category_name function has been removed because after sorting, the last or first element in the cats array will be the longest length label. Use this information in create_bar_chart and the asc Boolean value where true is sort ascending and false is sort descending.
struct Category { std::string label; double value;};
Update the attached code to include the following in global scope:
#define CATEGORIES 25int num_categories = 5;
Change the attached code's functions to the following:
//Ask user how many categories, up to a max of 25int how_many_categories();void get_category(Category cats[CATEGORIES]);void get_values(Category…
arrow_forward
Can somebody help me with my homework? I provided screenshots of my code that goes alongside the question.
*Side note comments would be helpful, but not required.
arrow_forward
16-7. Automated Title: In eq_world_map_3.py, we specified the title manually when defining my_layout, which means we have to remember to update the title every time the source file changes. Instead, you can use the title for the data set in the metadata part of the JSON file. Pull this value, assign it to a variable, and use this for the title of the map when you’re defining my_layout.
arrow_forward
The CustNames spreadsheet imported customer names with non-printable characters, additional spaces, and inconsistent formatting. First name, middle initial, and last name must be cleared. Use the text function in column B of the CustNames worksheet to remove nonprinting characters from column A. To fit contents, resize column. 0.5
arrow_forward
Modify
1. Replace the elements of r_vec with the indices 4, 6, 8, 10 and 15 with 3, 6, 9, 12 and 15, respectively.
2. Replace the elements of mat with the indices (4, 5), (4, 17), (20, 5) and (20,17) with 5, 8, 2 and 9, respectively.
3. Replace the all the elements of the eleventh column of mat with ones.
4. Replace the characters 11 to 19 of chr with the words white cat.
5. Replace the first entry of str with the value of sclr.
arrow_forward
Capture the student performance record in a sentinel-controlled loop and store the results in three parallel arrays. The information to be stored in the three arrays is the student’s full name, continuous assessment mark, and final mark. If the lecturer types the word ‘Done’ instead of a full name, the loop should immediately stop even before capturing any marks.
Search the array for the student’s full name and then display the full record of the student, or the notification that student does not exist. You should also allow the teachers to search for the best or worst performer in the Final Mark column. When the mark is found, let it be displayed with all the student’s details.
Make this program a menu driven system with the following options: Capture Marks, Find a Student, Find the Best Performer, Find the Worst Performer,
arrow_forward
Plz help with code
arrow_forward
Go to Sheet1. In cell C11 type a VLOOKUP function to find the corresponding letter grade for the score in B11. Use absolute references for the table array parameter (A4 to B8). Use FALSE for the range lookup parameter. Copy the formula in C11 to C12:C15Notice this formula only works correctly in C11 and C15. That is because you used FALSE for the range lookup parameter. FALSE means search for an exact match. The only scores in C11:C15 that exactly match values in the table array are 80 and 90.
In cell D11 type a VLOOKUP function to find the corresponding letter grade for the score in B11. The table array parameter is the same as in C11. Do not type anything for the range lookup parameter. Copy the formula in D11 to D12:D15Notice this formula works correctly for all cells. The default value for the range lookup parameter is TRUE. (You could have typed TRUE and it would work the same way, However, why type when you don't need to?) TRUE means search for an approximate match - the last…
arrow_forward
The chr21_genes.txt file lists genes from human chromosome 21, in their order along the chromosome, as described in Hattori et al. (Nature 405, 311-319) (Links to an external site.). For each gene, the file gives the gene symbol, description and category. The fields are separated by tabs. You will need to get the the meaning of each category. You can find these meanings in the original paper (Links to an external site.), under the "Gene categories" section. Create a file named chr21_genes_categories.txt that store this information in tab separated fields:
arrow_forward
Python code .
arrow_forward
5. Write an R program, using the corrgram library, to plot the Average_Daily_Traffic_Counts.csv dataset.Use the lower.panel=panel.conf, upper.panel=panel.pts settings for the plot.
6. Write an R program, using the ggplot2 library, to print a scatter chart for the count_female andcount_male columns. Use the Demographic_Statistics_By_Zip_Code.csv dataset.
arrow_forward
Define the function print_trophic_class_summary(tli3_values) that accepts a list of trophic level index values and prints a summary outlining the number of lakes in each trophic classification, in order from highest trophic classification to lowest. See the examples for the required format.
Notes:
Your function must print the summary, not return it.
In each state line, the initial number should be formatted with width 3. (Hint: :3 will be helpful)
All possible states must be included in the output, even states with zero lakes. The following list will be helpful:
['Hypertrophic', 'Supertrophic', 'Eutrophic', 'Mesotrophic', 'Oligotrophic', 'Microtrophic', 'Ultra-microtrophic']
You must include and use one of your number_in_trophic_class functions (take your pick!), plus your trophic_class function. Basically, you can start with your answer to Question 5 or 6 and add your print_trophic_class_summary function definition after your previous definitions.
For example:
Test
Result…
arrow_forward
3. In file 'R-Factor-Basics.docx', page 4, use the factor() command to modify the column
dat$Group so that the control group is plotted last.e
arrow_forward
Apply the discretization filter in iris dataset. (Note: iris dataset can be directly loaded into WEKA from the “C:\Program Files\Weka-3-8\data” link). After applying the discretization filter, list all the features (attributes) and their ranges.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
Related Questions
- Computer programming for business analytics. R programming: import the HouseData.csv file into a data frame called house using the stringsAsFactors = FALSE setting. Next, multiply the Parking and City_Category vectors using the factor() function.arrow_forwardR programming First, import the HouseData.csv file into a data frame called house with the stringsAsFactors = FALSE setting. Next, use the factor() function to encode the Parking and City_Category vectors as factors.arrow_forwardCreate six TIME SERIES(USING pd.series function) and store in a Pandas data frame:a)The data frame must have an index that is a range of dates from 2016-01-01 until today.b)Each column contains one set of random numbers in a range 0 to 1 (each column will haveone random number for each date). number of columns = 6 number of rows = date range between today and 2016-01-01arrow_forward
- get_total_cases() takes the a 2D-list (similar to database) and an integer x from this set {0, 1, 2} as input parameters. Here, 0 represents Case_Reported_Date, 1 represents Age_Group and 2 represents Client_Gender (these are the fields on the header row, the integer value represents the index of each of these fields on that row). This function computes the total number of reported cases for each instance of x in the text file, and it stores this information in a dictionary in this form {an_instance_of_x : total_case}. Finally, it returns the dictionary and the total number of all reported cases saved in this dictionary.arrow_forwardCreate six TIME SERIES(USING pd.series function) and store in a Pandas data frame:a)The data frame must have an index that is a range of dates from 2016-01-01 until today.b)Each column contains one set of random numbers in a range 0 to 1 (each column will haveone random number for each date).arrow_forwardComputer programming for business analytics. 1. Using the R programming language, import the HouseData.csv file into a data frame called house while setting stringsAsFactors = FALSE. After that, use the factor() function on the Parking and City_Category vectors.arrow_forward
- Q7: Using the AIRPORT KLX Table (Textbook page 131), describe an example that illustrates the insertion anomaly. Q8: Using the AIRPORT KLX Table (Textbook page 131), describe an example that illustrates the deletion anomaly. Q9: Using the AIRPORT KLX Table (Textbook page 131), describe an example that illustrates the modification anomaly.arrow_forwardWeek 4 Lab assessment task Use the function julia, that you have defined, to produce an image of a Julia set. Please try to find a "nice" seed constant that produces an interesting Julia set that is different from the provided example. At your choice, you may also customise the colour map or other aspects of the image. Use the title command, and optionally the subtitle command, to add a title in the same format as the image shown below. Sample code: julia (0.4,0.4,1500,50) ; 2 hold on 3 title (" Julia set c = 0.4 + 0.41") 4 subtitle ("Oliver Heaviside ID 123456789") s hold offarrow_forwardList data_list contains integers read from input, representing a sequence of data values. For each index i of data_list from 1 through the second-to-last index: The element at index i is a drop if the element is less than both the preceding element and the following element. If the element at index i is a drop, then output 'Drop: ', followed by the preceding element, the current element, and the following element, separating each element by a space.arrow_forward
- This exercise allows a user to load one of two CSV files and then perform histogram analysis and plots for select variables on the datasets. The first dataset represents the population change for specific dates for U.S. regions. The second dataset represents Housing data over an extended period of time describing home age, number of bedrooms and other variables. The first row provides a column name for each dataset. The following columns should be used to perform analysis: PopChange.csv: Pop Apr 1 Pop Jul 1 Change Pop Housing.csv: AGE BEDRMS BUILT ROOMS UTILITY Notice for the Housing CSV file, there are more columns in the file than are required to be analyzed. You can and should still load each column. Specific statistics should include: Count Mean Standard Deviation Min Max Histogram A user interface might look similar to this: ***************** Welcome to the Python Data Analysis App********** Select the file you want to analyze: 1. Population Data 2.…arrow_forwardLanguage - Pythonarrow_forwardFor this exercise, we’ll use the (built-in) dataset VADeaths. a) Make sure the object is a data frame, if not change it to a data frame. b) Create a new variable, named Total, which is the sum of each row. c) Change the order of the columns so total is the first variable.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Np Ms Office 365/Excel 2016 I NtermedComputer ScienceISBN:9781337508841Author:CareyPublisher:Cengage
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage