tutorial_01

.html

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

301

Subject

Statistics

Date

Feb 20, 2024

Type

html

Pages

Uploaded by DeaconEnergyCat30

Tutorial 1: Introduction to Statistical Modelling and A/B Testing ¶ Learning Objectives ¶ After completing this week's worksheet and tutorial work, you will be able to: 1. Describe the goals of hypothesis testing, in particular difference in means tests related to A/B testing. 2. Give an example of a problem that requires A/B testing. 3. List methods used to test difference in means between two populations. 4. Interpret the results of hypothesis tests. 5. Explain the relation between type I and type II errors, power and sample size in 2-sample hypothesis testing. 6. Write a computer script to perform difference in means hypothesis testing and compute errors, power and p-values. In [67]: # Run this cell before continuing. library(tidyverse) library(infer) library(broom) library(cowplot) library(binom) source("tests_tutorial_01.R") 1. Analysis of an A/B Testing Paper ¶ In Worksheet 1, we reviewed key concepts of hypothesis tests to test the difference between two population means and discussed their relation to A/B testing. In this tutorial, you will review concepts related to the difference between two proportions, also seen before in STAT 201. In this exercise, we will work with the paper "Improving Library User Experience with A/B Testing: Principles and Process" by Young (2014). This paper presents a case study where A/B testing is applied with different webpage designs. The primary aim is to compare user interactions to determine which one statistically improves the navigation experience by increasing the homepage click-through rate. The experiment was conducted using the web analytics software Google Analytics and Crazy Egg. The data from the paper can be found here . The setup was done on the Interact category in the Montana State University's library webpage (more information can be found in the section Step 1 in the paper). The experimental treatments (as explained and shown in Step 4 in the paper) are the following: Interact (the control treatment), Connect , Learn , Help , and Services . The response variable is what we call the click-through rate , i.e., ratio of users that click on a specific link to the total number of users who view the page (a proportion that goes from 0 to 1). We have already processed the data for you. Firstly, we load the Crazy Egg data from the web. In [68]:

click_through <- read_csv("data/click_through.csv") %>% select(webpage, adjusted_clicks, target_clicks) head(click_through) Rows: 5 Columns: 5 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (1): webpage dbl (4): clicks, home_page_clicks, adjusted_clicks, target_clicks Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. ℹ A tibble: 5 × 3 webpage adjusted_clicks target_clicks <chr> <dbl> <dbl> Interact 2423 42 Connect 1504 53 Learn 1569 21 Help 1595 38 Services 1299 45 Question 1.0 {points: 1} The adjusted_clicks in the data frame click_through are the total clicks we will use to compute the click-through rate by treatment, where target_clicks are what we could define as “successes” . Compute the corresponding click-through rate by row by dividing target_clicks over adjusted_clicks . Add it as a new column in the data frame called click_rate . Then, reorder the experimental treatments (i.e., factor levels) in descending order by click-through rate. Fill out those parts indicated with ... , uncomment the corresponding code in the cell below, and run it. In [69]: click_through <- click_through %>% mutate(click_rate = target_clicks / adjusted_clicks) %>% mutate(webpage = fct_reorder(webpage, desc(click_rate))) click_through

levels(click_through$webpage) A tibble: 5 × 4 webpage adjusted_clicks target_clicks click_rate <fct> <dbl> <dbl> <dbl> Interact 2423 42 0.01733388 Connect 1504 53 0.03523936 Learn 1569 21 0.01338432 Help 1595 38 0.02382445 Services 1299 45 0.03464203 1. 'Connect' 2. 'Services' 3. 'Help' 4. 'Interact' 5. 'Learn' In [70]: test_1.0() Test passed 🥳 Test passed 🥳 Test passed 😸 Test passed 🥳 Test passed 🥳 [1] "Success!" Question 1.1 {points: 1} The sampled click-through rates in the data frame click_through are estimates of population proportions. Hence, it is possible to obtain confidence intervals by relying on the Central Limit Theorem. Obtain the 95% confidence interval for each population click rate and store the lower and upper bounds in two new columns click_through : lower_ci and upper_ci . Fill out those parts indicated with ... , uncomment the corresponding code in the cell below, and run it. In [75]: click_through <- click_through %>% mutate( lower_ci = click_rate - qnorm(0.975) * sqrt(click_rate * (1 - click_rate) /

adjusted_clicks), upper_ci = click_rate + qnorm(0.975) * sqrt(click_rate * (1 - click_rate) / adjusted_clicks) ) click_through A tibble: 5 × 6 webpage adjusted_clicks target_clicks click_rate lower_ci upper_ci <fct> <dbl> <dbl> <dbl> <dbl> <dbl> Interact 2423 42 0.01733388 0.012137247 0.02253052 Connect 1504 53 0.03523936 0.025920819 0.04455790 Learn 1569 21 0.01338432 0.007698296 0.01907035 Help 1595 38 0.02382445 0.016340290 0.03130861 Services 1299 45 0.03464203 0.024697385 0.04458668 In [76]: test_1.1() Test passed 🥳 Test passed 🥳 Test passed 🥳 Test passed 🥳 Test passed 😀 [1] "Success!" Question 1.2 {points: 1} Let's create an effective visualization for the point estimate click_rate and the confidence intervals you obtained above. The ggplot() object's name shoud be CIs_click_through_rates . Fill out those parts indicated with ... , uncomment the corresponding code in the cell below, and run it. In [77]: # Plotting click-through rates as points with 95% confidence intervals. CIs_click_through_rates <- click_through %>% ggplot(aes(x = webpage, y = click_rate)) + geom_point() + geom_errorbar(aes(ymin = lower_ci, ymax = upper_ci), width = 0.1) + theme(

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...

Algebra

ISBN:9780079039897

Author:Carter

Publisher:McGraw Hill

Big Ideas Math A Bridge To Success Algebra 1: Stu...

Algebra

ISBN:9781680331141

Author:HOUGHTON MIFFLIN HARCOURT

Publisher:Houghton Mifflin Harcourt

College Algebra (MindTap Course List)

Algebra

ISBN:9781305652231

Author:R. David Gustafson, Jeff Hughes

Publisher:Cengage Learning

College Algebra

Algebra

ISBN:9781938168383

Author:Jay Abramson

Publisher:OpenStax

SEE MORE TEXTBOOKS

Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning
College Algebra
Algebra
ISBN:9781938168383
Author:Jay Abramson
Publisher:OpenStax