ADTA_5240_assignment_02_Adii - for merge

docx

School

University of North Texas *

*We aren’t endorsed by this school

Course

4230

Subject

Computer Science

Date

Dec 6, 2023

Type

docx

Pages

18

Uploaded by ConstableRedPandaMaster880

Report
ASSIGNMENT – 02 Aditya Lingangari ID no : 11625546 How to create a New Storage bucket,3 folders and Load Data into a Folder in GCP To create a new bucket in GCP : Go to navigation bar and scroll down to storage. Click on that and select ‘create bucket’. There appears a box, where you have to give that bucket a name. Select multi-regional under ‘choose where to store your data’ Later , ‘choose a default storage class for your data’ as default ‘standard’ and ‘choose how to control access to objects’ as ‘fine-grained’ and click continue. Last step is to click on the create option which will create your bucket successfully. Below is the screenshot of the same. Bucket Name : adta_assi_02_adi
To create 3 Folders: Now , you need to create 3 folders with folder names as data, logs & outputs. Firstly, click on the create folder under the bucket details tab. Click create and create 3 separate folders with name as data, logs and outputs. And then it creates 3 folders where you can load data. Below is the screenshot of the same :
To upload data to GCP storage buckets : After creating the folders, click on one of the folders and upload files. Browse for the two files mentioned and upload the same accordingly. Below is the screenshot of the same :
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
To create Hadoop cluster : Firstly, go to the navigation bar, under ‘API & services’ go to dashboard and click ‘ Enable APIS and Services ‘. In the search bar, search for ‘ computer engine ‘ and enable ‘computer engine API’ Now the computer engine API is enabled. Below is the screen shot of the same:
To enable GCP Dataproc API: Firstly, go to the navigation bar, under ‘API & services’ go to dashboard and click ‘ Enable APIS and Services ‘. In the search bar, search for ‘ Dataproc API ‘ and click enable ‘Cloud Dataproc API’ Now the Cloud Dataproc API is enabled . Below is the screenshot of the same:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
To create a cloud Dataproc Cluster : Click on the navigation menu and scroll down to ‘Dataproc’ Click on clusters and then select ‘Create Cluster’ Next step is setting up the cluster Setup Cluster : Give a name to the cluster , enter location as : Region as ‘us-central1’ and zone as ‘s-central-a’ Enter cluster type as ‘standard(1 master, N workers) ‘
Now scroll down and under versioning click change.
Now scroll down and under ‘STANDARD DATAPROC IMAGE’ choose 1.5(Debian 10, Hadoop 2.9 , Spark 2.4). And now click select.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Now its time to configure ‘Master and Worker Nodes’ Click ‘Configure Nodes’
Select “GENERAL-PURPOSE” for the “Machine family” Select “E2” for “Series” Select “e2-standard-8 (8 vCPUs, 32 GB memory)” for the “Machine Type” Select “128 GB” for the “Primary disk size (min 10GB)” Select “Standard Persistent Disk” for the “Primary disk type” Scroll down to “Worker nodes”
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Now Under Worker Nodes , You are supposed to complete the below steps Select “GENERAL-PURPOSE” for the “Machine family” Click “E2” for “Series” and e2-standard-4 (4 vCPUs, 16 GB memory)” for the “Machine Type” click “e2-standard-4 (4 vCPUs, 16 GB memory)” for the “Machine Type” Keep “2” for the “Number of worker nodes” and Select “128 GB” for the “Primary disk size (min 10GB)” Select “Standard Persistent Disk” for the “Primary disk type” The next step will be “Customize cluster” Now under ‘Customize Cluster’, Set primary network and subnetwork as ‘Default’ And type in Hadoop under the network tags.
Scroll down to cloud storage stagging bucket. Click on “Browse” and find the storage bucket you created previously. I have named mine as “ adta_assi_02_adi “ Click “Select” and You can see on the next page, that “adta_assi_02_adi” is now in the “Cloud Storage staging bucket”
Now you are supposed to click on create as you can see your storage bucket down there and then wait for the configuration to happen.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Now it has been created successfully and is in running state. Dataproc cluster : master node and worker node : Click the Navigation Menu (Three horizontal bars on the top left corner) to open the menu. Scroll down to “Compute Engine” Click on “Compute Engine”
Stopped all the nodes for now.
Now you can see that all the three nodes of the cluster are in running state. You are good to go now.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help