class7exercise

.pdf

School

New York University *

*We aren’t endorsed by this school

Course

0232

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by BarristerResolve12942

1 measuring artificial intelligence discussion in earnings calls On Brightspace, the zip file tech_earnings_call_transcripts.zip contains six “JSON” files containing earnings calls information for recent earnings calls of the six tech companies we have studied. Download it and unzip it into a folder on your computer. (Here is an example one of these earnings calls in video format: https://www.microsoft.com/en-us/Investor/events/FY-2023/earnings-fy-2023-q4.aspx .) The following code uses the package “json” to load one of the json files into an object called “json_data” (you may need an os.chdir() prior to this code to move into the directory containing the .json files). import json file = open('transcript_id__2795186.json', 'r') json_data = json.load(file) file.close() 1. Run the above code to load one of the earnings call transcripts. What’s the type of the json_data object? Once you know the json_data object’s type, play around with it in order to figure out how it is structured (“dumping it to the console” prints it out fairly readably, for example). In particular, figure out where the text that people say is stored in the object (hint: if you are struck, try type(json_data['components']) , and type(json_data['components'][0]) ). 2. For each of these six earnings calls, I want two measures: (1) how often did they talk about Artificial Intelligence in the 'Presenter Speech'?, and (2) how often did they talk about Artificial Intelligence in the other dialogue (the questions and answers)? Your code should create three lists (or arrays): a list of company names, a list of measures for my first question, and a list of measures for my second question. The lists should line up (so the first elements of each list correspond, second elements correspond, etc.). To accomplish this: a. After loading an earnings call, use a for loop to create two variables of text: presenterSpeechesBlob and questionsAndAnswersBlob. The for loop should concatenate the spoken text to the first variable (blob of text) using + if it’s a presenter speech, otherwise, concatenate to the second using +. Then you can analyze these blobs of text as strings. b. To loop over the json files, consider using: json_files = [pos_json for pos_json in os.listdir('./') if pos_json.endswith('.json')] c. Then write code to use these two variables containing strings of text to construct the desired numeric measures. To do this: - The python string method “.contains()” counts the number of times the string contains a substring. For example, try "hi hi hey".contains("hi") - Some substrings that indicate “talk about AI” include “AI”, “Artificial Intelligence”, “OpenAI”, “ChatGPT”, “Large Language”. Feel free to use these or come up with your own. Also feel free to apply any weights you think appropriate (maybe you think mention of OpenAI should count more than just AI?). - You’ll need to normalize your measure by the length of the blob of text, otherwise, longer text will have more speech about AI, but this might just be because of the length of the text. One trick is to use the contains method to count the number of spaces in the text. The number of spaces is approximately the number of words, and so is a good denominator to normalize your measure by. 3. For which companies is your Question & Answer index bigger than your Presenter Speech index?

Discover more documents: Sign up today!

Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get: