Problem D. Find URLs HTML is a language for representing documents designed to be displayed by a web browser. In many browsers, you can see the HTML source code by right-clicking somewhere on the page and clicking “View Page Source”. If you try this on a web page with links to other websites, you’ll notice that the URL of the link is usually formatted in the following way: href="https://some.website.com/subfolder/more_stuff.txt" Write a function find_url(html) that takes in a string of html text that contains exactly one external link URL formatted as above, and returns just the URL string (in the above example, that would be https://some.website.com/subfolder/more_stuff.txt). You can assume that the only place in the string where the substring href=" occurs is right before the URL, and that the next quotation mark after that point denotes the end of the URL. Hints: ●  The .find method and string slicing will likely make this easier. ●  Remember that in order to use a double quote mark (") in a string, you either need to escape it with a backslash ("\""), or just use single quotes to begin/end the string ('"'). Examples: >>> find_url('title="Association for Computing Machinery">ACM DL: 81100248871') 'https://dl.acm.org/profile/81100248871' >>> find_url(']Intact Forest Landscapes') 'http://www.intactforests.org/'

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question

Problem D. Find URLs
HTML is a language for representing documents designed to be displayed by a web browser. In

many browsers, you can see the HTML source code by right-clicking somewhere on the page and clicking “View Page Source”.

If you try this on a web page with links to other websites, you’ll notice that the URL of the link is usually formatted in the following way: href="https://some.website.com/subfolder/more_stuff.txt"

Write a function find_url(html) that takes in a string of html text that contains exactly one external link URL formatted as above, and returns just the URL string (in the above example, that would be https://some.website.com/subfolder/more_stuff.txt).

You can assume that the only place in the string where the substring href=" occurs is right before the URL, and that the next quotation mark after that point denotes the end of the URL.

Hints:

  • ●  The .find method and string slicing will likely make this easier.

  • ●  Remember that in order to use a double quote mark (") in a string, you either need to

    escape it with a backslash ("\""), or just use single quotes to begin/end the string ('"').

    Examples:

    >>> find_url('title="Association for Computing Machinery">ACM DL</a>: <span class="uid"><a rel="nofollow" class="external text" href="https://dl.acm.org/profile/81100248871">81100248871</a></span></ span></li>') 'https://dl.acm.org/profile/81100248871'

    >>> find_url('</a><span class="mw-editsection-bracket">]</span></span></h2><ul><li><a rel="nofollow" class="external text" href="http://www.intactforests.org/">Intact Forest Landscapes</a></li>')
    'http://www.intactforests.org/'

Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps with 2 images

Blurred answer
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY