How to convert HTML code to Text with Python (solved)
easy_install beautifulsoup4 easy_install html5lib
html_doc = """Some HTML code that you want to convert""" from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc) print(soup.get_text())
Of course, you need to import the module if you want to call its functions. That the role of the line in bold above (from bs4 import BeautifulSoup).
|Html-source-code (Photo credit: Wikipedia)|
Alternatives to BeautifulSoup to implement HTML2textStriptogram might be an alternative to beautiful soup, but I must say, I am fully satisfied by beautiful soup.
from stripogram import html2text, html2safehtml # Only allow <b>, <a>, <i>, <br>, and <p> tags clean_html = html2safehtml(original_html,valid_tags=("b", "a", "i", "br", "p")) # Don't process <img> tags, just strip them out. Use an indent of 4 spaces # and a page that's 80 characters wide. text = html2text(original_html,ignore_tags=("img",),indent_width=4,page_width=80
If you want to improve your coding skills, I advise you to look at "Cracking the Coding Interview: 150 Programming Questions and Solutions". It was written by Gayle Laakmann McDowell, a former recruiter from Google who also worked at Apple and I find it really great!