How to count number of words in a HTML string and find Read time in Python 3
In this blog we are going to learn how to count number of words in a string with HTML tags and read-time of that string in Python.
While writing blogs or articles in html text editor the editor gives a string with embedded with HTML tags which is saved in database as it is.
We need to show read-time of a blog/article OR number of words in that blog to a reader. We can count words from a string with HTML tags by stripping HTML tags as follows in Python:
we need HTMLParser library for striping of HTML Tags and math library for mathematical operations and re for regex related operations.
from html.parser import HTMLParser import math import re
Then we need to create a Class which implement HTMLParser
class MLStripper(HTMLParser):
"""
Class for stripping Html Tags
"""
def __init__(self):
self.reset()
self.strict = False
self.convert_charrefs= True
self.fed = []
#this function takes html string as input and put data in
def handle_data(self, d):
self.fed.append(d)
def get_data(self):
return ''.join(self.fed)
Now write function which takes input as HTML string return clean word string without HTML tags
def strip_tags(html):
s = MLStripper()
s.feed(html)
return s.get_data()
Write functions for word count and read-time
def count_words(html_string):
# html_string = """
# <h1>This is a title</h1>
# """
word_string = strip_tags(html_string)
count = len(word_string.split()) #without any argument split() works on space
return count
def get_read_time(html_string):
count = count_words(html_string)
read_time_min = math.ceil(count/200.0) #assuming 200wpm reading
return int(read_time_min)
def count_words(html_string):
# html_string = """
# <h1>This is a title</h1>
# """
word_string = strip_tags(html_string)
words = re.findall(r'\w+', word_string)
count = len(words) #joincfe.com/projects/
return count
Hope this will help.
Image Credit : Google
Comments (0)
Leave a comment
Related Blogs
Introduction to Kibana
Aug 1, 2020, 6:19:45 PM | Anurag Srivastava
Bucket Aggregation in Elasticsearch
Aug 29, 2018, 7:15:06 PM | Anurag Srivastava
Metrics Aggregations in Elasticsearch
Aug 18, 2018, 6:02:20 PM | Anurag Srivastava
Introduction to Elasticsearch Aggregations
Aug 14, 2018, 4:47:56 PM | Anurag Srivastava
Wildcard and Boolean Search in Elasticsearch
Aug 10, 2018, 7:14:40 PM | Anurag Srivastava
Basics of Data Search in Elasticsearch
Aug 4, 2018, 7:02:21 AM | Anurag Srivastava
Top Blogs
Wildcard and Boolean Search in Elasticsearch
Aug 10, 2018, 7:14:40 PM | Anurag Srivastava
Elasticsearch REST APIs
Jul 31, 2018, 6:16:42 PM | Anurag Srivastava
How to count number of words in a HTML string and find Read time in Python 3
Jun 30, 2018, 12:07:47 PM | jitender yadav
Create a Chess board in PHP
Mar 9, 2020, 8:45:41 AM | Rocky Paul
Bucket Aggregation in Elasticsearch
Aug 29, 2018, 7:15:06 PM | Anurag Srivastava
Metrics Aggregations in Elasticsearch
Aug 18, 2018, 6:02:20 PM | Anurag Srivastava