How to count number of words in a HTML string and find Read time in Python 3


preview imageProgramming
by jitender yadav,Jun 30, 2018, 12:07:47 PM | 2 minutes |

In this blog we are going to learn how to count number of words in a string with HTML tags and read-time of that string in Python.

While writing blogs or articles in html text editor the editor gives a string with embedded  with HTML tags which is saved in database as it is.

We need to show read-time of a blog/article OR number of words in that blog to a reader. We can count words from a string with HTML tags by stripping HTML tags as follows in Python:

we need HTMLParser library for striping of HTML Tags and math library for mathematical operations and re for regex related operations.

from html.parser import HTMLParser
import math
import re

Then we need to create a Class which implement HTMLParser

class MLStripper(HTMLParser):
"""
Class for stripping Html Tags
"""
def __init__(self):
self.reset()
self.strict = False
self.convert_charrefs= True
self.fed = []
    #this function takes html string as input and put data in
def handle_data(self, d):
self.fed.append(d)

def get_data(self):
return ''.join(self.fed)


Now write function which takes input as HTML string return clean word string without HTML tags

def strip_tags(html):
s = MLStripper()
s.feed(html)
return s.get_data()

Write functions for word count and read-time

def count_words(html_string):
# html_string = """
# <h1>This is a title</h1>
# """
word_string = strip_tags(html_string)
count = len(word_string.split()) #without any argument split() works on space
return count


def get_read_time(html_string):
count = count_words(html_string)
read_time_min = math.ceil(count/200.0) #assuming 200wpm reading
return int(read_time_min)
We can count words using regex also
def count_words(html_string):
# html_string = """
# <h1>This is a title</h1>
# """
word_string = strip_tags(html_string)
words = re.findall(r'\w+', word_string)
count = len(words) #joincfe.com/projects/
return count


Hope this will help.

Image Credit : Google


Comments (0)

Leave a comment

Related Blogs

preview thumbnail
Introduction to Kibana

Aug 1, 2020, 6:19:45 PM | Anurag Srivastava

preview thumbnail
Bucket Aggregation in Elasticsearch

Aug 29, 2018, 7:15:06 PM | Anurag Srivastava

preview thumbnail
Metrics Aggregations in Elasticsearch

Aug 18, 2018, 6:02:20 PM | Anurag Srivastava

preview thumbnail
Introduction to Elasticsearch Aggregations

Aug 14, 2018, 4:47:56 PM | Anurag Srivastava

preview thumbnail
Wildcard and Boolean Search in Elasticsearch

Aug 10, 2018, 7:14:40 PM | Anurag Srivastava

preview thumbnail
Basics of Data Search in Elasticsearch

Aug 4, 2018, 7:02:21 AM | Anurag Srivastava

Top Blogs

preview thumbnail
Wildcard and Boolean Search in Elasticsearch

Aug 10, 2018, 7:14:40 PM | Anurag Srivastava

preview thumbnail
Elasticsearch REST APIs

Jul 31, 2018, 6:16:42 PM | Anurag Srivastava

preview thumbnail
preview thumbnail
Create a Chess board in PHP

Mar 9, 2020, 8:45:41 AM | Rocky Paul

preview thumbnail
Bucket Aggregation in Elasticsearch

Aug 29, 2018, 7:15:06 PM | Anurag Srivastava

preview thumbnail
Metrics Aggregations in Elasticsearch

Aug 18, 2018, 6:02:20 PM | Anurag Srivastava