Member-only story
How to turn media articles into metrics data with RSS feeds
News and media serves as a powerful source of data. News captures the latest, current events that are going on in different places across the world. Media articles can have large implications for industries like investment management , finance, cybersecurity and politics. For almost any business, the news is a vital way to judge the exposure that a key interest from the public.
However, using news and media as a searchable or monitorable source of data is difficult. Most media exists as web pages with deeply intricate html patterns that often change overtime. This representation of data is hard to extract, and often times scraping the web can get you or your business into compliance and legal issues.
RSS, which stands for Really Simple Syndication, is an XML format used to provide a summarized “feed” of recent updates for a website. News and media websites use RSS to provide a list of top stories to consume through a concise, easily parsable format. Such websites often publish this feed through a .rss
link. It’s possible to use RSS to convert news data into event data and metrics data that can then be fed into a wide variety of data intensive applications.
The Structure
To start, lets go over the basics of RSS feed structure. For an example, here’s the RSS…