OPML to Markdown

less than 1 minute read

When you export your podcast subscriptions from Overcast, you get an OPML file something like this:

<?xml version="1.0"?>
<!-- example OPML file -->
<opml version="1.0">
<head>
<title>Overcast Podcast Subscriptions</title>
</head>
<body>
<outline type="rss" text="Richard Herring's Leicester Square Theatre Podcast" title="Richard Herring's Leicester Square Theatre Podcast" xmlUrl="http://feeds.feedburner.com/RichardHerringLSTPodcast" htmlUrl="https://www.comedy.co.uk/podcasts/richard_herring_lst_podcast/"/>
<outline type="rss" text="5by5 at the Movies" title="5by5 at the Movies" xmlUrl="http://feeds.5by5.tv/movies" htmlUrl="http://5by5.tv/movies"/>
<outline type="rss" text="TV Talk Machine" title="TV Talk Machine" xmlUrl="http://feeds.theincomparable.com/tvtm" htmlUrl="https://www.theincomparable.com/tvtm/"/>
<outline type="rss" text="A STORM OF SPOILERS - A Pop Culture Podcast" title="A STORM OF SPOILERS - A Pop Culture Podcast" xmlUrl="http://feeds.feedburner.com/AStormOfSpoilers" htmlUrl="http://stormofspoilers.com/"/>
<outline type="rss" text="Reconcilable Differences" title="Reconcilable Differences" xmlUrl="https://www.relay.fm/rd/feed" htmlUrl="https://www.relay.fm/rd"/>
<outline type="rss" text="Query" title="Query" xmlUrl="https://www.relay.fm/query/feed" htmlUrl="https://www.relay.fm/query"/>
<outline type="rss" text="Omnibus" title="Omnibus" xmlUrl="https://feeds.megaphone.fm/omnibus" htmlUrl="https://www.omnibusproject.com/"/>
<outline type="rss" text="Techmeme Ride Home" title="Techmeme Ride Home" xmlUrl="http://feeds.feedburner.com/TechmemeRideHome" htmlUrl="https://www.techmeme.com/"/>
<outline type="rss" text="Stuff You Should Know" title="Stuff You Should Know" xmlUrl="https://feeds.megaphone.fm/stuffyoushouldknow" htmlUrl="https://www.howstuffworks.com/"/>
<outline type="rss" text="Following The Leftovers (Ad-Free)" title="Following The Leftovers (Ad-Free)" xmlUrl="http://username:password@baldmove.com/feed/ad-free-the-leftovers/" htmlUrl="http://baldmove.com/category/the-leftovers/"/>
</body>
</opml>
view raw overcast.opml hosted with ❤ by GitHub

Note that the file contains usernames and passwords in the feed URL

I wanted to convert the OPML to a Markdown list. Something like this:

* [text](htmlUrl) [RSS](xmlUrl)

Well a little fancier, I wanted a FontAwesome RSS SVG icon as the feed link:

* [text](htmlUrl) <a style="color:#fa9b39" href="xmlURL" itemprop="sameAs">
<i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a>

I customized a Python script by Dom Davis to:

  • Output an unordered list instead of headers.
  • Sort the list alphabetically.
  • In addition to the title (text)
    • extract the feed URL (xmlUrl)
    • and the podcast home page (htmlUrl)
  • Strip username:password from feed URLs.
  • Strip (Ad-Free) from titles as they made each line too long.
# Based on https://gist.github.com/domdavis/9988867
# Changed to handle podcast export OPML from Overcast. e.g.
# <outline type="rss" text="Road Work" title="Road Work" xmlUrl="http://feeds.5by5.tv/roadwork" htmlUrl="http://5by5.tv/roadwork"/>
# I wanted to grab the xmlUrl and htmlUrl and output a <ul> with links to page and feed.
# also strips usernames and passwords if set in the xmlUrl - add yours to the passwd var
# $ pip install opml
# $ python opml2md.py some_outline.opml
# -> some_outline.md
import codecs
import opml
import sys
INPUT = sys.argv[1]
OUTPUT = '.'.join(INPUT.split('.')[:-1] + ['md'])
with codecs.open(INPUT, 'r') as f:
outline = opml.from_string(f.read())
blocks = []
passwd = "username:password@"
adFreeStr = " (Ad-Free)"
# * [The Talk Show With John Gruber](htmlUrl) [RSS](xmlUrl)
def substring_after(s, delim):
return s.partition(delim)[2]
def substring_before(s, delim):
return s.partition(delim)[0]
def strip_end(text, suffix):
if not text.endswith(suffix):
return text
return text[:len(text)-len(suffix)]
def _extractBlocks(indent, node):
xmlURL = ""
textStr = ""
for child in node:
if indent == 0:
# strip password if present
if passwd in child.xmlUrl:
prefix = substring_before(child.xmlUrl,passwd)
suffix = substring_after(child.xmlUrl,passwd)
xmlURL = prefix + suffix
else:
xmlURL = child.xmlUrl
# strip (Ad-Free) if present - makes lines too long
if adFreeStr in child.text:
textStr = strip_end(child.text, adFreeStr)
else:
textStr = child.text
# alternative output without the FontAwesome SVG stuff
# text = "* [" + child.text + "](" + child.htmlUrl + ") [RSS](" + xmlURL + ")\n";
text = "* [" + textStr + "](" + child.htmlUrl + ") " + '<a style="color:#fa9b39" href="' + xmlURL +'" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a>';
else:
depth = 4 * (indent - 1)
text = (" " * depth) + "* " + child.text
blocks.append(text)
if len(child) > 0:
depth = indent + 1
_extractBlocks(depth, child)
_extractBlocks(0, outline)
output_content = '\n'.join(sorted(blocks))
with codecs.open(OUTPUT, 'w', 'utf-8') as f:
f.write(output_content)
print('->', OUTPUT)
view raw opml2md.py hosted with ❤ by GitHub

This generates the Markdown for you to paste into your post:

* [5by5 at the Movies](http://5by5.tv/movies) <a style="color:#fa9b39" href="http://feeds.5by5.tv/movies" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a>
* [A Cast of Kings - A Game of Thrones Podcast](http://www.slashfilm.com/) <a style="color:#fa9b39" href="http://feeds.feedburner.com/castofkings" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a>
* [A STORM OF SPOILERS - A Pop Culture Podcast](http://stormofspoilers.com/) <a style="color:#fa9b39" href="http://feeds.feedburner.com/AStormOfSpoilers" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a>
* [Accidental Tech Podcast](http://atp.fm/) <a style="color:#fa9b39" href="http://atp.fm/episodes?format=rss" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a>
* [Amber Nectar HCAFC](http://www.ambernectar.org/) <a style="color:#fa9b39" href="http://feeds.soundcloud.com/users/soundcloud:users:54747069/sounds.rss" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a>
* [Back to Work](http://5by5.tv/b2w) <a style="color:#fa9b39" href="http://feeds.5by5.tv/b2w" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a>
* [Bald Move TV](http://baldmove.com/category/tv-podcast) <a style="color:#fa9b39" href="http://baldmove.com/feed/ad-free-tv-podcast/" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a>
* [Bald Movies](http://baldmove.com/category/bald-movies/) <a style="color:#fa9b39" href="http://baldmove.com/feed/ad-free-bald-movies/" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a>

To remove the real username:password from opml2md.py before posting the gist, a little sed:

sed -i '' "s/passwd = .*/passwd = \"username:password@\"/" opml2md.py