OPML to Markdown
When you export your podcast subscriptions from Overcast, you get an OPML file something like this:
<?xml version="1.0"?> | |
<!-- example OPML file --> | |
<opml version="1.0"> | |
<head> | |
<title>Overcast Podcast Subscriptions</title> | |
</head> | |
<body> | |
<outline type="rss" text="Richard Herring's Leicester Square Theatre Podcast" title="Richard Herring's Leicester Square Theatre Podcast" xmlUrl="http://feeds.feedburner.com/RichardHerringLSTPodcast" htmlUrl="https://www.comedy.co.uk/podcasts/richard_herring_lst_podcast/"/> | |
<outline type="rss" text="5by5 at the Movies" title="5by5 at the Movies" xmlUrl="http://feeds.5by5.tv/movies" htmlUrl="http://5by5.tv/movies"/> | |
<outline type="rss" text="TV Talk Machine" title="TV Talk Machine" xmlUrl="http://feeds.theincomparable.com/tvtm" htmlUrl="https://www.theincomparable.com/tvtm/"/> | |
<outline type="rss" text="A STORM OF SPOILERS - A Pop Culture Podcast" title="A STORM OF SPOILERS - A Pop Culture Podcast" xmlUrl="http://feeds.feedburner.com/AStormOfSpoilers" htmlUrl="http://stormofspoilers.com/"/> | |
<outline type="rss" text="Reconcilable Differences" title="Reconcilable Differences" xmlUrl="https://www.relay.fm/rd/feed" htmlUrl="https://www.relay.fm/rd"/> | |
<outline type="rss" text="Query" title="Query" xmlUrl="https://www.relay.fm/query/feed" htmlUrl="https://www.relay.fm/query"/> | |
<outline type="rss" text="Omnibus" title="Omnibus" xmlUrl="https://feeds.megaphone.fm/omnibus" htmlUrl="https://www.omnibusproject.com/"/> | |
<outline type="rss" text="Techmeme Ride Home" title="Techmeme Ride Home" xmlUrl="http://feeds.feedburner.com/TechmemeRideHome" htmlUrl="https://www.techmeme.com/"/> | |
<outline type="rss" text="Stuff You Should Know" title="Stuff You Should Know" xmlUrl="https://feeds.megaphone.fm/stuffyoushouldknow" htmlUrl="https://www.howstuffworks.com/"/> | |
<outline type="rss" text="Following The Leftovers (Ad-Free)" title="Following The Leftovers (Ad-Free)" xmlUrl="http://username:password@baldmove.com/feed/ad-free-the-leftovers/" htmlUrl="http://baldmove.com/category/the-leftovers/"/> | |
</body> | |
</opml> |
Note that the file contains usernames and passwords in the feed URL
I wanted to convert the OPML to a Markdown list. Something like this:
* [text](htmlUrl) [RSS](xmlUrl)
Well a little fancier, I wanted a FontAwesome RSS SVG icon as the feed link:
* [text](htmlUrl) <a style="color:#fa9b39" href="xmlURL" itemprop="sameAs">
<i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a>
I customized a Python script by Dom Davis to:
- Output an unordered list instead of headers.
- Sort the list alphabetically.
- In addition to the title (
text
)- extract the feed URL (
xmlUrl
) - and the podcast home page (
htmlUrl
)
- extract the feed URL (
- Strip username:password from feed URLs.
- Strip
(Ad-Free)
from titles as they made each line too long.
# Based on https://gist.github.com/domdavis/9988867 | |
# Changed to handle podcast export OPML from Overcast. e.g. | |
# <outline type="rss" text="Road Work" title="Road Work" xmlUrl="http://feeds.5by5.tv/roadwork" htmlUrl="http://5by5.tv/roadwork"/> | |
# I wanted to grab the xmlUrl and htmlUrl and output a <ul> with links to page and feed. | |
# also strips usernames and passwords if set in the xmlUrl - add yours to the passwd var | |
# $ pip install opml | |
# $ python opml2md.py some_outline.opml | |
# -> some_outline.md | |
import codecs | |
import opml | |
import sys | |
INPUT = sys.argv[1] | |
OUTPUT = '.'.join(INPUT.split('.')[:-1] + ['md']) | |
with codecs.open(INPUT, 'r') as f: | |
outline = opml.from_string(f.read()) | |
blocks = [] | |
passwd = "username:password@" | |
adFreeStr = " (Ad-Free)" | |
# * [The Talk Show With John Gruber](htmlUrl) [RSS](xmlUrl) | |
def substring_after(s, delim): | |
return s.partition(delim)[2] | |
def substring_before(s, delim): | |
return s.partition(delim)[0] | |
def strip_end(text, suffix): | |
if not text.endswith(suffix): | |
return text | |
return text[:len(text)-len(suffix)] | |
def _extractBlocks(indent, node): | |
xmlURL = "" | |
textStr = "" | |
for child in node: | |
if indent == 0: | |
# strip password if present | |
if passwd in child.xmlUrl: | |
prefix = substring_before(child.xmlUrl,passwd) | |
suffix = substring_after(child.xmlUrl,passwd) | |
xmlURL = prefix + suffix | |
else: | |
xmlURL = child.xmlUrl | |
# strip (Ad-Free) if present - makes lines too long | |
if adFreeStr in child.text: | |
textStr = strip_end(child.text, adFreeStr) | |
else: | |
textStr = child.text | |
# alternative output without the FontAwesome SVG stuff | |
# text = "* [" + child.text + "](" + child.htmlUrl + ") [RSS](" + xmlURL + ")\n"; | |
text = "* [" + textStr + "](" + child.htmlUrl + ") " + '<a style="color:#fa9b39" href="' + xmlURL +'" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a>'; | |
else: | |
depth = 4 * (indent - 1) | |
text = (" " * depth) + "* " + child.text | |
blocks.append(text) | |
if len(child) > 0: | |
depth = indent + 1 | |
_extractBlocks(depth, child) | |
_extractBlocks(0, outline) | |
output_content = '\n'.join(sorted(blocks)) | |
with codecs.open(OUTPUT, 'w', 'utf-8') as f: | |
f.write(output_content) | |
print('->', OUTPUT) |
This generates the Markdown for you to paste into your post:
* [5by5 at the Movies](http://5by5.tv/movies) <a style="color:#fa9b39" href="http://feeds.5by5.tv/movies" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a> | |
* [A Cast of Kings - A Game of Thrones Podcast](http://www.slashfilm.com/) <a style="color:#fa9b39" href="http://feeds.feedburner.com/castofkings" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a> | |
* [A STORM OF SPOILERS - A Pop Culture Podcast](http://stormofspoilers.com/) <a style="color:#fa9b39" href="http://feeds.feedburner.com/AStormOfSpoilers" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a> | |
* [Accidental Tech Podcast](http://atp.fm/) <a style="color:#fa9b39" href="http://atp.fm/episodes?format=rss" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a> | |
* [Amber Nectar HCAFC](http://www.ambernectar.org/) <a style="color:#fa9b39" href="http://feeds.soundcloud.com/users/soundcloud:users:54747069/sounds.rss" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a> | |
* [Back to Work](http://5by5.tv/b2w) <a style="color:#fa9b39" href="http://feeds.5by5.tv/b2w" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a> | |
* [Bald Move TV](http://baldmove.com/category/tv-podcast) <a style="color:#fa9b39" href="http://baldmove.com/feed/ad-free-tv-podcast/" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a> | |
* [Bald Movies](http://baldmove.com/category/bald-movies/) <a style="color:#fa9b39" href="http://baldmove.com/feed/ad-free-bald-movies/" itemprop="sameAs"> <i class="fas fa-fw fa-rss-square" aria-hidden="true"></i></a> |
To remove the real username:password from opml2md.py
before posting the gist, a little sed
:
sed -i '' "s/passwd = .*/passwd = \"username:password@\"/" opml2md.py