I’ve come to enjoy writing posts, transcriptions, reading notes, and even formal papers using markdown and its variants. I really want to do so with this blog as well, but it’s cumbersome. For wordpress.org sites, there is a plugin to use markdown. But of course, plugins aren’t available on the wp.com side. I started this blog as one of my first-ever entries into the world of the web, and went with wp.com because it was comfortable and easy. I certainly could migrate it to a self-hosted site, but if I’m doing that, then I’m going to move it off of wordpress altogether. I could also write the posts in markdown, transform them to html, and then cut-and-paste the html into a new post in the web browser. That sounds like a pain in the ass, and duplicates files where duplication isn’t necessary.
In the meantime, I still want to write my posts in markdown, store them as text files in my svn repo, and publish/push them directly to the site from the command line. This post is a first attempt at doing that. Here’s the strategy, which borrows a little bit from how static site generators like jekyll or growl work. It’s pretty simple in conception, but includes a couple of tricks in execution. The basic flow is this:
- Write the post in markdown.
- Transform the the post to html.
- Push the newly html-encoded post to my this site using xml-rpc and the MetaWeblog API.
There is a slick python markdown module to handle the transformation part. There is also a library for using xml-rpc included in the standard library, which includes the MetaWeblog API. So the hard work is already done. In order to use the API, though, the application has to send a few different bits of info to the blog — blog_id, username, password, title, content, tags, categories, status
. The title, content, tags, and categories need to be passed as a dictionary. I want an easy way to include that info on a post, so borrowing from the practice of static site generators, I decided to include a YAML header to each post, and parse that with pyYAML. So, each post includes a header like this:
--- title: title of the post categories: categories tags: list, of, tags status: publish (or draft) ---
The three dashes are important, as they separate the YAML information from the post content, and the script uses a regex to accomplish that separation. Then, it parses the YAML and produces a dictionary of key:value pairs. So, there are two dependencies not in the standard library that you must make sure are installed on your PYTHONPATH
for this to work– pyYAML and python-markdown.
Step one, import the dependencies:
1 2 3 4 5 6 7 |
#! /usr/bin/python import xmlrpclib import yaml import re import sys import markdown |
Step two, compile a regex (borrowed from here) to split the YAML from the markdown content, open your file and extract the info:
RE_YAML = re.compile(r'(^---\s*$(?P<yaml>.*?)^---\s*$)?(?P<content>.*)', re.M | re.S) postFile = open(sys.argv[1], 'r').read() fileInfo = RE_YAML.match(postFile) getYAML = yaml.load(fileInfo.groupdict().get('yaml')) postMD = fileInfo.groupdict().get('content')
You’ll notice that this bit opens a filename passed as an argument from the command line, which is where I run the script from. You could also ask for raw_input
or however you want to open the post file. The final three lines of that snippet split the file via regex and yaml.
Step three, transform the post and prepare the metadata for xml-rpc transport to the blog:
newPost = markdown.markdown(postMD, extentions=['footnotes', 'codehilite']) blogurl = 'https://example.wordpresscom/xmlrpc.php' username = '######' passwrod = '######' blogid = '' server = xmlrpclib.ServerProxy(blogurl, allow_none=True) if getYAML['status'] == 'publish': status = '1' else: status = '0' data = {} data['title'] = getYAML['title'] data['description'] = newPost data['categories'] = [getYAML['categories']] data['mt_keywords'] = getYAML['tags']
Step four, send the new post to the blog and print a confirmation with the new post’s postid:
post_id = server.metaWeblog.newPost(blogid, username, password, data, status) print "Created id - %s" %post_id
Step five, deploy the whole thing from the cli:
$ post name_of_post.txt
So, for that to work, you simply need to add an alias named post
to your .bash_profile
or .profile
file, pointing to the script.
And that’s it. I’m also keeping a pickle of all my posts and their id numbers, in case I want to edit a post again later. The other thing I’m working on with this is a step prior to publishing that will us a regex to find images in the post, and then locate them from an images folder on my hard drive and upload them to wp.com using metaWeblog.newMediaObject
. I’ll post it once I’ve got it running.
Two caveats — right now, this script always produces a new draft or published post, it doesn’t do any checking to see if the post already exists. That’s another thing I’ll try fixing in the near future. At least for now, as of this post right here, I can write in TextMate, store in my svn repo, and push a finished post to this blog from the command line. That makes me a little more happy.
[…] my other home ← post to wordpress.com with markdown […]
[…] Posted on May 31, 2011 by ctb I’ve added the ability to extract and upload images to my post-with-markdown script. Unfortunately, markdown doesn’t have short codes for setting the size of a displayed […]
This is very cool. I would like to do something similar. Do you have the code posted on GitHub or somewhere where I could access it?
Thanks!
I haven’t put the script up on Github or bitbucket, but if you combine all the pieces above into a single script it’ll work. I’ve added a few more things to the script, including image uploads, but they don’t exactly work as well as I’d like. At any rate, I’ll put what I have now up on github latter today and post the link here.
That would be awesome. Thanks!
[…] I also put the script up there to post to wordpress.com using with markdown that I wrote about here. That one needs some work, though. Particularly handling ascii/unicode issues and with uploading […]
I forgot to come back and here and mention that did put the scripts up on Github.
Love your script. Been using it for a while. I do have one request though, if the script would be possible to post on multiple categories, it would be awesome!
Thanks
Hi Ed–
I”m glad it’s been useful. Have you been using the version on github? The script should already do this. When you’ve tried to enter multiple categories, how did you do it? (I almost never use more than one category for posts, but do with tags.)
Try formatting the yaml front matter for categories like this:
That returns a python list of multiple categories for the
['categories']
dictionary key, and should insert multiple categories into the xml.If that doesn’t work, I’ll take another look at it. I’ve been planning for a while to rewrite the script and do away with yaml altogether, opting instead for the python markdown metadata extensions. It essentially does the same thing as the yaml, returning a dictionary of key/values to use with xmlrpc with one less dependency. Actually, what I really wish is that wordpress.com had a json api.
Hi again, Ed.
I did a little work on it this morning and I have a new version of the script on a dev branch on github. With my tests it seems to work work fine now. You have to put each category or tag on its own indented line. You no longer need the
---
to mark off the metadata section. Just use metadata along the lines of multimarkdown. So, the metadata would now look like this:Hope that helps.
[…] got part way down the rabbit hole before I found this nice looking Python project. It looks very complete. The GitHub project is worth a look […]
[…] batch renaming photos, tweeting from the command line (here and here), bursting and OCRing pdfs, posting to wordpress.com using markdown, using easygui for pythonic historians, and on making a static-site digital history archive. I did […]
Reblogged this on Adil Akhter's Blog.
Hi guys, I am getting
dky@mb:~/bin$ ./markdown2wp.py
Traceback (most recent call last):
File “./markdown2wp.py”, line 22, in
import imageUpload
ImportError: No module named imageUpload
Any ideas what module is imageUpload? I can’t seem to locate it and I did install BeautifulSoup as well.
Hi Don–
imageUpload is a second script for dealing with images in a post. Did you get the source from GitHub? https://github.com/parezcoydigo/markdown2wp.com
Put that file in the same directory as the markdown script. If you don’t care about images, then just comment out those lines.
Man thanks ctb, I have added the imageUpload.py script and the error has gone away. Thanks for the fast response.
ctb, looks like I ran into one more issue with the script. Have you seen this:
dky@mb:~/bin (master *)$ ./markdown2wp.py ~/Desktop/test.txt/test.txt
Created new post. ID = 95
Traceback (most recent call last):
File “./markdown2wp.py”, line 83, in
postList = pickle.load(open(‘/tmp/post.txt’))
EOFError
I created /tmp/post.txt and it’s writable not sure why it’s spitting out the error. I think the pickle feature is nice to have a running list of posts. Isn’t a deal breaker though if I can’t get it working.
That’s because the pickle file is empty, you’re trying to load an empty file that has no pickled data. Try this– comment out the pickle.load line for a first time, make postList and empty dictionary, and run the script. Alternatively, you can seed the pickle file from the command line interpreter. Then, after you’ve seeded the file you can uncomment and it should work. Does that make sense?
Originally my intention was to use the pickle file to download old posts and edit them if I wanted. I never did that.
ctb, thanks for the response looks like I got it working with the following:
postList = {“100”: “A cat was on my roof today”}
pickle.dump(postList, open(“/tmp/post.txt”, “wb”))
postList = pickle.load(open(‘/tmp/post.txt’))
postList[post_id] = data[‘title’]
pickle.dump(postList, open(‘/tmp/post.post.txt’, ‘w’))
print(postList)
I am a complete python newbie and it took me a while to piece together what you meant. The next issue I encountered with the script was the images aren’t being uploaded properly. They appear to be corrupted? Have you ever seen this? The script doesn’t error out but I have a png file in the same folder as the markdown file and every time it get’s uploaded all I see is a broken image link when previewing the post.
I also tried adding a jpeg file and it too was also corrupted. At first I was thinking it couldn’t process png but now both images are corrupted lead me to something else.
I love the tool and it has been fun struggling over getting it working.
Hi Don–
OK, I looked a little more into this. There are three potential pitfalls with the imageUploader. One is encoding. In the version of the script you are using, I had the encoding being done by python’s base64 module. In playing around with it this morning, I think that’s one problem. I updated the script on github by taking that line out. You can redownload it, or you can edit your copy by uncommenting line #41 and switching the encoder to xmlrpc’s Binary encoder. Then, just delete the base64 line. In fact, you don’t even need to import base64 anymore if you do that. That solves one problem.
The second problem is this– there’s no way using xmlrpc to attach uploaded images to a post directly. There is a potential workaround. After uploading an image, I could retrieve it and update its parameters. But, that seems like unnecessary overhead. The post will still show the image if the url for the image is correct. Which leads to the final problem.
If you upload an image more than once, even with {overwrite: True} in the image parameters, WP gives it a unique url that will be different from the original. This means, for example, if you upload a draft and then later reupload a final version and upload the images each time, your urls will be jacked up. Or, if you’ve previously uploaded an image with the same name, your url will be wrong. Do take care to correctly put the correct url in your post in the first place– ex,
https://parezcoydigo.files.wordpress.com/2012/12/wpid888-dtermcopy.jpg
, oryourblogname.files.wordpress.com/YYYY/MM/imagename.jpg
. The fileUpload function does return a set of urls and names for the images, so what I can maybe do is go back to the post and match the urls from the uploaded files to their appearance in the script. That should ensure they display.Let me also add, that to upload the files, your script should be using:
server.wp.uploadFile(blogid, username, password, imageData)
This uses the wordpress API instead of the legacy API.
ctb,
Looks like removing the base64 encoding did the trick. Images upload with no problem now. I thinking having the file upload function return the url of the image and then updaing the post data would be awesome. Currently I am using local files so I don’t exactly know the final url for the images until they are uploaded to wp.
You can predict the url easily. They will always be in the format I mentioned a couple of comments above. So, your markdown should look like:

or, whatever the filetype is.
Yep, understood, I’ll look at modifying the script to my needs. Awesome work by the way this will save me a lot of time and improve my workflow.
Reblogged this on clasense4 blog and commented:
This is Interesting, and I will use it.