post to wordpress.com with markdown

I’ve come to enjoy writing posts, transcriptions, reading notes, and even formal papers using markdown and its variants. I really want to do so with this blog as well, but it’s cumbersome. For wordpress.org sites, there is a plugin to use markdown. But of course, plugins aren’t available on the wp.com side. I started this blog as one of my first-ever entries into the world of the web, and went with wp.com because it was comfortable and easy. I certainly could migrate it to a self-hosted site, but if I’m doing that, then I’m going to move it off of wordpress altogether. I could also write the posts in markdown, transform them to html, and then cut-and-paste the html into a new post in the web browser. That sounds like a pain in the ass, and duplicates files where duplication isn’t necessary.

In the meantime, I still want to write my posts in markdown, store them as text files in my svn repo, and publish/push them directly to the site from the command line. This post is a first attempt at doing that. Here’s the strategy, which borrows a little bit from how static site generators like jekyll or growl work. It’s pretty simple in conception, but includes a couple of tricks in execution. The basic flow is this:

  1. Write the post in markdown.
  2. Transform the the post to html.
  3. Push the newly html-encoded post to my this site using xml-rpc and the MetaWeblog API.

There is a slick python markdown module to handle the transformation part. There is also a library for using xml-rpc included in the standard library, which includes the MetaWeblog API. So the hard work is already done. In order to use the API, though, the application has to send a few different bits of info to the blog — blog_id, username, password, title, content, tags, categories, status. The title, content, tags, and categories need to be passed as a dictionary. I want an easy way to include that info on a post, so borrowing from the practice of static site generators, I decided to include a YAML header to each post, and parse that with pyYAML. So, each post includes a header like this:

---
title: title of the post
categories: categories
tags: list, of, tags
status: publish (or draft)
---

The three dashes are important, as they separate the YAML information from the post content, and the script uses a regex to accomplish that separation. Then, it parses the YAML and produces a dictionary of key:value pairs. So, there are two dependencies not in the standard library that you must make sure are installed on your PYTHONPATH for this to work– pyYAML and python-markdown.

Step one, import the dependencies:

1
2
3
4
5
6
7
#! /usr/bin/python

import xmlrpclib
import yaml
import re
import sys
import markdown

Step two, compile a regex (borrowed from here) to split the YAML from the markdown content, open your file and extract the info:

RE_YAML = re.compile(r'(^---\s*$(?P<yaml>.*?)^---\s*$)?(?P<content>.*)',
    re.M | re.S)
postFile = open(sys.argv[1], 'r').read()
fileInfo = RE_YAML.match(postFile)
getYAML = yaml.load(fileInfo.groupdict().get('yaml'))
postMD = fileInfo.groupdict().get('content')

You’ll notice that this bit opens a filename passed as an argument from the command line, which is where I run the script from. You could also ask for raw_input or however you want to open the post file. The final three lines of that snippet split the file via regex and yaml.

Step three, transform the post and prepare the metadata for xml-rpc transport to the blog:

newPost = markdown.markdown(postMD, extentions=['footnotes', 'codehilite'])

blogurl = 'https://example.wordpresscom/xmlrpc.php'
username = '######'
passwrod = '######'
blogid = ''
server = xmlrpclib.ServerProxy(blogurl, allow_none=True)

if getYAML['status'] == 'publish':
    status = '1'
else: status = '0'

data = {}
data['title'] = getYAML['title']
data['description'] = newPost
data['categories'] = [getYAML['categories']]
data['mt_keywords'] = getYAML['tags']

Step four, send the new post to the blog and print a confirmation with the new post’s postid:

post_id = server.metaWeblog.newPost(blogid, username, password, data, status)
print "Created id - %s" %post_id

Step five, deploy the whole thing from the cli:

$ post name_of_post.txt

So, for that to work, you simply need to add an alias named post to your .bash_profile or .profile file, pointing to the script.

And that’s it. I’m also keeping a pickle of all my posts and their id numbers, in case I want to edit a post again later. The other thing I’m working on with this is a step prior to publishing that will us a regex to find images in the post, and then locate them from an images folder on my hard drive and upload them to wp.com using metaWeblog.newMediaObject. I’ll post it once I’ve got it running.

Two caveats — right now, this script always produces a new draft or published post, it doesn’t do any checking to see if the post already exists. That’s another thing I’ll try fixing in the near future. At least for now, as of this post right here, I can write in TextMate, store in my svn repo, and push a finished post to this blog from the command line. That makes me a little more happy.

About

Associate Professor of Early Latin America Department of History University of Tennessee-Knoxville

Tagged with: ,
Posted in programming
25 comments on “post to wordpress.com with markdown
  1. […] my other home ← post to wordpress.com with markdown […]

  2. […] Posted on May 31, 2011 by ctb I’ve added the ability to extract and upload images to my post-with-markdown script. Unfortunately, markdown doesn’t have short codes for setting the size of a displayed […]

  3. adamswynne says:

    This is very cool. I would like to do something similar. Do you have the code posted on GitHub or somewhere where I could access it?

    Thanks!

  4. ctb says:

    I haven’t put the script up on Github or bitbucket, but if you combine all the pieces above into a single script it’ll work. I’ve added a few more things to the script, including image uploads, but they don’t exactly work as well as I’d like. At any rate, I’ll put what I have now up on github latter today and post the link here.

  5. adamswynne says:

    That would be awesome. Thanks!

  6. […] I also put the script up there to post to wordpress.com using with markdown that I wrote about here. That one needs some work, though. Particularly handling ascii/unicode issues and with uploading […]

  7. ctb says:

    I forgot to come back and here and mention that did put the scripts up on Github.

  8. Ed P says:

    Love your script. Been using it for a while. I do have one request though, if the script would be possible to post on multiple categories, it would be awesome!

    Thanks

  9. ctb says:

    Hi Ed–

    I”m glad it’s been useful. Have you been using the version on github? The script should already do this. When you’ve tried to enter multiple categories, how did you do it? (I almost never use more than one category for posts, but do with tags.)

    Try formatting the yaml front matter for categories like this:

    categories:
         - category1
         - category2
    

    That returns a python list of multiple categories for the ['categories'] dictionary key, and should insert multiple categories into the xml.

    If that doesn’t work, I’ll take another look at it. I’ve been planning for a while to rewrite the script and do away with yaml altogether, opting instead for the python markdown metadata extensions. It essentially does the same thing as the yaml, returning a dictionary of key/values to use with xmlrpc with one less dependency. Actually, what I really wish is that wordpress.com had a json api.

  10. ctb says:

    Hi again, Ed.

    I did a little work on it this morning and I have a new version of the script on a dev branch on github. With my tests it seems to work work fine now. You have to put each category or tag on its own indented line. You no longer need the --- to mark off the metadata section. Just use metadata along the lines of multimarkdown. So, the metadata would now look like this:

    title: My title
    tags:  python
           markdown
    categories:  programming
                 writing
    status: publish
    

    Hope that helps.

  11. […] got part way down the rabbit hole before I found this nice looking Python project. It looks very complete. The GitHub project is worth a look […]

  12. […] batch renaming photos, tweeting from the command line (here and here), bursting and OCRing pdfs, posting to wordpress.com using markdown, using easygui for pythonic historians, and on making a static-site digital history archive. I did […]

  13. Md.AdilAkhter says:

    Reblogged this on Adil Akhter's Blog.

  14. Don says:

    Hi guys, I am getting

    dky@mb:~/bin$ ./markdown2wp.py
    Traceback (most recent call last):
    File “./markdown2wp.py”, line 22, in
    import imageUpload
    ImportError: No module named imageUpload

    Any ideas what module is imageUpload? I can’t seem to locate it and I did install BeautifulSoup as well.

  15. ctb says:

    Hi Don–

    imageUpload is a second script for dealing with images in a post. Did you get the source from GitHub? https://github.com/parezcoydigo/markdown2wp.com

    Put that file in the same directory as the markdown script. If you don’t care about images, then just comment out those lines.

  16. Man thanks ctb, I have added the imageUpload.py script and the error has gone away. Thanks for the fast response.

  17. ctb, looks like I ran into one more issue with the script. Have you seen this:

    dky@mb:~/bin (master *)$ ./markdown2wp.py ~/Desktop/test.txt/test.txt
    Created new post. ID = 95
    Traceback (most recent call last):
    File “./markdown2wp.py”, line 83, in
    postList = pickle.load(open(‘/tmp/post.txt’))
    EOFError

    I created /tmp/post.txt and it’s writable not sure why it’s spitting out the error. I think the pickle feature is nice to have a running list of posts. Isn’t a deal breaker though if I can’t get it working.

  18. ctb says:

    That’s because the pickle file is empty, you’re trying to load an empty file that has no pickled data. Try this– comment out the pickle.load line for a first time, make postList and empty dictionary, and run the script. Alternatively, you can seed the pickle file from the command line interpreter. Then, after you’ve seeded the file you can uncomment and it should work. Does that make sense?

    Originally my intention was to use the pickle file to download old posts and edit them if I wanted. I never did that.

  19. ctb, thanks for the response looks like I got it working with the following:

    postList = {“100”: “A cat was on my roof today”}
    pickle.dump(postList, open(“/tmp/post.txt”, “wb”))

    postList = pickle.load(open(‘/tmp/post.txt’))

    postList[post_id] = data[‘title’]
    pickle.dump(postList, open(‘/tmp/post.post.txt’, ‘w’))
    print(postList)

    I am a complete python newbie and it took me a while to piece together what you meant. The next issue I encountered with the script was the images aren’t being uploaded properly. They appear to be corrupted? Have you ever seen this? The script doesn’t error out but I have a png file in the same folder as the markdown file and every time it get’s uploaded all I see is a broken image link when previewing the post.

    I also tried adding a jpeg file and it too was also corrupted. At first I was thinking it couldn’t process png but now both images are corrupted lead me to something else.

    I love the tool and it has been fun struggling over getting it working.

  20. ctb says:

    Hi Don–

    OK, I looked a little more into this. There are three potential pitfalls with the imageUploader. One is encoding. In the version of the script you are using, I had the encoding being done by python’s base64 module. In playing around with it this morning, I think that’s one problem. I updated the script on github by taking that line out. You can redownload it, or you can edit your copy by uncommenting line #41 and switching the encoder to xmlrpc’s Binary encoder. Then, just delete the base64 line. In fact, you don’t even need to import base64 anymore if you do that. That solves one problem.

    The second problem is this– there’s no way using xmlrpc to attach uploaded images to a post directly. There is a potential workaround. After uploading an image, I could retrieve it and update its parameters. But, that seems like unnecessary overhead. The post will still show the image if the url for the image is correct. Which leads to the final problem.

    If you upload an image more than once, even with {overwrite: True} in the image parameters, WP gives it a unique url that will be different from the original. This means, for example, if you upload a draft and then later reupload a final version and upload the images each time, your urls will be jacked up. Or, if you’ve previously uploaded an image with the same name, your url will be wrong. Do take care to correctly put the correct url in your post in the first place– ex, https://parezcoydigo.files.wordpress.com/2012/12/wpid888-dtermcopy.jpg, or yourblogname.files.wordpress.com/YYYY/MM/imagename.jpg. The fileUpload function does return a set of urls and names for the images, so what I can maybe do is go back to the post and match the urls from the uploaded files to their appearance in the script. That should ensure they display.

  21. ctb says:

    Let me also add, that to upload the files, your script should be using:

    server.wp.uploadFile(blogid, username, password, imageData)

    This uses the wordpress API instead of the legacy API.

  22. ctb,

    Looks like removing the base64 encoding did the trick. Images upload with no problem now. I thinking having the file upload function return the url of the image and then updaing the post data would be awesome. Currently I am using local files so I don’t exactly know the final url for the images until they are uploaded to wp.

  23. ctb says:

    You can predict the url easily. They will always be in the format I mentioned a couple of comments above. So, your markdown should look like:

    ![Alt text](http://yourblogname.files.wordpress.com/YYYY/MM/imagename.png)

    or, whatever the filetype is.

  24. Yep, understood, I’ll look at modifying the script to my needs. Awesome work by the way this will save me a lot of time and improve my workflow.

  25. clasense4 says:

    Reblogged this on clasense4 blog and commented:
    This is Interesting, and I will use it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

parecer:
parecer:

Hacer juicio ú dictamen acerca de alguna cosa... significando que el objeto excita el juicio ú dictamen en la persona que le hace.

Deducir ante el Juez la accion ú derecho que se tiene, ó las excepciones que excluyen la accion contrária.

RAE 1737 Academia autoridades
Buy my book!



Chad Black

About:
I, your humble contributor, am Chad Black. You can also find me on the web here.
%d bloggers like this: