Hello there!
We love web development here at Shift8. We also love finding ways to automate and integrate different interfaces together. For example, we have written Python solutions to integrate TREB (Toronto Real Estate Board) listings into a WordPress site.
Finding a Python Library to work with WordPress
There are many readily available libraries that one can use with Python to connect to WordPress’ XML-RPC interface in order to make queries and add/delete/modify content.
A personal favourite is the WordPress XML-RPC library that can be easily integrated into python with pip. Simply install the library and you’re ready to start writing scripts to manipulate content on wordpress!
Load the libraries in your Python code
This is obvious. I’ll provide a snippet that imports all the necessary modules to do pretty much anything in WordPress.
#!/usr/bin/python from wordpress_xmlrpc import Client, WordPressPost from wordpress_xmlrpc.methods.taxonomies import * from wordpress_xmlrpc.methods.posts import * from wordpress_xmlrpc.methods.users import * from wordpress_xmlrpc.methods import *
There’s not much else to say here, though it would be ideal to only load what you need to minimize the footprint as much as possible.
Authenticating with WordPress
This too is obvious. With the XML-RPC library, you need to define the credentials needed in order to make the connection. If you are planning on writing a useful Python script that will ultimately end up on a public forum such as github, it would be ideal to use the Python module called ConfigParser. This will allow to to store authentication details in a safe place, such as your home folder in a hidden file perhaps.
Either way this is how you would define those connection details :
wp_url = 'http://www.yoursite.com' wp_username = 'username' wp_password = 'password'
To actually authenticate a session in order to perform tasks, you will need to use the above variables in a connection string :
# Get blog URL wp_site = Client(wp_url, wp_username, wp_password) siteurl = wp_site.call(options.GetOptions(['home_url']))[0].value
The Client directive is what makes the connection. The next line, siteurl is simply pulling the home url variable from WordPress’ settings and storing it in a variable. That is the simple function that we conduct that obviously requires a WordPress authenticated session in order to complete.
Searching all post titles in WordPress with Python
There are countless guides out there that show you how to integrate Python and WordPress. What we found was lacking was how to make more complicated queries and searches within WordPress. One of our requirements was the ability to make posts to WordPress but first check if any posts of the same title exist. This is obviously not perfect for example what if two different posts have the same title? Either way, this is a good place to start with respect to building a definition in Python to conduct these types of queries.
#Searches wordpress posts based on title def find_id(title): offset = 0 increment = 20 while True: filter = { 'offset' : offset } p = wp.call(GetPosts(filter)) if len(p) == 0: break # no more posts returned for post in p: if post.title == title: return(post.id) offset = offset + increment return(False)
So the above definition basically queries for all posts with an iteration of 20 posts at a time (think pagination). Each post’s title is pulled from the list and compared to the title that was originally passed to the definition, “title”.
Pretty simple, huh? It took us a little bit to perfect this so it actually worked with some semblance of reliability.
Search WordPress tags with Python
What if you wanted to do the same thing with WordPress tags instead? This is where we found very little documentation specifically on the subject. We took the above definition and modified it to accommodate searching tags.
def find_id(tag): p = wp.call(taxonomies.GetTerms('post_tag')) if len(p) == 0: return(False) for thetags in p: print 'looking for tag : ' , tag , ' in thetags : ' , str(thetags) if str(thetags) in tag: return(True) return(False)
You’ll see that we took the first definition and modified it. One of the first things you’ll notice is that we’re no longer looping through all posts in increments. You’ll also notice the wp.call function is a bit different. I would hope that its pretty self explanatory. The only other thing to note here is that the thetags variable had to be force converted to a string variable type. These queries did not work unless that happened simply because we are comparing a string (tag) to a hashed list (thetag before converting to string). If there’s a better way, I’m open to suggestions!
Search all WordPress post content with Python
Why not take it one step further? Why not write a definition that allows you to search all post content for a particular word or string? I cant think of practical uses for this because it would depend on the root use case for your python script would be. One of the best things that python can do is scrape and pull information from all sorts of sources. Integrating this information into WordPress via python is sometimes much much easier than writing a custom WordPress plugin simply because of the fact that if you are dealing with data sets, data sources and API’s from other web sources, Python is much better suited (in most cases) to perform this data integration and connection than WordPress or PHP for that matter. This was what we found when we wanted to integrate TREB into WordPress. We were pulling complicated and obfuscated CSV data and had to clean it up, grab what data was necessary and then import it into WordPress.
#Searches wordpress post content for anything def find_id(content): offset = 0 increment = 20 while True: filter = { 'offset' : offset } p = wp.call(GetPosts(filter)) if len(p) == 0: break # no more posts returned for post in p: if post.content.find(content) != -1: post.post_status = 'unpublish' # We remove the post if its found, but you could do anything wp.call(posts.EditPost(post.id, post)) return(post.id) offset = offset + increment return(False)
You’ll see in the above example that we search all post content (this time back to the 20 posts per increment). If a match is found, we remove the post or unpublish it at least. You can do anything obviously, but this is what we wanted to happen.
I hope this helps!