encounters with technology, science and life.

Next Delicacy on the Menu: Python Pickles

Python pickle could be a delicacy that vegans, vegetarians and meat lovers alike might relish. No, I am definitely not talking about the serpentine meat in jars of brine stored for eternity. This pickle is from Python programming language.

pickle

Python pickle is an aptly named package that helps us literally “pickle” objects and data from Python code and save it in flat files. This can be reloaded into its original form using the same package. In programming lingo, we call it serialization or marshaling of data, but in simple terms, this is a way of storing data in a format that enables us to resurrect the object if and when required.

Use cases of Pickling
Why we do we need pickling at all in the first place? A few sample uses are listed below:

  • Sending data across a trusted network. You can make data in pickle format and sending through a trusted network, so that the recipient can compose the same objects.
  • If a process takes a long time to do a task, and there are intermittent results available, you can pickle the results into a flat file. If the process goes down for some reason, you could bring up the process, which could reconstruct the object from the pickle file. The process could literally start from where it was left off.
  • Suppose you have a cluster of processes, which work in tandem. One process would take an input, produce an intermittent result and hand off to another process and so on. You could employ pickle to serialize the output objects to flat files and next process in line can pick them up and start processing as if an object that was handed off to it.

How do we use Pickle?

There are inner classes within pickle module and there are many helper functions to dump and load different data types. But essentially there are only two functions within the module. dump() and load(). Using dump() you can serialize data to a file descriptor (an open file, a socket etc.) and load() will help you load the serialized data and return you an object that was dumped before.

Sample Code: Dumping Data

#
# data_dump.py
# Dumps a dictionary to a flat file
#

import pickle
import sys

def main():
    my_data = {"name" : "Python", \
      "type" : "Language", "version" : "2.6"}

    # File opened in binary writable format (wb)
    fp = open("picklejar.dat", "wb")

    # Data is written to the file
    pickle.dump(my_data, fp)

    # As program quits this happens, but just
    # to make code cleaner, we can close the file descriptor.
    fp.close()

if __name__ == '__main__':
    sys.exit(main())

Let us deconstruct the program. The package pickle is going to do all the heavy lifting for us. We have a little dictionary object that we need to store. First thing to do is to open a file, named as you wish. Remember to open the file in writable binary mode. The data dumped will not be purely printable ASCII format.

Save the code above to data_dump.py and run it.

python data_dump.py

If you check the file system, you should be seeing a file containing the dictionary.

ls -l picklejar.dat

Sample Code: Loading Data
Now we need to reload the data that was stored previously in pickle jar.dat.

#
# data_load.py
#

import pickle
import sys

def main():
    # Open the file readonly binary format. Why?
    fp = open('picklejar.dat', 'rb')
    out = pickle.load(fp)
    print(out)
    fp.close()

if __name__ == '__main__':
    sys.exit(main())

If you execute this program, you will see that the output is written in the same fashion as we had originally saved from our previous example. Cool, isn’t it?

If you have a skeptical mind (that is a good thing) you would be asking questions. It is wonderful to store a dictionary, but I have many data structures, how would I store them? Can we store native data types? What can we store and what we can’t?

One way to store a large number of objects is to create a simple wrapper object – list, dictionary etc – which comprises all the objects that you want to pickle and then dump that wrapper object itself. You can not only store objects, but native variables such as integers and strings can also be stored. But you can’t store methods or classes, obviously. But if you think about it, methods are your logical segments, which comes to life as soon as you instantiate the object itself. The missing element of the puzzle is the data itself, which pickle easily saves and restores.

Caveats
Like everything else in life, this too comes with a caveat or two.

  • Pickle storage is not secure. If you are planning to send data between machines across a non-trusted network (say internet), never use pickle. There are packages such as Trusted Pickle aka TPickle to help us with this task. It uses public key encryption to sign the pickle data and send. I haven’t used it myself, but here is the link for those who would like to venture out. http://trustedpickle.sourceforge.net/
  • If you have large number of objects, pickling can be very slow. One reason is that this package is a “pure Python” package. But there is a C implementation of pickle called cPickle, which is a 1000 times faster than pickle itself. If performance is your concern, go for it!

Next time you come across a scenario, where you want to save and recover “states” of objects, give it a shot. Surely worth it.

Photo by Three Points Kitchen

Similar posts

No Comments Yet

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>