Archive for April, 2008

Snakes in an Office

I’ve mentioned before that my job (whose contract is almost up) doesn’t require programming. But that doesn’t mean that programming is not useful. When I first got my laptop, I asked the tech guy about the process of installing new software. It went something like this:

  1. Write a formal request for the software you would like installed and submit it to the tech support department.
  2. Wait for review and approval.
  3. Wait for them to install it.

A pretty standard (and painful) process in large organizations, unfortunately. I asked him, “So if I want to install Python I just submit one of these requests?”

Insead of answering my question, he asked me one in return. “What would you want to install Python for?!” I think he actually spat and curled his lips at the thought.

“I don’t know… writing basic scripts.” I really didn’t know what at the time, but I knew it would be useful. And I wanted to use it.

He proceeded to look at me as if I had suggested bathing in tomato sauce as a remedy for headaches.

So I was under the impression that I couldn’t install new software and didn’t want to bother going through all that formal cock holding; I went without my Firefox, my Python. I’ve since discovered that I can install things. (Sorta.)

Anyway, I was using Python at work today, and I just love it. Previously, I had been fumbling around with VBA in Excel and Word. I had a button set up in Word that would run a macro/VBA to export the form’s contents to a CSV file in a particular folder. I could open a Word document anywhere, press the export button, and the CSV would be created/overwritten in that folder. Then I had a button set up in Excel to run a macro that would import all CSV files in that folder as rows in the spreadsheet. (This was actually really useful, and if you have to deal with a lot of similar tasks in Office, use VBA and macros.) There was an inconsistency in the number of rows I had in my spreadsheet and the number of docs I had… so I wanted to compare the exported CSVs with the available DOCs.

Enter, the Python interpreter (great for little tasks and as a substitute shell on windows).

The word documents were in a bunch of different folders (for various reasons) but once I had a variable, originals, containing a list of all the documents, and a variable, exported, containing a list of all the CSV, files, I was rolling.


>>> len(originals), len(exported)
(198,200) # Hmmmm, looks like a job for sets!
>>> s1,s2 = (set(originals), set(exported))
>>> len(s1.difference(s2))
398
>>> # What the...
...
>>> originals
['doc1.doc', 'doc2.doc', ...., 'doc200.doc']
>>> exported
['doc1.txt', 'doc2.txt', ..., 'doc200.txt']
>>> # Oh yeah, different extensions. Duh...
... # Bus leaves SOON!
... # Need to quickly strip the extensions and rebuild the sets....
...
>>> s1,s2 = ( set([s[:-4] for s in s1]), set([s[:-4] for s in s2]) )
>>> s1
['doc1', 'doc2', ... , 'doc200']
>>> # Cool
...
>>> s1.difference(s2)
(['doc38', 'doc72'])

AHA!! Caught my bus. Python is great. :D