Removing duplicate iCal entries

geek alert This is hugely off-topic, so feel free to skip to the next article, which will likely have something to do with screenwriting and/or filmmaking.

My assistant Chad and I use Apple’s iCal to keep track of appointments. It’s nowhere near as sophisticated as Exchange or a real professional calendar system, but for the most part, it works. He maintains “John’s Work” calendar, and I maintain “John’s Personal” calendar. We both use the built-in publish-and-subscribe feature, so we see the same things on each computer.

After doing this for several years, however, some problems have arisen — mostly stemming from syncing with various Palm devices. Calendar events get duplicated, often six or seven times. Multiply that by several years, and the files get huge, and slow: my Work.ics file ballooned to 1.7 megabytes.

After searching the internet for a program that would fix this, I finally had to write my own. In the interest of paying-it-forward to the next guy with the same problem, here’s what I wrote. Continue on only if you’re truly geeky, or desperate.

UPDATE: Changes with OS 10.4 (Tiger), and specifically iCal 2.0, means that the script as written won’t work anymore. Sorry. But the underlying concept still holds. With an hour and a little ambition, it should be possible to eliminate duplicates in the same way. Just be sure to always work on a backup of the calendar file.

If you open up an iCal file in a text editor, you’ll see that it’s actually very simple. Each entry starts with “BEGIN:VEVENT” and ends with “END:VEVENT”. In-between, you see the start and end times (written as 20041210T130000, which means 2004/12/10 at 1:00 p.m.), a summary of the event, and a UID, which is supposed to keep each entry unique. Unfortunately, events sometimes get imported without the UID’s, which means you can end up with seventeen appointments for picking up the dry cleaning last March.

The best (i.e. simplest) solution I could come up with was to simply sort the file based on start times, and eliminate duplicates. This is a little unforgiving; if you happened to have “Coffee with Jen” and “Conference call w/ Paul Brown” listed at exactly the same date and time, one would get overwritten. But that was a longshot I could live with.

Here’s the Python script I wrote. I like Python because it fits my brain, but you could easily translate it to Perl, Ruby or almost anything. (I’d be tempted to do it in AppleScript, but I find it slow at handling big text files.)

!/usr/bin/python

This script removes duplicate entries from an iCal file,

by building a dictionary keyed from the DTSTART value.

Note: if two different entries have the identical start

time and date, one will be overwritten.

Obviously, work on a backup copy of your iCal file.

import sys

replace next line with path to calendar

the_file = open('/Users/john/Library/Calendars/Work.ics')
the_text = the_file.readlines()

the_dict = {} temp = "" id = ""

for line in the_text: if 'BEGIN:VEVENT' in line: temp = "" temp = line elif 'END:VEVENT' in line: temp = temp + line.rstrip()
the_dict[id] = temp ### note: overwrites if multiple elif 'DTSTART' in line: temp = temp + line id = line ### use 'DTSTART' as key id value else: temp = temp + line

vcalendar headers; replace if yours are different

print 'BEGIN:VCALENDAR' print 'VERSION:2.0' print 'X-WR-CALNAME:Revised Work' print 'PRODID:-//Apple Computer, Inc//iCal 1.5//EN' print 'X-WR-RELCALID:DC928866-9705-11D9-9E58-000393D00CCE-CALP' print 'X-WR-TIMEZONE:US/Pacific' print 'CALSCALE:GREGORIAN'

use dictionary values; discard keys

for line in the_dict.values(): print line

stardard vcalendar footer

print 'END:VCALENDAR'

You notice that the script reads from a file, but doesn’t write back out. That’s because I’m using BBEdit, and it’s easier just to generate the text file in that.

  • Digg
  • Facebook
  • Reddit
  • SphereIt
  • StumbleUpon
  • Twitter
March 19, 2005 @ 11:34 am | Comments (16)
Filed under: Geek Alert

16 Responses to “Removing duplicate iCal entries”

  1. Richard

    Wait… you have an assistant? Damn. I need to sell a script or two. Or three.

  2. Craig Mazin

    John:

    I’m in awe. Not common parlance “awe”, but Old Testament “awe”. You frighen me. :)

    I might actually use this script. My assistant and I have been iCal wrangling with each other for 2 years now, and the big issue is the duplication problem you’re describing.

    Hey, have you ever tried iSynCal? If you have and it doesn’t work, let me know before I head down a blind alley.

    Best,

    C.

  3. Rob

    It’s this kind of thing that turned me off computers until 1995. When I had to write my own code for games before I could play them (circa 1984) I vowed to never again own a computer. Long live the Selectric II.

  4. dustym

    You pythonista.

  5. John

    Craig:

    No, I haven’t tried iSyncCal. I have no doubt that someone has written a real program that does what I need; I just couldn’t find it when looking.

  6. Gary Lee

    Very cool. I just used this script to weed out duplicate entries caused by a broken sync with multiple devices (palm, ipod, .mac). Every single event was duplicated in one of my calendars. Your script worked a charm, but I did have to make a minor correction.

    I had to change the “in” statements from - if in line: to - if line.find() != -1:

    Python wants a character as left operand to “in”.

  7. Gary Lee

    My apologies… a bit got lost in my last message. It should have said:

    I had to change the “in” statements from -

        if &lt string &gt in line:
    
    to -
        if line.find(&lt string &gt) != -1:
    

    for example:

        if 'BEGIN:VEVENT' in line:
    
    should be changed to:
        if line.find('BEGIN:VEVENT') != -1:
    

  8. John

    Warning: High Geek Factor!

    Gary:

    Glad you got it to work. I guess I’m not clear why the original version was tripping up for you.

    if 'BEGIN:VEVENT' in line:

    is pretty Pythonic; the .find() test really shouldn’t be necessary. Was it a curly-apostophe situation?

    Also, are you using a pretty recent version (I’m on 2.3)?

    Just curious. Any Python experts out there, feel free to chime in.

  9. John

    Gary:

    Clarifying/Obfuscating a little more on the “in” issue:

    For the Unicode and string types, x in y is true if and only if x is a substring of y. An equivalent test is y.find(x) != -1. Note, x and y need not be the same type; consequently, u’ab’ in ‘abc’ will return True. Empty strings are always considered to be a substring of any other string, so “” in “abc” will return True. Changed in version 2.3: Previously, x was required to be a string of length 1.

    (this from http://docs.python.org/ref/comparisons.html#l2h-432)

    So, it seems like the “in” line really does work — but only in 2.3 and later. It does for me. And it looks prettier, to boot.

  10. Emilio

    I ran the original code posted above in python 2.4 and Tiger. Runs fine, but the output has an extra empty line between all data lines, except between end: and begin: of entries. I save the output to a file. Upon tryinc to import it, iCal indicates the file is unreadable. Any guesses? thanks! EAL

  11. John

    Emilio:

    Tiger (specifically, iCal 2.0) has a different file format/layout. The current script won’t work. Sorry.

    Given priorities around here, I probably won’t be able to tweak the script myself, but it shouldn’t be too hard for someone with a little ambition and an hour or two.

    Some of the new information in the .ics file includes daylight savings time, “sequence,” “location,” “exdate,” and others.

    If you decide to tackle the Tiger situation, please drop a note to let us know.

  12. Emilio

    I will look into it. I am not py pgrmmr, but if I can figure the new format I might take a swing at it. Grateful for your promt reply. I will let you know when I give up.

  13. Emilio

    Didn’t give up. The code works fine, except that the output transforms the \n into \r thus resulting in extra blank lines in output. I was able to remove all duplicates efficiently by using TextWrangler. I am not 100% sure about the details of the following steps, but let me report in case others want to tinker:

    Export calendar with dups into file to work on. In TextWrangler (free from BBsoft) open the file that contains py code above (modified as indicated). Run py code form inside TextWrangler (fast!). In output window search/replace all \r\r with \r. Check if your output file has any \r that resulted from \n’s (this may be the result of some calendar events from my Newton -it was 1994!) and replace with . Save as … output into filename.ics using macintosh line breaks and (this is the part I don’t quite remember-sorry) unicode-8. Import from iCal. Thanks for the code. I had no idea what python was until I read this thread!

  14. jason

    I have had a similar problem and have used Undupe from Stevens Creek Software – http://www.stevenscreek.com/palm/undupe.html It finds and can remove duplicates (not just in Datebook) then when you sync, the dupes are taken care of. $10 well spent for me.

  15. Michael

    Could you re-write this script to remove events with a specific keyword in their title? I don’t know enough Python to make the appropriate changes to your script and I haven’t been able to find a ready-made program/script for this task.

  16. LinuxBaer

    thank you for that! it’s also useful with linux and kde. had same probs with palm sync.

    now erverything is fine again. thank you!!!

 

About

This site is run by screenwriter John August. Mostly, he answers reader-submitted questions about the craft, but occasionally he goes on tangents that run far afield of writing and filmmaking. You'll also find info on past, present and future projects.

Follow Me

On Twitter: @johnaugust

Ask a Question

If you have a question about screenwriting or my movies that hasn't been answered, by all means ask. There are a few guidelines to follow.

Featured Articles

101: Some screenwriting basics


There are more than 900 articles on the site. You can find category archives at the bottom of every page.

Read Me

  • The Variant
  • A new short story available for download, Kindle and iPhone.

Feeds