Removing duplicate iCal entries
This is hugely off-topic, so feel free to skip to the next article, which will likely have something to do with screenwriting and/or filmmaking.
My assistant Chad and I use Apple’s iCal to keep track of appointments. It’s nowhere near as sophisticated as Exchange or a real professional calendar system, but for the most part, it works. He maintains “John’s Work” calendar, and I maintain “John’s Personal” calendar. We both use the built-in publish-and-subscribe feature, so we see the same things on each computer.
After doing this for several years, however, some problems have arisen — mostly stemming from syncing with various Palm devices. Calendar events get duplicated, often six or seven times. Multiply that by several years, and the files get huge, and slow: my Work.ics file ballooned to 1.7 megabytes.
After searching the internet for a program that would fix this, I finally had to write my own. In the interest of paying-it-forward to the next guy with the same problem, here’s what I wrote. Continue on only if you’re truly geeky, or desperate.
UPDATE: Changes with OS 10.4 (Tiger), and specifically iCal 2.0, means that the script as written won’t work anymore. Sorry. But the underlying concept still holds. With an hour and a little ambition, it should be possible to eliminate duplicates in the same way. Just be sure to always work on a backup of the calendar file.
If you open up an iCal file in a text editor, you’ll see that it’s actually very simple. Each entry starts with “BEGIN:VEVENT” and ends with “END:VEVENT”. In-between, you see the start and end times (written as 20041210T130000, which means 2004/12/10 at 1:00 p.m.), a summary of the event, and a UID, which is supposed to keep each entry unique. Unfortunately, events sometimes get imported without the UID’s, which means you can end up with seventeen appointments for picking up the dry cleaning last March.
The best (i.e. simplest) solution I could come up with was to simply sort the file based on start times, and eliminate duplicates. This is a little unforgiving; if you happened to have “Coffee with Jen” and “Conference call w/ Paul Brown” listed at exactly the same date and time, one would get overwritten. But that was a longshot I could live with.
Here’s the Python script I wrote. I like Python because it fits my brain, but you could easily translate it to Perl, Ruby or almost anything. (I’d be tempted to do it in AppleScript, but I find it slow at handling big text files.)
!/usr/bin/python
This script removes duplicate entries from an iCal file,
by building a dictionary keyed from the DTSTART value.
Note: if two different entries have the identical start
time and date, one will be overwritten.
Obviously, work on a backup copy of your iCal file.
import sys
replace next line with path to calendar
the_file = open('/Users/john/Library/Calendars/Work.ics')
the_text = the_file.readlines()
the_dict = {} temp = "" id = ""
for line in the_text:
if 'BEGIN:VEVENT' in line:
temp = ""
temp = line
elif 'END:VEVENT' in line:
temp = temp + line.rstrip()
the_dict[id] = temp ### note: overwrites if multiple
elif 'DTSTART' in line:
temp = temp + line
id = line ### use 'DTSTART' as key id value
else:
temp = temp + line
vcalendar headers; replace if yours are different
print 'BEGIN:VCALENDAR' print 'VERSION:2.0' print 'X-WR-CALNAME:Revised Work' print 'PRODID:-//Apple Computer, Inc//iCal 1.5//EN' print 'X-WR-RELCALID:DC928866-9705-11D9-9E58-000393D00CCE-CALP' print 'X-WR-TIMEZONE:US/Pacific' print 'CALSCALE:GREGORIAN'
use dictionary values; discard keys
for line in the_dict.values(): print line
stardard vcalendar footer
print 'END:VCALENDAR'
You notice that the script reads from a file, but doesn’t write back out. That’s because I’m using BBEdit, and it’s easier just to generate the text file in that.


March 20th, 2005 at 9:23 pm
Wait… you have an assistant? Damn. I need to sell a script or two. Or three.
March 21st, 2005 at 1:52 am
John:
I’m in awe. Not common parlance “awe”, but Old Testament “awe”. You frighen me. :)
I might actually use this script. My assistant and I have been iCal wrangling with each other for 2 years now, and the big issue is the duplication problem you’re describing.
Hey, have you ever tried iSynCal? If you have and it doesn’t work, let me know before I head down a blind alley.
Best,
C.
March 21st, 2005 at 10:05 pm
It’s this kind of thing that turned me off computers until 1995. When I had to write my own code for games before I could play them (circa 1984) I vowed to never again own a computer. Long live the Selectric II.
March 22nd, 2005 at 1:07 pm
You pythonista.
March 31st, 2005 at 1:06 am
Craig:
No, I haven’t tried iSyncCal. I have no doubt that someone has written a real program that does what I need; I just couldn’t find it when looking.
April 20th, 2005 at 4:52 am
Very cool. I just used this script to weed out duplicate entries caused by a broken sync with multiple devices (palm, ipod, .mac). Every single event was duplicated in one of my calendars. Your script worked a charm, but I did have to make a minor correction.
I had to change the “in” statements from - if in line:
to -
if line.find( ) != -1:
Python wants a character as left operand to “in”.
April 20th, 2005 at 5:24 am
My apologies… a bit got lost in my last message. It should have said:
I had to change the “in” statements from -
if < string > in line:to -if line.find(< string >) != -1:for example:
if 'BEGIN:VEVENT' in line:should be changed to:if line.find('BEGIN:VEVENT') != -1:April 20th, 2005 at 12:49 pm
Warning: High Geek Factor!
Gary:
Glad you got it to work. I guess I’m not clear why the original version was tripping up for you.
if 'BEGIN:VEVENT' in line:is pretty Pythonic; the .find() test really shouldn’t be necessary. Was it a curly-apostophe situation?
Also, are you using a pretty recent version (I’m on 2.3)?
Just curious. Any Python experts out there, feel free to chime in.
April 20th, 2005 at 1:04 pm
Gary:
Clarifying/Obfuscating a little more on the “in” issue:
(this from http://docs.python.org/ref/comparisons.html#l2h-432)
So, it seems like the “in” line really does work — but only in 2.3 and later. It does for me. And it looks prettier, to boot.
May 3rd, 2005 at 9:35 am
I ran the original code posted above in python 2.4 and Tiger. Runs fine, but the output has an extra empty line between all data lines, except between end: and begin: of entries. I save the output to a file. Upon tryinc to import it, iCal indicates the file is unreadable. Any guesses? thanks! EAL
May 3rd, 2005 at 10:27 am
Emilio:
Tiger (specifically, iCal 2.0) has a different file format/layout. The current script won’t work. Sorry.
Given priorities around here, I probably won’t be able to tweak the script myself, but it shouldn’t be too hard for someone with a little ambition and an hour or two.
Some of the new information in the .ics file includes daylight savings time, “sequence,” “location,” “exdate,” and others.
If you decide to tackle the Tiger situation, please drop a note to let us know.
May 3rd, 2005 at 11:05 am
I will look into it. I am not py pgrmmr, but if I can figure the new format I might take a swing at it. Grateful for your promt reply. I will let you know when I give up.
May 3rd, 2005 at 8:58 pm
Didn’t give up. The code works fine, except that the output transforms the \n into \r thus resulting in extra blank lines in output. I was able to remove all duplicates efficiently by using TextWrangler. I am not 100% sure about the details of the following steps, but let me report in case others want to tinker:
Export calendar with dups into file to work on. In TextWrangler (free from BBsoft) open the file that contains py code above (modified as indicated). Run py code form inside TextWrangler (fast!). In output window search/replace all \r\r with \r. Check if your output file has any \r that resulted from \n’s (this may be the result of some calendar events from my Newton -it was 1994!) and replace with . Save as … output into filename.ics using macintosh line breaks and (this is the part I don’t quite remember-sorry) unicode-8. Import from iCal.
Thanks for the code. I had no idea what python was until I read this thread!
May 6th, 2005 at 9:34 am
I have had a similar problem and have used Undupe from Stevens Creek Software – http://www.stevenscreek.com/palm/undupe.html It finds and can remove duplicates (not just in Datebook) then when you sync, the dupes are taken care of. $10 well spent for me.
December 7th, 2006 at 9:28 am
Could you re-write this script to remove events with a specific keyword in their title? I don’t know enough Python to make the appropriate changes to your script and I haven’t been able to find a ready-made program/script for this task.
February 28th, 2007 at 10:36 pm
thank you for that! it’s also useful with linux and kde. had same probs with palm sync.
now erverything is fine again. thank you!!!