Dude, Where’s My .RAR?

February 25th, 2008 by Adrian

I started working a bunch in UnrealScript again lately, and I’m still coming up to speed on the class changes and event flow in UT3 vs. UT2004.  There’s really no great IDE for UT (sorry, WOTGreal), so I spend a lot of time tracing method names and properties through the exported source scripts from the retail game.  For 10+ full text searches per minute, the grep search in Windows XP just doesn’t cut it.  Unfortunately, neither do Microsoft or Google’s desktop search apps.

So I decided to install a desktop search utility.  I’m a little anti-Google these days, so I plumped for Windows Live desktop search.  I know the guys who spearheaded this as a skunkworks project @ MS, and I was psyched to see it get deployed ASAP back in 2004.  For one thing, I was designing desktop search for Windows Vista, and the shell team was really resistant the idea that life in the OS could be a lot better with a ubiquitous, non-invasive search capability that went deeper than simple property snarfing.  The shell PM team was pretty green, and made up for their lack of experience with plenty of obstinacy. Twisting their arms to install a working prototype on their XP machines was one of the best ways to get them to see the light.

Fast-forward back to present day, 2008.  I downloaded & installed the Windows Live tool, rebooted, and waited for it to index the 100k or so items in my selected directories.  For a couple of days, everything was fine.  Adding the .UC extension to the list of files indexed with the plain text filter was trivial.  But I noticed that every time I rebooted my machine, the indexer would fire up and start scanning all 100k items for changes.  Didn’t matter if it’d been 30 minutes or 30 hours since the last reboot.  Very annoying, lots of disk chatter going on that didn’t need to be, so I thought I’d just update some preference on indexing/scanning frequency.

No such luck.  There’s a nice treeview control to let me add/remove individual directories and UNC paths from the index scope, but no scheduling options at all.  There is a “snooze” feature that postpones indexing for 15 minutes up to a day, but I really don’t want make a new routine out of starting up my computer: boot, wait for indexer to load, right-click on taskbar, snooze, then start working. 

So I reluctantly downloaded Google Desktop, cuz I figure even if I don’t like Google’s business strategy, they’re going to do a kick-ass search applet.  Turns out: not so much.  I’ve already uninstalled the app, so I can’t do a detailed feature blow-by-blow, but I’ll say this much: editing path locations was much more painful in Google Desktop than Windows Live, thanks to the fact that all options/settings are implemented as HTML forms.  No handy treeview control here, just a single-line text field for typing in the root path(s) of your choice.  Thanks, but no thanks.

I don’t know if I could have easily added a new plain-text file type (.UC); Google desktop doesn’t get the benefit of leveraging MSSearch’s iFilters, so the interface had a lot of options & info about installing filter pack addons but less about file extension mapping to existing filters.  I assuming it could be done, whether or not the UI was optimized for the task.

It’s all moot because once again, no scheduling options for indexing.  So I’m no better off than with Windows Live Desktop Search.  With the UI being a little more clumsy for what I want to do, there was no reason to keep this app installed (I don’t need the sidebar, thanks).

So here’s the conclusion to my little rant: why is scheduling the indexing a big deal?   I think the answer probably lies in an all-or-nothing approach to indexed retrieval vs. grep, i.e. spend a lot of time building and maintaining this index so we never have to grep for files again.  That means taking every opportunity to update the index – otherwise, the user might search in the next few minutes, and the results will be inaccurate.

Funny story about Windows XP Search:  Windows XP shipped with the MSSearch indexer.  (MSSearch is Microsoft’s in-house text indexing engine that powers Sharepoint, the Microsoft.com site [last I knew], help for Office apps, etc.)  To avoid a negative first experience with disk performance, it’s not turned on by default.  But users are prompted when doing text searches to turn on indexing “to make future searches faster.”  Okay, here’s the funny part: it won’t.

Sure, WinXP will build the index with MSSearch.  You’ll hear the drive grinding away.  A lot.  But unless the index is 100% up-to-date when you go to search, XP will ignore the index and just grep anyway, to make sure you’re not getting inaccurate results.  Being 100%-up-to-date means that not a single file has been updated since the index was last built.  That almost never happens on an end-user’s desktop.  So if you’re a user who said, “sure, I’d like fast searching, that sounds good!” and enabled indexing in XP… umm, sorry. 

Anyway, the solution to this mess seems pretty straightforward, right?  Just check the OS change log for files updated since the index build, filter those files out of  any indexed results returned (to prevent false positives), populate the result list with what you’ve got so far, then continue to search for the query in the remaining handful of changed files (preferably updating the index as you go).  That’s the approach we took for Vista [disclaimer: I left MSFT before Vista actually shipped, so there’s always the chance that something got nuked at the 11th hour].

The nice thing about using a “grindex” style search is that it puts the time tradeoffs completely in the user’s control.  If you want to spend lots of cycles up front on “background” indexing to make your complete results super-snappy, go for it.  If not, schedule a lazy index build (every 48 hrs?), and a few searches will be a little slower in the meantime.

I could live with that.  But it’d be nice not to have to install a new OS just to get it.

Posted in Technobabble


(comments are closed).

About Sips from the Can

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aliquam justo tortor, dignissim non, ullamcorper at, lobortis vitae, risus. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Aliquam erat volutpat. Aenean mi pede, dignissim in, gravida varius, fringilla ullamcorper, augue.

(edit footer.php to change this text)