Updated Script (minor): Modified Remove Dupes from Google Reader

Download script: mod_remove_dupes_from_google.user.js.

Summary

Wired feeds have entries which differ only by the amount of trailing whitespace (e.g. spaces, line feeds, and carriage returns). These entries would not be identified as duplicate. The problem was present from the original script.

Resolution

Fixed.

Remarks

• The original script function was modified to strip trailing whitespace, so otherwise identical entries would be removed as duplicates. This was a one-line addition to the existing code.

• Script updated from version m1.0.0 to m1.0.1.

10 Responses to “Updated Script (minor): Modified Remove Dupes from Google Reader”

  1. Help Says:

    I see the button in Google Reader, but when I click on it, nothing happens and duplicate entries are still present. I’m using FF 3.0.8. Thanks for any help. I would say that 10-15% of the entries in my reader are duplicates, so getting this to work would be awesome.

  2. ifixscripts Says:

    Whew, scared me, I thought maybe Google changed things again, but just checked and it works OK for me.

    One thing to be aware of is that feeds can differ in their title snippet/summary entries (sometimes only very slightly), although the actual entries read the same. For example, the Wired feeds might have two entries titled “Bad News: Scientists Make Cheap Gas From Coal” with a summary that follows reading either “Electric cars have been getting a lot of buzz lately, but a more” or “A new, more efficient process for turning coal into liquid fuel uses less”.

    Those two Wired entries would not be considered duplicates. I’m still thinking of a graceful way to filter such duplicates without a lot of false positives. Some feeds regularly reuse the title alone for regular entries, so a script cannot filter on just that with any confidence.

    Anyway, a couple of other things are to make sure you are in list mode and that you have all the entries loaded on the page for checking (i.e. you can’t page down and load more) and to try clicking the button a couple of times, in case the first click is missed or otherwise not process.

    If those don’t make a difference, can you post me the feeds with the duplicates you use? If I can see the problem, I can usually fix it. If you do not wish to post your feeds in this comment publicly, you can e-mail it support@devoresoftware.com. If you do this, please make sure you put “I Fix Scripts” in the title or content to ensure it isn’t filtered to junk. I literally get several thousand spam e-mails a day and my filters are pretty strict on the common addresses.

  3. Help Says:

    I see now – I need to be in list mode. Sorry about that. It works great.

    On a side note – is there a way to tell Reader to load all of the items? I have one feed that generates several hundred news items and the only way I can get it to load them all is to scroll, which is time consuming with several hundred items.

    Thanks again for this – I played with it on a feed with about 340 items and it knocked out about 40 of them. Awesome.

    • ifixscripts Says:

      I wanted an autoload-all-new feature myself when testing the script, but haven’t yet figured out how it might be accomplished. Besides there being a hidden feature we don’t know of, there are a few possible script solutions, though I’m not sure if they would work.

      Do you have a link to a feed which generates a large number of messages per day, possibly including duplicate entries? I can test a couple ideas for a solution pretty quickly with an appropriate test feed. Otherwise, I have to leave messages unread on my current feeds for several days, hand mark them as unread, or load up a bunch of dummy feeds that I don’t really want to have. It’s easier to test against a real high-traffic feed.

  4. Help Says:

    You could try this one:

    http://feeds.wsjonline.com/wsj/xml/rss/3_7011.xml

    Or subscribing to all of these will for sure get a ton:

    http://feeds.bizjournals.com/bizj_southeast
    http://feeds.bizjournals.com/bizj_atlanta
    http://www.bizjournals.com/rss/feed/vertical_topic/21
    http://www.bizjournals.com/rss/feed/vertical_subtopic/19
    http://www.bizjournals.com/rss/feed/vertical_subtopic/50
    http://www.bizjournals.com/rss/feed/vertical_subtopic/19
    http://www.bizjournals.com/rss/feed/vertical_topic/7

    Thanks for your help.

    That should generate a lot of duplicates.

    • ifixscripts Says:

      Hmm, you’ll have to give me several days, or more, to get to this one. I can see how auto-load might be done. Opening the last new entry loaded makes Google Reader load more feed entries that follow.

      If not all new entries are loaded, a script can automatically open the last loaded entry, mark it unread, and close it, scan to the new end of entries, open the (updated) last entry, mark unread, close, and so on until full new entry load is detected. And the script would need to make sure it didn’t outrun the Google Reader event driver. Nasty ugly stuff, but a workable concept, though it would require quite a bit of new code and logic.

      Looks like it’s also possible to hook up code to make the remove duplicates button work with feed entries displayed in expanded mode, as well as the current list mode. At least that is a relatively straightforward enhancement, since it uses the same basic idea as the existing list mode dupe check and could share some code.

      • Help Says:

        Very cool – sounds very promising. I’ll look forward to it when you get a chance to play with it.

  5. Help Says:

    I’ve played with the hack you linked to, but I can’t get it to work. It should work really well for what I need, but it doesn’t seem to load any additional items in Reader for me.

    • ifixscripts Says:

      I tested this as working OK after your post, but then this evening the Prefetch More script stopped working for me as well.

      However, about a half hour ago, the author posted a script update to userscripts which made things work here again. Guess this a case of playing keep-up with the Google developers, and why directly manipulating internal variables can be tricky on active projects.

      Anyway, try uninstalling the old version and installing the latest version, changing the list variable to a high number as before. See if that makes things work for you now.

  6. ifixscripts Says:

    I finally had sufficient free time to look this issue over and, just before diving in, I noticed the Google Reader Prefetch More script. As supplied, the script automatically loads 60 messages in list view, rather than the default 20, but this can be easily changed to a higher number. If you install it, you might want to change the line var list = 60; to var list = 1000; or a suitably high number to get high initial message load values.

    The script appears to directly twiddle Reader internals and consequently goes out of date more often, but the author is actively supporting the script judging by the feedback.

    Neat hack. I haven’t had to time to look it over in detail and see how it works yet, but the basic idea is one I’d like to look more closely at before implementing more intensive and messy measures to load all unread messages. I’ll try to incorporate the basic idea, if viable, in the dupe-removing script within the coming week, and do a new version release.

    P.S. That one WSJ news link does generate a heck of a lot of entries for testing.

Leave a Reply