In an age of LLMs, is it time to reconsider human-edited web directories?

Back in the early-to-mid '90s, one of the main ways of finding anything on the web was to browse through a web directory.

These directories generally had a list of categories on their front page. News/Sport/Entertainment/Arts/Technology/Fashion/etc.

Each of those categories had subcategories, and sub-subcategories that you clicked through until you got to a list of websites. These lists were maintained by actual humans.

Typically, these directories also had a limited web search that would crawl through the pages of websites listed in the directory.

Lycos, Excite, and of course Yahoo all offered web directories of this sort.

(EDIT: I initially also mentioned AltaVista. It did offer a web directory by the late '90s, but this was something it tacked on much later.)

By the late '90s, the standard narrative goes, the web got too big to index websites manually.

Google promised the world its algorithms would weed out the spam automatically.

And for a time, it worked.

But then SEO and SEM became a multi-billion-dollar industry. The spambots proliferated. Google itself began promoting its own content and advertisers above search results.

And now with LLMs, the industrial-scale spamming of the web is likely to grow exponentially.

My question is, if a lot of the web is turning to crap, do we even want to search the entire web anymore?

Do we really want to search every single website on the web?

Or just those that aren’t filled with LLM-generated SEO spam?

Or just those that don’t feature 200 tracking scripts, and passive-aggressive privacy warnings, and paywalls, and popovers, and newsletters, and increasingly obnoxious banner ads, and dark patterns to prevent you cancelling your “free trial” subscription?

At some point, does it become more desirable to go back to search engines that only crawl pages on human-curated lists of trustworthy, quality websites?

And is it time to begin considering what a modern version of those early web directories might look like?

@degoogle #tech #google #web #internet #LLM #LLMs #enshittification #technology #search #SearchEngines #SEO #SEM

      • ᴇᴍᴘᴇʀᴏʀ 帝A
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 months ago

        Thanks for that, a real blast from the past. I have a vague memory that I was an editor on the ODP or dmoz back in the day.

        Sorry, I was hesitant to post links at first before I vetted them.

        Yes, perhaps not coincidentally, I thought it best to ask for a human-curated link.

        • Michelle Hughes@a2mi.social
          link
          fedilink
          arrow-up
          2
          ·
          4 months ago

          @Emperor

          Y’know, come to think of it, Wikipedia might be a better project to point to here. All the content on there is hand curated. When I’m interested in a subject, I usually go to wikipedia first instead of a search engine. Sometimes I am directed out to other websites from there.

          I set up a quick keyword search so I can type “wp blah blah blah” into my url bar and it searches wikipedia.

          https://support.mozilla.org/en-US/kb/how-search-from-address-bar?redirectslug=Smart+keywords&redirectlocale=en-US

          • ᴇᴍᴘᴇʀᴏʀ 帝A
            link
            fedilink
            English
            arrow-up
            2
            ·
            4 months ago

            The only issue with Wikipedia (coming from a long, long time user and Administrator) is that freely open and editable wiki needs a critical mass of users to become self-policing.

            One of the projects I’ve been kicking around for a while (and has worked it’s way to the top of my list) is a wiki that integrates with Lemmy (and, potentially, other Fediverse services) which you could definitely use as a form of curated link directory - having an external links sections was definitely one of the uses it could be put to (as well as holding an instances documentation and a community’s FAQs, for example).