search

From IndieWeb

search in the IndieWeb usually refers to searching your personal site for your own content (and/or caches of content you’ve responded to), sometimes searching IndieWeb chat archives or the IndieWeb wiki, or other IndieWeb search services.

Why

Why should your site be searchable?

  • You want your independently created and owned content to be found and preferably above and before content on silos.

Why should your site have a search feature?

  • The ability to easily search IndieWeb sites is a very commonly requested feature by readers / users of said sites.
  • They don't want to have to think to go to Google (and take extra steps) to search your site.
  • Make it easy for your friends that read your site to find stuff there by having a simple search box in the top right of every page (common UI convention) that allows the user to type something in and perform a search on your site. You can of course use 3rd party search engine to do this, even returning results directly from them. E.g. using a Google search box on your site.
  • Website visitors can find exactly what they are looking for without having to search through pages of feeds or archives.

Why Not

There are sometimes reasons you don't want a particular page or section of your site to not be indexed for searching. E.g.

  • Private / private by obscurity URLs
  • Dynamic aggregations, e.g. tag aggregation pages, archives (by date etc.), because you'd rather that just the post permalinks themselves get indexed to reduce noise in search results.

How

How to implement search on your site.

searchability - level 1

Make sure your site is at least searchable (IndieMark search level 1). This means:

  • allow robots to index. Permissive or no robots.txt. Either don't have /robots.txt (easiest), or if you have one, it MUST allow search engines to index public posts on your site.
  • post content in HTML. Your post content MUST be in the visible HTML of the page retrieved from your post permalink. No depending on Javascript to render your post content - if you can't curl it, it's not on the web.
  • site-specific searchability. Be able to use "site:yoursite.example.com search-term" in Google and other search engines (that support site-specific searches) directly to find and display your posts in search results.

search box - level 2

Add a simple search box to your site using a static form that submits to a search engine to provide time ordered (most recent first) results! (IndieMark search level 2)

E.g.

<form class="search" action="http://www.google.com/search" method="get">
<input type="hidden" name="as_sitesearch" value="example.com"/>
<input type="hidden" name="tbs" value="sbd:1,cdr:1,cd_min:1/1/1970"/>
<input type="search" name="q"/>
<button type="submit">Search</button>
</form>

And change example.com to your personal site name! This HTML has been tested live since 2012-07-06.

Search form styling is left as an exercise for the creator.

site search with 3rd party backend - level 3

Search where your site uses a 3rd party search service (e.g. Google), but still provides the results on your own domain. (IndieMark search level 3)

How to TBD.

Third Party Search Services

site search with site backend - level 4

Search where your site handles all the indexing and search queries. (IndieMark search level 4)

How to TBD.

Software

self-hacked engines

client side search - level 4 (alternative)

As an alternative to a backend search service, client side search uses JavaScript to perform search within the browser using a prebuilt index. This is particularly useful for static sites since it does not require significant server resources.

Software

How To Avoid

For all the reasons above in Why Not, here's how to avoid having specific pages not be indexed:

Put this in the head of your page you don't want indexed:

<meta name="robots" content="noindex,follow" />

This instructs the crawler to not add the page to it's index, but to follow links contained in it. If you don't want it to consider links either, change follow to nofollow.

IndieWeb Examples

IndieWeb sites that have search interfaces.

Tantek

Tantek Çelik has had a search interface on his site tantek.com since 2012-07-06 which uses a simple static form that submits to Google search (IndieMark search level 2).

Aaron Parecki

From 2012-07 to 2016-01, Aaron Parecki had a search interface on his site aaronparecki.com which used a simple static form that posted to to Google search scoped to the website and with query parameters that indicate to Google to return posts in reverse date order (IndieMark search level 2).

Since 2016-08 there is a search interface which searches a local index of posts, returning the list of matching posts rendered in normal list format in reverse date order (IndieMark search level 4).

Since 2020-06 the local search has been expanded to include searching the contents of reposts and likes, as well as the authors of those posts. e.g. https://aaronparecki.com/search?q=tantek

Ben Werdmuller

Ben Werdmüller has had a search interface on his site werd.io since (2013-06-20) which uses his own site's backend (MongoDB in particular). (IndieMark search level 4).

Barnaby Walters

Barnaby Walters added a simple static search form (based on Tantek’s code) to waterpigs.co.uk on 2014-02-24 which submits to a site-scoped Google search (IndieMark search level 2).

Also experimenting with local search engine which indexes the archive of all the pages I’ve linked to as well as mentions of my own pages using Elasticsearch.

UI as of 2014-03-01, showing authorship information, page name, excerpt, URL: 2014-03-01-indie-search-halsway.png

Dan Lyke

Dan Lyke has had locally hosted search since March of 2001, and currently has a simple search which uses his PostgreSQL back-end text indexes, and does some ordering of search results based on phrases, and "+" and "-" to require and exclude. (IndieMark search level 4).

Since I'm scanning various other sites for inbound links, I'd like to, at some point, index those other sites as well for additional search options.

Kyle Mahan

Kara Mahan previously had local search backed by Postgres full text matching (@@ operator) on 2015-01-16. Posts were presented as a standard h-feed, but I'd wanted to style them more like "search results" (and have more results per page) in the future.

In 2016-01, I converted my site to Known, which uses MySQL full text search by default.

Christian Weiske