Important Update (May 25, 2014): Google has started parsing and indexing Javascript. The approach of this article is to use
<noscript>
tags but Google will likely ignore those now. We upgraded our site to sniff Google and other popular
search engines and serve our simple content that way. However, in the future it might not be a concern as Google
plan on having Javascript sites just work!
It’s a myth that if you use a client side MVC framework that your application’s content cannot be indexed by search engines. In fact, Discourse forums were indexable by Google the day we launched.
Search engine visibility does, however, require a little more work to implement. This is a real trade off you’ll have to consider before you decide to go with an MVC framework instead of an application that does its rendering on the server side.
Before you get scared off: I’d like to point out that our search engine code was done by Sam Saffron in a day! This extra work might take you less time than you thought.
Getting Started: Pretty URLs
Out of the box, most client side MVC frameworks will default to hash-based URLs that take advantage of
the fact that characters in a URL after an #
are not passed through to the server. Once the Javascript
application boots up it looks at hash data and figures out what it has to do.
Modern browsers have a better alternative to hash-based URLs: The HTML5 History API. The History API
allows your Javascript code to modify the URL without reloading the entire page. Instead of URLs
like http://yoursite.com/#/users/eviltrout
you can support http://yoursite.com/users/eviltrout
.
There are two downsides to using the History API. The first is Internet Explorer only started supporting it IE10. If you have to support IE9, you’ll want to stick with hashes. (Note: Discourse actually works on IE9, but the URL does not update as the user goes around. We’ve accepted this trade off.)
The second downside is that you have to modify your server to serve up your Javascript application regardless of what URL is requested. You need to do this because if you change the browser URL and the user refreshes their browser the server will look for a document at that path that doesn’t exist.
Serving Content
The second downside I mentioned actually has an nice upside to it. Even if you are serving up the same Javascript code regardless of URL, there is still an opportunity for the server to do some custom work.
The trick is to serve up two things in one document: your Javascript application
and the basic markup for search engines in a <noscript>
tag. If you’re unfamiliar with a the <noscript>
tag, it’s designed for rendering versions of a resource to a clients like search engines that don’t support
Javascript.
This is really easy to do in Ruby on Rails (and probably other frameworks that I’m less familiar with!).
Your application.html.erb
can look like this:
<html>
<body>
<section id='main'></section>
<noscript>
<%= yield %>
</noscript>
</body>
... load your Javascript code here into #main
</html>
With this approach, if any server side route renders a simple HTML document, it will end up in the <noscript>
tag for indexing. I wouldn’t spend much time on what the HTML looks like. It’s meant to
be read by a robot! Just use very basic HTML. To preview what a search engine will see, you can turn
off Javascript support in your browser and hit refresh.
We’ve found it advantageous to use the same URLs for our JSON API as for our routes in the Javascript application. If a URL is requested via XHR or otherwise specifies the JSON content type, it will receive JSON back.
In Rails, you can reuse the same logic for finding your objects, and then
choose the JSON or HTML rendering path in the end. Here’s a simplified version of our
user#show
route:
def show
@user = fetch_user_from_params
respond_to do |format|
format.html do
# doing nothing here renders show.html.erb with the basic user HTML in <noscript>
end
format.json do
render_json_dump(UserSerializer.new(@user))
end
end
end
Note that you don’t have to implement HTML views for all your routes, just the ones that
you want to index. The others will just render nothing into <noscript>
.
One More Thing
If you get an HTML request for a URL that also responds with JSON, there is a good chance your application is going to make a call to the same API endpoint after it loads to retrieve the data in JSON so it can be rendered.
You can avoid this unnecessary round trip by rendering the JSON result into a variable
in a <script>
tag. Then, when your Javascript application looks for your JSON, have it check to see
if it exists in the document already. If it’s there, use it instead of making the extra
request.
This approach is much faster for initial loads! If you’re interested in how it’s implemented in Discourse, check out:
- preload_store.js - a simple interface to load data that’s been set in
PreloadStore
. - application.html.erb how we set data in
PreloadStore
.