SEO for AngularJS on S3
You would not have imagined that merely migrating a AngularJS SPA application onto AWS S3 or any other static file host would have been so troublesome. Suddenly, you are faced with issues that cause you tremendous headaches.
Updated as of 25 November 2015.
Here are the 2 main issues:
- Deep linked URLs no longer work if you enable HTML5 mode in your SPA application. In my previous article, I provided a quick hack for enabling pretty HTML5 mode URLs in AngularJS on AWS S3 so that deep-linked URLs can now function properly.
- The quick hack mentioned above is apparently not SEO friendly at all, so it cannot be a long-term solution. Google is actually able to detect the 404 error when AWS S3 is first unable to find the file, and then stops crawling the URL because the 404 error is used to indicate that the URL is no longer in use. This has serious repercussions on your site’s SEO.
As such, let me explain how we can enable proper SEO (search engine optimization) for AngularJS on AWS S3 as a long-term strategy, even though it will be a lot more effort required.
As a disclaimer, the solution below may not work for certain cases due to different or incompatible coding practices or architectures.
In a bid for a more automated solution to make my AngularJS SPA application on AWS S3 SEO friendly, I created the grunt-prerender Grunt tool that creates pre-rendered HTML snapshots for static file serving purposes.
The idea behind the solution is actually very simple. It bears a huge resemblance to the fashion as to which how Jekyll generates site pages automatically from blog posts.
Let’s say you have a URL
http://www.mysite.com/faq/ that you want to be SEO-friendly. Upon a GET request, a file server will by default look for an index file, e.g.
index.html, in the
index.html file will not exist in the
faq directory, because what should be served by right should be the
index.html in the root directory.
Hence, a natural solution would be then to place a pre-generated HTML snapshot of the URL
index.html in the
faq directory. The main difference between this pre-generated HTML snaphot at the URL
index.html in the root directory will be that the former already has the content of the URL
http://www.mysite.com/faq/ populated inside already, unlike the original plain
This solves the SEO for content marketing issue. If the search engine is only able to find the original
index.html SPA template, whatever content is in it is probably more or less all of the content that will be crawled for your website, so that is almost no SEO.
Using the grunt-prerender tool, it is now possible to generate all these static HTML snapshots for all relevant URLs that you want to be SEOed, and then to upload them to AWS S3 or any other static file host. For a URL
http://www.mysite.com/faq/, the tool will take a HTML snapshot of the hashed version of the URL
http://www.mysite.com/#/faq/ and then save it to disk in the right path.
You could simply install grunt-prerender in your project with:
npm install grunt-prerender --save-dev
Here are snippets of my own production Grunt workflow to automate this prerendering. What I do is to simply use the grunt-prerender tool to generate all my HTML snaphots before uploading them back to my production S3 bucket using the grunt-aws-s3 tool. If you are already thinking of SEO for your website, I am sure you should definitely have generated a
sitemap.xml on your site. But in any case, you are still able to input a list of URLs to the tool instead.
As a caveat, there are a few pointers you might want to take note to be able to use this solution:
- It is a good practice to use trailing slashes for URLs (
http://www.mysite.com/faqis not so good).
- Be careful when using (Jquery) plugins that do not integrate with AngularJS nicely or update in a two-way binding manner, as they may not function nicely in the prerendered HTML snapshot.
- All URLs in your application should preferrably be absolute URLs, but if you do have relative URLs, try adding a
- You should probably do some cleaning of your generated HTML snapshots using tools such as grunt-replace or grunt-dom-munger to finish with nice clean HTML templates. Things you may want to remove include dynamically generated script tags and css styles, as well as AngularJS directives in dynamically generated template content as templates will get reloaded again causing directives to load more than once.
In any case, you may also create your own staging Grunt workflow incorporating the use of this grunt-prerender tool.
If you already have static pages that you want to be crawlable, you may even include this tool in your pre-production workflow.
Last but not least, this tool is still currently a Grunt tool, and is not optimized for its usage. A better use would be as part of a cron task to automate this prerendering.