Google bots process JS differently than a non-JS page. Bots process them in three phases, namely crawling, indexing, and rendering. These phases can be easily understood thanks to the graphic from Google Developers below:
This phase is about the discoverability of your content. It’s a complicated process, involving subprocesses, namely seed sets, crawl queuing and scheduling, URL importance, and others.
To begin with, Google’s bots queue the pages for crawling and rendering. The bots use the parsing module to fetch pages, follow links on the pages, and render until a point when the pages are indexed. The module not only renders pages but also analyzes the source code and extracts the URLs in the <a href=”…”> snippets.
The bots check the robots.txt file to see whether or not crawling is allowed. If the URL is marked disallowed, the bots skip it. Therefore, it’s critical to check the robots.txt file to avoid errors.
The process of displaying the content, templates, and other features of a site to the user is called rendering. There is server-side rendering and client-side rendering.
Server-side rendering (SSR)
As the name suggests, in this type of rendering the pages are populated on the server. Each time the site is accessed, the page is rendered on the server and sent to the browser.
In other words, when a user or bot accesses the site, they receive the content as HTML markup. This usually helps the SEO as Google doesn’t have to render the JS separately to access the content. SSR is the traditional rendering method and may prove to be costly when it comes to the bandwidth.
Coming back to what happens after a page has been crawled, the bots identify the pages that need to be rendered and add them to the render queue unless the robots meta tag in the raw HTML code tells Googlebot not to index the page.
The pages stay in the render queue for a few seconds, but may take some time, depending on the amount of resources available.
Once the WRS fetches the data from external APIs and databases, the Caffeine indexer on Google can index the content. This phase involves analyzing the URL, understanding the content on the pages and its relevance, and storing the discovered pages in the index.
Be persistent with your on-page SEO efforts
All the on-page SEO rules that go into optimizing your page to help them rank on search engines still apply. Optimize your title tags, meta descriptions, alt attributes in images, and meta robot tags. Unique and descriptive titles and meta descriptions help users and search engines easily identify the content. Pay attention to the search intent and the strategic placement of semantically-related keywords.
Also, it’s good to have an SEO-friendly URL structure. In a few cases, websites implement a pushState change in the URL, confusing Google when it’s trying to find the canonical one. Make sure you check the URLs for such issues.
If your content can be seen in the DOM, chances are your content is being parsed by Google. Checking the DOM will help you determine whether or not your pages are being accessed by the search engine bots.
Bots skip rendering and JS execution if the meta robots tag initially contains noindex. Googlebot doesn’t fire events at a page. If the content is added to the page with the help of JS it should be done after the page has loaded. If the content is added to the HTML when clicking the button, when scrolling the page, and so so, it won’t be indexed.
Avoid blocking search engines from accessing JS content
To avoid the issue of Google not being able to find JS content, a few webmasters use a process called cloaking that serves the JS content to users but hides it from crawlers. However, this method is considered to be a violation of Google’s Webmaster Guidelines and you could be penalized for it. Instead, work on identifying the key issues and making JS content accessible to search engines.
At times, the site host may be unintentionally blocked, barring Google from seeing the JS content. For instance, if your site has a few child domains that serve different purposes, each should be having a separate robots.txt because subdomains are treated as a separate website. In such a case, you need to make sure that none of these robots.txt directives are blocking search engines from accessing the resources needed for rendering.
Use relevant HTTP status codes
Google’s crawlers use HTTP status codes to identify issues when crawling a page. Therefore, you should use a meaningful status code to inform the bots if a page shouldn’t be crawled or indexed. For instance, you could use a 301 HTTP status to tell the bots that a page has moved to a new URL, allowing Google to update its index accordingly.
Refer to this list of HTTP status codes and know when to use them:
Fix duplicate content
Fix lazy-loaded content and images
What’s more, image searches are also a source of additional organic traffic. So if you have lazy-loaded images, search engines will not pick them. While lazy loading is great for users, it needs to be done with care to prevent bots from missing potentially critical content.
Use JS SEO tools
- URL inspection feature. This tool is found in Google Search Console. It can show you whether or not Google’s crawlers were able to index or crawl your pages.
- Search engine crawlers. These tools allow you to effectively test and monitor how search engines crawl your pages.
- Page Speed Insights. Google’s Page Speed Insights shares details about your site’s performance and offers recommendations on how it can be improved.
- Site: Command. This tool helps you see whether or not Google has properly indexed your content. All you need to do is enter this command on Google – site: [website URL] “text snippet or query”
2. Use of hash in the URLs
Remember what John Mueller said about bad URLs at an SEO event?
“For us, if we see kind of the hash there, then that means the rest there is probably irrelevant. For the most part, we will drop that when we try to index the content…”
Yet several JS-based sites generate URLs with a hash. This can be disastrous for your SEO. Make sure your URL is Google-friendly. It should definitely not look like this:
www.example.com/#/about -us OR
3. Not checking the internal link structure
Google requires proper <a href> links to find URLs on your site. Also, if the links are added to the DOM after clicks on a button, the bots will fail to see them. Most webmasters miss out on these points, causing their SEO to suffer.
Take care to provide the traditional ‘href’ link, making them reachable for the bots. Check your links using the website audit tool, SEOprofiler to improve your site’s internal link structure.
Check out this presentation by Tom Greenway during the Google I/O conference for guidance on a proper link structure: