Majestic SEO
- What is Majestic SEO?
- Majestic SEO is a commercial offshoot from Majestic-12 Ltd, a UK-based company founded in 2004.
- Where is the data coming from?
- Our data comes from the World Wide Web itself. The Majestic-12: Distributed Search Engine does not meta-search or otherwise query other search engines: we are the search engine! Over a long period of time we have developed software capable of crawling and indexing large amounts of web data. This index is a big stepping stone towards relevant full-text search. The purpose of the index is to allow relevancy research as well as to help fund continued activites in development of a competitive community-driven general-purpose web-scale search engine.
- Do you include data from Google / Yahoo / Bing?
- While we do incorporate third-party data to enhance our services - such as GEO IP information from MaxMind, and Traffic Information from Alexa, common techniques for obtaining search engine data - for example - scraping is often classed as a breach of service, and hence this data is unlikley to find its way into Majestics database.
- What is the best way of getting in touch?
- Our contact form is the best way of getting a fast, effective response.
- Are you planning to offer SEO services to end users?
- MajesticSEO does not offer SEO services. MajesticSEO offers link intelligence and domain metrics to inform and empower Internet marketing professionals and Search engine optimisers.
The 'Fresh Index' and 'Historic Index'
- What are the 'Fresh' and 'Historic' Indexes? Why are they different?
- Our 'Historic Index' and 'Fresh Index' are huge databases which contain information about how sites on the Internet link together - effectivly large repositories of Internet mapping data. Our tools then access these databases in order to generate reports which help people understand the flow of organic traffic to their own websites, and those of their competitors.
- Our Historic Index is one of the most comprehensive sources of backlink data on the Internet. As such, despite our huge processing power, the index takes many days to build - ensuring that it is always a few days behind what is happening now. In order to address this, we have created a smaller, separate index which we call the 'Fresh Index'. As this index is smaller, we can update it more often, and thus it can be kept more up-to-date.
- The two datasets - the 'Fresh Index' and the 'Historic Index' are separate, and it would take substantial effort to merge the two datasets seamlessly from a user perspective - which would have resulted in delaying the release of the Fresh Index by many months. The data from the Fresh Index will be fed into the Historic Index when the Historic Index is updated. There is therefore the possibility that older backlinks in the Fresh Index will also be present in the Historical Index.
- How big are the 'Historic' and 'Fresh' indexes?
- The Historic index has 3,658,511,547,386 URLs from 360,824,476,574 crawled pages.
- The Fresh index has 109,848,066,152 URLs from 18,860,432,199 crawled pages.
- My site is found in the Fresh Index but not in the Historic Index (and vice versa)?
- Investigating the structure of the internet involves running automated processes to download and study every webpage on the Internet. All web crawlers have limits on how much of the Internet they can crawl in any given period. Therefore crawlers try to prioritise visits to the most important sites more often. This is not a perfect process, as there are many factors which affect the viability and structure of a web crawl. It is possible that the time between crawls to certain sites will be greater than the duration that the Fresh Index covers - thus the site may exist, and be present in the Historic Index, but not present in the Fresh Index. As the Fresh Index contains more recent data, it may also contain sites that have recently been discovered by our crawlers, after we started building the most recent Historic Index. If a site has very few backlinks, then it can take sometime for it to be detected by our web crawlers, so sometimes even older sites may be present in our Fresh Index but not our Historic Index.
- What is your index update strategy?
- We regularly update both of our indexes. However, due to the massive scale of the Historic Index it takes several days of constant processing to generate it. At the time of writing, we are able to update our Historic Index monthly, and aim to update the Fresh Index daily. Historic index updates are publicised on the blog.
- What is the index built date?
- This is the date when the latest version of our index was built - it then takes around 12 hours to release the index. In this time we have to transport large amounts of data to our front end, so there may be a short period after this date when the update is not reflected on the website.
- What is the most recently discovered backlink date?
- It is the date of the most recently crawled backlink in our index. This is an indicator of the freshness of the data on our site.
- What technology is used to build the indexes?
- We developed our own proprietary software using the C#/.NET platform: highly parallel methods that take advantage of multiple cores and machines to process data in parallel.
- I can't find my site in your index, what's wrong?
- Nothing if your site is very new or not well linked. Our index is based on the links that we followed. It should be fairly rare for a site to be missing as our index is pretty comprehensive. If your site is not in the Fresh index it is worth checking for it in the Historic index and vice versa!
Reports and Tools
- How can I see more detailed information for my site?
- By registering with us you can add your own websites to your profile to see detailed analysis of backlinks and anchor text. This information is prioritised to show the most important backlinks first, rather than showing a random sample (this helps gives prominence to the more important backlinks). The catch is that in exchange for free basic information about anchor text and backlinks for your site you will have to allow our bot (MJ12bot) to crawl your site for at least 12 months since your last usage of reports. In the vast majority of cases (95%+) you already do that, so in this case there is no catch really: only those who specifically disallowed our bot in their robots.txt or only allow a handful of other bots to crawl and disallow all the rest, will need to decide if they want to allow us crawl them in exchange for the free information that they can get. We hope your decision will be positive!
- What new features and tools are you developing?
- Lots of very cool features. For example determining authority sites, types of backlinks, detecting link exchanges and more: you are basically witnessing the kind of secret relevancy tool that any large search engine keeps under wraps away from prying eyes. Here you will have the opportunity to see things that some people would not want you to know: you are being left guessing and using clues that might have been designed to lead you away from the truth. Here we are actually making a big effort to get to the truth on how relevancy algorithms work in the best search engines out there. So that we could build one ourselves.
- What is the effect of backlinks and anchor text on relevancy?
- For cases when the number of matches is high, backlinks and anchor text begin to play a very important role. The reason for this is simple - search engines can only show the top 10 entries, but when the number of matches is in the range of tens or even hundreds of millions, you will get lots of matches with nearly the same full text matching scores so you have to use something else to select the "best" of them. Just how important is it? That's exactly why we created the anchor index to be able to find good answers to this question!
- Why can't I search using keywords?
- There is no full-text index, no HTML cache or anything like this present in the anchor index. The anchor index is a collection of factual information about linking on the web. There are two big reasons for this: first we need to understand relevancy algorithms much better, and secondly we will need a lot more hardware to handle ~50 bln pages full-text index, so we have to do it in steps. The first stage is this anchor index which we hope will help us move forward toward our ultimate goal.
- Where are the link: and linkdomain: commands?
- They are not implemented on purpose because we don't want to fight with large scale automated queries trying to get these backlinks. If you want to see backlinks and anchor text then you need to register and then verify your domain: this is when you will get a set of reports showing backlinks, anchor text, links from bad neighbourhood's and other goodies.
- What happens to my ( Standard/Advanced ) Reports if I cancel my Subscription?
- The hosting and provision of access to reports generated is a service included in the subscription plan. Once the final allowance period on a subscription plan expires, any reports generated during the subscription will be locked, and after a period these reports will be removed and purged from our servers. It is therefore critical that you should download all reports before the expiration of any subscription you may have.
Note
Whilst we hope to provide the best possible user experience to our userbase, this FAQ may contain errors and ommisions. This page is intended to provide an overview of some of the techniques which may be used in the delivery of data from Majestic SEO - but please note that the provision of any service may be subject to more frequent review than this text. Use of our services is subject to the acceptance of our Terms and Conditions.