It’s official, Google announced three days ago that Penguin has now successfully been integrated into the core algorithm.
What a lot of people may be asking at this point is: What does this mean? What are the implications of Penguin going real-time? That’s what I’m going to dive into in this article.
Brief History of Penguin
Google’s search algorithms collate a bunch of signals in order to rank web content that is most likely to match and meet user intent. These are broadly divided into “on-page” (signals related to the websites being considered for ranking themselves) and “off-page” (signals related to the web “outside” of the website being considered).
SEOs have known this for a long time, and for many years some SEOs have specifically tried to game or trick Google’s algorithm into believing a website is high quality by building particular kinds of links to the website. This is backlink spam.
Because Google has a world-class spam fighting team, Google became aware of this behaviour and created an algorithm specifically to counteract these efforts, called “Penguin”.
Initially when Google created Penguin (and similar spam-combating algorithms), they did not integrate them into the core algorithm. What this means is that Penguin ran “refreshes” at certain frequencies wherein it would gather data and feed that back into the core algorithm but only once in a while. In between those times, Penguin was not running actively. To give an idea, it’s been a full year and nine months since the last Penguin refresh now, which is relatively long for a Penguin refresh (before it was more like 6-12 months).
Penguin going real-time
Now that Penguin has matured somewhat, Google has integrated it into the core algorithm, which is what they do with these kind of “external” modules that are bolted onto the algorithm once they have matured. This means that the signals Penguin considers flow into the other signals much more seamlessly.
Now that Penguin is real-time, this basically means that the signals it considered are refreshed “continuously” along with the rest of the signals Google processes. Now, to be nuanced about this, we have reason to believe that some signals are processed in a staggered way (I got confirmation from a Googler about this in the past) but in general terms it means Penguin processes like the rest of the algorithm.
Now let’s turn from those basics to a bit more interesting information: what are the implications of this?
For each claim I will give a “speculation score” out of 10. 10/10 means this is highly speculative. 1/10 means this is fairly certain.
Major Implication 1: Penguin is now considered “mature” by Google (5/10)
Google would not integrate Penguin into the core algorithm if it were still undergoing wholesale changes quite often. This means that Google probably considers Penguin to be quite “mature”. (I have confirmed this in conversation with a Googler friend) This alone has several implications:
Implication 1.1: If there is spam that Penguin doesn’t currently identify, it might work indefinitely (6/10)
If Penguin is considered mature, that means it’s unlikely to have highly sophisticated changes made to it in the future. I think that spam is becoming increasingly indistinguishable from ordinary internet content. (This is the natural cat and mouse game, “natural selection”, like bacteria that becomes antibiotic resistant, etc.) — this could mean we are reaching a point where some most sophisticated spam can only be effectively targeted by manual actions from Google’s side.
Implication 1.2 Google may not focus as many resources on Penguin going forward (7/10)
It follows from the impression that Penguin is matured that Google potentially won’t make as many changes or focus as many resources on it in future.
Major Implication 2: Spam that Penguin can identify is punished “immediately” (1/10)
Because Penguin updates its signals in real time, it will catch spam that it is able to catch and discount it immediately. This also has some implications:
2.1 Some (link) spam won’t work in the first place, “churn and burn” should be dead (1/10)
Historically people could build spam that “would have” been caught by Penguin, but it survives and has a real impact on ranking, because Penguin didn’t have the agility to catch the spam due to the long periods of time between data refreshes. This means that people could adopt a kind of “churn and burn” approach with link spam; building spammy networks and positively affecting their client’s ranking even though Penguin would have caught this behaviour, then, when Penguin updates, it burns their network, and they just start again with exactly the same kind of vulnerable spam. Now this kind of stuff won’t have the time to take off the ground. “Churn and burn” is no longer a viable spamming strategy (*for spam that Penguin is able to catch).
2.2 Spammers can get much quicker feedback about whether their spam is viable (2/10)
One of the strategies suggested to counteract bacterial resistance to antibiotics is to release newer (and stronger) generations of antibiotics in a much more conservative (slower) manner. Bacteria takes time to build resistance to antibiotics and the more antibiotics and the more stronger antibiotics we expose bacteria to increases the speed at which it evolves.
Similarly, now that spammers can try new spam ideas and see (much more-) real-time feedback from Google’s algorithms as to whether the spam is working, this may actually allow spammers to evolve the sophistication of backlink spam at an unprecedented rate.
This could mean bad news for Google as spammers make their spam increasingly indistinguishable from the natural linking structures of the internet. Also, plugging into implication 1.1, it means spammers might be able to find indefinitely viable spam quicker than they could before. This could represent a slow “asymptote” of spam — as it tends towards normal content in the eyes of the algorithms.
Major Implication 3: The Disavow tool might actually do something now (3/10)
A while back Google started offering a tool within Search Console known as the “Disavow” tool. Within this tool one could disavow certain backlinks coming to one’s site to let Google know that you want nothing to do with those links and they should be discounted from the ranking algorithm.
3.1 Link Disavow based Penguin recoveries should see a spike in over the next few weeks (2/10)
Some people have already reported Penguin based recoveries. As Penguin goes real-time so do the signals processed by the Disavow tool (presumably).
The exact mechanics of the Disavow Tool are still unclear however. Are links targeted by Penguin simply ignored/discounted by the algorithm? Then the tool does effectively nothing. But if links targeted by Penguin apply a spam / penalty score to (now individual pages) that actually lessen their pre-existing ability to rank, then disavow actually does something by removing that penalty, and now that Penguin is real-time the disavow recoveries might increase.
3.2 We can actually use the Disavow tool for both Penguin recovery and other interesting stuff now (2/10)
Naturally this might mean “link detox” actually does something now as the Disavow tool is used actively in real-time, whereas before you’d have to sit around and wait for a Penguin refresh. But there are other implications as well: the Disavow tool can be used for SEO experiments which may become more viable now if Google uses these signals in real-time.
Major Implication 4: Google may go quiet on future webspam updates (5/10)
One of the major things Google would announce in the past is Penguin refreshes — if they don’t announce those anymore, then they probably don’t have much to say about Penguin. Plugging into implication 1 we really don’t know how many resources they’ll put into Penguin going forward, and have no way to know as it’s likely they’ll go quiet.
Adding in what we know about the use of deep learning algorithms, it’s likely the Google ranking algorithm will increasingly become a black box to those both outside and perhaps even inside Google.
Major Implication 5: Penguin’s granularity has increased (1/10)
In the announcement post Google said “Penguin is now more granular. Penguin now devalues spam by adjusting ranking based on spam signals, rather than affecting ranking of the whole site.”
5.1 Individual pages will be affected, making Penguin more sharp to the spam while not punishing entire websites (1/10)
In the past as far as we know Penguin would basically lower the score for an entire website’s URLs, applying a broad punishment to the site’s ability to rank generally. This is a “reverse-engineered” implication as I wasn’t as certain about that before.
5.2 This probably doesn’t change anything for businesses (3/10)
So now Penguin will probably be more explicitly targeted at major transactional landing pages on websites while “sparing” the rest. This is because webspam with heavy anchor text is usually focused on transactional landing pages. So for businesses this would probably not feel any different, as the transactional landing pages drive money-bringing traffic to sites.
After a good ~21 months we finally see a “refresh” of Penguin, the last and permanent refresh. We’ve yet to see all of the implications in the land of SEO but one can hope most of them will be positive.