Grey Hat keyword and content strategies — SRP flooding, stolen content — Joe Sinkwitz // Digital Heretix

Episode Overview

We’re joined again by Joe Sinkwitz, principal of Digtial Heretix and CEO and co-founder of Intellifluence. Listen in as Ben and Joe dig into grey hat strategies for key word stuffing and content hacks as well as slipping into the dark side with content stripping and republishing.

Topics covered include:

  • Spinning using Markov chain swaps and the extent to which the practice fluctuates between grey and black hat SEO.
  • How to ensure that your content has utility and won’t run afoul of Google
  • The roll of Online Reputation Management

Episode Transcript

Ben:                 Welcome to Gray Hat Week on the Voices of Search podcast. I’m your host Benjamin Shapiro and this week we’re going to discuss the balance of ranking optimization and risking your domain’s reputation. Joining us for Gray Hat SEO Week is Joe Sinkwitz, who is the Principal at Digital Heretix, which is a brand reputation management agency. Joe was also the co-owner of the Advanced Search Summit and a co-founder and CEO of Intellifluence, which is a SAS tool that helps brands discover the right influencers for their products, pitch them and get honest reviews. He’s had a wide variety of experiences related to SEO, content optimization and helping brands get out of trouble.

Ben:                 Today Joe and I are going to talk about grey hat strategies for keyword stuffing and content hacks, but before we hear from Joe, I want to remind you that this podcast is brought to you by the marketing team at Searchmetrics. We are an SEO and content marketing platform that helps enterprise scale businesses monitor their online presence and make data-driven decisions. To support you, our loyal podcast listeners, we’re offering a complimentary digital diagnostic where a member of our digital strategies group will provide you with a consultation that reviews how your website, content and SEO strategies can all be optimized. To schedule your free digital diagnostic, go to Searchmetrics.com/diagnostic. Okay, on with the show. Here’s my conversation with Joe Sinkwitz, Principal at Digital Heretix. Joe, happy Wednesday. Welcome back to the Voices of Search podcast.

Joe:                  Great to talk to you again.

Ben:                 Great to connect. We’ve been going through some of the different tactics and strategies that are gray hat, something that you maybe contest and find a performance boost, but also that potentially could get you into some trouble. Today we’re going to talk about content hacks, keyword stuffing, flooding SERPs, even stolen content. Talk to me about some of the ways that you are seeing people use various content strategies to push the boundaries for what Google’s terms of services say is acceptable and where are people actually getting themselves in trouble?

Joe:                  Sure. My favorite thing right now is something called Keyword Juicer and it’s actually a private product by CopyPress. The reason I like this is they figured out a way to automate what my manual process is. When I’m building like a content strategy, I tend to look at a couple of things. I look at HREFs data for not just the domain in question, but all the potential competitors that I see. Then I’ll go to SpyFu and I’ll look at the paid search data associated with those same type of phrases to get a sense of what the actual competition looks like, how does the content breakout, what are they actually trying to buy traffic for?

Joe:                  Then from there you’re able to do an analysis to determine where your gaps are in your strategy. You could say like, “Oh, we need to create content around this section.” They figured out how to automate that and I know they’re pulling in a ton of data to do that. That’s on the white hat side of things, because all you’re really doing is you’re trying to figure out where are your strengths, weaknesses, opportunities and threats. You’re doing a SWOT analysis.

Joe:                  On the gray side of things, I see people that will say, “Okay, my goal is to get a more authoritative domain. Done. I’m going to start with that. It doesn’t matter what the industry is. Then I’m going to find people that are ranking significantly worse than I am, but they’re posting content. I’m just going to take that content, push it through a spinner and have it go live on my site. Because I have a more authoritative domain, chances are it’s going to be associated with me before it’s associated with them.” This could be all scripted so that a page might go live on my site within a couple of seconds at the longest, then it goes live on the site that we’re taking it from.

Joe:                  Now I would actually put this in black more than gray, but I still see it happen. It’s something where Google still has too much an affinity for overall authoritative domain versus the actual creator of the content. Otherwise we’d see nobody domains ranking for really deep pieces of content. That’s just not the case. Usually we end up seeing that shift towards the macro parasite of a very authoritative domain that has sections in a whole bunch of different niches that happens to rank for something that’s kind of lower quality. Well, why is it ranking? Why is it not pulling in that deep piece of content? That’s up to Google, but those very authoritative domains can effectively steal, in an automated fashion, spin and then post content. That’s what I see a lot of.

Ben:                 The interesting thing to me here is the notion of the spinning, where you can pull in a piece of content and change it around to be something that’s your own. Walk me through the dynamics of how the spinners work.

Joe:                  Sure. The really old ones were like RSSGM. Those guys, they would just swap out a couple of words based on a source, and then as the spinners themselves got more sophisticated, tried to make it more like natural English reading and so it would swap out phrases. Some of it is pretty cool how like the mathematics of it is pretty neat. Back in the day we tried to create something, we called it the Veronikov. It was based on like a Markov chain and I think our content manager was named Veronica back then, so we called it the Veronikov Method. Now, I mean stuff has gotten so much further than simple Markov chain swaps. Now they’re looking at far beyond key redundancy. They’re looking far beyond other expected phrases to exist on the page. If you’re doing payday loans, you probably need to talk about APR. You probably need to talk about payback methods, so like those type of phrases.

Joe:                  It’s like that stuff still all exists, but in some cases,  you can’t quite tell that it’s been spun. It’s gotten that decent. Then beyond that, for those that are operating on extremely authoritative domains, talking like an About.com sort of setup, they could go through this process and then they could farm it out into like a feeder system for low end editors that are paid in a similar fashion, so like a cyber setup, where it just goes into a queue and they do quick edits and post locks. That would basically break, I think, a lot of niches.

Ben:                 It’s funny, I’ve done a similar SEO strategy before. I mentioned in a previous podcast, my guitar lesson website, DrumSchool.com, and one of our SEO strategies was we scraped the biggest dictionary of guitar terms and we just started not necessarily copying the content, but I would hand the description of a specific guitar term to an editor and say, “Rewrite this in your own words.” Essentially that’s what technology is doing is just taking a piece of content, spinning it around, just modifying a few phrases and then republishing it in real time.

Joe:                  The biggest place to see this happening a lot is monitor any news queries. Look for it in breaking news and you’ll see in some cases where the news that ranks is kind of thin looking. It doesn’t have a ton of detail, but it ranks. Then you start looking at where are they actually linking to if they cite sources and it’ll often be the big deep piece of content that they took it from.

Ben:                 Interesting. Okay. The first thing that we’re talking about here with keyword stuffing is you’re just honestly stealing content and making it look like it is not duplicate content. Talk to me about some of the … You mentioned before that there are keyword strategies where some brands are just going into too much depth and that’s getting them into trouble, creating content that is superficial. Walk me through some of the strategies in terms of creating your keyword targets and what is too much and what is Google starting to penalize?

Joe:                  Sure. That’s a good question. I’d say that they go on too much breadth more than depth.

Ben:                 Right.

Joe:                  What happens here is if you look at early Panda, one of the ways that we started fixing Panda was to look at every piece of content. We’d ask ourselves, “Does this page answer a question that’s answered elsewhere on the domain?” It starts very simply like that. If so, then we need to ask ourselves, which one is the better answer? If we can’t make that distinction, then we need to combine the pieces of content and set up a re-direct. If this page is not that great and the other page is better, we need to say, “Okay, we’re going to either redirect it if it makes sense to do so, or we’re going to 410 it. We’re going to basically remove it from search.”

Joe:                  It really comes down to that level of simplicity. When sites went too crazy, we were creating the most ridiculous content, how to go skiing on a payday loan, how to take your best vacation on a payday loan, like just really ridiculous stuff that no one ever really cared about, but for that period of time leading up to Panda, you could get away with it. You could have a domain that has a ridiculous amount of authority and it’s spreading that authority to 50,000 different pages, all that are all unique technically, but they happen to be about a specific phrase, a long tail phrase. That’s how people got in trouble. They kept driving the strategy, and I still see it too much where some brands are blogging for the sake of blogging. They’re just putting out a lot of content that doesn’t actually have any use.

Joe:                  Current day Panda, if you’re putting up stuff where a user’s not going to go to it and they’re balancing out quickly, if the user signals start to look pretty bad where the content is not useful and not answering a specific query, chances are you’re going to start coming up to that line and potentially crossing that line. The fix of that is be judicious about the type of content you’re putting out. If you’re a new site, then yeah, you can get away with having a bunch of stuff doing that because in new sites, they tend to cannibalize and kill off all the bad stuff. There’s certain practice to that. A lot of brands that get in trouble, they don’t recognize the benefit of calling pages. They just keep spitting them out there.

Ben:                 There’s another topic related to keyword stuffing, which is how much are actually feeding into Google’s index? Where are you seeing people try to not just produce the content but essentially flood the index with pages? Even if it is genuinely unique content, your search results pages or other pages that are not necessarily meant to be crawled, how do you figure out not only what to create but then what to share with Google?

Joe:                  Sure. This gets into the realm of ORM for me.

Ben:                 Sorry, the realm of what?

Joe:                  Reputation management.

Ben:                 Okay.

Joe:                  Because what happens is if you have a negative article that exists on a domain and you could take a deep look at this site, you’re looking at the robots, robots.txt, and you say, “Huh, this is interesting. It looks like it is blocking this particular sub-directory from Google, but it will still deliver a response if I create a random query in here. Then what you could do is you could start injecting pages that don’t even exist into Google by linking to all this stuff. Well, there’s reasons why you might do that, in order to temporarily get Google to either flip which page that they’re going to show for a specific query, or you might get it to basically not trust the pages that are in there. If it’s a weak enough domain, you might be trying to trigger an adult filter.

Joe:                  There’s reasons behind it, but a lot of finding those weaknesses can come from just looking at whether or not a site is correctly canonicalling pages. Are they opening themselves up to duplicate content by slapping in a query string and putting in who knows what? Are they having problems with that directory structure? I mentioned the robots.txt. Do they have problems, just a really weird CMS setup where you’re just able to just garble stuff in? It really depends. I have yet to see a really clean CMS that protects everything out of the box. It doesn’t matter what plugin you use. There’s going to be a problem somewhere because it’s all tradeoffs in terms of how they got there.

Ben:                 Let me make sure I understand. You’re talking about when somebody submits a page that potentially opens them up to risk, right? If you have a negative page, then that can be highlighted, which essentially then can get flagged by Google. That’s really competitive SEO more than anything else.

Joe:                  Yeah, and it happens quite a bit. If you think about any news publication, if you think about any like gossip rag, if you start looking at their traffic stats, start looking at their link stats, you’ll see where people are trying to take out specific pages or trying to take out specific category pages where they’re just trying to harm the domain, but Google does take measures and they try to ignore certain tactics, but over time it’s like the overall algorithm is so much more complex than any of us fully appreciate. Whenever something’s added, something’s kind of either taken out or it’s demoted in terms of relevance. When that occurs, it opens up these little holes. If they go crazy about, “We need to really highlight more content from weak domains, Joe’s right,” well then that opens up a new attack strategy in terms of how you might go about flooding the net with junk web 2.0 pages.

Joe:                  What I say today may not even be valid until next week because something might have changed or there might be a better way to inject content, like open comments. Never have open comments. That’s probably the worst thing you do on a blog, but like so many people still have it open and so many people still have a ton of content that’s jammed in there. If it’s like an old CMS, you could do like frame breaking essentially to create stuffed pages that exist due to having injected a comment on something. The web, unfortunately, is a mess when it comes to that stuff.

Ben:                 It’s really interesting to me that not only do you need to evaluate your keyword and your content strategy from a perspective of how it ranks, but also what it exposes you in terms of risk for other people being able to take your content and manipulate your brand. What are some of the ways that you evaluate your content strategy to not only understand how it’s helping you benefit and helping you achieve your goals, but also how it’s opening you up to risk?

Joe:                  That’s a nuanced question there. I mean, I would still approach the content strategy the same way because every time I’m creating a content strategy, there is a corresponding promotional strategy associated with it. The reason that that promotional strategy exists is twofold. One, yes, we want to be able to rank, and then in the world that we live in, it takes the content plus the link started to rank. The other thing that getting those links helps to do is to help you set apart slightly from someone that’s creating that similar content but doesn’t have the links associated with it. One of the risk mitigating factors is creating a strong domain. Another mitigating factor is to continually have strong usage data associated with your site.

Joe:                  One way that I like to do that is actually with an email list. Intellifluence, one of the coolest things I get to do every day, or actually every week, is we have weekly emails that go out to all our influencers. 60,000 people, they get this email. It has a really high open rate. They take action. That action usually involves them clicking through, going in through Chrome browser, logging into the site and goofing around on the site to do their work. Sometimes, it still amuses me, they don’t click the link to log in, but instead they’ll go to Google and they’ll look for the domain or they’ll look for the login page there and then log in through there. By utilizing this list of people that we have over email, we’re driving all this positive usage signal. Positive usage signals, I think, are going to be viewed as more important the further along we go, simply because of all the tentacles that Google has with Chrome, Android, et cetera, and their own search engine. With that, that’s one mitigating thing.

Joe:                  The other thing I think that you have to do is you just simply have to be on top of security updates. If you have WordPress, you need to have that automatic security updates on. If you’re on WordPress, you also need to make sure that you’re blocking robots on queue and query. You need to make sure that you don’t have search results showing up. Each CMS has it a little bit different in how you have to go about it. It becomes like a laundry list of, “I just need to do these basic things to cross off to make sure I don’t get slapped pretty hard by a competitor.”

Joe:                  It’s kind of sad. It’s a lot like walking down the street in New York City with a purse. You’re not just going to like dangle it on your elbow. You’re going to strap it across your body. You see those old ladies like that. No one’s taking that purse because you’re basically telling the world like, “I’m not a target.” That’s what you’re trying to do with your site within Google if you know you’re in competitive industry. You’re just taking one factor away at a time for you to be attacked. Nothing is ever attack-proof. You just want to make it more difficult, more expensive for them to take you out.

Ben:                 Any last thoughts on some of the ways that keyword strategies and content hacks are changing>? What’s gray hat today that might be black hat tomorrow?

Joe:                  I think it’s still going to be in the realm of how much content is being produced and for which queries. I think we’re still not at that proper line in terms of having only one page to answer one question. It’s still way too easy to get carried away talking about phrases that really don’t sound like they’re very similar but they’re answering with the same core root question. That’s going to trip more filters in the future, just because we’re all producing a ton of content and we’re not slowing down, it seems like. We just keep producing more and more. Until they pull it back, it’ll happen.

Ben:                 If anything, that might make the SEO’s job and life a little easier knowing that you don’t have to produce the same content 57 times with 57 different keywords. I don’t know why I said 57, first number that came into my mind, but if you can answer a question once and you can do it well, hopefully Google will prioritize that and that’ll allow us to refocus on producing the right content as opposed to the right volume of content.

Joe:                  Right. If they start focusing more on those user signals, I think you’ll see that happen because then they could just prioritize that single piece of content based on, “Hey, people found this useful.”

Ben:                 Okay. I think that’s a great place for us to land the plane today. That wraps up this episode of the Voices of Search podcast. Thanks for listening to my conversation with Joe Sinkwitz, the founder of Digital Heretix. We’d love to continue this conversation with you so if you’re interested in contacting Joe, you can find a link to his LinkedIn profile in our show notes. You can contact him on Twitter. His handle is CygnusSEO, C-Y-G-N-U-S-S-E-O, or you can visit his company’s website, which is DigitalHeretix.com, D-I-G-I-T-A-L-H-E-R-E-T-I-X.com. If you have general marketing questions or if you’d like to be a guest on this podcast, you can find my contact information in our show notes, or you can send me a tweet @BenJShap, B-E-N-J-S-H-A-P.

Ben:                 If you’re interested in learning more about how to use search data to boost your organic traffic, online visibility, or to gain competitive insights, head over to Searchmetrics.com/diagnostic for your complimentary advisory session with our digital strategies team. If you liked this podcast and you want a regular stream of SEO and content marketing insights in your podcast feed, hit the subscribe button in your podcast app and we’ll be back in your feed tomorrow morning to discuss gray hat strategies related to misleading users including cloaking and JavaScript. Okay, that’s it for today, but until next time, remember, the answers are always in the data.