Skip to document

Google Hacking Guide

Ethical Hacking
Course

Computer Network (CN301)

146 Documents
Students shared 146 documents in this course
Academic year: 2019/2020
Uploaded by:
55Uploads
163upvotes

Comments

Please sign in or register to post comments.

Related Studylists

hack

Preview text

johnny@ihackstuff johnny.ihackstuff

The Google Hacker’s Guide

Understanding and Defending Against

the Google Hacker

by Johnny Long

johnny@ihackstuff

johnny.ihackstuff

johnny@ihackstuff

  • GOOGLE SEARCH TECHNIQUES................................................................................................................ johnny.ihackstuff
    • GOOGLE WEB INTERFACE...................................................................................................................................
    • BASIC SEARCH TECHNIQUES..............................................................................................................................
  • GOOGLE ADVANCED OPERATORS
    • ABOUT GOOGLE’S URL SYNTAX....................................................................................................................
  • GOOGLE HACKING TECHNIQUES
    • DOMAIN SEARCHES USING THE ‘SITE’ OPERATOR...........................................................................................
    • FINDING ‘GOOGLETURDS’ USING THE ‘SITE’ OPERATOR.................................................................................
    • SITE MAPPING: MORE ABOUT THE ‘SITE’ OPERATOR......................................................................................
    • FINDING DIRECTORY LISTINGS........................................................................................................................
    • VERSIONING: OBTAINING THE WEB SERVER SOFTWARE / VERSION.............................................................
      • via directory listings
      • via default pages
      • via manuals, help pages and sample programs
    • USING GOOGLE AS A CGI SCANNER................................................................................................................
    • USING GOOGLE TO FIND INTERESTING FILES AND DIRECTORIES....................................................................
  • ABOUT GOOGLE AUTOMATED SCANNING..........................................................................................
  • OTHER GOOGLE STUFF
    • GOOGLE APPLIANCES......................................................................................................................................
    • GOOGLEDORKS.................................................................................................................................................
    • GOOSCAN.........................................................................................................................................................
    • GOOPOT...........................................................................................................................................................
  • A WORD ABOUT HOW GOOGLE FINDS PAGES (OPERA)
  • PROTECTING YOURSELF FROM GOOGLE HACKERS......................................................................
  • THANKS AND SHOUTS..................................................................................................................................

johnny@ihackstuff johnny.ihackstuff text-based and mobile browsers. “Web, Images, Groups, Directory and News” tabs

These tabs allow you to search web pages, photographs, message group postings, Google directory listings, and news stories respectively. First-time Google users should consider that these tabs are not always a replacement for the “Submit Search” button.

Search term input field Located directly below the alternate search tabs, this text field allows the user to enter a Google search term. Search term rules will be described later.

“Submit Search” This button submits the search term supplied by the user. In many browsers, simply pressing the “Enter/Return” key after typing a search term will activate this button.

“I’m Feeling Lucky” Instead of presenting a list of search results, this button will forward the user to the highest-ranked page for the entered search term. Often times, this page is the most relevant page for the entered search term.

“Advanced Search” This link takes the user to the “Advanced Search” page as shown in Figure 2. Much of the advanced search functionality is accessible from this page. Some advanced features are not listed on this page. “Preferences” This link allows the user to select several options (which are stored in cookies on the user’s machine for later retrieval) including languages, filters, number of results per page, and window options. “Language tools” This link allows the user to set many different language options and translate text to and from various languages.

johnny@ihackstuff johnny.ihackstuff

Figure 2: Advanced Search page

Once a user submits a search by clicking the “Submit Search” button or by pressing enter in the search term input box, a results page may be displayed as shown in Figure 3.

Figure 3: A basic Google search results page.

The search results page allows the user to explore the search results in various ways.

Top line The top line (found under the alternate search tabs) lists the

johnny@ihackstuff johnny.ihackstuff

Figure 5: Another Google error page

There is a great deal more to Google’s web-based search functionality which is not covered in this paper.

BASIC SEARCH TECHNIQUES..............................................................................................................................

Simple word searches

Basic Google searches, as I have already presented, consist of one or more words entered without any quotations or the use of special keywords. Examples:

peanut butter butter peanut olive oil popeye

‘+’ searches

When supplying a list of search terms, Google automatically tries to find every word in the list of terms, making the Boolean operator “AND” redundant. Some search engines may use the plus sign as a way of signifying a Boolean “AND”. Google uses the plus sign in a different fashion. When Google receives a basic search request that contains a very common word like “the”, “how” or “where”, the word will often times be removed from the query as shown in Figure 6.

Figure 6: Google removing overly common words

johnny@ihackstuff johnny.ihackstuff In order to force Google to include a common word, precede the search term with a plus (+) sign. Do not use a space between the plus sign and the search term. For example, the following searches produce slightly different results:

where quick brown fox +where quick brown fox

The ‘+’ operator can also be applied to Google advanced operators, discussed below.

‘-‘ searches

Excluding a term from a search query is as simple as placing a minus sign (-) before the term. Do not use a space between the minus sign and the search term. For example, the following searches produce slightly different results:

quick brown fox quick –brown fox

The ‘-’ operator can also be applied to Google advanced operators, discussed below.

johnny@ihackstuff johnny.ihackstuff

site: find web pages on a specific web site

This advanced operator instructs Google to restrict a search to a specific web site or domain. When using this operator, an addition search argument is required.

Example:

site:harvard tuition

This query will return results from harvard that include the term tuition anywhere on the page.

filetype: search only within files of a specific type.

This operator instructs Google to search only within the text of a particular type of file. This operator requires an additional search argument.

Example:

filetype:txt endometriosis

This query searches for the word ‘endometriosis’ within standard text documents. There should be no period (.) before the filetype and no space around the colon following the word “filetype”. It is important to note thatGoogle only claims to be able to search within certain types of files. Based on my experience, Google can search within most files that present as plain text. For example, Google can easily find a word within a file of type “.txt,” “.html” or “.php” since the output of these files in a typical web browser window is textual. By contrast, while a WordPerfect document may look like text when opened with the WordPerfect application, that type of file is not recognizable to the standard web browser without special plugins and by extension, Google can not interpret the document properly, making a search within that document impossible. Thankfully, Google can search within specific type of special files, making a search like “filetype:doc endometriosis“ a valid one.

The current list of files that Google can search is listed in the filetype FAQ located at google/help/faq_filetypes.html. As of this writing, Google can search within the following file types:

  • Adobe Portable Document Format (pdf)
  • Adobe PostScript (ps)
  • Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku)
  • Lotus WordPro (lwp)
  • MacWrite (mw)
  • Microsoft Excel (xls)
  • Microsoft PowerPoint (ppt)
  • Microsoft Word (doc)
  • Microsoft Works (wks, wps, wdb)
  • Microsoft Write (wri)
  • Rich Text Format (rtf)
  • Text (ans, txt)

johnny@ihackstuff johnny.ihackstuff

link: search within links

The hyperlink is one of the cornerstones of the Internet. A hyperlink is a selectable connection from one web page to another. Most often, these links appear as underlined text but they can appear as images, video or any other type of multimedia content. This advanced operator instructs Google to search within hyperlinks for a search term. This operator requires no other search arguments.

Example:

link:apple

This query query would display web pages that link to Apple’s main page. This special operator is somewhat limited in that the link must appear exactly as entered in the search query. The above query would not find pages that link to apple/ipod, for example.

cache: display Google’s cached version of a page

This operator displays the version of a web page as it appeared when Google crawled the site. This operator requires no other search arguments.

Example:

cache:johnny.ihackstuff cache:johnny.ihackstuff

These queries would display the cached version of Johnny’s web page. Note that both of these queries return the same result. I have discovered, however, that sometimes queries formed like these may return different results, with one result being the dreaded “cache page not found” error. This operator also accepts whole URL lines as arguments.

intitle: search within the title of a document

This operator instructs Google to search for a term within the title of a document. Most web browsers display the title of a document on the top title bar of the browser window. This operator requires no other search arguments.

Example:

intitle:gandalf

This query would only display pages that contained the word ‘gandalf’ in the title. A derivative of this operator, ‘allintitle’ works in a similar fashion.

Example:

allintitle:gandalf silmarillion

johnny@ihackstuff johnny.ihackstuff

Most of the arguments in this URL can be omitted, making the URL much more concise. For example, the above URL can be shortened to

google/search?q=sardine

making the URL much more concise. Additional search terms can be appended to the URL with the plus sign. For example, to search for “sardine” along with “peanut” and “butter,” consider using this URL:

google/search?q=sardine+peanut+butter

Since simplified Google URLs are simple to read and portable, they are often used as a way to represent a Google search.

Google (and many other web-based programs) must represent special characters like quotation marks in a URL with a hexadecimal number preceded by a percent (%) sign in order to follow the http URL standard. For example, a search for “the quick brown fox” (paying special attention to the quotation marks) is represented as

google/search?&q=%22the+quick+brown+fox%

In this example, a double quote is displayed as “%22” and spaces are replaced by plus (+) signs. Google does not exclude overly common words from phrase searches. Overly common words are automatically included when enclosed in double-quotes.

Google hacking techniques

DOMAIN SEARCHES USING THE ‘SITE’ OPERATOR...........................................................................................

The site operator can be expanded to search out entire domains. For example:

site:gov secret

This query searches every web site in the .gov domain for the word ‘secret’. Notice that the site operator works on addresses in reverse. For example, Google expects the site operator to be used like this:

site:cia site:cia site:gov

Google would not necessarily expect the site operator to be used like this:

site:cia site:www site:cia

The reason for this is simple. ‘Cia’ and ‘www’ are not valid top-level domain names. This means that as of this writing, Internet names may not end in ‘cia’ or ‘www’. However,

johnny@ihackstuff johnny.ihackstuff

sending unexpected queries like these are part of a competent Google hacker’s arsenal as we explore in the “googleturds” section.

How this technique can be used

  1. Journalists, snoops and busybodies in general can use this technique to find interesting ‘dirt’ about a group of websites owned by organizations such as a government or non-profit organization. Remember that top-level domain names are often very descriptive and can include interesting groups such as: the U. Government (.gov or .us)
  2. Hackers searching for targets. If a hacker harbors a grudge against a specific country or organization, he can use this type of search to find sensitive targets.

FINDING ‘GOOGLETURDS’ USING THE ‘SITE’ OPERATOR.................................................................................

Googleturds, as I have named them, are little dirty pieces of Google ‘waste’. These search results seem to have stemmed from typos Google found while crawling a web page. Example:

site:csc site:microsoft

Neither of these queries are valid according to the loose rules of the ‘site’ operator, since they do not end in valid top-level domain names. However, these queries produce interesting results as shown in Figure 7.

Figure 7: Googleturd example

These little bits of information are most likely the results of typographical errors in links place on web pages.

johnny@ihackstuff johnny.ihackstuff

FINDING DIRECTORY LISTINGS........................................................................................................................

Directory listings provide a list of files and directories in a browser window instead of the typical text-and graphics mix generally associated with web pages. Figure 8 shows a typical directory listing.

Figure 8: A typical directory listing

Directory listings are often placed on web servers purposely to allow visitors to browse and download files from a directory tree. Many times, however, directory listings are not intentional. A misconfigured web server may produce a directory listing if an index, or main web page file is missing. In some cases, directory listings are setup as a temporarily storage location for files. Either way, there’s a good chance that an attacker may find something interesting inside a directory listing.

Locating directory listings with Google is fairly straightforward. Figure 8 shows that most directory listings begin with the phrase “Index of”, which also shows in the title. An obvious query to find this type of page might be “intitle:index”, which may find pages with the term ‘index of’ in the title of the document. Remember that the period (.) serves as a single-character wildcard in Google. Unfortunately, this query will return a large number of false-positives such as pages with the following titles:

Index of Native American Resources on the Internet LibDex - Worldwide index of library catalogues Iowa State Entomology Index of Internet Resources

Judging from the titles of these documents, it is obvious that not only are these web pages intentional, they are also not the directory listings we are looking for. (jedi wave “This is not the directory listing you’re looking for.”) Several alternate queries provide more accurate results:

intitle:index "parent directory" intitle:index name size

johnny@ihackstuff johnny.ihackstuff

These queries indeed provide directory listings by not only focusing on “index” in the title, but on key words often found inside directory listings such as “parent directory” “name” and “size.”

How this technique can be used

Bear in mind that many directory listings are intentional. However, directory listings provide the Google hacker a very handy way to quickly navigate through a site. For the purposes of finding sensitive or interesting information, browsing through lists of file and directory names can be much more productive than surfing through the guided content of web pages. Directory listings provide a means of exploiting other techniques such as versioning and file searching, explained below.

VERSIONING: OBTAINING THE WEB SERVER SOFTWARE / VERSION.............................................................

via directory listings

The exact version of the web server software running on a server is one piece of required information an attacker requires before launching a successful attack against that web server. If an attacker connects directly to that web server, the HTTP (web) headers from that server can provide this information. It is possible, however, to retrieve similar information from Google without ever connecting to the target server under investigation. One method involves the using the information provided in a directory listing.

Figure 9: Directory listing "server" example

Figure 9 shows the bottom line of a typical directory listing. Notice that the directory listing includes the name of the server software as well as the version. An adept web administrator can fake this information, but this information is often legitimate, allowing an attacker to determine what attacks may work against the server. This example was gathered using the following query:

johnny@ihackstuff johnny.ihackstuff

via default pages

It is also possible to determine the version of a web server based on default pages. When a web server is installed, it generally will ship with a set of default web pages, like the Apache 1.2 page shown in Figure 10.

Figure 10: Apache test page

These pages can make it easy for a site administrator to get a web server running. By providing a simple page to test, the administrator can simply connect to his own web server with a browser to validate that the web server was installed correctly. Some operating systems even come with web server software already installed. In this case, an Internet user may not even realize that a web server is running on his machine. This type of casual behavior on the part of an Internet user will lead an attacker to rightly assume that the web server is not well maintained and is, by extension insecure. By further extension, the attacker can also assume that the entire operating system of the server may be vulnerable by virtue of poor maintenance.

How this technique can be used

A simple query of “intitle:Test.Page.for it!" will return a list of sites running Apache 1.2 with a default home page. Other queries will return similar Apache results:

Apache server version Query Apache 1.3 – 1.3 Intitle:Test.Page.for It! this.web! Apache 1.3 – 1.3 Intitle:Test.Page.for seeing.this Apache 2 Intitle:Simple.page.for Apache.Hook Apache SSL/TLS Intitle:test "Hey, it worked !" "SSL/TLS-aware"

johnny@ihackstuff johnny.ihackstuff

Microsoft’s Internet Information Services (IIS) also ships with default web pages as shown in Figure 11.

Figure 11: IIS 5 default web page

Queries that will locate default IIS web pages include:

IIS Server Version Query Many intitle:welcome intitle:internet IIS Unknown intitle:"Under construction" "does not currently have" IIS 4 intitle:welcome.to.IIS. IIS 4 allintitle:Welcome to Windows NT 4 Option Pack IIS 4 allintitle:Welcome to Internet Information Server IIS 5 allintitle:Welcome to Windows 2000 Internet Services IIS 6 allintitle:Welcome to Windows XP Server Internet Services

In the case of Microsoft-based web servers, it is not only possible to determine web server version, but operating system and server pack version as well. This information is invaluable to an attacker bent on hacking not only the web server, but hacking beyond the web server and into the operating system itself. In most cases, an attacker with control of the operating system can wreak more havoc on a machine than a hacker that only controls the web server.

Netscape Servers also ship with default pages as shown in Figure 12.

Was this document helpful?

Google Hacking Guide

Course: Computer Network (CN301)

146 Documents
Students shared 146 documents in this course
Was this document helpful?
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 1 -
The Google Hacker’s Guide
Understanding and Defending Against
the Google Hacker
by Johnny Long
johnny@ihackstuff.com
http://johnny.ihackstuff.com