Google Search Result Scrapping- Re-Reality

Web Scrapping- Google Search Result Through VBA

In this blog, we are going to explain to you and give the source code to scrape the data from google search engine through Microsoft excel VBA (Visual Basic Application). This source code is very helpful if you have a lot of data and you want the link from the Google search result pages. This method is also helpful to scrap LinkedIn data, company website and many more.

 This web scraping can be done with the help of many methods. You can even use python beautiful soup to create such kind of scrapping but on this blog, we will be explaining, how you can scrap data from google search engine through visual basic application programming language and also at the end we provide you the source code.

 If you any doubt or if you feel difficult to understand the source code you can write on the comment section. We will definitely get back to you soon.

We will explain this scrapping method mainly with two common browsers, internet explorer and Mozilla Firefox. There is an advantage and disadvantage of both the browsers while using this scrapping method.

 If you use internet explorer to use this scrapping method, you will be required 64-bit of windows operating system and the newest version of internet explorer application on the other hand if you use the Mozilla Firefox then you don’t need to have a 64-bit of the operating system. It can be used in any system.

Secondly, if you use Mozilla Firefox then you will get a limit of scrapping almost 1000 per day. You can use it for doing unlimited scrapping if you install a VPN (Virtual Private Network) or any other method and change your IP address. But in this case, if you use internet explorer you can do scrapping as much as you wish. It’s all because of one major problem. And that is the google search engine “I am not robot” page.

Scrapping through VBA using Mozilla Firefox

First of all, open the VBA dashboard by pressing alt+F11 key. After that create a new module by selecting the insert option in the toolbar and click on module.

Secondly, select all the correct references that we will be required by clicking on the tool option on the toolbar and click on the references. Now search and select the references: Microsoft object library, Microsoft Internet Control and HTML object library.

Now we will start writing the program. In this blog I will provide you the source as well as I will also make you understand how does it works and its algorithms. First, write sub and give a name for your program. In this case I am giving the program name as scrap_google.

Sub scrap_google()

End Sub

Now we have to declare the variables first of all and its data types that we are going to use in the program.

Sub scrap_google()

    Dim url As String, lastRow As Long

    Dim XMLHTTP As Object

    Dim html As Object

    Dim objResultDiv As Object

    Dim objH As Object

End Sub

Here url is declared as a string its to hold the google search url, lastrow is declared as long to hold the last row of the given data in the excel sheet. XMLHTTP is declared as object to control the Mozilla Firefox application. Html is declared as an object to hold the scrap data, objresultdic and objh are also declared as object to select the particular Html tag of google page and scrap the data.

Now at the very beginning, we have to find the last row till where our data is given at the excel sheet. In short, we need to find the last row of our given data. For these, we will use the variable the lastrow which we have declared as an long datatype and assign it the value of the last row. The program to find the last row in given below. It will automatically find the last row of our data and till that last row value, we need to run our this program to scrap all the data. For these process, we actually use a for loop

lastRow = Range(“A” & Rows.Count).End(xlUp).Row

The loop will be like this

For i = 2 To lastRow

Which means the loop will start from the 2nd cell till the last number of rows of our data.

Let’s take an example: You have a list of 10 companies and you want the websites of the companies. To find the websites of 10 company, I don’t think this scrapping method is useful instead it can be done manually but instead, if you like 10,000 of data or even more then this method will be very much beneficial for you. It will make your work much easier and quicker.

Let’s say you have a list of 10 companies and you want to scrap their websites, first you need to put all the company names in the first column A, starting from 2nd row as we are starting our looping process from 2nd row onwards.

Now we have to declare the URL string as per google search page url and concatenate with the data that we need to search in google search page according to the cell name where we have pasted our data in excel sheet. To do so, we have pasted the names of 10 companies in the first column starting from the 2nd row.

So the url value will be like this

url = “https://www.google.com/search?q=” & Cells(i, 1) & “&rnd=” & WorksheetFunction.RandBetween(1, 10000)

Now the next step is to open the Mozilla Firefox and enters the url in the navigation url bar. To do this, the following is the code we need to put in our program.

Set XMLHTTP = CreateObject(“MSXML2.serverXMLHTTP”)

        XMLHTTP.Open “GET”, url, False

        XMLHTTP.setRequestHeader “Content-Type”, “text/xml”

        XMLHTTP.setRequestHeader “User-Agent”, “Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0”

        XMLHTTP.send

And following is the code to scrap the the title of the first result of the google search page.

        Set html = CreateObject(“htmlfile”)

        html.body.innerHTML = XMLHTTP.ResponseText

        Set objResultDiv = html.getelementbyid(“rso”)

        Set objH = objResultDiv.getelementsbytagname(“h3”)(0)

Cells(i, 2).Value = objH.innerText

Here Set html will download the doc file of that particular html page and rest of the program will scrap the title according to the google search page html tags which “rso” and “h3”. (0) means to scrap the first result, if you wish to scrap the 2nd result put 1, and if you want to scrap the 3rd result then put 2 and so on. It works exactly as like the values of array which starts from 0 and end with n-1 value where n is the number of value. And at last, the cell name is given where the scrapped data is needed to paste. Similarly to scrap the links, the tag “cite” is used. Following is the code

        Set html = CreateObject(“htmlfile”)

        html.body.innerHTML = XMLHTTP.ResponseText

        Set objResultDiv = html.getelementbyid(“rso”)

        Set objH = objResultDiv.getelementsbytagname(“cite”)(0)

       Cells(i, 3).Value = objH.innerText

        DoEvents

    Next

And at last, write “doevent and next “to run the loop till the last row of our given data in the excel sheet. Here’s the program end’s.

Sub scrap_gogle ()

    Dim url As String, lastRow As Long

    Dim XMLHTTP As Object

    Dim html As Object

    Dim objResultDiv As Object

    Dim objH As Object

    lastRow = Range(“A” & Rows.Count).End(xlUp).Row

    For i = 2 To lastRow

        url = “https://www.google.com/search?q=” & Cells(i, 1) & “&rnd=” & WorksheetFunction.RandBetween(1, 10000)

        Set XMLHTTP = CreateObject(“MSXML2.serverXMLHTTP”)

        XMLHTTP.Open “GET”, url, False

        XMLHTTP.setRequestHeader “Content-Type”, “text/xml”

        XMLHTTP.setRequestHeader “User-Agent”, “Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0”

        XMLHTTP.send

        Set html = CreateObject(“htmlfile”)

        html.body.innerHTML = XMLHTTP.ResponseText

        Set objResultDiv = html.getelementbyid(“rso”)

        Set objH = objResultDiv.getelementsbytagname(“h3”)(0)

        Cells(i, 2).Value = objH.innerText

        Set html = CreateObject(“htmlfile”)

        html.body.innerHTML = XMLHTTP.ResponseText

        Set objResultDiv = html.getelementbyid(“rso”)

        Set objH = objResultDiv.getelementsbytagname(“cite”)(0)

        Cells(i, 3).Value = objH.innerText

        DoEvents

    Next

End Sub

Now click on the play button or just press the F5 Key to run the program

The results will like this:

And if you wish to do this scrapping method using Internet Explorer following is the code.

Option Explicit
Const TargetItemsQty = 1 ‘ results for each keyword

Sub GWebSearchIECtl()

Dim objSheet As Worksheet
Dim objIE As Object
Dim x As Long
Dim y As Long
Dim strSearch As String
Dim lngFound As Long
Dim st As String
Dim colGItems As Object
Dim varGItem As Variant
Dim strHLink As String
Dim strDescr As String
Dim strNextURL As String

Set objSheet = Sheets("Sheet1")
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True ' for debug or captcha request cases
y = 1 ' start searching for the keyword in the first row
With objSheet
    .Select
    .Range(.Columns("B:B"), .Columns("B:B").End(xlToRight)).Delete ' clear previous results
    .Range("A1").Select
    Do Until .Cells(y, 1) = ""
        x = 2 ' start writing results from column B
        .Cells(y, 1).Select
        strSearch = .Cells(y, 1) ' current keyword
        With objIE
            lngFound = 0
            .navigate "https://www.google.com/search?q=" & EncodeUriComponent(strSearch) ' go to first search results page
            Do
                Do While .Busy Or Not .READYSTATE = 4: DoEvents: Loop ' wait IE
                Do Until .document.READYSTATE = "complete": DoEvents: Loop ' wait document
                Do While TypeName(.document.getelementbyid("res")) = "Null": DoEvents: Loop ' wait [#res] element
                Set colGItems = .document.getelementbyid("res").getElementsByClassName("g") ' collection of search result [.g] items
                For Each varGItem In colGItems ' process each item in collection
                    If varGItem.getelementsbytagname("a").Length > 0 And varGItem.getElementsByClassName("st").Length > 0 Then ' must have hyperlink and description
                        strHLink = varGItem.getelementsbytagname("a")(0).href ' get first hyperlink [a] found in current item
                        strDescr = GetInnerText(varGItem.getElementsByClassName("st")(0).innerHTML) ' get first description [span.st] found in current item
                        lngFound = lngFound + 1
                        With objSheet ' put result into cell
                            .Hyperlinks.Add .Cells(y, x), strHLink, , , strDescr
                            .Cells(y, x).WrapText = True
                            x = x + 1 ' next column
                        End With
                        If lngFound = TargetItemsQty Then Exit Do ' continue with next keyword - necessary quantity of the results for current keyword found
                    End If
                    DoEvents
                Next
                If TypeName(.document.getelementbyid("pnnext")) = "Null" Then Exit Do ' continue with next keyword - no [a#pnnext.pn] next page button exists
                strNextURL = .document.getelementbyid("pnnext").href ' get next page url
                .navigate strNextURL ' go to next search results page
            Loop
        End With
        y = y + 1 ' next row
    Loop
End With
objIE.Quit
End Sub

Function EncodeUriComponent(strText As String) As String
     Static objHtmlfile As Object
If objHtmlfile Is Nothing Then
    Set objHtmlfile = CreateObject("htmlfile")
    objHtmlfile.parentWindow.execScript "function encode(s) {return encodeURIComponent(s)}", "jscript"
End If
EncodeUriComponent = objHtmlfile.parentWindow.encode(strText)
End Function

Function GetInnerText(strText As String) As String
     Static objHtmlfile As Object

If objHtmlfile Is Nothing Then
    Set objHtmlfile = CreateObject("htmlfile")
    objHtmlfile.Open
    objHtmlfile.Write "<body></body>"
End If
objHtmlfile.body.innerHTML = strText
GetInnerText = objHtmlfile.body.innerText
End Function

Thanks for reading, if you have any query, suggestions or doubts kindly write it on the comment section we will really appreciate you for doing this.

Shared Excel Sheet

Internet Explorer Scrapping. Click Here

Mozilla Firefox Scrapping. Click Here

USB Stick 256 GB

One Reply to “Web Scrapping- Google Search Result Through VBA”

Leave a Reply

Your email address will not be published. Required fields are marked *