Extracting data from the searching results of citation database

Today (Sept. 25, 2002), my major advisor asked me to extract citation data from one text file he created and sent me through email. After I looked at the data, I realized that the data came from the citation databases, such as AGRICOLA, Biological Abstract, CAB, and so on. The requirements that you asked me to do are listed in the following:

  1. Eliminate the (many) duplicates from the list;
  2. Send me a Word file containing citations & abstracts for just the articles that have abstracts (as soon as 1 is done);
  3. Make a list with all citations & abstracts in chronological order;
  4. Make a list of all citations only in normal, bibliographic order.

At the very biginning, I had no idea that I fullfilled the tasks. I might take a long time to do it starting from the original text file. In order to accelerate the processing procedure, I tried to write a small VBA program in Excel to extract the data from the text file. During I went home to cook the dinner, I took a little time to write a VBA subroutine to achieve the function. But the filename must be write to the program before running the
macro. After dinner, I spent one hour to write a little bit complex program with userform to get information from users to costomize some parameters. At this time point, the program is more general than before.

Later on, when I debugged the program, I found that it could not fullfill the specific function for extracting information to different columns. I went to K-State libray website to open the database and searched the database again and saved the results with tags to file. Then I use this file to extract the citaiton data with a midification of program, I got a very satisfied results.

After extraction of data from text file, I used Excel function to sort the data and eliminat the duplicate citations. Now I had the raw data to generate the different types of data to meet the four requirements.

I opened WORD and created mail-merging main document with the Excel file as the data source. I quickly formated the merging rules and generated two final documents for my major professor. The whole procedure took me about 3 hours.

The source codes are listed in the following.

'Source code for userform
'========================
Private Sub btnCancel_Click()
    frmImport.Hide
End Sub

Private Sub btnImport_Click()
Dim filename
Dim fs, a

filename = tbFilename.Text
Set fs = CreateObject("Scripting.FileSystemObject")
If fs.FileExists(filename) Then
    Set a = fs.OpenTextFile(filename, 1)
    Row = 1
    Do While Not (a.AtEndofStream)
        Line = a.readline
        pos = InStr(Line, "Record ")
        If pos = 1 Then
            If cbSpaceLine.Value Then
                Line = a.readline
            End If
            Line = a.readline
            i = 1
            If Not cbTag.Value Then
                Do Until Line = ""
                    Cells(Row, i) = Line
                    Line = a.readline
                    i = i + 1
                Loop
            Else
                auline = ""
                tiline = ""
                abline = ""
                soline = ""
                pyline = ""
                Do Until Line = ""
                    fieldname = Left$(Line, 2)
                    Line = Right$(Line, Len(Line) - 5)
                    Select Case fieldname
                    Case "AU"
                        auline = Line
                    Case "TI"
                        tiline = Line
                    Case "AB"
                        abline = Line
                    Case "SO"
                        soline = Line
                    Case "PY"
                        pyline = Line
                    End Select
                    Line = a.readline
                Loop
                If Not (cbIgnore.Value) Then
                    Cells(Row, 1) = auline
                    Cells(Row, 2) = pyline
                    Cells(Row, 3) = tiline
                    Cells(Row, 4) = soline
                    Cells(Row, 5) = abline
                ElseIf (cbIgnore.Value And abline <> "") Then
                    Cells(Row, 1) = auline
                    Cells(Row, 2) = pyline
                    Cells(Row, 3) = tiline
                    Cells(Row, 4) = soline
                    Cells(Row, 5) = abline
                End If
            End If
            If Not (cbIgnore.Value) Then
                Row = Row + 1
            ElseIf abline <> "" Then
                Row = Row + 1
            End If
        End If
    Loop
    a.Close
    Set a = Nothing
End If
Set fs = Nothing

If cbElimDups.Value Then Module1.deletedups

frmImport.Hide
Unload frmImport
End Sub

Sub deletedups()
' The rocords with same title are thought as duplicate records.
' This macro keep one of them and delete others.
Selection.Sort Key1:=Range("A1"), Order1:=xlAscending, Header:=xlGuess, _
     OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom
Row = 1
Do Until Cells(Row, 3) = ""
    curtitle = LCase(Cells(Row, 3))
    i = Row + 1
    done = False
    Do Until done
        newtitle = LCase(Cells(i, 3))
        If newtitle = curtitle Then
            Rows(CStr(i) & ":" & CStr(i)).Select
            Selection.Delete Shift:=xlUp
        Else
            done = True
        End If
    Loop
    Row = Row + 1
Loop
End Sub

If you want to use the code, you can download the source code following the link at the bottom of thsi page. After importing the code (frmImport.frm and module1.bas) to your VBA project, click Tools->Macro->Macros to open “Macro” dialog, then select macro “ExtractCites” and click Run button, a dialog will pop up. Enter the file name which include your data, check the proper checkbox items, click “Import Citations” to run the program. All the data will be extracted to the current active worksheet.

  • Share/Bookmark

2 Responses to “Extracting data from the searching results of citation database”

  1. new software says:

    Thanks Gerat post.

  2. Tania says:

    Thanks for article. Everytime like to read you.
    Have a nice day

Leave a Response

You must be logged in to post a comment.