|
||||||
|
Introduction Welcome! Searchable Doc Index with Word Welcome to Microsoft Access Developer Level 39. In this course we will focus on building a searchable document index in Microsoft Access using Word automation and VBA. We will discuss how to import different document types, extract their text, and load keywords into your database for easy searching, including handling exception words. We will cover building the necessary tables, linking documents to customers, setting up queries and subforms, using FollowHyperlink, working with recordsets, optimizing performance, tracking word counts, and adding progress indicators. This course builds on material covered in previous levels and is designed for those who have completed the earlier series. NavigationKeywordsAccess Developer, searchable document index, keyword indexing, word automation VBA, import Word documents, import PDF text, file IO, recordsets, NoMatch property, exception table, junction table, composite key, search box, word count, query optimization
IntroWelcome to Microsoft Access Developer Level 39. In this course we will focus on building a searchable document index in Microsoft Access using Word automation and VBA. We will discuss how to import different document types, extract their text, and load keywords into your database for easy searching, including handling exception words. We will cover building the necessary tables, linking documents to customers, setting up queries and subforms, using FollowHyperlink, working with recordsets, optimizing performance, tracking word counts, and adding progress indicators. This course builds on material covered in previous levels and is designed for those who have completed the earlier series.TranscriptWelcome to Microsoft Access Developer Level 39 brought to you by AccessLearningZone.com. I am your instructor Richard Rost.In today's class, we are going to focus on building a searchable document index. That is where we can take different documents and load the text of those documents into our database so we can search on the keywords. If you want to see which customers have documents that have the word elephant in them, then it is a very easy search to do. We are going to learn how to use Microsoft Word Automation. In VBA, we will be able to click a button. We will have Microsoft Word open up in the background and it will read in the file. Whether it is a PDF, a text file, a Word document, or any document type that Word supports, you can do this with. We will take that document, save it as text, then we will pull the text into our Access database and index all the keywords. We will build an exception table, of course, so we do not have to bother indexing words like the, in, or at. We will work with recordsets some more. I know you guys love recordsets. I spent a lot of time on recordsets in earlier lessons. We will learn how to find records while a recordset is open and we will learn about the NoMatch property. This lesson follows Access Developer 38 and everything that comes before it. I strongly recommend you have completed my Beginner, Expert, Advanced, and Developer series before continuing with this lesson. My lessons are meant to be taken one after the other. Each one builds on the one that came before it. Look at this page if you do not believe me. Read that. I will be using Microsoft Access 365. It is currently July of 2022, so it is roughly equivalent to Microsoft Access 2021. However, the lessons covered in this course should be applicable going back to, I am going to say, Access 2007. So if you are using something earlier than that, forget it. 2007 and on, all of that should work just fine with what we are covering today. If you have got questions about the stuff covered in today's class, scroll down to the bottom of the page that you are watching this on and post a comment right there on the lesson page. If you have got other questions that do not necessarily pertain to the topics covered today, but you have got other Access questions or Excel or Word, post them in the forums on my website. Let us take a few minutes and cover exactly what we are going to do in today's class. In lesson one, we are going to begin building a document index, a searchable document index. We are going to create the table to hold this information. We are going to use FollowHyperlink to open files. We have done that before. That is nothing new. We are going to then copy and paste the document text into the index table and create a search box so that we can search for documents that have any particular keywords in them. Then we will assign each document to customers so you can go to the customer's record and open up any documents related to that customer. In lesson two, we are continuing on with our document index. We are going to make a button to browse and pick a file. We will automatically load that document into Microsoft Word in the background using some VBA and some Word automation. Then we will be able to convert any PDF file, Word document file, or text file into plain text that we can pull into our Access database. That is all covered in lesson two. In lesson three, we are continuing with the document index. We are going to learn some reasons why you would want to build your own index. There are a bunch of reasons why. We will cover them in this lesson. There are a lot of limitations with long text fields. One of them is you cannot edit a long text field of more than 64k. It is 4,000 characters roughly. We are going to learn how to use file input/output from Access Developer 30 to import text files directly. We will open up Word, we will save the Word doc or the PDF as a text file, then we will use file I/O to read that in instead of copying and pasting it. All that is covered in lesson three. In lesson four, we are continuing on with the document index. We are going to actually build the index tables that hold the data for all the keywords. The index table has the list of keywords in it just once. Then we will build a junction table to track which keywords are in which documents. We will use a composite key to prevent duplicates. We will write an AddToIndex subroutine and filter out unnecessary characters. Lots, lots more. In lesson five, we are going to set up our document index Q and our document index subform - this little guy right here. You can click on a document and see whatever words are in that document down below. We will set up our exception table, which is a list of words you do not care to index: that, them, there, they, about, its, etc. Then we will make sure that the search now works taking the index table into consideration, which means we have to change the record source when we go between not searching or showing all the documents. It is a little tricky. A lot of good query stuff in this lesson and we are going to cover that all in lesson five. Here we go. Finally, in lesson six, we are going to finish up this document index. We are going to display the percent completed right there. So you are going to see 73 percent completed while this index is building, which is great for you if you get longer documents and they take a while, like this big one that we did down here. It could take a few minutes to compile. That is what we are going to spend some time with at the end of the lesson. We are going to optimize this. We are going to try and replace some of this SQL we have been writing with recordsets. We are going to use FindFirst and NoMatch, which is this new recordset stuff, folks. New recordset stuff. I know you guys love recordsets. We are going to open dbOpenDynaset, which I forgot about when I was writing a code the first time. Then I got an error message and had to fix it. I know you guys love my error messages too. We are going to add a word count. That is another reason for adding the recordset loop here. So you could see, OK, the word computer was in here in this document nine times, the word people is in it seven times. We will do a lot more recordset coding. So this will conclude our document index, but this lesson has got a lot of cool stuff in it and some new material. Here we go. QuizQ1. What is the main focus of Access Developer Level 39?A. Building a searchable document index in Microsoft Access B. Creating sales reports in Access C. Developing a customer relationship management form D. Designing navigation menus Q2. What method will be used to pull text from different types of documents into Access? A. ODBC Data Import B. Microsoft Word Automation using VBA C. Manual entry only D. Access Macro Actions Q3. Which type of documents can be converted and indexed using the techniques in this class? A. Only Word documents B. Only PDF files C. Any file type supported by Microsoft Word D. Only text files Q4. What is the primary purpose of the exception table mentioned in the class? A. To store customer information B. To list customers who have overdue documents C. To store words that should not be indexed D. To hold sales tax rates Q5. Why is it necessary to create a junction table in the context of the document index? A. To prevent users from editing documents B. To track which keywords are found in which documents C. To delete documents from the database D. To save back-up copies of each document Q6. What issue with long text fields in Access is specifically mentioned in the lesson? A. Text fields are not searchable at all B. You cannot edit long text fields over 64,000 characters C. Long text fields automatically format text as bold D. Text fields accept only numbers Q7. What is a benefit of using file input/output (file I/O) as described in Lesson 3? A. It allows for direct use of binary images B. It speeds up form design in Access C. It enables importing text files directly into Access tables D. It encrypts all database content Q8. What is the 'NoMatch' property used for in relation to recordsets? A. To determine if a recordset has been updated B. To check if a search within an open recordset did not find a matching record C. To format text for printing D. To lock database tables Q9. What does the percent completed feature show when building the index? A. The number of customers linked to documents B. The amount of time left before Access closes C. The percentage of the document indexing task completed D. The size of the Access database Q10. What additional information is added when tracking word occurrences in documents? A. The file path of each document B. The date the document was created C. The number of times each word appears in a document D. The owner of the document Q11. Why should students have completed the prior courses (Beginner, Expert, Advanced, and Developer series) before this lesson? A. Because later lessons include easier material B. Because each lesson builds on previous concepts and techniques C. Because exam questions are only covered in earlier lessons D. Because Microsoft Access 365 is required Q12. Which Access versions are these techniques expected to work with? A. Only Access 365 B. Access versions before 2000 C. Access 2007 and newer D. Only Access for Mac Answers: 1-A; 2-B; 3-C; 4-C; 5-B; 6-B; 7-C; 8-B; 9-C; 10-C; 11-B; 12-C DISCLAIMER: Quiz questions are AI generated. If you find any that are wrong, don't make sense, or aren't related to the video topic at hand, then please post a comment and let me know. Thanks. SummaryToday's video from Access Learning Zone is Microsoft Access Developer Level 39. I am your instructor, Richard Rost.In this lesson, the focus is on creating a searchable document index. The goal is to load various types of documents into your Access database so you can search for keywords within those documents. Imagine you want to locate all customer documents that mention the word "elephant" - with this system, performing that search becomes quick and simple. We will cover Microsoft Word Automation in VBA, allowing you to press a button that opens Word in the background. Word will read the contents of files, whether they are PDFs, text files, Word documents, or any other type supported by Word. Once Word converts the document into plain text, we will import that text into our Access database. From there, we will index all the keywords so searching is more efficient. To make the index meaningful, we will build an exception table to exclude common words like "the," "in," and "at" from our indexing process. This class also involves working with recordsets again. I have emphasized recordsets in previous lessons, and now we will add to that with techniques for searching through records while a recordset is open and using the NoMatch property. This lesson builds directly on Access Developer 38 and previous videos. I strongly recommend you have completed all of my Beginner, Expert, Advanced, and Developer series lessons before continuing here, as each class builds on the material that came before it. My videos are designed to be followed in sequential order. For reference, I am using Microsoft Access 365, as of July 2022. The material should apply to Access 2021 as well, and even as far back as Access 2007. If you have an older version, you may encounter compatibility issues. If you have any questions specifically about this lesson, you can post them in the comments area beneath the video. For other questions about Access, Excel, or Word, visit the forums on my website. Now let me outline the structure of this course and what you can expect to learn. In lesson one, we start developing the searchable document index. We will design the table structure to store the text from documents. You will learn to use the FollowHyperlink method to open files, then copy and paste document text into the index table. I will also demonstrate how to create a search box to find documents containing particular keywords. Each document can be linked to customers, so you can view all documents relevant to a specific customer right from their record. In lesson two, we continue with the document index and add a button for browsing and selecting files. We will automate the process of opening Microsoft Word in the background using VBA, converting various document formats into plain text. That plain text will be brought into Access for indexing. This approach works for PDFs, Word documents, and text files. In lesson three, we dive deeper into building your own document index and discuss why this approach is beneficial. I will explain some limitations with long text fields, such as editing restrictions when exceeding about 4,000 characters. We will look at using file input and output, a topic I first covered in Access Developer 30, to import text files directly. The process involves having Word or PDF files saved as text, then using file I/O to bring that content into Access instead of relying only on copy and paste. Lesson four is where we create the keyword index tables in the database. The main index table will hold each keyword only once, and a junction table will track which keywords appear in which documents. This involves using a composite key to avoid duplicates. We will write an AddToIndex subroutine, filter out special characters, and cover related topics. Lesson five focuses on setting up the document index query and the document index subform. This allows you to click on a document and view all associated keywords underneath. We will build an exception table for words that should not be indexed, such as "that," "them," "they," "about," or "its." The search function will be updated to take the index table into account. This does require some changes to the record source, depending on whether you are searching or viewing all documents. There is plenty of practical query work in this section. Lesson six wraps up the document index project. We will display the percentage completed while building the index, which is especially helpful for larger documents that can take a while to process. This part will also address code optimization, swapping out some SQL statements for more efficient recordset code. We will use FindFirst and NoMatch properties of recordsets. If you've watched my earlier coding videos, you know I sometimes forget to open dbOpenDynaset and run into errors, but we'll address those here as well. Additionally, we will implement a word count feature. This means you'll not only know which keywords appear in each document, but also how many times each word is used. This is another example of where looping through recordsets is necessary. By the end of this lesson, you will have a fully functional, searchable, and efficient document index system in your Access database, complete with keyword tracking, customer association, and progress monitoring. There is plenty of new material in this lesson, and I look forward to sharing it with you. You can find a complete video tutorial with step-by-step instructions on everything discussed here on my website at the link below. Live long and prosper, my friends. Topic ListBuilding a searchable document index in AccessUsing Microsoft Word Automation with VBA Loading document text into an Access database Creating a table to store document text Using FollowHyperlink to open files Copying and pasting document text into table Creating a search box for document keywords Assigning documents to customers Browsing and selecting files with a button Automatically loading documents into Word via VBA Converting PDF, Word, and text files to plain text Importing extracted plain text into Access Understanding limitations of long text fields Using file input/output to import text files Building index tables for keywords Creating a junction table linking keywords and documents Using composite keys to prevent duplicate records Writing an AddToIndex subroutine Filtering out unnecessary characters from keywords Setting up an exception table for common words Displaying indexed keywords in a subform Adjusting search queries using the index table Showing index build percent completion Optimizing queries with recordsets Using FindFirst and NoMatch with recordsets Adding word count functionality to the index ArticleIn this tutorial, we are going to build a searchable document index in Microsoft Access. This means you will be able to bring different types of documents into your database, extract their text, and search for keywords within that text. For example, if you want to find all customer documents that mention the word elephant, this system will let you do that quickly and easily.To achieve this, we will use Microsoft Word Automation in VBA. With Word Automation, you can programmatically open Word in the background, load a document, and extract its text. This approach is powerful because Microsoft Word can open and convert a wide range of document types, including Word files, PDF files, and plain text files. By using Word Automation, we can open a chosen document, save it as plain text, then read that text into our Access database for indexing. First, you need to create a table in your Access database that will hold information about your documents. Typically, this table will have fields like DocumentID, CustomerID, DocumentName, DocumentText, and any other relevant information such as upload date. You should also have a Customers table if you are linking documents to specific customers. To open documents directly from your database, you can use the FollowHyperlink command in VBA. This command allows you to open files, folders, and URLs from within Access. You might have a button on your form that, when clicked, runs code like the following: Application.FollowHyperlink Me.txtFilePath Assuming txtFilePath is a textbox containing the full file path, this will open the selected document using its default application. Next, we want to automate the process of extracting text from documents. To do this, you must first add a reference to the Microsoft Word Object Library in your Access VBA editor. Go to Tools, References, and check Microsoft Word xx.x Object Library, matching your version number. With that set up, you can create a VBA subroutine to open a document in Word, save it as plain text, then read that text file back into your database. Here is an example outline of how that code might look: Sub ImportDocumentText(strFilePath As String) Dim wdApp As Word.Application Dim wdDoc As Word.Document Dim strTempTextFile As String Dim strTextContents As String Set wdApp = New Word.Application wdApp.Visible = False Set wdDoc = wdApp.Documents.Open(strFilePath) strTempTextFile = Environ("Temp") & "\temptext.txt" wdDoc.SaveAs2 FileName:=strTempTextFile, FileFormat:=wdFormatText wdDoc.Close False wdApp.Quit Set wdDoc = Nothing Set wdApp = Nothing Dim intFile As Integer intFile = FreeFile Open strTempTextFile For Input As #intFile strTextContents = Input$(LOF(intFile), intFile) Close #intFile Kill strTempTextFile ' Now save strTextContents to your database ' Example: CurrentDb.Execute "UPDATE DocumentT SET DocumentText = '" & Replace(strTextContents, "'", "''") & "' WHERE DocumentID = " & Me.DocumentID End Sub You can call this subroutine from a button, passing in the file path of the document you want to index. Now that you have the document text in your database, you will want to create a way to search for keywords. This can start with a simple form with a textbox for entering a keyword and a search button. The search button can run a query that looks for the keyword inside the DocumentText field. For example, your query might look like: SELECT * FROM DocumentT WHERE DocumentText LIKE "*" & [Forms]![frmSearch]![txtKeyword] & "*" This basic approach works, but for better performance and more advanced features, you will want to build an index of keywords. To do that, create a new table for the unique keywords, for example, KeywordT with KeywordID and Keyword. Then create a junction table, DocumentKeywordT, with DocumentID and KeywordID, to track which keywords appear in each document. Use a composite primary key (DocumentID, KeywordID) to avoid duplicates. To build the index, write a VBA subroutine that goes through the DocumentText, splits the text into words, checks each word against a list of exceptions like "the", "and", "or", and indexes only the useful keywords. Create another table, ExceptionT, and fill it with the words you do not want to index. Here is a sample outline of what your indexing subroutine might look like: Sub AddToIndex(docID As Long, docText As String) Dim arrWords() As String arrWords = Split(docText, " ") Dim db As DAO.Database Dim rsKeyword As DAO.Recordset Dim rsDocKeyword As DAO.Recordset Dim rsException As DAO.Recordset Set db = CurrentDb Set rsKeyword = db.OpenRecordset("KeywordT", dbOpenDynaset) Set rsDocKeyword = db.OpenRecordset("DocumentKeywordT", dbOpenDynaset) Set rsException = db.OpenRecordset("ExceptionT", dbOpenSnapshot) Dim word As String Dim keywordID As Long For Each word In arrWords word = Trim(LCase(word)) word = Replace(word, ".", "") word = Replace(word, ",", "") ' Add more replacements for other punctuation as needed rsException.FindFirst "ExceptionWord = '" & word & "'" If rsException.NoMatch And word <> "" Then rsKeyword.FindFirst "Keyword = '" & word & "'" If rsKeyword.NoMatch Then rsKeyword.AddNew rsKeyword!Keyword = word rsKeyword.Update rsKeyword.FindFirst "Keyword = '" & word & "'" End If keywordID = rsKeyword!KeywordID rsDocKeyword.FindFirst "DocumentID = " & docID & " AND KeywordID = " & keywordID If rsDocKeyword.NoMatch Then rsDocKeyword.AddNew rsDocKeyword!DocumentID = docID rsDocKeyword!KeywordID = keywordID rsDocKeyword!Count = 1 rsDocKeyword.Update Else rsDocKeyword.Edit rsDocKeyword!Count = rsDocKeyword!Count + 1 rsDocKeyword.Update End If End If Next word rsKeyword.Close rsDocKeyword.Close rsException.Close Set rsKeyword = Nothing Set rsDocKeyword = Nothing Set rsException = Nothing Set db = Nothing End Sub This code assumes you have added a "Count" field to your DocumentKeywordT table, so you can keep track of how many times each word appears in the document. As you build your document index, you may want to display progress to your users, especially with large documents. You can update a progress bar or percentage display as you go through the words so that users know how far along the process is. Once your index tables and code are in place, you can build search forms that let users enter one or more keywords and see all documents that contain those words. You can even display all the keywords found for a selected document in a subform. This gives users a powerful way to explore and filter documents based on their content. Throughout this process, you will make heavy use of recordsets in VBA. Recordsets allow you to read, find, edit, and add records dynamically as you process the document text and build your index. The FindFirst method helps you locate a specific record, and the NoMatch property indicates if a search found any results. Using dbOpenDynaset is important so that your recordset is updateable. As you refine your system, you may want to optimize your queries and code further, perhaps replacing certain SQL-based operations with recordset-based loops for greater control and efficiency. If you run into errors, remember to watch for issues like accidentally opening a recordset as read-only when you need to edit it. Adding error handling can also help make your code more robust. Finally, by building this searchable index, you create a flexible system where you can link documents to customers, easily search for documents containing any keywords, and even display word counts for analytics. This approach will work in recent versions of Access, including Access 2007 and newer, and uses powerful VBA automation with Word to make the process seamless. With these steps and examples, you should have a solid foundation for creating an efficient and useful document indexing system in your Access applications. |
||
|
| |||
| Keywords: Access Developer, searchable document index, keyword indexing, word automation VBA, import Word documents, import PDF text, file IO, recordsets, NoMatch property, exception table, junction table, composite key, search box, word count, query optimization PermaLink How To Build a Searchable Document Index With Word Automation and Recordsets in Microsoft Access |