Searchable Document Index, MS Word Automation
This class is all about building a searchable document index in Microsoft Access. We are going to learn how to use Microsoft Word Automation to take a file - any file type that Word can read, including DOCX and PDF - and then automatically convert that document to text so we can read it into our Access database. We will then parse the text of that document, index each word, and store a count of each word for each document. We'll also spend more time on Recordsets, learning how to search within an open Recordset using the FindFirst and NoMatch commands.
In Lesson 1, we will begin building our searchable document index. We'll create the needed tables and forms. We'll review how we can use FollowHyperlink to open any file. We'll build a search box to show only documents that contain our search term. We'll assign documents to customers, and then open a list of just that customer's documents from the customer form.
In Lesson 2, we will add a button so that we can browse and pick a file. We will then use Microsoft Word Automation so that we can open the document in Word in the background using VBA, copy that document's text, and paste it into our database. This will work with any file type that Word supports: DOC, DOCX, PDF, TXT, and more.
In Lesson 3, we will talk about the reasons why it's beneficial to create your own index instead of just relying on Access to index the field (hint: one of the major reasons is to save space). We'll talk about the limitations of long text fields, and I'll show how you to overcome the 64k limitation on text boxes. Then we'll switch from using copy/paste to get the text data from Word and actually make Word convert the document into a text file. We'll use VBA File I/O which we learned in Developer 30 to read the text directly into our database.
In Lesson 4, we will build our own keyword index. We'll parse the text of each document and store each keyword in a separate index table. This saves a ton of space. We'll review using a composite key to prevent duplicates.
In Lesson 5, we will make a subform to show the keywords for each document. We'll build an exception table so that we don't waste time and space indexing connector words (the, an, them, he, she, etc.) We'll improve upon our search algorithm, changing the recordsource property for our document form if the user is doing a search vs. just browsing, editing, or adding.
In Lesson 6, we will display the percent completed while building an index as large documents can take a few minutes to parse. We'll add an abort checkbox in case the user gets tired of waiting. We'll add a word count for each keyword so you can see which documents may be more relevant for certain keywords than others. We'll learn how to search for records while inside an active recordset using the FindFirst and NoMatch commands. Finally, we'll spend some time optimizing our AddToIndex loop to speed it up as much as possible.
Enroll now so that you watch these lessons, learn with us, post questions, and more!
Please feel free to post your questions or comments below. If you are not sure as to whether or not this product will meet your needs, I'd rather help you before you buy it. Remember, all sales are final. Thank you.
microsoft access, access 2016, access 2019, access 2021, access 365, ms access, #msaccess, #microsoftaccess, #help, #howto, #tutorial, #instruction, #learn, #lesson, #training, #database, Searchable Document Index, Microsoft Word Automation, Convert PDF to TXT, DOCX to Text, Index Individual Keywords, rs.FindFirst, rs.NoMatch, Limitations of Long Text Fields, Overcome 64k Limitation, WordDoc.SaveAs2, wdFormatText, Exception Table, Display Percent Completed, Abort, Word Count, Finding Records in a Recordset, rs.FindFirst, rs.NoMatch, dbOpenDynaset
You may want to read these articles from the 599CD News: