Over the past couple of days, I've been spending some of my free time building a real search index for the site. Until now, if you searched for something, you'd only get results if your "exact phrase" matched something I had manually added to my hand-built index of popular tips and titles. Or you might get a hit if the words appeared in exactly the same order you typed them somewhere on a page or in a comment.
The new search index changes that. It's more like a Google-style engine that breaks down every page and every comment into individual words. Now, if you type in three words, you'll see results as long as all three words appear somewhere in the same page or comment - even if they aren't right next to each other. It's still strict in one sense: if you type five words, then all five need to be present for a match. But at least you no longer need the exact phrase, which is already a big step up.
Next on my list is to loosen that rule so you won't need all of the words. Maybe four out of five will be enough, which should make the results more flexible. That's coming soon. For now, the indexer is still chugging away in the background, working through nearly 20 years of pages and comments. After just a couple of days it's already over 100,000 records, and it's indexing newest-to-oldest, so the most recent material will show up first.
So give it a try. Play around with it. Let me know how it works for you and what you'd like to see improved. I'm having fun learning about search engine design, and I'd love your feedback.
And yes, eventually this will probably become a database lesson of some kind. LOL
Eventually you may have to really on AI to help you with that. The greater the amount of text, the greater the need for advanced search algorithms, and your site is huge. There may be misspelled words, alternate spellings, word conjugations, contractions, dialectic phrases, etc., that can't be handled with traditional search methods.
Yeah, I'm taking it step by step. I'm starting off with just basic keyword indexing. Then I'm going to add help with synonyms and stuff like that. I want to get it as good as I can with just traditional algorithms, because I don't want to rely on AI API calls every time someone wants to search for something.
So there is already an AI search bot that's been programmed with the outlines for my courses and the tech help videos. It's not perfect, but it's got a lot in it. So I might eventually merge them, but we'll see. It's still a work in progress. It's when I get some free time. And there's always the option to Google Search. If you scroll all the way to the bottom of my search page, there's a link there that will do a Google search on my site, which I sometimes have to rely on myself.
Well, after the indexer ran for about 24 hours, it indexed about half the pages on the site and about 1% of the comments (those will take a while), but I had to restart it. I did some quality testing, and I realized that I didn't factor apostrophes into the tokenizer, so it was breaking up "ain't" or "can't" into two separate words, "can" and "t". I'm not indexing words under 4 characters long because they typically don't produce good results anyways. So I had to clear the indexed flag from all the pages, then let it run again.
I kept all the data that's in the indexing table, so it's all still in there, but I cleared the indexed y/n field from all of the page entries and all the comments, so it'll just look like none of them have been indexed. It's going to slowly re-go through all of them and re-index them again. So pay no attention to those percentages for now.
But like I said, it's a work in progress. I went to search for my "Null Ain't Null" video just by using the word "ain't" and I couldn't find it, and that's when I realized there was a problem.
And in a few minutes, after this comment is indexed, it should show up too. Yay.
If you are a Visitor, go ahead and post your reply as a
new comment, and we'll move it here for you
once it's approved. Be sure to use the same name and email address.
This thread is now CLOSED. If you wish to comment, start a NEW discussion in
General Discussion Forum.