|
||||||
|
|
Normalizing Data By Richard Rost What Normalization Means. How To Fix Bad Tables. In this video we're going to learn about database normalization for Microsoft Access. This is, in a nutshell, making sure that your tables hold the correct types of data to form proper relationships. We will go over some database theory, and then I'll show you how to take a table that holds both customer and order data and normalize it by splitting it into two tables with a proper relationship between them. Leslie from Harrisonburg, Virginia (a Platinum Member) asks: I'm trying to move my data from Excel to Access. I've watched your video on importing, it was very helpful. Thank you. But what do I do if I've got data mixed together that shouldn't be? For example, I've got one sheet in Excel that mixes customer and order data together. Is there a way to separate it? MembersMembers will look at three more examples of normalizing data. We'll separate customer data by like address fields, we'll take a table that has a parent and multiple children listed together and split that into two tables, and we'll do the same thing for customers with multiple addresses and phone numbers.
Silver Members and up get access to view Extended Cut videos, when available. Gold Members can download the files from class plus get access to the Code Vault. If you're not a member, Join Today!
Links
Suggested Courses
See Also
Keywordsmicrosoft access, ms access, ms access tutorial, #msaccess, #microsoftaccess, #help, #howto, #tutorial, #learn, #lesson, #training, #database, normalization, normal forms, standard, normalized, relationships, key fields, how to normalize data in access, step by step normalization example, separate customers from orders
IntroIn this video, we'll talk about the basics of database normalization in Microsoft Access, including what normalization is, why it's important, and how it helps reduce data redundancy and improve data integrity. I'll walk through a common scenario where customer and order information are mixed together in one Excel sheet, then show you how to separate that data into proper related tables using append and update queries in Access. We'll also discuss key concepts like primary keys, referential integrity, first and second normal forms, and tips on setting up unique identifiers for your data.TranscriptWelcome to another TechHelp video brought to you by AccessLearningZone.com. I am your instructor, Richard Rost.In this video, we're going to talk about proper normalization and what that means for your Microsoft Access databases. Normalization is one of those fancy database terms that you don't know what it is until you learn what it is, and then once you know what it is, you'll really appreciate it. So keep watching. Today's question comes from Leslie in Harrisonburg, Virginia, one of my platinum members. Leslie says, I'm trying to move my data from Excel to Access. I've watched your video on importing. It was very helpful. Thank you. You're welcome. But what do I do if I've got data mixed together that shouldn't be? For example, I've got one sheet in Excel that mixes customer and order data together. Is there a way to separate it? Leslie sent me a sample of her data. It looks something like this. Yes, I recreated it for class. But she's got her order information all in one sheet. It's got the order ID. It's got the customer. But as you can see, the customer information repeats. Every time a customer puts another order in the system, they just put in the customer's name and phone number again. Plus, they've got address and some other stuff that they copy and paste each time. I see this a lot. Then they have the information about the order: the order date, the amount, what was purchased, and so on. But basically, this information should be broken up into two distinct tables: a customer table and then an order table. Now, this is a process called normalization. That's basically taking information that should be in multiple tables and putting it in separate tables, breaking up big tables. There are a lot of different rules in normalization. It gets really nerdy, folks. If you want to get into the database theory, but as always, I'm going to teach you what you need to know for today, the basics of normalization. I've been doing this stuff for over 30 years, and even I don't use all the normalization rules. Let's just talk about the basics today. Database normalization is organizing your database so as to reduce data redundancy and improve data integrity. What does that mean? Redundancy: A table has redundant data if it unnecessarily duplicates information that should be in separate, related tables. This is exactly Leslie's problem. As you can see here: customer, phone number, order date, amount, and then products purchased. As you can see, Jim Kirk has two records in here. There is Jim Kirk, there he placed an order on 1/1, and there's a second order down here for Jim Kirk on 1/20, different stuff. But this is redundant information. We don't need to have Jim Kirk and his phone number listed twice. That should be in a separate table. That's redundant information. Then the second thing is integrity. Tables in a normalized database need to have both entity integrity and referential integrity. What does that mean? Well, if you've done any work with Access, if you've watched any of my videos, you get this. I'm just throwing at you some database terms that database nerds use. A table with entity integrity has a field with a unique identifier for each record. In Access, this is called an auto number. It's usually an auto number. And I have a whole separate set of videos on auto numbers. I'll put links down below for all the extra videos I'm going to refer you to. But entity integrity means each record in here has an entity, an ID of some kind that uniquely identifies each record. You would never have two customer ID 3s, for example. The second part of the integrity equation is a table with referential integrity has no one-to-many relationships where there's a child, the many side, that's missing its parent. For example, here's the parent table, the customers, customers one through four. Over here is a child, a related table, you've got order IDs and customer IDs. Here's a missing customer ID. There is no customer six in this table. So that's an orphan record. This database does not have referential integrity. Again, if you've watched my other videos, I show you how to set up referential integrity in your table relationships to prevent this from happening. If you're normalized, you've got data integrity and you've got no redundancy in your tables. As I said, there's a whole bunch of other nerdy stuff too. Don't worry about the rest of it. This is for practical day-to-day use of Access. We're not getting into super high computer science database theory today. You might also hear these terms thrown around: normal forms. This is not Access forms. This is not the forms that we build in class to put our customer data in. Normal form is like the form or the format of the database. There's first normal form, second normal form, third, fourth, fifth, sixth, and some weird ones. I usually only worry about first and second. If you know these two things, you're good with 99 percent of database problems. In my 30 years of building databases with Access, some of that stuff's really even too nerdy for me, but this will get you by. The first normal form says that a table doesn't have a repeating list of items in any of its fields. For example, here's the order table that I showed you before. Over here we have products purchased: three photon torpedoes, six phaser rifles, and tribble bait. This should not be in one field. This should be broken down into multiple records in a related table. This is what it should look like. You have your orders over here and your order details over here. Order one for Jim Kirk: here's order one and you can see the three items that are placed for order one, and each one of those has its own order detail ID. I cover this in my invoicing video. Here's your photon torpedo three of them, phaser rifle six, and so on. Instead of listing them in the same field, I see this a lot. People put a list of items or subjects, for example for books, or genres for music, just as a comma-delimited items list in a long notes field or something. Don't do that. This should be in a separate table. First normal form can also refer to a table where all of the fields are broken down into the smallest component parts. Now, I don't always agree with this. For example, you can see here, customer list: Jim Kirk, Will Riker, Mr. Worf, and so on. Well, first normal form technically dictates you should break this up into first name and last name as separate fields. I don't normally get this picky with my data, with names. Usually I'll do first name and last name. You can even do prefix, suffix, middle name, that kind of stuff. Break it down as much as you want. But I'm okay if you want to leave your customer name like that. It's not the worst thing in the world. I've seen some people with addresses. For example, you can have just an address field, or you can break it down into address, city, state, zip, address one, address two, the street number, the street type (is it an avenue). You can break it down into all that kind of stuff if you want to, or just do address. This really is dependent upon your needs for your database. As I tell people in my Access Beginner 1 class: if you're building your database from scratch, if you're starting off new, you want to break it down into as many parts as possible up front because it's a lot easier to take first name and last name and put them together later than it is to take a customer name and split it apart. I do have a video where I show you how to do that. We split it apart based on the spaces. I'll put a link to that video down below. It's not as easy as just taking these two together and concatenating them. Now, moving on to second normal form. A table is in second normal form if any non-key fields that are not directly related to the primary key are moved to a different table. Basically what this means is, like we said before, these fields like the customer name and phone number that aren't directly related to the primary key (the order) should be moved to a different table. So we end up with a customer table and then we have the replacement over here: a key field, a foreign key field, that refers to that customer by an ID. That's the second normal form. There are always exceptions, as I said, sometimes you don't want to do this. Sometimes you want to take, for example, the address that an order was shipped to, and store it in the address table. In that case, it does refer specifically to an order. So that's the case where you want to copy the order information from the customer table into the order table. That's a process called denormalization, and that's a whole different ball of wax. I talk about that in my expert classes. Enough theory for today, let's get into some practical examples here. Let's see how we can take that Excel spreadsheet that's got customer information mixed in with the orders and we'll separate this into a customer table and an order table. But first, some prerequisites. You should know how relationships work between tables. If not, go watch my relationships video before continuing with this one. It's a free video, it's on my YouTube channel, it's on my website. Go watch it now, then come back. Also, you should know how to import data from Excel into Access and you should know how to use an append query (that's taking data from one table and adding it to another table). These are all free. Go watch them and then come back. Pause, open up another window or whatever. You'll find links to these down below in the description under the video and a little link section. Here's my sample Excel spreadsheet. I've got an order ID, I got a customer and as you can see the customer has repeated, there's Jim Kirk twice, the customer, the phone number, then we got the order information: the order date, the amount, whatever other order fields you might want to have over here, has been paid, that kind of stuff. Now, as far as the items on the order, yes, those should all be split off into separate records in a related table as well. You can use the same technique that I'm showing you here to split orders and order details. It doesn't matter what two things you're trying to split. You can split parents and children, you can split employees and employers, you can split companies and workers, whatever. It doesn't matter. Same stuff, just put together differently. One thing that you're going to need is some field in here to uniquely identify each customer. We have an order ID, so the order information is all set, that's fine. If not, you're going to have to add one, you need some way to uniquely identify those orders too. As far as the customer is concerned, I see this a lot where people have the same customer, but James T. Kirk is the second one, but someone else typed it up here as Jim Kirk. That's one of the major reasons why you want to have properly normalized data in the first place, it prevents that from happening. But you're going to run into tables where you've got the data and it's like that, so some field, whether it's phone number, whether it's social security number, an employee ID, whatever, you need to figure out some way to uniquely identify these records. I'm going to base it on phone number and you'll have to just sort this list. You'll have to sort this list and take a peek at it and make sure that these are all the same. Change these if you want to, it won't matter once we import it, you'll get one of them. Make sure that this field is the same. Some field has to be the same for each of your customers and this might even be slightly different. You might have to go through and scrub your data a little bit. Access can't fix data that's been badly entered. You're going to have to go through and manually tweak it a little bit. Make sure you've got at least one field representing your customers that's unique. Or it could be a combination of multiple fields. It could be address, city, state, and zip put together as long as the addresses are all the same. That's something you're going to kind of have to do by hand. There's no way to easily automate that. But once you've got this looking good, let's go over to Access and pull it in. So we're under External Data, New Data Source, File, Excel. This is covered in that importing video that I made you watch. We're on my desktop, I believe, here. And I got a bunch of sample files, your sample one. Hit OK. There it is. First row contains my column headings. Yes. Next. Here's the thing: This is order ID. I want to assign auto numbers to my order records as they come in. But this order ID is not going to do. I'm going to rename this right now as the existing order number. I want to keep it, but I'm going to call it the order number. You'll see why in a minute. These are all good. That's text, text, that's a date, that's a currency. Beautiful. Next. I'm going to let Access add a primary key. Why? Because it's going to add that auto number, and I want an auto number. This won't be an auto number. This will just be a number. You can't turn this into an auto number once you import it, trust me. Next. Let's import this into a table called orderT, my order table. Do I need to save these steps? No. There's my orders. Came in pretty good. Let's take a peek at it. Design View. I'm going to rename this to orderID. Now, we need to hang onto this order number for right now, especially if this order number is going to represent something that you have in a related table. You don't want to lose this just yet. Keep in mind, this is data in a perfect world. This might look like 101, 124. This might be just some random numbers that you guys assigned to orders before, whatever. But now I got my set of auto numbers in here, which is what I want. Now let's create a customer table because we're going to peel this stuff off and put it in a customer table using an append query. But we're going to make the table first. So come over here and let's close this guy. Let's go to Create, Table Design. I want a customerID, that's going to be an auto number. I got customer name, which is text. I don't have first name and last name in this table here. We just got one customer, that's fine. Then we have a phone number, and that's short text as well. What's going to happen is when I take this order data and I bring it over into the customer data, I don't want anything to repeat. I've already decided I want to use phone number as my unique identifier. So what I'm going to do is in this table, I'm going to take phone number and I'm going to index this. I'm going to set this to index: Yes, no duplicates. So when the append query runs, it won't copy over any record that's already in there based on its phone number. That's how I can ensure these to be unique. Let's save this. This will be my customer table. Primary key, yes, it'll make that auto number the primary key. Now close this. We don't need this open still. Let's make our append query now. Create, Query Design. We know how to make append queries. We watched the append query video. I'm going to bring in the order table. Close that. Let's set this as an append query. We're going to be appending into the customer table. It OK. What fields do we need? The customer ID is going to be assigned, that's an auto number. We're going to append customer into the customer name field. We're going to append phone into the phone field. Let's save this as, we'll just call it appendQ because we only need one in this database. Now I'll run it. I have warnings turned off and I'm still going to get this warning. It says it can't add zero fields due to null type conversion failures. That's fine. That means we didn't have any of those. But it didn't add one, I don't even know why it bothers to tell you the zero. But it didn't add one record due to key violations. Key violations means I tried to add a record that had an index key value (that phone number) more than once, and there was one key violation. That's to be expected. Do you want to continue running it? Yes, go ahead. Everything appears to be done. Let's open up the customer table. There we go, the unique list of customers. We should have four customers in there. It's skipped customer three because that was where the key violation was, but that doesn't matter, we don't care what auto numbers are. Remember, auto numbers are not for you folks, they're for Access. Doesn't matter what these are. Now, we can get rid of the append query. There's my customers. There's my orders. Now, I need some way to relate these two things together. In the order table, I have to put a customer ID in here. How do I get that customer ID over there? Well, I got to link these two things together somehow. What's the field that links them together right now? It's the phone number field. That was the field that I'm using to uniquely identify each customer. We're going to add a customer ID field into the order table: customerID, a number field of type long integer. I'm going to slide it up top. I like to keep the fields up top. This is where we're going to put the customer ID from this table. Save it. Close it. Open it up. Right now it's all blank. That's fine. We're going to use another query, an update query, to update this. I probably should have put update queries on the prerequisite list. There it is: update queries. Watch that now too. Pause and come back later. Now I can join these two tables together and update this field based on that field. Let's see how that works. Close these. Create, Query Design. Let's bring in the order table. Let's bring in the customer table. Now it's going to try and join them by customer ID. Not yet. Not yet. It sees that the field names are the same. One's an auto number, one's a long integer. It's trying to make that relationship. We don't want to join them that way just yet. We're going to eventually, but not until we set up the customer IDs. How are we going to join these together? We're joining them by phone number, that's the field right now that matches in both tables. If I bring in orderID and customerID from over there and customerID from over there, and let's bring in the order date and the customer name just to see, I'm back and forth on both tables here. If you run that, there you go. There's the orderT customerID and there's the customerT customerID. They're joined together by the phone number, which is the same in both tables. Let's bring them both over here and you can see there's orderT phone number, customerT phone number. They're joined together, they should be the same. Now all I have to do is take that number and put it in that field. That's where the update query comes into play. Go back here, get rid of all this stuff. Right now these records are matched up one for one. Let's change this to an update query. I want to update this guy, so bring that down here. That's the field I want to update. What do I want to update it to? Whatever the value is in this guy. Then I got to type in, though, and it's going to be customerT. You need the name of the table because it's ambiguous. There are multiple fields called customerID, so you got to say customerT.customerID. Tab over. So we're going to update this customerID to whatever that value is. Now run the query. I have warnings turned off, so it doesn't say you're about to update five records. You might see that. It's a global setting you can turn off. I cover that in my blank database template video and my Access, I think Beginner 1, I'm not sure. Anyways, so now let's take a look at what's in the order table. Don't run this again. It won't matter if you run it again, but don't run the append query more than once, because if you do, you'll get multiple sets of data. The update query doesn't matter. Open this guy up now and look at that. It copied all those customer IDs. Jim Kirk is customer one. Jim Kirk, customer one. The customer table's got this stuff. Now I can get rid of the redundant data from my order table. I can come into the order table now and I can delete the customer name and phone number. Are you sure you want to delete those fields? Yes, I don't need them anymore. You can delete the order number too if you don't want it, unless you've got to relate this to something else like the order details. You do the same exact thing. You take the order details, split them off. You'll link them together by order number. You delete what you don't need. You copy over the ID just like I just showed you with customers. So for this example now, I no longer need this order number. I don't have any child records. Delete the index. That's fine. Save this, close it. I'll save this, this is my updateQ. If you're doing this just one time, like if you're converting your business or wherever you got all your Excel stuff, you're converting it over to Access, if you're just doing it once, no need to save these queries. If this is something you're planning on having to do a lot, then save these queries. Now here's my customer table, look at that, nice and clean. And there's my order table. Customer one, customer one, there's their proper orders. How do you bring this stuff together now? Another query. Create, Query Design. Bring in the customer data and then the order data. Now it will properly link with customerID. Bring in all these fields and all these fields and run it and there you go. There's Jim Kirk and his two orders. That is a properly normalized table, with the exception of maybe splitting these up into first and last name, but again, that's optional. You don't have to do that. I recommend it, but you'll be okay if you don't. Save this as my, let's call it customerXorderQ. If you want to learn a lot more about this stuff, my Access Expert Levels 1 and 2 classes cover all these different relational database concepts. I start off in Expert 1. We cover the basic relationship types, foreign keys, primary keys, all that stuff: one-to-many, one-to-one relationships. Then, in Expert Level 2, I spend a lot more time talking about database table normalization. I give you a lot more information than I just gave you in this video. We talk about global relationships, setting up referential integrity, preventing orphaned records, all this stuff, all the cool stuff that the cool kids know. Access Expert Levels 1 and 2. If you really want to learn about relationships in Access, I got this thing called the Relationship Seminar. It's eight hours long and tackles all this different stuff, all these different join types, self-joins, reverse relationships, you name it. You'll find links to all this stuff down below in the description below the video window. If you want to learn more about normalizing data in your tables, the extended cut for members is 50 minutes long, almost an hour. I go over three different examples. These are things I see from people all the time. The first example: instead of just one field, like a phone number, that you can make unique to each customer, what about if you have to take like an address with multiple fields? You might have two Jim Kirks, for example, but this one's on this address, and this one's on a different address. So we can use all of these address fields to normalize that table. The second example is one I see constantly. It's when you've got the parent information and the child information in the same record. You got first name, last name of the parent, and then child one, child one's date of birth, child two, child two's date of birth, child three, and so on. People do this all the time with products and services. You got an order, product one, product two, product three, product four, product five. No, don't do that. If you got to take this and import that into Access, I'll show you that in this example too. The third example is a lot more complicated. This is for my ABCD customers, my Access Business Contact Database. That's where you've got a customer, but you've also got a billing address, a shipping address, home phone, cell phone, work phone, and you want to break all that down into multiple tables. You've got a separate table for addresses, a separate table for phone numbers. So you don't have phone one, phone two, phone three, four, and five. You put each phone number in a separate related record. Same thing with addresses. You might have three, four, five different addresses for them: home, work, billing, shipping, etc. Doesn't matter. That's why it goes into a related, secondary table. All this is covered in the extended cut for the members, 50 minutes long. Silver members and up get access to all of my extended cut videos. I think we're close to 240 of them now. There's a lot of videos to watch. Gold members can download these sample databases. There's a few different databases for this one. I'm tired now. This took me like four hours of my day to record all this. I hope you guys learned something, and we'll see you soon. How do you become a member? Click on the Join button below the video. After you click the Join button, you'll see a list of all the different membership levels that are available, each with its own special perks. Silver members and up will get access to all of my extended cut TechHelp videos, one free beginner class each month, and more. Gold members get access to download all of the sample databases that I build in my TechHelp videos, plus my code vault, where I keep tons of different functions that I use. You'll also get a higher priority if you decide to submit any TechHelp questions to me, and you'll get one free expert class each month after you finish the beginner series. Platinum members get all the previous perks, plus even higher priority for TechHelp questions, access to all of my full beginner courses for every subject, and one free developer class each month after you finish the expert classes. These are the full length courses found on my website, not just for Access too. I also teach Word, Excel, Visual Basic, and lots more. You can now become a Diamond Sponsor and have your name or company name listed on a sponsors page that will be shown in each video as long as you're a sponsor. You'll get a shout out in the video and a link to your website or product in the text below the video and on my website. Don't worry, these free TechHelp videos are going to keep coming. As long as you keep watching them, I'll keep making more and they'll always be free. QuizQ1. What is the main purpose of database normalization?A. To make databases easier to read B. To reduce data redundancy and improve data integrity C. To create more tables for the sake of organization D. To speed up database performance Q2. What does redundancy in a database refer to? A. Using the same auto number in multiple tables B. Unnecessarily duplicating information that should be in separate related tables C. Having too many tables in a database D. Storing unused data in the database Q3. What is entity integrity in an Access table? A. Each record must have a value entered for every field B. Table fields must only allow numbers C. Each record has a unique identifier, usually an auto number D. Each table must have a primary key that is always a text field Q4. What is referential integrity? A. Ensuring no duplicate field names exist B. Making sure each field is properly indexed C. Ensuring the many side of a one-to-many relationship is never missing its parent D. Having no primary keys in the database Q5. What is the 1st normal form in database normalization? A. Ensuring every table only has number fields B. Keeping all fields together in one table for simplicity C. Making sure there are no repeating lists of items in any fields D. Always splitting first and last names into separate fields Q6. When splitting data from Excel to Access, what is a recommended first step before importing? A. Add extra columns with redundant data B. Ensure at least one field uniquely identifies each record you wish to separate C. Delete all duplicate records D. Convert all data to text format Q7. What is the purpose of an append query as discussed in the video? A. To delete duplicate records B. To transfer data from one table to another, possibly removing duplicates based on unique fields C. To generate reports from your data D. To update values in a single table Q8. Why is it important to have a unique identifier (such as phone number) when importing customer data? A. To make the table look cleaner B. To ensure primary keys can be manually assigned C. To avoid adding duplicate customer records D. To create more fields in the table Q9. In second normal form, where should non-key fields not directly related to the primary key be moved? A. Kept together with all other data in one table B. Deleted from the database C. Moved to a different, related table D. Converted to lookups Q10. What is the process called when you intentionally copy data from one table to another for convenience, even if it causes redundancy? A. Denormalization B. Referential integrity C. Entity integrity D. Normalization Q11. After splitting customers and orders into separate tables, how should you link an order to its customer? A. By matching their phone numbers only B. By entering the customer name manually for each order C. By including the customer's ID as a foreign key in the order record D. By duplicating all customer data in every order record Q12. What is usually done with redundant customer name and phone fields in the order table after customer IDs have been assigned and linked? A. Leave them in the table for future reference B. Delete them to remove redundancy from the order table C. Update their values to NULL D. Change them into lookup fields Q13. If someone has a spreadsheet with parent and child information in the same record, what is the recommended approach? A. Leave all child info together in the parent table B. Split the parent and each child record into separate related tables C. Add more columns to accommodate extra children D. Use only the first child's information Q14. What is the advantage of splitting fields like addresses and phone numbers into related tables? A. It keeps the main table small and efficient, and allows multiple addresses or phone numbers per customer B. It creates unnecessary complexity C. It makes it harder to run queries D. It increases data redundancy Q15. What Access feature ensures that related tables are linked in a way that prevents orphaned records? A. Indexed fields B. Referential integrity C. Data Macros D. Lookup wizards Answers: 1-B; 2-B; 3-C; 4-C; 5-C; 6-B; 7-B; 8-C; 9-C; 10-A; 11-C; 12-B; 13-B; 14-A; 15-B DISCLAIMER: Quiz questions are AI generated. If you find any that are wrong, don't make sense, or aren't related to the video topic at hand, then please post a comment and let me know. Thanks. SummaryToday's video from Access Learning Zone focuses on an important foundational concept for Microsoft Access databases: proper normalization. Normalization is a term you often hear in database discussions, but it can seem abstract until you truly understand what it involves. Once you grasp it, you'll see its value whenever you work with Access.This topic comes from a question by Leslie, who is transitioning data from Excel to Access. The issue Leslie faces is quite common: in her Excel sheet, customer and order data are mixed together. Customer information like name, phone number, and address is repeated every time a customer places an order. This not only leads to redundancy but also makes data management cumbersome and error-prone. So, what is normalization? At its core, normalization means organizing your data across multiple tables to eliminate redundant information and enforce data integrity. Rather than having a single table with repeated data, you should break your information into appropriately related tables. In Leslie's case, this means having a customers table and a separate orders table. One of the main goals of normalization is to remove redundancy. In Leslie's Excel sheet, the same customer's name and contact information appear multiple times if they have more than one order. This repetition is unnecessary, especially since this information seldom changes between orders. By moving the customer information into its own table, you only store each customer's details once. Data integrity is another vital aspect. In a well-normalized database, each record is uniquely identified, typically using an auto number in Access. This is called entity integrity. For instance, each customer gets a unique customer ID, and each order a unique order ID. Referential integrity ensures that every order references a valid customer and that there are no orders pointing to non-existent customers. Access provides features to help manage these relationships. You might also hear terms like "normal forms," which describe various levels of normalization. The basics you need are first normal form and second normal form. First normal form says that every field in your table should contain only a single value, not a list of things. For example, instead of having a single field listing multiple products purchased, you should have an order details table where each line represents a single product for an order. Second normal form takes this a step further, ensuring that all non-key information in a table depends only on the primary key, not on part of a compound key or something else. Access users sometimes wrestle with how granular to get with fields like names and addresses. Best practice is to break them into components (first name, last name, address, city, etc.), but how far you go depends on your specific needs. If you're starting fresh, it's a good idea to split these details at the beginning, as it's much easier to combine fields later than it is to split them apart. If you already have data in one combined field, you could use additional tools or queries to separate the information, although it's more work and prone to errors. To illustrate, let me walk through how you might import a flat, denormalized Excel table into Access and normalize it. The general process looks like this: 1. Import your Excel data into a temporary table in Access. This table will still have a mixture of order and customer data. 2. Create a new customers table, with an auto number as the primary key and fields for customer information. 3. Use an append query to add unique customer records from your import table into the customer table. To prevent duplicates, pick a field like phone number or a combination of fields to act as a unique identifier or index. Access will skip any records that would violate this index. 4. Once your customers are in their own table with unique customer IDs, add a customerID field to your orders table. 5. Use an update query to fill the customerID field in your orders, matching based on the unique field you chose (like phone number). 6. When the customerIDs are properly linked, you can remove redundant customer fields from the orders table, leaving only the reference to the customer table. Once you separate your customers and orders, you can use queries to bring the data back together when needed. For example, you can join the tables on customerID to view order histories for each customer. These concepts also extend to splitting other kinds of related data. The same methods apply whether you're looking at parents and children, companies and employees, or other situations with related information. Sometimes, you may need to decide how best to identify unique records. Perhaps phone number is not reliable, or there might be minor inconsistencies you have to clean up manually before importing. The uniqueness of your customer (or any other entity) is key to a normalized design. After setting up and relating your tables, you can safely remove temporary or redundant data from the tables. All future updates and data entry can be managed through related tables and forms in Access, keeping your database organized and efficient. For those interested in furthering their understanding, my Access Expert Level 1 and 2 classes go much deeper into these concepts. The sessions cover relational database fundamentals, primary and foreign keys, and best practices for setting up all kinds of relationships, including referential integrity and more advanced normalization. If you're eager for even more, I offer an in-depth Relationship Seminar that explores dozens of relationship types, such as self-joins and more. The extended cut of today's lesson is nearly an hour long, covering additional real-world normalization examples. These include scenarios where unique identifiers aren't obvious, tables where parent and child items are mixed in one record, and cases where customers have multiple addresses or phone numbers requiring even more tables and relationships. All these are practical issues that many users encounter when moving from Excel to Access or when improving existing databases. Members at the Silver level and above have access to this extended content, as well as a growing library of other advanced tutorials. Gold members can also download all the sample databases I create, access a comprehensive code vault, and get priority help. Platinum members receive everything plus access to all beginner courses and one free developer class per month. No matter your membership status, I want to assure you that my standard TechHelp videos will always remain free. As long as you keep watching, I'll keep making them. If you would like the full step-by-step video tutorial that walks you through each of these processes, you'll find it on my website at the link below. Live long and prosper, my friends. Topic ListWhat database normalization is and why it is importantReducing data redundancy in Access tables Improving data integrity through normalization Identifying redundant data in Excel spreadsheets Understanding entity integrity in Access tables Understanding referential integrity and orphan records Difference between normal forms and Access forms First normal form: eliminating repeating groups Breaking down fields into atomic parts When to split names and addresses into multiple fields Second normal form: moving unrelated fields to separate tables Using denormalization in specific situations Preparing Excel data for import into Access Identifying unique customer records by key fields Cleaning and standardizing data before import Importing data from Excel into Access Renaming fields during data import Letting Access create auto number primary keys Creating a customer table with unique index Setting unique index on customer phone numbers Using append queries to populate customer tables Handling key violations during data import Adding foreign key fields to establish relationships Using update queries to link tables by key fields Removing redundant fields from order data Validating results of normalization process Creating queries to join normalized tables |
||||||||||||||||
|
| |||
| Keywords: TechHelp Access normalization, normal forms, standard, normalized, relationships, key fields, how to normalize data in access, step by step normalization example, separate customers from orders, fix non relational PermaLink Normalizing Data in Microsoft Access |