|
||||||
|
|
AutoNumbers Good or Bad? By Richard Rost Are AutoNumbers Good or Bad as Primary Keys in Access? I hear a lot of criticisms and arguments against using AutoNumbers as the primary key field in Microsoft Access databases. In this video, I will examine exactly what AutoNumbers are, when they should be used, when you might not want to use them, what those arguments against them are, and of course, my counters to those arguments. Craig asks, "I'm taking a college course and the instructor gave us a homework assignment. He wants us to create a table to store vehicles where the Primary Key is the VIN. I thought you said to always use AutoNumbers. Who's right here?" MembersThere is no Extended Cut for this video Topics Covered
ConclusionAutoNumbers are easier to use than natural keys or text. Access maintains them automatically and they provide a more stable database than Natural Keys. Generally, AutoNumbers save disk space and increase database performance. They decrease total cost of ownership. Other options are available to provide the user with visible identifiers for records, such as order numbers, so as not to give up business intelligence, however an Autonumber should be the primary key for MOST tables you build, with few exceptions. Side Videos
References and Research
IntroIn this video, we talk about whether using AutoNumbers as primary keys in your Microsoft Access tables is a good idea or not. I cover the differences between natural and synthetic keys, explain how AutoNumbers work, discuss the pros and cons of using them for primary keys, and address common arguments both for and against their use. We'll also look at scenarios when you might not want to use AutoNumbers and how they impact performance, relationships, and database maintenance. If you've wondered whether to use AutoNumbers or another field as your key, this video has the answers.TranscriptWelcome to another TechHelp video brought to you by AccessLearningZone.com. I am your instructor, Richard Rost.In today's video, we're going to take a look at auto numbers. Are they good or bad? Should you use auto numbers as the primary key in your tables? Today's question comes from Craig. Craig asks, I'm taking a college course and the instructor gave us a homework assignment. He wants us to create a table to store vehicles where the primary key is the VIN, the vehicle identification number. I thought you said to always use auto numbers. Who's right here? Well Craig, I recommend you do what your instructor wants. I don't want to be responsible for anyone failing their class. However, in my professional opinion, and I've been doing this for 26 years now, I believe it's almost always best to use an auto number as the primary key field for just about every table. Yes, I would definitely use an auto number for this particular table. It's better for relationships and for lookups. You can and should definitely store the VIN in a table as a secondary key. You can index it, no duplicates, to search on it and to make sure that you don't enter the same car twice, but I would not use this as the primary key. It's a very long value. It's text, which is not optimized for lookups, and it will take up a lot of space in your database, especially in the foreign key fields and child records. If you link this car to something else, like a trip table, that you log which car took the trip and you've got 5,000 trips logged, now you have to copy that VIN 5,000 times. We'll talk about that in a minute. So, with my professional opinion and with my experience, I can definitely say that you're almost always going to want to use an auto number in a table like this. Let's take a look at why. First, before we understand auto numbers, let's understand what key fields are. A key field is a field in a record that holds unique data, which designates the record from all other records in the table. Things like invoice number, account number, customer ID, serial number, product code. You can use phone number or email address as a key field. I used to build databases back in the 90s and people used to always use phone number to look them up. They would use it as a key field and that's okay. You can do that, but in my opinion, it's not the best. Now, a primary key is a specific kind of key field. Most tables should have a primary key, which is the one field that is used for relationships and to identify the record. A record may contain multiple key fields, but only one can be the primary key field. So you might have phone number and email and those values might be unique. You may say in your database, I only want one phone number or one email. It's not a proper customer record, but you should have a customer ID as the primary key. That's the field you're going to use to make the relationships between your tables in your database. Now, a foreign key field is used to relate to the other table. When relating two tables together, the foreign key is the child table that points back to the primary key in the parent table. So here in my silly little picture, you can see on the left are customers and on the right are vehicles. Make the customers have ID fields that are primary keys, 101, 102, and 103. You store those values as the foreign key in the vehicle table. So each of the vehicles has a primary key, which is, again, an auto number one to six, but that foreign key is the number that you store from the customers. If you were to use VIN as your instructor wants for each of those vehicles, you'd have lots of very big values stored in that table. Big equals slow, so keep that in mind. Now, there are two kinds of key fields. There are natural keys and synthetic keys. Natural keys are found in the real world. You get it from somewhere else usually. An invoice number, an account number that someone else gives you, a serial number. You could be typing in products and the manufacturer puts a serial number on it. That's found in the real world. A product code, a phone number, an email address, a social security number, a VIN number, a flight number. Those are all natural keys. You can use them in key fields if you want to, but it's something that has meaning to the user. Whereas a synthetic key is system generated. An employee ID, a customer ID, product ID, all right. There are lots of names for these if you do reading on different sites or if you have different books. They're also called surrogate keys. That's the most popular as a surrogate key. Blind key, factless key. It's about 15 different synonyms that I found. Again, you can use both in the same table if you want to. For example, if you've got a table called employee T, which has one employee per row. You could use social security number as a key field. That would be a natural key. But I would recommend adding your own auto number called employee ID, which would be a synthetic or a surrogate key. Why? We'll find out. Now, what exactly is an auto number? An auto number stands for automatic number. Access assigns these numbers and maintains them. It's a synthetic key. It's the perfect choice for a primary key. We'll learn why in a few minutes. They normally start at one and count up. They can be random. There are some different options available. They are normally, but not always, sequential. So they're usually in sequence, but they might not be. The point is that it shouldn't matter to you. Access maintains it. Access is the only one that cares about it. Why use auto numbers? Well, it makes creating key fields simple. Access creates them and maintains them for you. You really have almost nothing to do to keep a primary key in your table. It's easy to create relationships between tables because Access maintains those keys. You never have to worry about auto numbers changing. A natural key might change. Someone's phone number might change. Their email address might change. A VIN number shouldn't change unless you take it to the right body shop, just kidding. But natural keys may change in the real world. Then your customer says, I have to change my phone number in your system. Well, if that phone number is a key field, you've got it linked to all these different tables. Now you need coding in your database to automatically update all that stuff. Yes, the expert Access users are saying, cascades, updates, take care of that for you. No, not if you have linked tables. That's a whole other ball game. Auto numbers are best for keeping stability in your data and your backups. If Joe Smith changes his phone number and that's your primary key, and now you have to restore from a three day old backup, he didn't change it then, so now you've got problems. Auto numbers will take care of that. They never change. This customer will always have the same auto number. Here's some math for you. A long int, which is what an auto number is, the long integer, it's smaller than text keys. A long int is four bytes, so it goes from minus to plus about 2.1 billion. A text key of the same size would take 10 bytes, so that's more than twice the size. Smaller equals less overhead equals faster. You always want your databases to be as small as possible. They will run quicker, they'll process quicker. If you're paying for time on a server where you pay for either throughput or storage space, that's important. You get quicker processing times, faster lookups, faster transmission times over the network. It takes up less disk space. Keep in mind the space saved in your foreign keys too. If you're using VIN number as your primary key for a vehicle, I mentioned this before, it's a big number, 17 characters long for vehicles made after 1981. So that's huge for a primary key. If you're storing trip information and the car is used for the trip and you've got 10,000 trips, that's a lot of data that you're wasting. An auto number would save you some space there, which saves processing time. Another reason to use auto numbers, they upscale nicely, pretty much automatically, to SQL Server and other database server platforms. If you move beyond Access for your backend tables, they create identity or sequence type fields in SQL Server 2012 and later. A common key strategy across all of your tables can reduce programming time, costs, and make your systems more compatible. If all of your tables are based on auto numbers, everyone in your team knows that they need to handle auto number lookups, relationships, etc., a certain way. You don't want your different tables having different types of primary key fields. If you've got multiple people working on your database, designing it and developing it, they have to know, What is the primary key for the vehicle table? What is it for the customer table? What is it for the auto table? Your team knows every table has its own ID field and it's an auto number. This might not seem like a lot, but it is. If you confuse your programmers, and this is mainly a factor for a larger database that has multiple people working on designing them, but it's a factor. It's something to take into consideration. Auto numbers protect you against changes in business decisions. I don't know how many times I've had clients say, management wants to do this now, and you've got to change the database. If your employee number is numeric, you merge with another company that has a text-based employee number, like an initial on there or something. Now you've got to merge both databases and deal with every related table in your database a different way. By having auto numbers, even if company A has a set of employees and company B has a set of employees, as long as they've all got auto numbers, you can very easily change and merge things. You might have some overlapping conflicts. If Joe from company A was customer 15 and now Sue from company B is customer 15 or employee 15, someone's going to have to change their number. But chances are you might have fewer overlaps and your scheme will stay the same. That's what's important. All your forms and reports that are designed for a number will now still work. Natural keys do not always stand the test of time and can sometimes become meaningless or have weird meanings. Look at stock tickers. Google used to be GOG in the stock market, but now the company was officially renamed on paper to Alphabet Inc. So their stock ticker doesn't mean what the company is. Natural keys can easily be confused with each other if they look alike, another reason not to make them human readable. Again, GOG is good for the stock ticker and that's fine, but don't use it as your auto number. Don't use it as your primary key. So, when not to use auto numbers. Don't use an auto number if you need a value with a real world meaning. If it has to make sense to someone, if it has to be human readable, then don't use an auto number. But chances are you're not going to be using that value to make your relationships. If you need sequential numbering, don't use an auto number. If you have to make sure that your invoices go from 1 to 10, don't use an auto number for that. You can still have that kind of a number in addition to your auto number, but you don't want to rely on an auto number to always be sequential. They can leave gaps if they're deleted. If you delete record 5, now you've got 1, 2, 3, 4, 6, 7, 8, and so on. Don't use an auto number for any value the customer will see. We'll talk about the German tank problem in a few minutes. Auto numbers should be meaningless to the end user. Let me reiterate that. Auto numbers are meaningless to the user. That customer ID, that invoice ID, the product ID. They should not care about what those are. Neither should you. Those are for internal purposes only. If you want a special customer ID to give to the customers, they can see it on their invoices. Give them a separate ID. Give them a customer code. I've got videos to talk about how to make customer codes and sequential invoice numbers. I'll give you links to those below. Auto numbers should not be exposed to customers in situations where you have possible security concerns. Customer information, for example. We'll talk about this in a second. The folks over at the Access web will shout, Do not use the auto number if the field is meant to have meaning for the users. It's the seventh commandment. Access cares about auto numbers and uses them for lookups and relationships. That's it. You and your end users should not have to see them. If you need a number the user or the customer will see, then create your own secondary field, either synthetic or natural. It's up to you. Now, sequential numbers like an auto number can potentially give away intelligence, like how many customers you have or how many orders you've taken. It's just like in a restaurant. If you've got one guest check and Joe comes in on Monday and gets check number 100, and then comes back on Friday and gets check number 104, he knows you've only had three customers in the middle there. That might not be something you want him to know. If you have a new client and they get customer ID 12, they'll know that you don't have that many clients. You might not want to have that information visible. This brings up the German tank problem. In World War II, the Allies would capture or destroy German tanks. They could deduce the total number of tanks that the Germans were producing based on the serial numbers they found on the components like the gearboxes. So if their intelligence and their spies said the Germans had 20,000 tanks over here and 15,000 over there, but the serial numbers that you're capturing only indicate a few hundred, there you go. They played all kinds of games during World War II. They would set up fields with wooden boxes to look like tanks, so that when the planes flew over, they would see these fields of tanks lined up and they were just decoys. But the statisticians looking at the serial numbers said, nope, this is what they're doing. So auto numbers should never be shown to the customer if you're concerned about this kind of business intelligence leaking out. I know I teach my beginner classes and I tell the students to put the auto numbers on things like invoices and such, but that's just for the beginner students so that I don't get them all confused with having to hide numbers and stuff. I want them to grasp the concept of why we need an auto number. Then, in my later classes, I talk about this. I've got a couple side videos I'm going to reference. These are free videos, of course. One is how to create your own custom sequential automatically incrementing number with the DMax function. If you need order numbers, in this particular example, the customer's accountant said, I need all of your invoices to not have any gaps. They have to start at one and count up. You don't want the customer to see that; you might need that internal numbering for your accounting purposes. You can obfuscate or hide the value if you want to. You could do tricks like multiply it by three and print that as the actual invoice number so the customer doesn't know. In fact, in my members only video, I show how to do it so that each customer gets their own sequential invoice numbers. So XYZ Corp will see their invoices as 101, 102, 103, 104, just for that. You'll know how many invoices that you've made for just them. My second side video, again, a free video on YouTube and my website, is how to create your own secondary key field not based on your auto number. For example, in this class, I do a C followed by a date timestamp, like year, month, day, followed by a random number. So the chances of getting two of the same customer number that are exactly the same are infinitesimal. You could base it on customer information like the first initial, the last three letters of the last name, or whatever you want to do. It's totally up to you. This would be the value that you'd show the customer. You don't want to show the customer your customer ID. This video talks about how to do it. I'll put links to these videos in the description below. Now, there are some instances when you don't have to use auto numbers. You can, but there's no need for it. For example, tables that are never going to relate somewhere else. If you've got a simple little table just for lookup values and you're never going to make relationships with other tables, sure, fine. You can make the table without putting any auto numbers in it. Temporary tables. I make temp tables all the time. Like, I just want a list of email addresses to send this particular email batch to. I'll create a temp table with a make table query. I'll just dump the emails in it, the first names, and then just send it. No need to put an auto number in that table because the data is not going to stick around very long. The third option is if you're incredibly pressed for storage space and a natural key is available, like social security number, although I don't find this a particularly valid reason, some people do make the argument. If you've got an extremely tight amount of space that you're working with and you've already got a natural key in that table and you're not going to be relating it to a lot of other tables, okay, fine. You can get away with not having to have an auto number, but most of the time you're going to want to add one anyway. Now, there are a bunch of arguments against auto numbers, and I'm going to address all of them. I actually spent about two days researching this online and in different books, and I've tried to find all the different arguments that people make against using auto numbers and I wanted to address them all. Honestly, I've been teaching Access for over 20 years. I've been using it as a developer myself since 1994, and I've always just assumed that auto numbers were the proper solution because that's what Microsoft teaches in their books and all their software. That's what the various authors that I've read said. I've read, I don't know how many books on databases, and the prevailing wisdom has always been using auto number. Well, now, 26 years later since I started first building Access databases, and as an expert, I wanted to do the research myself to see what all these arguments are. What do people say are the reasons why you shouldn't use auto numbers? The first one is natural keys allow you to see information about the data without multiple fields. Auto numbers are not human readable. For example, the invoice number 222 and Y1023. You can tell this customer is number 222 from New York. That's the invoice number, and it's based on the customer number, the state, and then I'm assuming some random number or maybe a sequential number. The argument there is that you can see information, but my counter is that you shouldn't have to. Store that invoice number as a secondary field, but don't use it for relationships, and again, space, size, all those things matter. That's a pretty big number, so I would not use it as a primary key field. What if you've got to store order details, the line items in a second table, and again, you're making the relationship with that big number. The next argument: composite natural keys are better than auto numbers. A composite key is a key field made up of two or more other fields. For example, last name plus first name plus date of birth plus state. It's human readable, it gives the sales rep information, but it can also give away information like the customer's date of birth. They might not want that known if you put it on an invoice, and it is rare but quite possible for two customers to have the same name, the same date of birth, and the same state. It's possible. You want to think of the rare instances where things might happen. In this particular example, I would not use that. It gives away information, and you don't want something like this to be human readable. Plus, that's going to be a pretty big field, and again, it's going to make your database bigger. Let's say you're using a composite natural key for your employee email addresses. Your email address is first name plus last initial plus the year they were born plus @xyz.com. Now, it's possible to hire two people with similar names born in the same year, and if the customer figures this out, now he knows how old your sales rep is. Now he knows his sales rep was born in 92 or whatever. As you'll find, I'm not a big fan of natural keys. Next argument: phone numbers or email addresses are easy to use and look up the customer. I don't know how many times I walk into a store like your local pharmacy and they're like, what's your phone number so I can look up your stuff. Sure, have those as secondary keys. Store that data, but you don't want to use those values in your database to do the relationships between the tables. You can even index that field with no duplicates if you prefer, so you can't have two customers at the same phone number. However, these values might change in the future and they're not the best choice for primary keys. They will take up a lot of extra space, and again, if the customer changes their phone number, now you have to go updating multiple tables. Argument: adding an auto number when I already have another field is wasting space. Yes, but auto numbers are tiny and with today's computing power, the low cost of storage, high speed networks, the tiny amount of overhead for a four byte long integer is negligible. The benefits far outweigh the costs. It's easy to add and there's zero maintenance on an auto number. Let's say you're using the Social Security number as a primary key. That's a number in that well-known format. You need nine characters or nine bytes to store it. You want to store the SSN as a text value because it's got a leading zero and foreign workers may have letters in their SSNs. I didn't know that until recently actually. So you add an auto number as the primary key. Now you add four bytes to each employee record, increasing the total size to 13 bytes per employee. So, nine bytes to store the SSN and then four more bytes to store the auto number. Now, let's take related tables into consideration. We've got 10 employees who clock in and out once a day. Over a year, that's 7,300 records. If you're using just the SSN as the foreign key in the time clock table, that's 65,000 bytes of data versus only 29,000 bytes with the auto number, and this is just one field in one table. In other words, you will save space in the long run by storing more information in the employee table, and auto number in addition to the social security number. So, each employee takes up more space, but if each employee is linked to a time clock table with 7,000 records in it and now you're not storing the social security number in the time clock table, you're storing the auto number, so you will save space in the long run. It's those foreign keys that take up the space, and that's why they're better for relationships. It's nine bytes to store every social security number, four bytes to store every auto number. That's a five byte overhead per table in your time clock table. So the math works out. The next argument: If I use referential integrity with cascade updates, I don't have to worry if my natural key changes, Access updates the child tables automatically. So if the customer does change his phone number and my tables are linked with referential integrity with cascade updates set to on, Access will handle changing all the child keys. That is true, however, you cannot enforce referential integrity on linked tables. If your database grows to multi-user or you upscale to SQL Server, you're in trouble. You have to do that update yourself. As soon as you split your database and your tables are in multiple files, you lose that ability to use referential integrity with either cascade updates or cascade deletes. So if you have any kind of a multi-user setup, forget it. So this argument is invalid. Another argument: auto numbers are not good for database replication and multiple users. The argument is if you want to take a database and have a remote user who's not connected to the internet, go and add records, and then come back. Now you've got auto numbers that you have to try and merge together. That's true. That's not good with incremental auto numbers. However, in this case, you can use a random auto number or if you need even more protection, you can use a UUID or a GUID. I'll talk more about what these are in a second. However, in today's world, most people have a constant internet connection and these types of connections are rare where you have to split your database and give it to someone to use offline. There are some exceptions. In fact, I've got a video on how to do this. I just made it. I'll give you a link in a minute. If you've got someone who wants to be able to take their database on the plane or on a remote site, maybe in some other country where they don't have great internet service, you can make a replicated database and then merge the records back together again using random auto numbers. More importantly, you would have the same problem even with natural keys. It's difficult either way to bring back two edited databases and manage the conflicts. The auto number isn't the problem here. It's bringing back that information and syncing it up. You can use, again, random auto numbers or GUIDs to alleviate this problem. I've got a free video, how to use random auto numbers to create multiple copies of your database for users to take offline. You can then re-sync the data. This is great for airplane travel. In fact, this specific database came about because one of my customers was like, I travel a lot, I'm in the air a lot. The internet in the air is okay for web browsing or email, but it's not good enough for a constant connection to a database. It's very peckety. It's very high in latency. He goes to a lot of developing countries where he might be out in the middle of the wilderness with no internet service. He needs to be able to look up his information in his customer table and he has to be able to add records to the contact table and add orders. I showed him very easily how to create a secondary database that, with just a couple of table links and some queries, you can sync this information back and forth using random auto numbers. The auto numbers in the system like customers and stuff that are already in there won't change. If you add new stuff, it'll get assigned random auto numbers. Chances are, unless you've got a really big database with lots of volume, those numbers won't collide. You won't have any conflicts when you re-merge those two tables back together again. Let's take a quick look. If you use random auto numbers, you get a four byte long integer. We already talked about this. That's four billion possible combinations of numbers. If you added one record every second for a whole year, you would still only have a one in 136 chance of running into duplicate values. Assigning new values as random auto numbers pretty much guarantees you're not going to have any conflicts. You can have two people using the same database, adding orders, adding contacts. They get the same primary key, which is their customer. That's the same for both of them. They can add new stuff as a foreign key and not have to worry about conflicts. If that's not good enough for you, you can also use something called a UID, a universally unique identifier, or a globally unique identifier. Microsoft's term is a GUID. It's a big number. It looks like that. It's 128 bits. The chances of having conflicts are infinitesimal. According to GUIDgenerator.com, if every human on Earth generated 600 million GUIDs, there would only be a 50 percent probability of a duplicate. I didn't do that math. They did. That's why Microsoft uses those. The argument: if you need to merge two tables together, auto numbers are impossible to work with. It's tough, but it's not impossible. You need to change the auto numbers in table 2 so they don't conflict with table 1 if you don't want to use random auto numbers. Of course, the same problem as with replication. In the previous example, you can use update queries to adjust those foreign keys. The bottom line is you have the same problem with natural keys too. This isn't just an auto number problem. When I use auto numbers, you can't use auto numbers because if you bring together multiple sets of data, that's a problem. But it's not an auto number problem. It's a problem with keys in general. Here's a big one. This argument says, if you accidentally delete a record or your table becomes corrupted, you can't get auto numbers back and have to renumber all the child records. If somehow someone accidentally deletes customer 5, well, 5 is gone. Well, that's simply not true. Now, I teach people in my beginner classes that once you delete an auto number, it's gone forever because I don't want to confuse them. There is a way you can get those back. While it's generally taught that you cannot reuse auto numbers, it is possible to use an append query to reinsert deleted records. This is side video number 4, a free video, how to restore a deleted record with an append query. It is possible to get auto numbers back. You don't have to renumber all the child records. If someone accidentally deletes customer 5, or your database gets corrupted and you realize you're missing some records, the first and most important thing is to always make sure you have a good backup of your data. That being said, if your records get deleted or corrupted, you can look in your backups and see who was customer 5. Now you've got his information. I show you with an auto number, you can get that auto number back by using an append query to copy it from another table. Restore your backup into a separate table, delete everything you don't need, and then you can append those records back into your primary table and they'll be assigned the original auto numbers. So all your links are good, all the child records are still safe, and you don't have to go renumbering stuff with append queries and update queries. This was the biggest argument that I found. If I delete some records, then, first of all, don't delete records. Never delete anything, especially customers. Just mark them inactive. Orders, mark them canceled. But if stuff gets accidentally deleted, yeah, you can restore it. There's a way, so this argument is invalid. Here's an argument that's actually true. Auto numbers can add another index to the database, which can slow down edits, inserts, and deletes. This is true. I cannot argue against it. If your table is primarily used for data entry and you do very little searching and sorting on it, you may not want to add an auto number if another natural key is available. In other words, an auto number adds another index, which is a separate little hidden table that Access uses to speed up searches and sorts. When I say, Go give me customer 136, that index is already sorted numerically, so I can quickly find 136 and look up who it is. If you're doing lots of searches and sorts, you want this index table. The downside is that every time you edit or create a new record, it has to update the index table. If your database is primarily used for data entry, like a room full of people doing customer service constantly adding records, and they're never searching and pulling up old records, then you might want to avoid adding an index. It won't slow down the database by updating index tables. That is one argument that's actually valid, but it's a very rare case where you actually have to have this. Another example: a table on a web server that tracks hits by user. The user is identified by some field, let's say ID or email or whatever, and all you do is store that ID plus a timestamp. You download this information monthly and you generate a report from it. You don't want to slow down the web server with every addition for your report that's only run once a month. Every hit on the website, it tracks the user. But this isn't an auto number problem. Again, it's just a problem with having a primary key field. You'd have this problem with a natural key too, so this is not an argument against auto numbers. Any primary key would be unnecessary in this case since you're not planning on relating the records to another table. A lot of the times, when people are railing against auto numbers, they're actually railing against primary keys. It has nothing to do with what kind of primary key field you have. Another argument: you can't always assume that the most recent record has the highest auto number. That is true because records can be deleted and those numbers technically can be reused, so you should not make this assumption. Always record a date/time value if this is needed. Again, it's not an auto number problem. It's a primary key problem. Don't assume when you're adding records that the highest auto number is necessarily the highest or most recent value, especially if you're using random auto numbers. Auto numbers may leave gaps in the sequence and they're not good for things like invoice numbers. That's true. The auto number should not be used like this. They're not supposed to be used for values that the customer is going to see or that your accountant cares about. Generate your own invoice number if you have to have sequential numbers. They'll leave gaps. That's how auto numbers are. That's not a reason to not use them. Use a different field and create your own invoice numbers. Another argument: auto numbers have to start at one. What if I want to use a larger number to start with? Once again, I will reiterate that shouldn't matter. Remember this: auto numbers are meaningless to the end user. They're not for you. They're not for the user to see. You shouldn't care what values they start at as long as they're unique. I get asked all the time, How do I start my auto numbers at 1001 so my customer doesn't know that they're customer number 15? Again, give them a different number to see. Don't show them your auto numbers. If you really want to artificially inflate your auto numbers, you can use an append query to do that, or just append into the table a value of 1001 and you'll get that, or add a thousand blank records and delete them. There are tricks you can play, but again, you shouldn't have to. No one should care about your auto numbers except Access. This is a valid problem with auto numbers. If you delete records at the end of the table and then do a compact and repair, the auto number index is reset and links to child records may be off. That is true. Be careful of this. For example, if you have invoices in your system, or orders in your system, and you delete the last five, and then you compact and repair, those numbers will be reassigned again. If you delete the last five records, let's say it's 95 to 100, compact and repair, you go back in, the next auto number being assigned is going to be 95. If you've got another, say, order detail item in there that has an order number of 95, it gets attached to this new invoice. That could be a problem. Again, don't delete orders. Mark them inactive. Delete those child records first. That's just laziness if you don't. If you compact your database, those last auto numbers get reassigned again. Be aware of that. That's just something that a good Access developer has to know. You may have this problem, too, even if you use natural keys or some other form of primary key. If you delete some records and somehow those numbers get reassigned, it could link to the wrong child records. People say, Access isn't perfect. I've seen it make mistakes with auto numbers before. That's true. Auto numbers aren't perfect. No database system is perfect. Mistakes will happen even with natural keys. Once again, that's not a problem with auto numbers. That's a problem with databases in general. Backup your data. If you have problems with your auto numbers, you can go back to your backups and look and see. I've got videos on backing up data, too. So in conclusion, auto numbers are easier to use than natural keys or text. Access maintains them automatically, and they provide a more stable database than natural keys. Generally, auto numbers save disk space and increase database performance. They decrease total cost of ownership. Other options are available to provide the user with visible identifiers for records, such as order numbers, so as not to give up business intelligence. However, an auto number should be the primary key for most tables that you build with few exceptions. Here's my list of references, some of the research that I did. I'll put links to all of this stuff below the video window. Thank you for watching. I hope you learned something. I had a lot of fun researching this information because I wanted to make sure that I was right. I wanted to make sure that auto numbers were definitely the best way to go. I hope I've made a convincing argument that you should use auto numbers in your tables too. Make sure you subscribe to my channel. Please rate this video, give it a thumbs up if you liked it. Post your comments below. Subscribe and ring the bell. Click that little all button and you'll get email notifications every time I release a new video. You can head over to my website too. I've got a pretty active Access forum there if you're interested in reading conversations on Microsoft Access. If you want to see your questions answered in videos like this, visit my TechHelp page. If you haven't already, feel free to take my free Access Level 1 course. It's on YouTube or on my website. It's a full three hour long tutorial that covers all the basics of how to build databases in Access, including using auto numbers. If you like Level 1, Level 2 is just one dollar. Thanks. I hope you learned something and we'll see you next time. QuizQ1. What is the primary benefit of using an auto number as the primary key in your tables?A. It simplifies relationships and lookups while minimizing storage overhead. B. It makes the key field human readable for users. C. It ensures all records have real world meaning. D. It allows every field in the table to be unique. Q2. What is the main drawback to using a natural key such as a VIN as the primary key? A. Natural keys are not unique. B. Natural keys are optimized for lookup operations. C. They are often large, text-based, and can consume more space in related tables. D. They are always numbers, which saves space. Q3. Which of the following statements about primary keys is correct? A. Every table must have a primary key. B. Only one field in a table can be the primary key, but other fields can still be unique. C. Multiple fields must be the primary key. D. A primary key should always be visible to the user. Q4. What is the difference between a natural key and a synthetic (or surrogate) key? A. Natural keys are system-generated, synthetic keys are human-entered. B. Natural keys are found in the real world and have meaning to users; synthetic keys are system generated and have no meaning to users. C. There is no difference. D. Synthetic keys should contain customer information. Q5. Why do auto numbers provide greater stability in databases compared to natural keys? A. Auto numbers can be changed easily if needed. B. Natural key values may change in the real world, while auto numbers do not. C. Natural keys are more likely to be short and efficient. D. Auto numbers are human readable. Q6. What is a significant advantage of auto numbers in reducing database size? A. They can store text values efficiently. B. Foreign key fields require less space when using a four-byte auto number versus a long text value. C. They require a separate lookup table. D. They allow duplicate records. Q7. When should you avoid using an auto number as the primary key? A. When you need a value with a real world meaning or for sequential numbers shown to the user. B. When you are using a text field in your tables. C. When your database is large. D. When you want to show users the auto number. Q8. If you want a sequential, user-visible number (like an invoice number) without gaps, what should you do? A. Use the auto number field shown directly to the user. B. Create a custom sequential field independent of auto number. C. Delete records to remove gaps from the auto number. D. Use random numbers. Q9. What security issue can arise from exposing auto numbers to users? A. Users cannot edit their information. B. Business intelligence such as number of clients or sales can be deduced from sequential numbering. C. Users will forget their auto number. D. Auto numbers can be copied and reused easily. Q10. Why is relying mainly on natural keys for relationships between tables considered risky? A. Natural keys can change, causing relationship issues and data inconsistencies. B. Natural keys are always too short. C. Synthetic keys cannot be indexed. D. Natural keys are always numeric. Q11. Which argument is valid against using auto numbers for some tables? A. They add an extra index, which could slightly slow down inserts in tables used primarily for data entry. B. They make lookups much slower. C. They require every field to be stored as text. D. They require you to delete records frequently. Q12. What is the German tank problem an example of in relation to keys? A. It shows how using synthetic keys enhances security. B. Exposing sequential keys can reveal sensitive business information, like total output or transactions. C. Synthetic keys make databases slower. D. Natural keys are always better for users. Q13. Which of the following is a reason auto numbers are suitable for upscaling to SQL Server or other database platforms? A. They are converted into identity or sequence fields, which are supported by those systems. B. They are always text fields. C. They do not require unique values. D. They can be merged with any field automatically. Q14. What should you do instead of deleting important records (like customers) in a database? A. Mark them as inactive so data integrity and keys are preserved. B. Delete records immediately to save space. C. Remove all keys. D. Hide the records without marking them. Q15. How do auto numbers compare to natural keys when merging data from offline or replicated databases? A. Merging is harder with auto numbers because of possible conflicts, but random auto numbers or GUIDs can largely solve this. B. Natural keys never have any conflicts. C. Auto numbers can never be merged. D. Auto numbers require converting to text first. Q16. If a natural key, such as phone number, changes, what problem might you face if it is used as a primary key? A. You must update all related records and foreign keys across the database. B. Nothing problematic happens. C. The database will automatically fix everything for you. D. Phone numbers are not allowed as keys. Q17. Why is adding an auto number even when you have a natural key not considered a significant waste of space? A. The storage overhead of a four-byte long integer is negligible compared to the benefits for relationships and space saved in related tables. B. Auto numbers take up gigabytes of space. C. Natural keys are always better anyway. D. Auto numbers are slower to access. Q18. In which situation would NOT adding an auto number as a primary key be acceptable? A. In small lookup or temporary tables not used in relationships. B. In tables used as parents in relationships. C. In high-transaction tables requiring referential integrity. D. In tables requiring unique, stable identifiers. Q19. What is a GUID, as discussed in the video? A. A Globally Unique Identifier, a 128-bit value with an extremely low chance of duplication, good for merged or replicated databases. B. A graphical representation of a key field. C. A sequential auto number assigned by Access. D. A text code generated by the user. Q20. According to the video, what is the single most important takeaway about auto numbers shown to users? A. Auto numbers should be meaningless to the user and never shown where business intelligence or security could be compromised. B. Users should see auto numbers for accountability. C. Auto numbers must always start at 1 for convenience. D. Users can edit their auto number if needed. Answers: 1-A; 2-C; 3-B; 4-B; 5-B; 6-B; 7-A; 8-B; 9-B; 10-A; 11-A; 12-B; 13-A; 14-A; 15-A; 16-A; 17-A; 18-A; 19-A; 20-A DISCLAIMER: Quiz questions are AI generated. If you find any that are wrong, don't make sense, or aren't related to the video topic at hand, then please post a comment and let me know. Thanks. SummaryToday's video from Access Learning Zone covers the topic of auto numbers in Microsoft Access. The main focus is the ongoing debate about whether auto numbers are a good choice as primary keys in your tables, or if they should be avoided in favor of natural keys such as a vehicle identification number (VIN). I'll share my professional recommendations based on over two decades of experience building and teaching Access databases.The question we're examining comes from a student whose college instructor requires the VIN to be the primary key in a vehicle table. I always suggest that you follow your instructor's directions to avoid academic issues. However, in practical, real-world applications, I almost always advocate using an auto number as the primary key for most tables, including situations like vehicle tables. Storing the VIN as a secondary, indexed field allows you to enforce uniqueness and search efficiently, but it is not well suited to being the main identifier for a record due to its size and data type. Let me explain the reasoning behind this. Understanding key fields is the first step. A key field uniquely identifies each record in your table. These may include invoice numbers, customer IDs, account numbers, and so on. Natural keys are values with real-world significance, like serial numbers or phone numbers. Synthetic keys, or surrogate keys, are generated by the system, such as auto numbers or employee IDs that the database assigns automatically. Having a primary key is essential in most tables, and the primary key is the main field used for creating relationships between tables. The foreign key on related tables points back to this primary key, forming the backbone of your relational structure. Using something like a VIN as a primary key forces Access to copy large, text-based values every time you relate records, consuming excess space and slowing performance. In contrast, an auto number, being a simple four-byte integer, is far more efficient. Synthetic keys are often preferable even if the table contains a natural key. For example, an employee table might include social security numbers, but the primary key should still be an auto number. Auto numbers are managed and maintained by Access; they start at one and typically increment, but the actual value and sequence should not matter. Access uses them to enforce integrity, connect related records, and speed lookups. Natural keys can and do change in the real world, which complicates relationships and maintenance. For instance, if a phone number or other piece of unique data that you're using as a key changes, it can create significant headaches as you try to update every related record. Cascading updates are not always available or reliable in all situations, especially when your database is split or upscaled to SQL Server. Using auto numbers as primary keys keeps your database structure simple and stable. They're minimal in size, boost performance, and keep the workload light on related tables. As your database grows or migrates to a more robust platform like SQL Server, it's much easier to work with auto numbers since identity columns translate well on the backend. Consistency among tables makes programming and database development far easier for teams, and it minimizes the chances of confusion or configuration errors. Auto numbers also future-proof your system against changing business decisions. Mergers, new requirements, or changes in data structure are far easier to handle when every table uses a generic, meaningless ID. There are situations when auto numbers should not be used. If the key needs to be human-readable or carry specific real-world meaning, use another field for that purpose, but still rely on an auto number in the background for internal operations. Never use auto numbers for values that must be sequential or fill regulatory requirements, such as invoice numbers issued to customers. Auto numbers can, and will, develop gaps if records are deleted; their primary purpose is not to generate visible, ordered sequences for users. Instead, create a separate field and manage the sequencing and visibility needs independently. Auto numbers should never be exposed to customers or end-users if you're concerned about revealing business details such as how many clients or orders you have. This relates to the classic "German Tank Problem" during World War II, where sequential serial numbers revealed production quantities. If you need custom visible numbering, such as sequential invoice numbers that never skip or a customer-facing ID, you can use functions like DMax to create and manage those numbers yourself. There are strategies for generating other kinds of keys, too – for example, combining parts of a person's name, date, or even generating a unique code with a random element. These can serve as customer or order codes that are meaningful to your users but separate from your internal IDs. In some limited cases, you might skip the auto number. If the table is used solely for lookups without relationships, or for temporary data sets that will not be referenced elsewhere, or if you already have a genuinely unique, unchanging natural key and are critically low on storage space, it may be fine to rely exclusively on the natural key. However, in the vast majority of use cases, adding an auto number is still the superior choice. It's important to consider and address common arguments against auto numbers: - Some argue that natural keys convey information at a glance. While that is true, you can store meaningful identifier fields alongside your auto number, and use those for reports or searches as needed. - Composite keys, made from several fields, create lengthy and often redundant keys that can complicate relationships and expose private or unnecessary details. - Using fields like phone numbers or email addresses as keys is risky because these values can and do change over time. - The concern about wasting space by storing both an auto number and a natural key is largely a non-issue. The efficiency gained in related tables quickly outweighs the cost of four extra bytes per record. - Referential integrity with cascade updates is only reliable when working within a single file and not with linked or remote tables, which is often not the case when scaling or splitting your database. - Synchronizing records between databases with auto numbers, such as when working offline, can be resolved using random auto numbers or GUIDs, which provide more than enough randomness to avoid conflicts. Problems with merging and syncing are not unique to auto numbers and can arise with natural keys as well. - Recovering deleted records and their auto numbers is possible through backup restoration and append queries, so accidental data loss doesn't permanently break your relationships. - There are rare situations where having an additional auto number index could marginally impact performance on tables used only for data entry with no searching or sorting. In such special cases, it may be best to avoid an index altogether. - Auto numbers are not guaranteed to be sequential, nor should they be assumed to mark the most recent records. - If you want new records to start at a certain auto number value, tricks like appending blank records can be used, but this is unnecessary since the user never needs to see those values. - Compacting and repairing your database after deleting records at the end of an auto number sequence can result in reuse of those numbers. Marking records inactive, rather than deleting them, is the best practice here. - No database system is entirely without flaws; be rigorous about backups. In summary, auto numbers are generally the easiest, most reliable, and most efficient primary key choice for Microsoft Access tables. They provide the greatest stability and compatibility, reduce the size of your data, and simplify relationships and table design. For most tables you create, use an auto number as your primary key. When necessary, add additional fields to serve the needs of your users, such as visible codes or strictly sequential numbers. If you want to see detailed, step-by-step instructions for everything discussed here, including side videos on custom sequential numbering and restoring deleted records, you can find the complete video tutorial on my website at the link below. Live long and prosper, my friends. Topic ListDefinition and purpose of key fields in databasesPrimary key vs. other key fields Foreign key fields and table relationships Natural keys vs. synthetic (surrogate) keys Explanation of auto numbers in Access Why auto numbers are recommended for primary keys Problems with using natural keys as primary keys Advantages of using auto numbers in relationships Data size and performance benefits of auto numbers How auto numbers aid in upscaling to SQL Server Standardizing key strategies across tables How auto numbers support database merging and business changes Business intelligence concerns with exposing auto numbers When NOT to use auto numbers Creating custom sequential numbers for user-facing fields Creating secondary identifiers for users Instances when auto numbers are not necessary Common arguments against auto numbers and rebuttals Data storage math: auto numbers vs. text/natural keys Limitations of cascade updates with split and linked tables Replication and multi-user considerations for auto numbers Random auto numbers and GUIDs for uniqueness Methods for resolving auto number conflicts after merging tables How to restore deleted auto numbers with append queries Downsides of indexes and their performance impact Auto number sequencing issues and when not to rely on it Handling gaps and starting values in auto numbers Compact and repair impacts on auto numbers Key takeaways for using auto numbers as primary keys |
||||||||||||||||||||||||||||
|
| |||
| Keywords: TechHelp Access AutoNumbers Good or Bad Surrogate Key Primary Key Synthetic Key german tank PermaLink AutoNumbers Good or Bad? in Microsoft Access |