Classification is the problem in AI that computers are best at solving. It’s used by billions of people in everday life, including for: spam filtering, fraud detection, and even to unlock your iPhone with Face ID.
When you hear the term “classification,” though, the first thing that may come to mind is some complex system for organizing information, such as the Dewey Decimal System for cataloging non-fiction books.
Taxonomies like this are indeed relevant to AI. What the algorithms do, though, is much narrower than these sorts of complex classification systems you may be familiar with.
Let’s consider what it means for a human, since that’s what artificial intelligence is hoping to imitate.
What is classification?
Classification is a routine mental process. This may be news to the majority of humans, but it’s something we do every day without thinking about it.
Here are a few examples of questions where the answer requires classification:
- Is this a silver coin?
- (pointing) What is this?
- What is she doing?
Even telling your partner about your day requires classification—finding concepts to describe the activities you engaged in, the food you ate, or the people you interacted with.
Classification usually happens subconsciously, and it’s something you’re doing all the time. That said, it can also be an intentional process.
Imagine sitting on a jury where a man is on trial for murder. Is he a murderer? Did the actions described fall under the concept murder? In important cases like this, getting the classification wrong is very costly, so it’s critical to scrutinize the evidence before making a classification decision.
How do humans classify things?
As the example of being a juror in a murder trial indicates, focusing conscious attention on a classification decision is something anyone could do. It’s not something we usually need to think about, where the consequences of making a mistake are low and easy to correct. But it’s a mental process.
Examining these cases where we put deliberate thought into the process can help us understand the components of this process.
Determining whether someone committed a murder requires fully understanding the legal definition of murder, about which a judge carefully instructs jurors. It requires information about what events took place, and an evaluation of whether each piece of evidence is trustworthy. Finally, it requires combining this evidence and comparing it to a standard of proof (e.g. beyond a reasonable doubt) to form a final judgment.
This is specific to the conscious, deliberate process of communication, but it’s also true in the subconscious case. You need to understand the concept—typically something you learned informally as a child.
The data used as evidence is directly from sense perception or from memories derived from earlier perceptions, which in most cases can be treated as obviously true.* People also have an implicit standard of proof, though not explicit; if you aren’t sure, your subconscious will kick the problem back over to your conscious mind for more careful consideration. This is rarely necessary, but happens.
What are computers capable of?
The principles for computers are surprisingly similar. Classification requires as input an understanding of the concept and accurate information to be used as evidence. Note that accurate information implies humans are evaluating the evidence and only including it when correct. Humans do a lot of work, even though algorithms get all the credit!
The algorithm itself does the step of weighing the evidence and giving an estimate of how likely it is to be a correct classification. A human picks the standard of proof used to turn that into a final judgment.
Even though classification is a mental process we can all perform subconsciously with ease, machines can only perform one tiny portion of the process.
Note also that I’ve also framed the problem to be much easier for computers: computers typically require us to give them the specific concept we want to classify something into.** Humans know tens of thousands of concepts and figure out the right one to use in each circumstance.
What do these algorithms actually do?
Humans provide an understanding of the concept, but computers don’t have intelligence so a written explanation won’t do. There’s additional complexity here in translating a human understanding of the concept into a machine representation of the concept.
This problem is addressed by humans using their knowledge of the concept to provide many examples of cases when that concept is correctly and incorrectly applied. The algorithm then has to learn that same concept from the data provided.
This is a critical point: the machine can’t learn anything beyond what a human painstakingly teaches it.
The algorithm learns by recognizing patterns in the data provided by humans, with that data containing all the right answers. Once observed, the computer can attempt to apply these same patterns to data where they don’t have the answer key.
Learning…that’s a big deal. What does human learning look like?
To understand better what a computer is capable of, it’s worth comparing this to a clear case of high-quality human learning.
Consider what it takes to be a great math teacher. Bad math teachers just tell students a process to follow and drill it into students’ heads through repetition. Repetition is important to learning any math concept, but a great math teacher does something fundamentally different.
When presenting a new concept, a great math teacher slowly builds toward an understanding of a new idea. They provide a carefully-chosen sequence of examples of problems that the students know how to solve. Then, they introduce a new problem beyond the range of what they’ve been taught, but which will only take one small step to understand.
The student then makes this mental leap, and essentially discovers this new idea or method.
Of course, this is only possible because the instructor creates the perfect environment to enable students to figure out how to take this next step on their own. Without well-chosen examples and just the right environment, a new technique would seem similar to a new magic trick to learn rather than an incremental advance that adds to a growing body of knowledge.
How does machine learning compare?
Computers don’t have an integrated body of knowledge to grow incrementally. Instead, each new concept learned is a brand new phenomenon learned from scratch.
Whereas humans think of the underlying reasons for things, machines are currently stuck at the level of pattern-recognition. A human can see one new problem, in the right context, and come up with a brand new idea. Computers are given millions of examples of one well-understood idea and asked to learn how to recognize it.
Classification is an important aspect of learning. But there’s much more to learning than classification. And there’s much more to classification than what computers are capable of.
While this discussion is primarily focused on classification, if you’re interested in a deeper comparison of the difference between human and computer learning you can find it here: How humans learn (and computers don’t)
*This is a simplification. In addition to getting data from sense perception or memory of sense perceptions, you can also get them from other people who communicate their own observations or conclusions. In any case, when true this all reduces back to sense perception. When relying on third parties for information, it does also raise the possibility of mistakenly believing a fabrication. These are all true in the murder trial case and more mundane, subconscious cases like telling a partner about your day.
**There are cases where computers are more advanced than this, such as identifying objects in images. They’ve been trained to recognize thousands of different types of objects as well as humans. This typically requires an additional layer of complexity: evaluating each concept the system knows and picking the most relevant one, something humans don’t seem to need. Further, this wouldn’t work for recognizing justice or even concepts of action like crawling. The only reason the computer can learn so many concepts is because all of the tasks look the same to the computer. They all involve picking an area out of an image and assigning it a label. This is an advancement in generalization. Still, humans are capable of classifying any concept without needing all concepts to be so similar.