data types#
All real world objects and ideas are transformed into computational format, into types of data. There are many data types in Python, but a few of them will be important for us in this workshop.
The most common data types include:
integer
string
boolean
list
Other types you might hear about: floats, tuples, dicts, etc. You can read more about these on W3Schools.
the type()
function#
Let’s start with the type()
function to see the data type for different kinds of input. A function is a way of doing something to data in Python. We will talk more about functions in upcoming lessons.
As you may have already guessed, the type()
evaluates the type of data that you put within the parentheses.
type(1)
int
The first type is int
which refers to integer, a whole number. In Python, an integer is distinguished from a number with a decimal, which has its own data type (called float
, for “floating point number”).
type('hello')
str
The next type is str
which refers to string, or a sequence of characters (including letters, numbers, punctuation, and symbols) enclosed by quotes. To a human, a string may look just like regular words, but to a computer which doesn’t understand human language, it is simply a sequence of characters, or a string of characters, between two quotes.
type('True')
str
The bool
data type stands for boolean, a True
or False
value. Why do booleans get their own data type? Because truth and falsity are the foundation of complex computational processes, like algorithms, that can “make decisions,” so to speak.
For now, booleans are not useful, but we will see their power when we start to write conditional statements in a few lessons. Specifically, we will use them to execute certain actions depending on if a value is True
or False
.
type(['donut', 'smoothie', 'coffee', 'banana'])
list
Finally, the list
data type describes a collection of items, such as a list of strings, like in this example above. Lists can also contain integers, other lists, and items from different data types.
Lists will be the most important data type for our work with text, as text-based data is often organized as strings within lists.
data type as constraint#
As you may have figured out, all information that you want to work with in Python must fit into one of these data types. This means that objects and people in the real world, in their complexity and expressiveness, must fit into one of the data types.
Computer software (not even large-language models!) do not process and understand human language the way humans do. For example, everything that humans consider a “word,” and all of the meanings associated with that word, becomes a simple string
to a computer, a sequence of alphanumeric characters wrapped by quotation marks. To a computer, a string
like “ocean” has no semantic meaning, though a large enough computer (like a large-language model) can process enough strings to get a sense of where “ocean” should fall in a sentence, and what words should surround it, perhaps “deep” or “blue”. But, as I explain in workshop 5 on Generating Text, this kind of understanding is a statistical inference, and doesn’t mean that the computer associates depth or color with the ocean, not to mention the various sensations that such words can evoke in humans.
Transforming real world information into computer-readable data inevitably reduces or simplifies the original. Data types thus create a particular kind of constraint on data. That being said, these kind of constraints can also be productive ones. Each data type comes with a number of methods
that can be used on that particular type. You can do things with integers that you cannot do with strings, for example. You can add numbers together, but not strings.
While you’re still a beginner, the differences in data types will likely create headaches as you learn by errors (many, many errors) what methods can and cannot accept certain types of data. Data types will feel like a nuisance, preventing you from doing what you want to your data. But as you advance, you’ll learn to leverage the different types to manipulate data in amazing ways.