In Dask and Pandas, you can use the .str accessor to apply string methods to a column of string data. This is particularly useful when working with text data, as it allows you to efficiently apply transformations to each element of a column in parallel.
Here's an example of how to use .str to capitalize the first letter of each word in a column of strings:
import dask.dataframe as dd# Create a dask dataframe from a CSV filedf = dd.read_csv('data.csv')# Use .str.title() to capitalize the first letter of each word in the 'text' columndf['text'] = df['text'].str.title()# Compute the result and print itprint(df['text'].compute()) |
This will output a new column of strings where the first letter of each word is capitalized.
In addition to title(), there are many other string methods that you can apply using .str, such as lower(), upper(), strip(), replace(), and split(), among others.
Using .str can be a powerful tool when working with text data, as it allows you to apply complex transformations to columns of string data efficiently and in parallel.