Computing the fraction of long trips is a common task in data analysis, and it can be done efficiently in Python using generators. Here's an example of how you can use generators to compute the fraction of long trips in a dataset:
def read_data(filename): # Generator function to read data from a file with open(filename, 'r') as f: for line in f: yield line.strip().split(',')def long_trip_fraction(data, threshold): # Generator function to compute the fraction of long trips num_long_trips = 0 num_total_trips = 0 for row in data: duration = float(row[0]) if duration > threshold: num_long_trips += 1 num_total_trips += 1 yield num_long_trips / num_total_trips# Example usage:data = read_data('trips.csv')long_trip_fractions = long_trip_fraction(data, 60.0)for fraction in long_trip_fractions: print(fraction) |
In this example, we define two generator functions: read_data and long_trip_fraction. The read_data function reads data from a file and yields each row as a list of strings. The long_trip_fraction function takes the data generator and a threshold duration as input and yields the fraction of long trips (i.e., trips that have a duration longer than the threshold) for each row in the dataset.
To use these generator functions, we first create a data generator by calling read_data and passing in the filename of the data file. We then create a generator for the long trip fractions by calling long_trip_fraction and passing in the data generator and a threshold duration of 60 minutes.
Finally, we iterate over the long trip fraction generator and print each fraction as it is generated.
This approach allows us to compute the fraction of long trips efficiently without having to load the entire dataset into memory at once. It also allows us to process the data in a streaming fashion, which is useful if the data is too large to fit into memory.