Iterate over the rows of a pandas dataframe, and get 100 rows each time
Iterating over rows in a pandas DataFrame and processing them in chunks of 100 rows can be done efficiently using the iloc
method. Here's a step-by-step guide on how to achieve this:
Import pandas and create a DataFrame: If you haven't already, import pandas and create or load your DataFrame.
Determine the number of rows: Get the total number of rows in the DataFrame.
Iterate in chunks: Use a loop to iterate over the DataFrame in chunks of 100 rows.
Here's a sample code to demonstrate this:
import pandas as pd
# Sample DataFrame creation for demonstration
data = {
'A': range(1, 1001), # 1000 rows of data
'B': range(1001, 2001)
}
df = pd.DataFrame(data)
# Determine the number of rows in the DataFrame
num_rows = len(df)
# Define the chunk size
chunk_size = 100
# Iterate over the DataFrame in chunks of 100 rows
for start in range(0, num_rows, chunk_size):
end = start + chunk_size
chunk = df.iloc[start:end]
# Process the chunk
print(f"Processing rows {start} to {end-1}")
print(chunk)
# Add your processing logic here
Import pandas: The import pandas as pd
statement imports the pandas library.
Create a DataFrame: The data
dictionary is used to create a sample DataFrame with 1000 rows.
Determine the number of rows: num_rows = len(df)
gets the total number of rows in the DataFrame.
Define the chunk size: chunk_size = 100
sets the size of each chunk to 100 rows.
Iterate over the DataFrame: The for
loop iterates over the DataFrame in steps of 100 rows. The start
variable is the starting index of the chunk, and end
is the ending index. The iloc
method is used to select the rows from start
to end
.
Process the chunk: Inside the loop, you can add your processing logic for each chunk. In this example, it simply prints the chunk and the range of rows being processed.
This approach ensures that you efficiently handle large DataFrames by processing them in manageable chunks.