The following code shows how to Create a Dataset with Missing Rows in Python.
import csv import random import numpy as np from faker import Faker # Set the random seed for reproducibility random.seed(42) np.random.seed(42) # Number of rows in the dataset num_rows = 1000 # Number of rows with missing values num_missing_rows = 150 # Initialize the Faker generator fake = Faker() # Generate data for the DataFrame data =  for i in range(num_rows): customer_name = fake.name() credit_score = random.randint(300, 900) loan_eligibility = 1 if credit_score > 600 else 0 data.append([customer_name, credit_score, loan_eligibility]) # Introduce missing values in credit_score and loan_eligibility missing_indices = random.sample(range(num_rows), num_missing_rows) for index in missing_indices: data[index] = data[index] = '' # Save the data to a CSV file with open('credit_score_dataset.csv', 'w', newline='') as csvfile: csvwriter = csv.writer(csvfile) csvwriter.writerow(['customer_name', 'credit_score', 'loan_eligibility']) csvwriter.writerows(data) print("Dataset created and saved successfully.")
When you run this program, the dataset named credit_score_dataset.csv is created. The following figure shows the dataset.
You can see that the dataset contains missing values at random places.
The above program works as follows:
- At first, we import the necessary libraries.
- We use the faker library to generate the customer names.
- Also, we specify the total number of rows and the total number of missing rows.
- Then, we create an empty list called data.
- In a for loop that iterates 1000 times, we create the values for each attribute. For instance, credit_score gets a random value between 300 and 600. Similarly, loan_eligibility gets a value 1 if the credit_score is more than 600. Otherwise, it gets a value of 0.
- In order to insert missing values at random places, we use the random.sample() function. Basically, this function takes a range from 1 to 1000 (total number of rows) and the number of rows with missing values. Then it assigns a blank value to last two attributes in those rows.
- Finally, we create a csv file in write mode. Then, we first write the heading row and next we write the list named data that contains 1000 rows.
- Dot Net Framework
- Power Bi
- Scratch 3.0