How to Create a Dataset in Python Using Some Random Values?

The following article explains How to Create a Dataset. Basically, here we use a small set of values. The dataset will be generated by using these values randomly.

Python Program to Generate the Dataset

The following python program shows how to generate this dataset.

import pandas as pd
import random
from faker import Faker

fake = Faker()

# Generate fabricated dataset
num_records = 1000
products = ["T-shirt", "Jeans", "Shoes", "Hat", "Jacket"]
categories = ["Clothing", "Footwear", "Accessories"]
data = []

for _ in range(num_records):
    product_name = random.choice(products)
    category = random.choice(categories)
    price = round(random.uniform(10, 200), 2)
    quantity = random.randint(1, 50)
    customer_name = fake.name()
    date = fake.date_this_year()

    data.append([product_name, category, price, quantity, customer_name, date])

columns = ["Product", "Category", "Price", "Quantity", "Customer Name", "Date"]
df = pd.DataFrame(data, columns=columns)

# Save dataset to a CSV file
df.to_csv("fabricated_ecommerce_dataset.csv", index=False)

Output

When we run the above program, a csv file with name ‘fabricated_ecommerce_dataset.csv’ is created and saved in the current location. The csv file looks like this.

What is Faker?

Basically, Faker is a popular open-source library used for generating fake data. In fact, it is commonly used in software development. Especially we use it during testing and development phases. In order to simulate realistic data without using actual sensitive or confidential information, we use Faker. Furthermore, Faker can generate a wide range of fake data, such as names, addresses, phone numbers, email addresses, dates, and more.

Therefore, developers use Faker to populate databases, create mockups, test user interfaces, and simulate various scenarios. Of course, with Faker, it is possible without the need for real data. This can help identify potential issues, test edge cases, and ensure that software behaves as expected under different circumstances.

Additinally, Faker is available in various programming languages not just in Python. Also, there are different implementations and versions of the library for different platforms and frameworks. In general, it simplifies the process of generating diverse and plausible data. It makes it a valuable tool for developers across different domains.

Creating Dataset

Another important Python library is pandas that we use here. Actually, we create a dataframe using DataFrame() function of pandas. After that, we use to_csv() function of pandas. As a result, it creates the corresponding csv file and writes it to the disk.

How to Create a Dataset in Python Using Some Random Values?

Python Program to Generate the Dataset

What is Faker?

Creating Dataset

Further Reading

Leave a Reply Cancel reply

Python Program to Generate the Dataset

What is Faker?

Creating Dataset

Further Reading

You may also like...

Common Ways to Create Web Applications in Python

A Brief Introduction of Pandas Library in Python

Understanding YOLO Algorithm

Leave a Reply Cancel reply