How to Create a Dataset in Python Using Some Random Values?

The following article explains How to Create a Dataset. Basically, here we use a small set of values. The dataset will be generated by using these values randomly.

Python Program to Generate the Dataset

The following python program shows how to generate this dataset.

import pandas as pd
import random
from faker import Faker

fake = Faker()

# Generate fabricated dataset
num_records = 1000
products = ["T-shirt", "Jeans", "Shoes", "Hat", "Jacket"]
categories = ["Clothing", "Footwear", "Accessories"]
data = []

for _ in range(num_records):
    product_name = random.choice(products)
    category = random.choice(categories)
    price = round(random.uniform(10, 200), 2)
    quantity = random.randint(1, 50)
    customer_name =
    date = fake.date_this_year()

    data.append([product_name, category, price, quantity, customer_name, date])

columns = ["Product", "Category", "Price", "Quantity", "Customer Name", "Date"]
df = pd.DataFrame(data, columns=columns)

# Save dataset to a CSV file
df.to_csv("fabricated_ecommerce_dataset.csv", index=False)


When we run the above program, a csv file with name ‘fabricated_ecommerce_dataset.csv’ is created and saved in the current location. The csv file looks like this.

How to Create a Dataset in Python
How to Create a Dataset in Python

What is Faker?

Basically, Faker is a popular open-source library used for generating fake data. In fact, it is commonly used in software development. Especially we use it during testing and development phases. In order to simulate realistic data without using actual sensitive or confidential information, we use Faker. Furthermore, Faker can generate a wide range of fake data, such as names, addresses, phone numbers, email addresses, dates, and more.

Therefore, developers use Faker to populate databases, create mockups, test user interfaces, and simulate various scenarios. Of course, with Faker, it is possible without the need for real data. This can help identify potential issues, test edge cases, and ensure that software behaves as expected under different circumstances.

Additinally, Faker is available in various programming languages not just in Python. Also, there are different implementations and versions of the library for different platforms and frameworks. In general, it simplifies the process of generating diverse and plausible data. It makes it a valuable tool for developers across different domains.

Creating Dataset

Another important Python library is pandas that we use here. Actually, we create a dataframe using DataFrame() function of pandas. After that, we use to_csv() function of pandas. As a result, it creates the corresponding csv file and writes it to the disk.

Further Reading

Python Practice Exercise

How to Start Working with Flask API?

20 Project Ideas Using Flask API for College Students

Introduction to PySyft

Exclusive Project Ideas for Students Using PySyft

What is the Transformer Model of AI?

10 Points of Difference Between the Transformer Model and RNN

How to Create a Dataset in Python Using Some Random Values?

Data Visualization Practice Exercise


You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *