The following article explains How to Create a Dataset. Basically, here we use a small set of values. The dataset will be generated by using these values randomly.
Python Program to Generate the Dataset
The following python program shows how to generate this dataset.
import pandas as pd
import random
from faker import Faker
fake = Faker()
# Generate fabricated dataset
num_records = 1000
products = ["T-shirt", "Jeans", "Shoes", "Hat", "Jacket"]
categories = ["Clothing", "Footwear", "Accessories"]
data = []
for _ in range(num_records):
product_name = random.choice(products)
category = random.choice(categories)
price = round(random.uniform(10, 200), 2)
quantity = random.randint(1, 50)
customer_name = fake.name()
date = fake.date_this_year()
data.append([product_name, category, price, quantity, customer_name, date])
columns = ["Product", "Category", "Price", "Quantity", "Customer Name", "Date"]
df = pd.DataFrame(data, columns=columns)
# Save dataset to a CSV file
df.to_csv("fabricated_ecommerce_dataset.csv", index=False)
Output
When we run the above program, a csv file with name ‘fabricated_ecommerce_dataset.csv’ is created and saved in the current location. The csv file looks like this.

What is Faker?
Basically, Faker is a popular open-source library used for generating fake data. In fact, it is commonly used in software development. Especially we use it during testing and development phases. In order to simulate realistic data without using actual sensitive or confidential information, we use Faker. Furthermore, Faker can generate a wide range of fake data, such as names, addresses, phone numbers, email addresses, dates, and more.
Therefore, developers use Faker to populate databases, create mockups, test user interfaces, and simulate various scenarios. Of course, with Faker, it is possible without the need for real data. This can help identify potential issues, test edge cases, and ensure that software behaves as expected under different circumstances.
Additinally, Faker is available in various programming languages not just in Python. Also, there are different implementations and versions of the library for different platforms and frameworks. In general, it simplifies the process of generating diverse and plausible data. It makes it a valuable tool for developers across different domains.
Creating Dataset
Another important Python library is pandas that we use here. Actually, we create a dataframe using DataFrame() function of pandas. After that, we use to_csv() function of pandas. As a result, it creates the corresponding csv file and writes it to the disk.
Further Reading
How to Start Working with Flask API?
20 Project Ideas Using Flask API for College Students
Exclusive Project Ideas for Students Using PySyft
What is the Transformer Model of AI?
10 Points of Difference Between the Transformer Model and RNN
How to Create a Dataset in Python Using Some Random Values?
Data Visualization Practice Exercise
- Angular
- ASP.NET
- C
- C#
- C++
- CSS
- Dot Net Framework
- HTML
- IoT
- Java
- JavaScript
- Kotlin
- PHP
- Power Bi
- Python
- Scratch 3.0
- TypeScript
- VB.NET
