Database in Python using psycopg2

7 min readDec 25, 2021

One of the most essential and basic requirement in Machine Learning is having a database. It is here where you will save and retrieve your data for experimentation. It will be really cumbersome to use csv or excel to save huge data 😅. There are many types of database amongst which we will be concentrating on Relational Database. We will see how to execute basic functions in database using Python (this blog assumes that the reader has a basic understanding of PostgreSQL). Let’s start 🤸‍♂️

PostgreSQL

Now there are many Relational Database Management System ( RDBMS) software like SQL, MySQL, Oracle etc. Basically in a RDBMS system, data is saved in rows and column format. The data is stored in such a way that the tables created are related to each other. By using some of the above software you can create, read, update and delete (CRUD operation) database data. In this post we will be using PostgreSQL as our database software. First the installation of PostgreSQL is necessary which can be found here — https://www.postgresql.org/download/
If you want to get a more detailed understanding of PostgreSQL, there is a very good playlist available in YouTube to which the link is — https://www.youtube.com/watch?v=XQ_6G0iCyMQ&list=PLwvrYc43l1MxAEOI_KwGe8l42uJxMoKeS
This is not the only one 😅. There are a ton of other videos available in YouTube. Choose anyone which suits you!

Psycopg2

To use PostgreSQL in Python we will use psycopg2 as it is is the most popular PostgreSQL database adapter for the Python programming language. To install this in your anaconda environment, use the command — conda install -c anaconda psycopg2.
Another important package that we will be using is the sqlalchemy. This is already available when installing Anaconda.
We have all the ingredients for cooking our code. Let’s start!

CRUD (Create, Read, Update, Delete)

These are the basic operations involved in a database. The acronym itself is self explanatory 😅. It starts with creating a database. Let’s see how we can do that.
In the previous post I had scraped data from a scraping website and stored it in a DataFrame. You can check the blog here: https://evrythngunder3d.wordpress.com/2021/11/06/web-scraping-using-selenium-in-python/

or on medium — https://shrinandkadekodi.medium.com/web-scraping-using-selenium-in-python-64a3502944ae
I stored the DataFrame in a pickle file and read it in here. Now we have the data available with us to insert it into the database.
Create:
The below code shows how to create a database using psycopg2. The first step is to connect to database. It could be possible that database may not be available. For such a case I have created a default database connection.

# function to create connection
def createConnection(dbname):
    # this executes when no database has been created
    if dbname == 'default':
        con = psycopg2.connect(user="postgres",
                               password="your_password",
                               host="localhost",
                               port="5432")
        # this is required for auto committing changes
        con.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
    # this executes when database has been created
    else:
        con = psycopg2.connect(user="postgres",
                               database=dbname,
                               password="your_password",
                               host="localhost",
                               port="5432")
    return con

Now if we have connected to a database, we also need to release the connection once work has been done. The below function releases the connection and prints a message.

# function to destroy connection
def destroyConnection(con):
    try:
        cursor = con.cursor()
        cursor.close()
        con.close()
        print("PostgreSQL connection is closed")
    except :
        pass

Once the database is created, which in our case is the ‘gamedata’, we will insert the DataFrame directly into table using sqlalchemy. To insert the data, we need a table in the database. The below code shows how we can create a table. It also shows how to insert entire DataFrame using inbuilt function of pandas.

# function to create database and insert data
def createData(processedData):
    # creates connection to databse with some default settings
    con = createConnection('default')
    cursor = con.cursor()
    # database name and code to create table in the database
    name_Database = "gamedata"
    tableCreation = "CREATE TABLE gamewarehouse (id serial PRIMARY KEY,teams varchar(200),year integer,\
    wins integer,losses integer)"
    
    try:
        # this checks if database is available or not
        checkDatabase = "SELECT datname FROM pg_catalog.pg_database WHERE datname='gamedata';"
        cursor.execute(checkDatabase)
        result = cursor.fetchone()
        # this will create database if not available
        if not result:
            # first create database
            sqlCreateDatabase = "create database "+name_Database+";"
            cursor.execute(sqlCreateDatabase)
            destroyConnection(con)
            
            # second create table
            con = createConnection(name_Database)            
            cursor = con.cursor()
            cursor.execute(tableCreation)
            con.commit() 
            destroyConnection(con)
            
            # this is required to insert pandas dataframe directly into database
            engine = create_engine('postgresql+psycopg2://postgres:your_password@localhost:5432/gamedata')
            processedData.to_sql('gamewarehouse', engine, if_exists='append',index=False)            
            engine.dispose()
            print('Database created successfully!');           
            
    except (Exception, Error) as error:
        print("Error while connecting to PostgreSQL", error)
    finally:
        if (con):
            destroyConnection(con)

As the above code shows, we have created id as a primary key which will be a serial number starting from 1 and will be unique. Also the creation of engine using sqlalchemy is necessary so as to insert the entire DataFrame . Pandas inbuilt function to_sql requires it. I was not able to use the simple connection object of psycopg2 to insert the DataFrame. Let me know in comments how this could be done 😅.
Running the above functions, we can see the results in PostgreSQL as below:

Read:
Read is very much simple as we just need to fetch data from the database. You can add various conditions to it like: get rows which have wins > 10 and so on. But in this tutorial we will simply select all rows. The below function shows this:

def readData():
    # connect to database
    name_Database = "gamedata" 
    con = createConnection(name_Database) 
    cursor = con.cursor()    
         
    # check if database is available
    checkDatabase = "SELECT datname FROM pg_catalog.pg_database WHERE datname='gamedata';"
    cursor.execute(checkDatabase)
    result = cursor.fetchone()
    if result:
        # select all data from table and give in dataframe
        selectQuery = "SELECT * from gamewarehouse"
        allRecords = pandas.read_sql_query(selectQuery,con)
        if (con):
            destroyConnection(con)
    return allRecords

As the above code shows, I have simply used SELECT * statement. This will provide me all the data available in the gamewarehouse table. Another thing you can notice is that the returned records are again converted back to pandas DataFrame using read_sql_query function.

Update:
The next basic operation we will see is Update. Here we will be updating some information in the columns. This too is a simple process. The below function shows this:

def updateData(setColumnName,setColumnData,whereColumnName,whereColumnData):
    name_Database = "gamedata" 
    con = createConnection(name_Database) 
    cursor = con.cursor()
    sqlQuery =  (f"UPDATE gamewarehouse SET {setColumnName} = {setColumnData} "
                 f"WHERE {whereColumnName} = {whereColumnData};")
    cursor.execute(sqlQuery)
    con.commit()
    if (con):
        destroyConnection(con)

In this code you can see that to update data in a table, you have to use some keywords like SET and WHERE (in addition to UPDATE). The SET keyword is where the column whose data has to be changed and data is updated. The WHERE clause gives the indication as to where the data lies.
eg: UPDATE gamewarehouse SET wins = 100 WHERE id = 50;
This will update the wins columns of id 50 to 100.

Delete:
The last operation that we will see is the Delete operation. Use this cautiously as deleting data is irrevocable process! The process is very straightforward. You can delete tables, rows or even databases. The below function shows how to delete a row:

def deleteData(whereColumnName,whereColumnData):
    name_Database = "gamedata" 
    con = createConnection(name_Database) 
    cursor = con.cursor()
    sqlQuery =  (f"DELETE FROM gamewarehouse WHERE {whereColumnName} = {whereColumnData};")
    cursor.execute(sqlQuery)
    con.commit()
    if (con):
        destroyConnection(con)

Again using WHERE we will specify the row to delete.
eg: DELETE FROM gamewarehouse WHERE id = 1;

The whole code for reference:

# function to create connection
def createConnection(dbname):
    # this executes when no database has been created
    if dbname == 'default':
        con = psycopg2.connect(user="postgres",
                               password="your_password",
                               host="localhost",
                               port="5432")
        # this is required for auto committing changes
        con.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
    # this executes when database has been created
    else:
        con = psycopg2.connect(user="postgres",
                               database=dbname,
                               password="your_password",
                               host="localhost",
                               port="5432")
    return con
 
# function to destroy connection
def destroyConnection(con):
    try:
        cursor = con.cursor()
        cursor.close()
        con.close()
        print("PostgreSQL connection is closed")
    except :
        pass 
  
  
# function to create database and insert data
def createData(processedData):
    # creates connection to databse with some default settings
    con = createConnection('default')
    cursor = con.cursor()
    # database name and code to create table in the database
    name_Database = "gamedata"
    tableCreation = "CREATE TABLE gamewarehouse (id serial PRIMARY KEY,teams varchar(200),year integer,\
    wins integer,losses integer)"
    
    try:
        # this checks if database is available or not
        checkDatabase = "SELECT datname FROM pg_catalog.pg_database WHERE datname='gamedata';"
        cursor.execute(checkDatabase)
        result = cursor.fetchone()
        # this will create database if not available
        if not result:
            # first create database
            sqlCreateDatabase = "create database "+name_Database+";"
            cursor.execute(sqlCreateDatabase)
            destroyConnection(con)
            
            # second create table
            con = createConnection(name_Database)            
            cursor = con.cursor()
            cursor.execute(tableCreation)
            con.commit() 
            destroyConnection(con)
            
            # this is required to insert pandas dataframe directly into database
            engine = create_engine('postgresql+psycopg2://postgres:your_password@localhost:5432/gamedata')
            processedData.to_sql('gamewarehouse', engine, if_exists='append',index=False)            
            engine.dispose()
            print('Database created successfully!');           
            
    except (Exception, Error) as error:
        print("Error while connecting to PostgreSQL", error)
    finally:
        if (con):
            destroyConnection(con)
   
def readData():
    # connect to database
    name_Database = "gamedata" 
    con = createConnection(name_Database) 
    cursor = con.cursor()    
         
    # check if database is available
    checkDatabase = "SELECT datname FROM pg_catalog.pg_database WHERE datname='gamedata';"
    cursor.execute(checkDatabase)
    result = cursor.fetchone()
    if result:
        # select all data from table and give in dataframe
        selectQuery = "SELECT * from gamewarehouse"
        allRecords = pandas.read_sql_query(selectQuery,con)
        if (con):
            destroyConnection(con)
    return allRecordsdef updateData(setColumnName,setColumnData,whereColumnName,whereColumnData):
    name_Database = "gamedata" 
    con = createConnection(name_Database) 
    cursor = con.cursor()
    sqlQuery =  (f"UPDATE gamewarehouse SET {setColumnName} = {setColumnData} "
                 f"WHERE {whereColumnName} = {whereColumnData};")
    cursor.execute(sqlQuery)
    con.commit()
    if (con):
        destroyConnection(con) 
  
  
def deleteData(whereColumnName,whereColumnData):
    name_Database = "gamedata" 
    con = createConnection(name_Database) 
    cursor = con.cursor()
    sqlQuery =  (f"DELETE FROM gamewarehouse WHERE {whereColumnName} = {whereColumnData};")
    cursor.execute(sqlQuery)
    con.commit()
    if (con):
        destroyConnection(con)  
  
  
# some preprocessing  
dataToLoad = getData()
dataToLoad = dataToLoad.rename(columns={'Team Name':'Teams'})
dataToLoad.columns = dataToLoad.columns.str.lower()
# Create
createData(dataToLoad)
# Read
getDataFrame = readData()
# Update
updateData('wins',100,'id',50)
# Delete
deleteData('id',1)

To summarize, we saw the application of psycopg2 to connect and insert DataFrame into databases and also performed some simple CRUD operations.
Let me know if this was helpful!

References:
- A lot of googling amongst which the major sources were stackoverflow.com, medium.com, documentation from postgresql.org and youtube.com

Originally published at http://evrythngunder3d.wordpress.com on December 25, 2021.

Database in Python using psycopg2

Written by Shrinand Kadekodi