How to Read Files from Google Drive Folder using Python


In this article, we will go through the whole process -step by step- of the interface between Python and Google drive for reading all files, files from a certain folders and folders using a simple systematic way.

Step 1: Create Google Service account

The first thing that you should do to interface with Google APIs in general is to create a service account using your google account, to be able to use Google services for developing. To do that, the following steps should followed:
1- Go to the Google APIs Console
2- Create a new project.
3- Click Enable API. Search for and enable the Google Drive API.
4- Create credentials for a Web Server to access Application Data.
5- 
Name the service account and grant it a Project Role of Editor.
6- 
Download the JSON file.
7- (Optional) You can copy that JSON file to your code directory or any other directory and rename it to client_secret.json


Step 2: Share the files, and folders with the service account

In this step you should share the files/folders that you would like to read with the service account; XXXX@XXXX.iam.gserviceaccount.com . That means that you give access to your service account to read those files and folders.


Step 3: Read Function

# --------- IMPORTS -----------
import pandas as pd
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient.discovery import build

scopes = ['https://www.googleapis.com/auth/spreadsheets',
          "https://www.googleapis.com/auth/drive.file",
          "https://www.googleapis.com/auth/drive"]
path = "PATH_TO_CRED_FOLDER" 
creds = ServiceAccountCredentials.from_json_keyfile_name(
          f"{path}/{'gcredentials.json'}",scopes)

# --------- FUNCTIONS -----------
def _get_folder_name(folder_id):
    #------------- Build Service -------------
    service = build('drive', 'v3', credentials=creds)
    try:
        return service.files().get(fileId=folder_id).execute()['name']
    except: 
        return None
    
def get_files_from_google_folder(folder_id):
    #------------- Build Service -------------
    service = build('drive', 'v3', credentials=creds)
    if folder_id:
        query = f"'{folder_id}' in parents"
        folder_name = _get_folder_name(folder_id)
    else:
        query = ""
        folder_name = None
        
    results = service.files().list(
                 q=query, 
                 pageSize=1000, 
                 fields="nextPageToken, 
                 files(id, name, mimeType, parents, fileExtension)").execute()    
    items = results.get('files', [])
    
    # ----------- Create DataFrame -------------
    df_files = pd.DataFrame(pd.DataFrame(data=items))
    df_files['parent_folder_name'] = folder_name

    return df_files


# ----------- MAIN -------------
FOLDER_ID = "" # PASTE HERE THE FOLDER ID
df_files = get_files_from_google_folder(FOLDER_ID)

Now if you print df_files, it will be in a similar format:

Please note that Google Drive API reads only 1000 file per request.

Finally, I hope that the article was helpful for you, if you like my content and want to support me you can buy me a coffee: https://www.buymeacoffee.com/sambadie


Leave comment