How to Read Files from Google Drive Folder using Python
In this article, we will go through the whole process -step by step- of the interface between Python and Google drive for reading all files, files from a certain folders and folders using a simple systematic way.
Step 1: Create Google Service account
The first thing that you should do to interface with Google APIs in general is to create a service account using your google account, to be able to use Google services for developing. To do that, the following steps should followed:
1- Go to the Google APIs Console
2- Create a new project.
3- Click Enable API. Search for and enable the Google Drive API.
4- Create credentials for a Web Server to access Application Data.
5- Name the service account and grant it a Project Role of Editor.
6- Download the JSON file.
7- (Optional) You can copy that JSON file to your code directory or any other directory and rename it to client_secret.json
Step 2: Share the files, and folders with the service account
In this step you should share the files/folders that you would like to read with the service account; XXXX@XXXX.iam.gserviceaccount.com . That means that you give access to your service account to read those files and folders.
Step 3: Read Function
# --------- IMPORTS -----------
import pandas as pd
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient.discovery import build
scopes = ['https://www.googleapis.com/auth/spreadsheets',
"https://www.googleapis.com/auth/drive.file",
"https://www.googleapis.com/auth/drive"]
path = "PATH_TO_CRED_FOLDER"
creds = ServiceAccountCredentials.from_json_keyfile_name(
f"{path}/{'gcredentials.json'}",scopes)
# --------- FUNCTIONS -----------
def _get_folder_name(folder_id):
#------------- Build Service -------------
service = build('drive', 'v3', credentials=creds)
try:
return service.files().get(fileId=folder_id).execute()['name']
except:
return None
def get_files_from_google_folder(folder_id):
#------------- Build Service -------------
service = build('drive', 'v3', credentials=creds)
if folder_id:
query = f"'{folder_id}' in parents"
folder_name = _get_folder_name(folder_id)
else:
query = ""
folder_name = None
results = service.files().list(
q=query,
pageSize=1000,
fields="nextPageToken,
files(id, name, mimeType, parents, fileExtension)").execute()
items = results.get('files', [])
# ----------- Create DataFrame -------------
df_files = pd.DataFrame(pd.DataFrame(data=items))
df_files['parent_folder_name'] = folder_name
return df_files
# ----------- MAIN -------------
FOLDER_ID = "" # PASTE HERE THE FOLDER ID
df_files = get_files_from_google_folder(FOLDER_ID)
Now if you print df_files, it will be in a similar format:
Please note that Google Drive API reads only 1000 file per request.
Finally, I hope that the article was helpful for you, if you like my content and want to support me you can buy me a coffee: https://www.buymeacoffee.com/sambadie