Google Sheets(試算表)相信大家都不陌生,能夠允許多個使用者在上面同時協作,不論是新增或編輯資料等,對於資料分析來說,就是一個非常好的儲存與共享資料工具。
當然,Google Sheets(試算表)也提供了API(Application Programming Interface),讓開發人員可以進行整合應用,所以,本文就來和大家分享,在使用Pandas套件進行資料分析時,如何寫入與讀取Google Sheets(試算表),實作的步驟如下:
- 啟用Google Sheets API
- 建立Google Sheets憑證
- 建立Google Sheets試算表
- Pandas寫入Google Sheets試算表
- Pandas讀取Google Sheets試算表
一、啟用Google Sheets API
前往Google API Developer Console,首頁如下圖:
二、建立Google Sheets憑證
而要讓應用程式能夠存取Google Sheets API,就需要建立憑證(Credential),藉此來進行存取驗證。而建立的方式如下圖:
由於本文沒有要將所建立的帳戶再授權給其它使用者,所以直接點擊「完成」即可。接著就可以點擊下圖紅框的地方,來新增這個帳戶的用戶端憑證(Client Credential),如下圖:
三、建立Google Sheets試算表
接下來,新增一張空白的Google Sheets試算表,命名為「Pandas讀寫範例展示」,如下圖:
選擇後,如果沒有要傳送郵件通知,則勾消「通知邀請對像」,點擊「共用」,如下範例:
四、Pandas寫入Google Sheets試算表
而想要將Pandas DataFrame中處理好的資料寫入Google Sheets試算表,除了需要利用以下的指令安裝Pandas套件外,還有gspread(Google Sheets Python API)及google-auth(Google APIs驗證)套件:
$ pip install pandas $ pip install gspread $ pip install google-auth
並且,在專案中進行引用,如下範例:
from google.oauth2.service_account import Credentials import gspread import pandas as pd
由於Google擁有許多的雲端服務,所以需要定義存取的Scope(範圍),也就是Google Sheets(試算表),如下範例:
from google.oauth2.service_account import Credentials import gspread import pandas as pd scope = ['https://www.googleapis.com/auth/spreadsheets']
接著,將剛剛所下載的JSON憑證檔(檔名可自行重新命名)與Scope(範圍)傳入google-auth套件的Credentails模組(Module),來建立憑證物件,如下範例:
from google.oauth2.service_account import Credentials import gspread import pandas as pd scope = ['https://www.googleapis.com/auth/spreadsheets'] creds = Credentials.from_service_account_file("gs_credentials.json", scopes=scope)
這時候就能夠將憑證物件傳入gspread模組(Module)的authorize()方法(Method)進行驗證,如下範例:
from google.oauth2.service_account import Credentials import gspread import pandas as pd scope = ['https://www.googleapis.com/auth/spreadsheets'] creds = Credentials.from_service_account_file("gs_credentials.json", scopes=scope) gs = gspread.authorize(creds)
驗證沒問題,就可以呼叫gspread模組(Module)的open_by_url()方法(Method),傳入Google Sheets試算表的網址,來執行開啟的動作,如下範例:
from google.oauth2.service_account import Credentials import gspread import pandas as pd scope = ['https://www.googleapis.com/auth/spreadsheets'] creds = Credentials.from_service_account_file("gs_credentials.json", scopes=scope) gs = gspread.authorize(creds) sheet = gs.open_by_url('https://docs.google.com/spreadsheets/d/1ZZ4byyE0POpZV7a9za6Bypt5UY5og_v9684qidJ7uwY/edit#gid=0')
由於要寫入Google Sheets試算表的第一個工作表,所以需要透過gspread模組(Module)的get_worksheet()方法(Method)來開啟,如下範例:
from google.oauth2.service_account import Credentials import gspread import pandas as pd scope = ['https://www.googleapis.com/auth/spreadsheets'] creds = Credentials.from_service_account_file("gs_credentials.json", scopes=scope) gs = gspread.authorize(creds) sheet = gs.open_by_url('https://docs.google.com/spreadsheets/d/1ZZ4byyE0POpZV7a9za6Bypt5UY5og_v9684qidJ7uwY/edit#gid=0') worksheet = sheet.get_worksheet(0)
而為了要模擬Pandas DataFrame寫入資料到Google Sheets試算表,所以Pandas DataFrame中的資料內容本文以Kaggle網站的2021年富比士億萬富翁資料集(Billionaire.csv)為例,透過read_csv()方法(Method)進行讀取,如下範例:
from google.oauth2.service_account import Credentials import gspread import pandas as pd scope = ['https://www.googleapis.com/auth/spreadsheets'] creds = Credentials.from_service_account_file("gs_credentials.json", scopes=scope) gs = gspread.authorize(creds) sheet = gs.open_by_url('https://docs.google.com/spreadsheets/d/1ZZ4byyE0POpZV7a9za6Bypt5UY5og_v9684qidJ7uwY/edit#gid=0') worksheet = sheet.get_worksheet(0) df = pd.read_csv('Billionaire.csv')
Pandas DataFrame中有了資料後,就可以呼叫gspread模組(Module)的update()方法(Method),分別將欄位名稱與資料內容寫入Google Sheets試算表,如下範例:
from google.oauth2.service_account import Credentials import gspread import pandas as pd scope = ['https://www.googleapis.com/auth/spreadsheets'] creds = Credentials.from_service_account_file("gs_credentials.json", scopes=scope) gs = gspread.authorize(creds) sheet = gs.open_by_url('https://docs.google.com/spreadsheets/d/1ZZ4byyE0POpZV7a9za6Bypt5UY5og_v9684qidJ7uwY/edit#gid=0') worksheet = sheet.get_worksheet(0) df = pd.read_csv('Billionaire.csv') worksheet.update([df.columns.values.tolist()] + df.values.tolist())
執行結果
五、Pandas讀取Google Sheets試算表
相反的,如果今天想要利用Pandas套件來讀取其它使用者所提供的Google Sheets試算表資料,則可以呼叫gspread模組(Module)的get_all_records()方法(Method)取得所有的資料,如下範例:
from google.oauth2.service_account import Credentials import gspread import pandas as pd scope = ['https://www.googleapis.com/auth/spreadsheets'] creds = Credentials.from_service_account_file("gs_credentials.json", scopes=scope) gs = gspread.authorize(creds) sheet = gs.open_by_url('https://docs.google.com/spreadsheets/d/1ZZ4byyE0POpZV7a9za6Bypt5UY5og_v9684qidJ7uwY/edit#gid=0') worksheet = sheet.get_worksheet(0) df = pd.read_csv('Billionaire.csv') worksheet.update([df.columns.values.tolist()] + df.values.tolist()) new_df = pd.DataFrame(worksheet.get_all_records()) print(new_df)
執行結果
六、小結
Google Sheets試算表可以說是實務上很常使用的資料分享與協作的工具之一,而學會了Pandas套件讀寫Google Sheets試算表資料後,除了能夠有效將自己處理完成的資料進行儲存外,也可以讀取外部分享的Google Sheets試算表資料,來實作後續的資料分析工作,希望本文能夠幫助在資料分析的過程中,會需要整合Google Sheets試算表資料的朋友們。
大家在進行資料分析時,都是整合什麼樣的工具或平台來分享資料呢?歡迎在底下留言和我分享交流唷~
如果喜歡我的文章,別忘了在下面訂閱本網站,以及幫我按五下Like(使用Google或Facebook帳號免費註冊),支持我創作教學文章,回饋由LikeCoin基金會出資,完全不會花到錢,感謝大家。
- [Pandas教學]客製化Pandas DataFrame樣式提升資料可讀性的實用方法
- [Pandas教學]3個Pandas套件合併多個CSV檔案資料的實用技巧
- Pandas教學]看完這篇就懂Pandas套件如何即時讀取API的回應資料
- [Pandas教學]快速掌握Pandas套件讀寫SQLite資料庫的重要方法
- [Pandas教學]輕鬆入門3個常見的Pandas套件排序資料方式
- [Pandas教學]有效利用Pandas套件篩選資料的應用技巧
- [Pandas教學]善用Pandas套件的Groupby與Aggregate方法提升資料解讀效率
- [Pandas教學]使用Pandas套件實作資料清理的必備觀念(上)
- [Pandas教學]使用Pandas套件實作資料清理的必備觀念(下)
- Visual Studio Code漂亮印出Pandas DataFrame資料的實用方法
- [Pandas教學]資料視覺化必懂的Pandas套件繪製Matplotlib分析圖表實戰
- [Pandas教學]5個實用的Pandas讀取Excel檔案資料技巧
- 解析Python網頁爬蟲如何有效整合Pandas套件提升資料處理效率
- [Pandas教學]掌握Pandas DataFrame讀取網頁表格的實作技巧
- [Pandas教學]資料分析必懂的Pandas DataFrame處理雙維度資料方法
- [Pandas教學]資料分析必懂的Pandas Series處理單維度資料方法
跟住你的步驟做了,然後出現了以下的錯誤
回覆刪除Traceback (most recent call last):
File "C:\Users\great\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 471, in prepare_body
body = complexjson.dumps(json, allow_nan=False)
File "C:\Users\great\AppData\Local\Programs\Python\Python38\lib\json\__init__.py", line 234, in dumps
return cls(
File "C:\Users\great\AppData\Local\Programs\Python\Python38\lib\json\encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Users\great\AppData\Local\Programs\Python\Python38\lib\json\encoder.py", line 257, in iterencode
return _iterencode(o, 0)
ValueError: Out of range float values are not JSON compliant
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "e:/PY/trynofuckup.py", line 15, in
worksheet.update([df.columns.values.tolist()] + df.values.tolist())
File "C:\Users\great\AppData\Local\Programs\Python\Python38\lib\site-packages\gspread\utils.py", line 593, in wrapper
return f(*args, **kwargs)
File "C:\Users\great\AppData\Local\Programs\Python\Python38\lib\site-packages\gspread\models.py", line 1276, in update
response = self.spreadsheet.values_update(
File "C:\Users\great\AppData\Local\Programs\Python\Python38\lib\site-packages\gspread\models.py", line 265, in values_update
r = self.client.request('put', url, params=params, json=body)
File "C:\Users\great\AppData\Local\Programs\Python\Python38\lib\site-packages\gspread\client.py", line 64, in request
response = getattr(self.session, method)(
File "C:\Users\great\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\sessions.py", line 602, in put
return self.request('PUT', url, data=data, **kwargs)
File "C:\Users\great\AppData\Local\Programs\Python\Python38\lib\site-packages\google\auth\transport\requests.py", line 475, in request
response = super(AuthorizedSession, self).request(
File "C:\Users\great\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\sessions.py", line 528, in request
prep = self.prepare_request(req)
File "C:\Users\great\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\sessions.py", line 456, in prepare_request
p.prepare(
File "C:\Users\great\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 319, in prepare
self.prepare_body(data, files, json)
File "C:\Users\great\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 473, in prepare_body
raise InvalidJSONError(ve, request=self)
requests.exceptions.InvalidJSONError: Out of range float values are not JSON compliant