Useful Methods in pandas That Make Your Code Faster and Cleaner

share on:

Getting familiar with some of the pandas methods will make your life easier when it comes to handling data. We will discuss a few commonly used pandas methods below.

dropna()

dropna()

>>> prices = yfinance.download([ "AAPL", "BTC-USD", "TSLA" ], start="2020-01-02", end="2020-12-31")["Adj Close"]
[*********************100%***********************]  3 of 3 completed
>>> 
>>> prices
                  AAPL       BTC-USD        TSLA
Date                                            
2020-01-02   73.683571   6985.470215   28.684000
2020-01-03   72.967201   7344.884277   29.534000
2020-01-04         NaN   7410.656738         NaN
2020-01-05         NaN   7411.317383         NaN
2020-01-06   73.548637   7769.219238   30.102667
...                ...           ...         ...
2020-12-26         NaN  26437.037109         NaN
2020-12-27         NaN  26272.294922         NaN
2020-12-28  135.292664  27084.808594  221.229996
2020-12-29  133.491257  27362.437500  221.996674
2020-12-30  132.353012  28840.953125  231.593338

[364 rows x 3 columns]

get rid of NaN

>>> prices = prices.dropna()
>>> prices
                  AAPL       BTC-USD        TSLA
Date                                            
2020-01-02   73.683571   6985.470215   28.684000
2020-01-03   72.967201   7344.884277   29.534000
2020-01-06   73.548637   7769.219238   30.102667
2020-01-07   73.202728   8163.692383   31.270666
2020-01-08   74.380287   8079.862793   32.809334
...                ...           ...         ...
2020-12-23  129.621231  23241.345703  215.326660
2020-12-24  130.620895  23735.949219  220.589996
2020-12-28  135.292664  27084.808594  221.229996
2020-12-29  133.491257  27362.437500  221.996674
2020-12-30  132.353012  28840.953125  231.593338

[252 rows x 3 columns]

shift(1)

>>> prices
                  AAPL       BTC-USD        TSLA
Date                                            
2020-01-02   73.683571   6985.470215   28.684000
2020-01-03   72.967201   7344.884277   29.534000
2020-01-06   73.548637   7769.219238   30.102667
2020-01-07   73.202728   8163.692383   31.270666
2020-01-08   74.380287   8079.862793   32.809334
...                ...           ...         ...
2020-12-23  129.621231  23241.345703  215.326660
2020-12-24  130.620895  23735.949219  220.589996
2020-12-28  135.292664  27084.808594  221.229996
2020-12-29  133.491257  27362.437500  221.996674
2020-12-30  132.353012  28840.953125  231.593338

[252 rows x 3 columns]
>>> prices.shift(1)
                  AAPL       BTC-USD        TSLA
Date                                            
2020-01-02         NaN           NaN         NaN
2020-01-03   73.683571   6985.470215   28.684000
2020-01-06   72.967201   7344.884277   29.534000
2020-01-07   73.548637   7769.219238   30.102667
2020-01-08   73.202728   8163.692383   31.270666
...                ...           ...         ...
2020-12-23  130.531815  23783.029297  213.446671
2020-12-24  129.621231  23241.345703  215.326660
2020-12-28  130.620895  23735.949219  220.589996
2020-12-29  135.292664  27084.808594  221.229996
2020-12-30  133.491257  27362.437500  221.996674

[252 rows x 3 columns]

Calculate returns

>>> returns = prices / prices.shift(1) - 1
>>> returns
                AAPL   BTC-USD      TSLA
Date                                    
2020-01-02       NaN       NaN       NaN
2020-01-03 -0.009722  0.051452  0.029633
2020-01-06  0.007968  0.057773  0.019255
2020-01-07 -0.004703  0.050774  0.038801
2020-01-08  0.016086 -0.010269  0.049205
...              ...       ...       ...
2020-12-23 -0.006976 -0.022776  0.008808
2020-12-24  0.007712  0.021281  0.024443
2020-12-28  0.035766  0.141088  0.002901
2020-12-29 -0.013315  0.010250  0.003466
2020-12-30 -0.008527  0.054034  0.043229

[252 rows x 3 columns]

Get rid of NaN in the result

>>> returns = returns.dropna()
>>> returns
                AAPL   BTC-USD      TSLA
Date                                    
2020-01-03 -0.009722  0.051452  0.029633
2020-01-06  0.007968  0.057773  0.019255
2020-01-07 -0.004703  0.050774  0.038801
2020-01-08  0.016086 -0.010269  0.049205
2020-01-09  0.021241 -0.024851 -0.021945
...              ...       ...       ...
2020-12-23 -0.006976 -0.022776  0.008808
2020-12-24  0.007712  0.021281  0.024443
2020-12-28  0.035766  0.141088  0.002901
2020-12-29 -0.013315  0.010250  0.003466
2020-12-30 -0.008527  0.054034  0.043229

[251 rows x 3 columns]

fillna()

Replace NaN with whatever reasonable value you want. Using the returns example above:

>>> returns = prices / prices.shift(1) - 1
>>> returns.head()
                AAPL   BTC-USD      TSLA
Date                                    
2020-01-02       NaN       NaN       NaN
2020-01-03 -0.009722  0.051452  0.029633
2020-01-06  0.007968  0.057773  0.019255
2020-01-07 -0.004703  0.050774  0.038801
2020-01-08  0.016086 -0.010269  0.049205
>>>
>>> returns = returns.fillna(0)
>>> returns.head()
                AAPL   BTC-USD      TSLA
Date                                    
2020-01-02  0.000000  0.000000  0.000000
2020-01-03 -0.009722  0.051452  0.029633
2020-01-06  0.007968  0.057773  0.019255
2020-01-07 -0.004703  0.050774  0.038801
2020-01-08  0.016086 -0.010269  0.049205

iloc(): position based selection

Select rows

>>> prices.iloc[0:8]
                 AAPL      BTC-USD       TSLA
Date                                         
2020-01-02  73.683571  6985.470215  28.684000
2020-01-03  72.967201  7344.884277  29.534000
2020-01-06  73.548637  7769.219238  30.102667
2020-01-07  73.202728  8163.692383  31.270666
2020-01-08  74.380287  8079.862793  32.809334
2020-01-09  75.960182  7879.071289  32.089333
2020-01-10  76.131920  8166.554199  31.876667
2020-01-13  77.758423  8144.194336  34.990665

select columns

>>> prices.iloc[:,[0,2]]
                  AAPL        TSLA
Date                              
2020-01-02   73.683571   28.684000
2020-01-03   72.967201   29.534000
2020-01-06   73.548637   30.102667
2020-01-07   73.202728   31.270666
2020-01-08   74.380287   32.809334
...                ...         ...
2020-12-23  129.621231  215.326660
2020-12-24  130.620895  220.589996
2020-12-28  135.292664  221.229996
2020-12-29  133.491257  221.996674
2020-12-30  132.353012  231.593338

[252 rows x 2 columns]

Mixed rows and columns selection

>>> prices.iloc[0:8,[0,2]]
                 AAPL       TSLA
Date                            
2020-01-02  73.683571  28.684000
2020-01-03  72.967201  29.534000
2020-01-06  73.548637  30.102667
2020-01-07  73.202728  31.270666
2020-01-08  74.380287  32.809334
2020-01-09  75.960182  32.089333
2020-01-10  76.131920  31.876667
2020-01-13  77.758423  34.990665

loc(): Label based selection

Select rows with row indexes

>>> prices.loc[["2020-01-02","2020-01-08", "2020-12-30"]]
                  AAPL       BTC-USD        TSLA
Date                                            
2020-01-02   73.683571   6985.470215   28.684000
2020-01-08   74.380287   8079.862793   32.809334
2020-12-30  132.353012  28840.953125  231.593338

Select columns with column names

>>> prices.loc[:, ["AAPL", "TSLA"]]
                  AAPL        TSLA
Date                              
2020-01-02   73.683571   28.684000
2020-01-03   72.967201   29.534000
2020-01-06   73.548637   30.102667
2020-01-07   73.202728   31.270666
2020-01-08   74.380287   32.809334
...                ...         ...
2020-12-23  129.621231  215.326660
2020-12-24  130.620895  220.589996
2020-12-28  135.292664  221.229996
2020-12-29  133.491257  221.996674
2020-12-30  132.353012  231.593338

[252 rows x 2 columns]

Mixed row and column selection

>>> prices.loc[["2020-01-02","2020-01-08", "2020-12-30"], ["AAPL", "TSLA"]]
                  AAPL        TSLA
Date                              
2020-01-02   73.683571   28.684000
2020-01-08   74.380287   32.809334
2020-12-30  132.353012  231.593338

Logical selection for rows

>>> prices.loc[prices.index == "2020-12-30"]
                  AAPL       BTC-USD        TSLA
Date                                            
2020-12-30  132.353012  28840.953125  231.593338

rows with columns

>>> prices.loc[prices.index == "2020-12-30", ["AAPL","TSLA"] ]
                  AAPL        TSLA
Date                              
2020-12-30  132.353012  231.593338

Select a particular month

>>> prices.loc[prices.index.month == 12]
                  AAPL       BTC-USD        TSLA
Date                                            
2020-12-01  121.465462  18802.998047  194.919998
2020-12-02  121.821777  19201.091797  189.606674
2020-12-03  121.683220  19445.398438  197.793335
2020-12-04  121.000267  18699.765625  199.679993
2020-12-07  122.484932  19191.630859  213.919998
2020-12-08  123.108482  18321.144531  216.626663
2020-12-09  120.535065  18553.916016  201.493332
2020-12-10  121.980141  18264.992188  209.023331
2020-12-11  121.158630  18058.904297  203.330002
2020-12-14  120.535065  19246.644531  213.276672
2020-12-15  126.572701  19417.076172  211.083328
2020-12-16  126.503426  21310.597656  207.589996
2020-12-17  127.384331  22805.162109  218.633331
2020-12-18  125.365181  23137.960938  231.666672
2020-12-21  126.919128  22803.082031  216.619995
2020-12-22  130.531815  23783.029297  213.446671
2020-12-23  129.621231  23241.345703  215.326660
2020-12-24  130.620895  23735.949219  220.589996
2020-12-28  135.292664  27084.808594  221.229996
2020-12-29  133.491257  27362.437500  221.996674
2020-12-30  132.353012  28840.953125  231.593338
>>> prices.loc[prices.index == pd.to_datetime("2020-12-30")]
                  AAPL       BTC-USD        TSLA
Date                                            
2020-12-30  132.353012  28840.953125  231.593338

groupby():

Leave a Response