Python Workshop - SENCE

Infos

Python

Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.

How to install Python + Libraries

Download and install Anaconda (Python 3.12).

Check if Python has been installed

Start-Menu
"Anaconda Prompt" + Enter ↵
python + Enter ↵
print('Ja') + Enter ↵
exit() + Enter ↵

Launch Jupyter Notebook

Start-Menu
"Jupyter Notebook" + Enter ↵
Click on New (top-right)
Choose Python 3

Anaconda Navigator

Anaconda Navigator is a graphical user interface that is automatically installed with Anaconda. Navigator will open if the installation was successful.

Warning

This interface can also be really slow, and might crash. You don't actually need Anaconda Navigator to launch the listed programs, e.g. jupyter notebook or spyder.

Windows: Click Start, search or select Anaconda Navigator from the menu.
macOS: Click Launchpad, select Anaconda Navigator. Or, use Cmd+Space to open Spotlight Search and type “Navigator” to open the program.
Linux: See next section.

Conda

If you prefer using a command line interface (CLI), you can use conda to verify the installation using Anaconda Prompt on Windows or terminal on Linux and macOS.

To open Anaconda Prompt:

Windows: Click Start, search or select Anaconda Prompt from the menu.
macOS: Cmd+Space to open Spotlight Search and type “Navigator” to open the program.
Linux–CentOS: Open Applications - System Tools - terminal.
Linux–Ubuntu: Open the Dash by clicking the upper left Ubuntu icon, then type “terminal”.

Links

Python Einführung

Zahlen (int & float)

7
0.001
1 + 1
2 * 3
7 / 2
4 / 2
7 // 2
7 % 2
2**3
11**137

Text (str)

"Hello Python"
len("Hello")
"hello".replace('e', 'a').capitalize()
"1,2;3,4;5,6".replace(',', '.')

Text & Zahlen

2 * 3
'2' * 3
'2*3'
int('2') * 3
float('3.5')
'2' + '3'
n = 9
f'file_{n}.txt'
ext = 'dat'
f'file_{n}.{ext}'
'a;b;c'.split(';')

Wertlisten (list)

['a', 'b', 'c']
len(['a', 'b', 'c'])
['a', 'b', 'c'][0]
['a', 'b', 'c'][:2]
['a', 'b', 'c'][2:]
['a', 'b', 'c'][::-1]
'abc'[1]
[1, 2, 3] + ['a', 'b', 'c']
';'.join(['a', 'b', 'c'])
range(5)

Boolesche Variable (True & False)

3 == 2
x = 5
y = 4
x > y
x >= x
x != y
12 > 7
'12' > '7'
'art' in 'Stuttgart'
6 in [5, 7, 2, 8, 4, 10, 1, 3, 9]

Mehr Wertlisten

values = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for value in values:
    print(value)

[2**i for i in values]
[i for i in values if i > 5]
[i for i in values if i % 2 == 0]

Dateien

with open('SchPark01/SchPark_GT_2011_01_01.csv') as csv_file:
    for line in csv_file:
        print(line)

with open('new_file.txt', 'w') as new_file:
    new_file.write("Hello\n")

Verzeichnisse

from pathlib import Path

for csv_path in Path('SchPark01').glob('*.csv'):
    print(csv_path)

for csv_path in Path('SchPark03').glob('*/*/*.csv'):
    print(csv_path)

Function

def f(a, b):
    return a + b

f(2, 3)

Sortieren

sorted([3,1,2])
numbers = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']
sorted(numbers)
sorted(numbers, key=int)

INSEL

Some examples are included in inselpy_examples

# pip install insel
import insel
insel.block('pi')
insel.block('sum', 2, 3)
insel.block('do', parameters=[1, 10, 1])
insel.template('a_times_b', a=7, b=3)

Übungen

Ziel

Ein Sensor hat vielen Dateien geliefert (365 pro Jahr). Es wäre schön, aus 365 Tagesdateien eine einzige Jahresdatei automatisch zu erzeugen.

Hier sind die Dateien : python_workshop_examples.zip

Die Skriptdateien sind schon da, sie haben aber nur eine Beschreibung und keinen Inhalt.

01_workshop_example.py

# Im SchPark01 :
# Eine Datei pro Tag
# Ohne Header
# Datum; Zeit; Horizontale Strahlung; Umgebungstemperatur
# YYYY/MM/DD;HH:MM;W/m2;Celsius
# 1 Verzeichnis pro Jahr
#
# -> 1 Datei fuers Jahr

02_workshop_example.py

# Im SchPark02 :
# Eine Datei pro Tag
# Mit Header
# Datum; Zeit; Horizontale Strahlung; Umgebungstemperatur
# YYYY/MM/DD;HH:MM;W/m2;Celsius
# 1 Verzeichnis pro Jahr
#
# -> 1 Datei fuers Jahr

03_workshop_example.py

# Im SchPark03 :
# Eine Datei pro Tag
# Mit Header
# Datum; Zeit; Horizontale Strahlung; Umgebungstemperatur
# YYYY/MM/DD;HH:MM;W/m2;Celsius
# 1 Verzeichnis pro Monat
#
# -> 1 Datei fuers Jahr

04_workshop_example.py

# Im SchPark04 :
# Eine Datei pro Tag
# Mit Header
# Datum, Zeit, Horizontale Strahlung, Umgebungstemperatur
# YYYY/MM/DD,HH:MM,W/m2,Celsius
# 1 Verzeichnis pro Monat
#
# -> 1 Datei fuers Jahr

05_workshop_example.py

# Im SchPark05 :
# Eine Datei pro Tag
# Mit Header
# Datum; Zeit; Horizontale Strahlung; Umgebungstemperatur
# YYYY/MM/DD;HH:MM;W/m2;Celsius
# 1 Verzeichnis pro Monat
#
# -> 1 Datei fuers Jahr (Excel .CSV)

06_workshop_example.py

# Im SchPark06 :
# Eine Datei pro Tag
# Mit Header
# Zeit; Horizontale Strahlung; Umgebungstemperatur
# HH:MM;W/m2;Celsius
# 1 Verzeichnis pro Monat
#
# -> 1 Datei fuers Jahr

07_workshop_example.py

# Im SchPark07 :
# Eine Datei pro Tag (Tag_Monat_Jahr.csv)
# Mit Header
# Zeit; Horizontale Strahlung; Umgebungstemperatur
# HH:MM;W/m2;Celsius
# 1 Verzeichnis pro Jahr
#
# -> 1 Datei fuers Jahr

08_workshop_example.py

# Im SchPark08 :
# Eine Datei pro Tag (Tag_Monat_Jahr.csv)
# Mit Header
# Zeit; Horizontale Strahlung; Umgebungstemperatur
# HH:MM;W/m2;Celsius
# 1 Verzeichnis fuer 2010 & 2011
#
# -> 1 Datei pro Jahr

Wichtige Libraries

Numpy

Array

import numpy as np

x = np.arange(10)
x + 1
(x + 1)**2
np.sin(x)
x > 3

2-D Arrays

table = np.arange(50).reshape(10,5)
table**2

Matrix

a = np.mat([[4, 3], [2, 1]])
b = np.mat([[1, 2], [3, 4]])
a * b

Matplotlib

Plot

import matplotlib.pyplot as plt
import numpy as np

t = np.linspace(0, 2*np.pi, 500)
plt.plot(t, np.sin(t))
plt.show()

Sankey

import matplotlib.pyplot as plt
from matplotlib.sankey import Sankey

s = Sankey()
s.add(flows=[0.7, 0.3, -0.5, -0.5],
      labels=['a', 'b', 'c', 'd'],
      orientations=[1, 1, -1, 0])
s.finish()
plt.show()

Pandas

import pandas as pd
df = pd.read_csv('SchPark01.csv',
         sep=';',
         header = None,
         names = ['date', 'time', 'Gh', 'Ta'],
         parse_dates = [[0, 1]],
         skipinitialspace=True,
         index_col = 0
       )
df.Gh['2011-07-07 12:30']
df.Ta.mean()
df.plot()

Sympy

import sympy
from sympy.solvers import solve
x = sympy.Symbol('x')
solve(sympy.Eq(x**2, x + 1), x)
sympy.expand(x * (x + 1) * (x + 3))

Optimize

from scipy.optimize import minimize

def f(x):
    return (x[0] + 2)**2 + (x[1] - 3)**2

minimize(f, [0, 0])

Uncertainties

# pip install uncertainties
from uncertainties import ufloat, umath
x = ufloat(39.5, 0.5)
x**2
umath.log(x)
y = x + 2
y
y - x

Many others

pvlib (Photovoltaics)
NetworkX (Graph theory)
scikit-learn, TensorFlow and keras (machine learning)
TensorFlow playground
Stable Diffusion (Image generation from text)
Kivy GUI apps for desktop & smartphones
django or flask (Web services)
BeautifulSoup (HTML/XML parser)
PyGame (Video Games)
missingno (Visualization of missing data)
Python for ArcGIS, TRNSYS, DAYSIM, Blender, ...

Python for HfT

Lineare Gleichungssysteme Lösen

import numpy as np
a = np.array([[1,2], [3,2]])
b = np.array([19, 29])
# 1*x0 + 2*x1 = 19
# 3*x0 + 2*x1 = 29
x = np.linalg.solve(a, b)
np.dot(a, x)
np.allclose(np.dot(a,x),b)

Mit komplexen Zahlen rechnen

1 + 2j
complex(1, 2)
z = 1 + 2j
abs(z)
z.real
z.imag
z**3
import cmath
cmath.sin(z)
cmath.exp(z)
cmath.rect(1, cmath.pi/3)

Mehrdimensionale Matrizen

import numpy as np

m = np.arange(12)
m = m.reshape(2,3,2)
m[1]
m[1][2]
m[1][2][0]
m[:,2,:]

def f(i,j,k):
    return (i + 3*j + 5*k)

np.fromfunction(f, (2,2,2))

Daten in 2D darstellen

import matplotlib.pyplot as plt

t = np.linspace(0, 2*np.pi, 500)
plt.plot(t, np.sin(t))
plt.show()

Daten in 3D darstellen

from mpl_toolkits.mplot3d import Axes3D # Needed for 3d plots

ax = plt.axes(projection='3d')
z = np.linspace(0, 1, 100)
x = z * np.sin(20 * z)
y = z * np.cos(20 * z)
ax.scatter(x, y, z, c = x+y)
plt.show()

Animationen (movies)

Tools → Preferences → Ipython Console → Graphics → Graphics Backend → Backend: “automatic”

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation

fig, ax = plt.subplots()

x = np.arange(0, 2*np.pi, 0.01)
line, = ax.plot(x, np.sin(x))

def animate(i):
    line.set_ydata(np.sin(x + 2*i*np.pi/100.0) *np.cos(2*i*np.pi/200))  # update the data
    return line,

def init():
    line.set_ydata(np.ma.array(x, mask=True))
    return line,

ani = animation.FuncAnimation(fig, animate, np.arange(1, 200), init_func=init,
              interval=25, blit=True)
plt.show()

Fourier Transformation

import numpy as np
import matplotlib.pyplot as plt

N = 1000
T = 0.01
x = np.linspace(0.0, N*T, N)
y = np.where(abs(x)<=0.5, 1, 0) # Rectangular function
yf = np.fft.fft(y)
xf = np.linspace(0.0, 1.0/(2.0*T), N//2)

fig, ax = plt.subplots()
ax.plot(xf, 2.0/N * yf[:N//2])
plt.show()

Ab- und Aufleiten, Nullstellen

from sympy import *
init_printing() # for pretty printing
x,a  = symbols('x a')
f = sin(sqrt((exp(x)+a)/2))
diff(f,x)
integrate(1/(1+x**2),x)
solve(f,x)
f.subs(x,log(-a))

Multi-plots

import numpy as np
import matplotlib.pyplot as plt

N = 5
x = np.linspace(0, 2 * np.pi, 400)
fig, subplots = plt.subplots(N, N, sharex='col', sharey='row')
for (i, j), subplot in np.ndenumerate(subplots):
    subplot.plot(x, i * np.cos(x**2) + j * np.sin(x))

fig.suptitle("i * cos(x**2) + j * sin(x)")
plt.show()

Numpy¶

In [1]:

Copied!

import numpy as np
import numpy as np

In [2]:

Copied!

x = np.arange(10)
x = np.arange(10)

In [3]:

Copied!

x
x

Out[3]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [4]:

Copied!

x + 1
x + 1

Out[4]:

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [5]:

Copied!

(x + 1)**2
(x + 1)**2

Out[5]:

array([  1,   4,   9,  16,  25,  36,  49,  64,  81, 100])

In [6]:

Copied!

np.sin(x)
np.sin(x)

Out[6]:

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

In [7]:

Copied!

np.sin(x).dtype
np.sin(x).dtype

Out[7]:

dtype('float64')

In [8]:

Copied!

x > 3
x > 3

Out[8]:

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

In [9]:

Copied!

x[:4]
x[:4]

Out[9]:

array([0, 1, 2, 3])

In [10]:

Copied!

x[7:]
x[7:]

Out[10]:

array([7, 8, 9])

In [11]:

Copied!

x[x > 3]
x[x > 3]

Out[11]:

array([4, 5, 6, 7, 8, 9])

In [12]:

Copied!

x[(x > 3) & (x < 7)]
x[(x > 3) & (x < 7)]

Out[12]:

array([4, 5, 6])

In [13]:

Copied!

x[(x > 7) | (x < 3)]
x[(x > 7) | (x < 3)]

Out[13]:

array([0, 1, 2, 8, 9])

In [14]:

Copied!

np.arange(50)
np.arange(50)

Out[14]:

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])

In [15]:

Copied!

np.arange(50).reshape(10,5)
np.arange(50).reshape(10,5)

Out[15]:

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49]])

In [16]:

Copied!

table = _
table = _

In [17]:

Copied!

table
table

Out[17]:

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49]])

In [18]:

Copied!

table**2
table**2

Out[18]:

array([[   0,    1,    4,    9,   16],
       [  25,   36,   49,   64,   81],
       [ 100,  121,  144,  169,  196],
       [ 225,  256,  289,  324,  361],
       [ 400,  441,  484,  529,  576],
       [ 625,  676,  729,  784,  841],
       [ 900,  961, 1024, 1089, 1156],
       [1225, 1296, 1369, 1444, 1521],
       [1600, 1681, 1764, 1849, 1936],
       [2025, 2116, 2209, 2304, 2401]])

In [19]:

Copied!

(table**2)[2:4]
(table**2)[2:4]

Out[19]:

array([[100, 121, 144, 169, 196],
       [225, 256, 289, 324, 361]])

In [20]:

Copied!

(table**2)[:,2:4]
(table**2)[:,2:4]

Out[20]:

array([[   4,    9],
       [  49,   64],
       [ 144,  169],
       [ 289,  324],
       [ 484,  529],
       [ 729,  784],
       [1024, 1089],
       [1369, 1444],
       [1764, 1849],
       [2209, 2304]])

In [21]:

Copied!

(table**2)[5:7, 2:4]
(table**2)[5:7, 2:4]

Out[21]:

array([[ 729,  784],
       [1024, 1089]])

In [22]:

Copied!

table ** 2 == 1089
table ** 2 == 1089

Out[22]:

array([[False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False,  True, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False]])

In [23]:

Copied!

[index for (index, value) in np.ndenumerate(table**2) if value == 1089]
[index for (index, value) in np.ndenumerate(table**2) if value == 1089]

Out[23]:

[(6, 3)]

In [24]:

Copied!

l = list(range(10))
l = list(range(10))

In [25]:

Copied!

l
l

Out[25]:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [26]:

Copied!

[i * 2 for i in l if i < 5]
[i * 2 for i in l if i < 5]

Out[26]:

[0, 2, 4, 6, 8]

In [27]:

Copied!

table**100
table**100

Out[27]:

array([[                   0,                    1,                    0,
        -2984622845537545263,                    0],
       [-3842938066129721103,                    0,  3728452490685454945,
                           0,  6627890308811632801],
       [                   0, -5485011738861510223,                    0,
        -8267457965844590319,                    0],
       [ 6813754833676406721,                    0, -1895149777787118527,
                           0,  8207815121025376913],
       [                   0,  1178643571979107377,                    0,
         5958518545374539809,                    0],
       [-7817535966050405663,                    0,  4157753088978724465,
                           0,  3054346751387081297],
       [                   0,  6365139678740040577,                    0,
        -8877898876394183551,                    0],
       [ 8170176069297290577,                    0, -1901415581956121743,
                           0,  2024094702548431329],
       [                   0,  6476859561917718817,                    0,
         5633018509028505393,                    0],
       [ 3548161959473065873,                    0, -2387541571489615039,
                           0,  1459632558914132161]])

In [28]:

Copied!

a = np.mat('4 3; 2 1')
a = np.mat('4 3; 2 1')

In [29]:

Copied!

b = np.mat('1 2; 3 4')
b = np.mat('1 2; 3 4')

In [30]:

Copied!

a**2
a**2

Out[30]:

matrix([[22, 15],
        [10,  7]])

In [31]:

Copied!

a
a

Out[31]:

matrix([[4, 3],
        [2, 1]])

In [32]:

Copied!

a*b
a*b

Out[32]:

matrix([[13, 20],
        [ 5,  8]])

In [33]:

Copied!

(np.arange(4)+1).reshape(2,2)
(np.arange(4)+1).reshape(2,2)

Out[33]:

array([[1, 2],
       [3, 4]])

In [34]:

Copied!

np.mat(_)
np.mat(_)

Out[34]:

matrix([[1, 2],
        [3, 4]])

In [1]:

Copied!

import pandas as pd
import pandas as pd

Parse CSV, trial & error¶

In [2]:

Copied!

pd.read_csv('output/SchPark01.csv')
pd.read_csv('output/SchPark01.csv')

Out[2]:

	2011/01/01;00:00;0.0;-0.6
0	2011/01/01;00:15;0.0;-0.4
1	2011/01/01;00:30;0.0;-0.5
2	2011/01/01;00:45;0.0;-0.5
3	2011/01/01;01:00;0.0;-0.7
4	2011/01/01;01:15;0.0;-0.6
...	...
35034	2011/12/31;22:45;0.0;7.9
35035	2011/12/31;23:00;0.0;7.9
35036	2011/12/31;23:15;0.0;8.4
35037	2011/12/31;23:30;0.0;8.5
35038	2011/12/31;23:45;0.0;8.1

35039 rows × 1 columns

In [3]:

Copied!

pd.read_csv('output/SchPark01.csv', sep=';')
pd.read_csv('output/SchPark01.csv', sep=';')

Out[3]:

	2011/01/01	00:00	0.0	-0.6
0	2011/01/01	00:15	0.0	-0.4
1	2011/01/01	00:30	0.0	-0.5
2	2011/01/01	00:45	0.0	-0.5
3	2011/01/01	01:00	0.0	-0.7
4	2011/01/01	01:15	0.0	-0.6
...	...	...	...	...
35034	2011/12/31	22:45	0.0	7.9
35035	2011/12/31	23:00	0.0	7.9
35036	2011/12/31	23:15	0.0	8.4
35037	2011/12/31	23:30	0.0	8.5
35038	2011/12/31	23:45	0.0	8.1

35039 rows × 4 columns

In [4]:

Copied!

pd.read_csv('output/SchPark01.csv', sep=';',
            names = ['date', 'time', 'ghi', 'ta'])
pd.read_csv('output/SchPark01.csv', sep=';',
            names = ['date', 'time', 'ghi', 'ta'])

Out[4]:

	date	time	ghi	ta
0	2011/01/01	00:00	0.0	-0.6
1	2011/01/01	00:15	0.0	-0.4
2	2011/01/01	00:30	0.0	-0.5
3	2011/01/01	00:45	0.0	-0.5
4	2011/01/01	01:00	0.0	-0.7
...	...	...	...	...
35035	2011/12/31	22:45	0.0	7.9
35036	2011/12/31	23:00	0.0	7.9
35037	2011/12/31	23:15	0.0	8.4
35038	2011/12/31	23:30	0.0	8.5
35039	2011/12/31	23:45	0.0	8.1

35040 rows × 4 columns

# NOTE: This will not work because of empty values: ghi and ta will be parsed as strings pd.read_csv('output/SchPark01.csv', sep=';', names = ['date', 'time', 'ghi', 'ta']).ghi.mean()

Parse CSV¶

In [5]:

Copied!





df = pd.read_csv('output/SchPark01.csv',
            sep = ';',
            na_values = ' ',
            names = ['date', 'time', 'ghi', 'ta'],
)

# https://stackoverflow.com/a/77983644/6419007
df['datetime'] = pd.to_datetime(df.pop('date')+' '+ df.pop('time'),
                                format="%Y/%m/%d %H:%M")
df = df.set_index('datetime')
df
df = pd.read_csv('output/SchPark01.csv',
            sep = ';',
            na_values = ' ',
            names = ['date', 'time', 'ghi', 'ta'],
)

# https://stackoverflow.com/a/77983644/6419007
df['datetime'] = pd.to_datetime(df.pop('date')+' '+ df.pop('time'),
                                format="%Y/%m/%d %H:%M")
df = df.set_index('datetime')
df

Out[5]:

	ghi	ta
datetime
2011-01-01 00:00:00	0.0	-0.6
2011-01-01 00:15:00	0.0	-0.4
2011-01-01 00:30:00	0.0	-0.5
2011-01-01 00:45:00	0.0	-0.5
2011-01-01 01:00:00	0.0	-0.7
...	...	...
2011-12-31 22:45:00	0.0	7.9
2011-12-31 23:00:00	0.0	7.9
2011-12-31 23:15:00	0.0	8.4
2011-12-31 23:30:00	0.0	8.5
2011-12-31 23:45:00	0.0	8.1

35040 rows × 2 columns

Plots¶

In [6]:

Copied!

import matplotlib.pyplot as plt
import matplotlib.pyplot as plt

In [7]:

Copied!

plt.rcParams['figure.figsize'] = (15, 8)
plt.rcParams['figure.figsize'] = (15, 8)

In [8]:

Copied!

df.plot();
df.plot();

No description has been provided for this image

In [9]:

Copied!

df.resample('ME').mean().plot();
df.resample('ME').mean().plot();

In [10]:

Copied!

import seaborn as sns
import seaborn as sns

In [11]:

Copied!

sns.heatmap(
        pd.pivot_table(df, values='ghi', index=df.index.time, columns=df.index.dayofyear),
        annot=False);
 sns.heatmap(
        pd.pivot_table(df, values='ghi', index=df.index.time, columns=df.index.dayofyear),
        annot=False);

In [12]:

Copied!

sns.heatmap(
        pd.pivot_table(df, values='ta', index=df.index.time, columns=df.index.dayofyear),
        annot=False);
 sns.heatmap(
        pd.pivot_table(df, values='ta', index=df.index.time, columns=df.index.dayofyear),
        annot=False);

In [13]:

Copied!

# https://stackoverflow.com/a/16345735/6419007
# NOTE: it used to be x.time, now it's apparently x.time()
df2 = df.groupby(lambda x: x.time()).ffill()
# https://stackoverflow.com/a/16345735/6419007
# NOTE: it used to be x.time, now it's apparently x.time()
df2 = df.groupby(lambda x: x.time()).ffill()

In [14]:

Copied!

df2
df2

Out[14]:

	ghi	ta
datetime
2011-01-01 00:00:00	0.0	-0.6
2011-01-01 00:15:00	0.0	-0.4
2011-01-01 00:30:00	0.0	-0.5
2011-01-01 00:45:00	0.0	-0.5
2011-01-01 01:00:00	0.0	-0.7
...	...	...
2011-12-31 22:45:00	0.0	7.9
2011-12-31 23:00:00	0.0	7.9
2011-12-31 23:15:00	0.0	8.4
2011-12-31 23:30:00	0.0	8.5
2011-12-31 23:45:00	0.0	8.1

35040 rows × 2 columns

In [15]:

Copied!

sns.heatmap(
        pd.pivot_table(df2, values='ghi', index=df2.index.time, columns=df2.index.dayofyear),
        annot=False);
 sns.heatmap(
        pd.pivot_table(df2, values='ghi', index=df2.index.time, columns=df2.index.dayofyear),
        annot=False);

In [16]:

Copied!

sns.heatmap(
        pd.pivot_table(df2, values='ta', index=df2.index.time, columns=df2.index.dayofyear),
        annot=False);
 sns.heatmap(
        pd.pivot_table(df2, values='ta', index=df2.index.time, columns=df2.index.dayofyear),
        annot=False);

Title, labels, units¶

In [17]:

Copied!





import matplotlib.dates as mdates
month_locator = mdates.MonthLocator(bymonthday=15)

ax = sns.heatmap(
    pd.pivot_table(df2, values='ta', index=df2.index.map(lambda x: x.strftime("%H:%M")),
                   columns=df2.index.dayofyear),
    annot=False,
    cbar_kws={'label': '', 'format': '%.0f °C'}
)

plt.title("Temperature in SchPark")
plt.xlabel("")
plt.ylabel("")

ax.xaxis.set_major_locator(month_locator)
ax.xaxis.set_major_formatter(mdates.DateFormatter("%B"))

plt.show()
import matplotlib.dates as mdates
month_locator = mdates.MonthLocator(bymonthday=15)

ax = sns.heatmap(
    pd.pivot_table(df2, values='ta', index=df2.index.map(lambda x: x.strftime("%H:%M")),
                   columns=df2.index.dayofyear),
    annot=False,
    cbar_kws={'label': '', 'format': '%.0f °C'}
)

plt.title("Temperature in SchPark")
plt.xlabel("")
plt.ylabel("")

ax.xaxis.set_major_locator(month_locator)
ax.xaxis.set_major_formatter(mdates.DateFormatter("%B"))

plt.show()

In [18]:

Copied!

df.ta.mean()
df.ta.mean()

Out[18]:

12.796018797659162

In [19]:

Copied!

df2.ta.mean()
df2.ta.mean()

Out[19]:

12.749155251141554

In [20]:

Copied!

df.sort_values('ghi', ascending=False).ghi.plot(use_index=False, title='Sorted irradiance');
df.sort_values('ghi', ascending=False).ghi.plot(use_index=False, title='Sorted irradiance');

In [21]:

Copied!

df.sort_values('ta', ascending=False).ta.plot(use_index=False, title = 'Sorted temperature');
df.sort_values('ta', ascending=False).ta.plot(use_index=False, title = 'Sorted temperature');

In [22]:

Copied!

df.plot(x='ghi', y='ta', xlabel='GHI [W/m²]', ylabel='Temperature', kind='scatter');
df.plot(x='ghi', y='ta', xlabel='GHI [W/m²]', ylabel='Temperature', kind='scatter');

Warmest day¶

In [23]:

Copied!

max_temp = df2.ta.max()
max_temp
max_temp = df2.ta.max()
max_temp

Out[23]:

38.3

In [24]:

Copied!

warmest_date = df2[df2.ta == df2.ta.max()].index.date[0]
warmest_date
warmest_date = df2[df2.ta == df2.ta.max()].index.date[0]
warmest_date

Out[24]:

datetime.date(2011, 8, 23)

In [25]:

Copied!

warmest_day = df2[df2.index.date == warmest_date]
warmest_day
warmest_day = df2[df2.index.date == warmest_date]
warmest_day

Out[25]:

	ghi	ta
datetime
2011-08-23 00:00:00	0.0	27.9
2011-08-23 00:15:00	0.0	27.7
2011-08-23 00:30:00	0.0	27.1
2011-08-23 00:45:00	0.0	26.7
2011-08-23 01:00:00	0.0	27.0
...	...	...
2011-08-23 22:45:00	0.0	28.2
2011-08-23 23:00:00	0.0	28.1
2011-08-23 23:15:00	0.0	28.0
2011-08-23 23:30:00	0.0	28.3
2011-08-23 23:45:00	0.0	28.4

96 rows × 2 columns

In [26]:

Copied!

warmest_day.ghi.plot(title='Irradiance during warmest day in SchPark, 2011');
warmest_day.ghi.plot(title='Irradiance during warmest day in SchPark, 2011');

In [27]:

Copied!

warmest_day.ta.plot(title='Temperature during warmest day in SchPark, 2011');
warmest_day.ta.plot(title='Temperature during warmest day in SchPark, 2011');

Montly temperature ridge lines¶

In [28]:

Copied!

df2.ta[df2.index.month==7].plot.hist(bins=50, title='Temperature distribution in July [°C]');
df2.ta[df2.index.month==7].plot.hist(bins=50, title='Temperature distribution in July [°C]');

In [29]:

Copied!





# getting necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

def ridge_lines(weather, column_name, title, xaxis):
    df = weather.copy()
    sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})

    # we define a dictionary with months that we'll use later
    month_dict = {1: 'january',
                  2: 'february',
                  3: 'march',
                  4: 'april',
                  5: 'may',
                  6: 'june',
                  7: 'july',
                  8: 'august',
                  9: 'september',
                  10: 'october',
                  11: 'november',
                  12: 'december'}

    df['month'] = df.index.month.map(month_dict)

    month_mean_serie = df.groupby('month')[column_name].mean()
    df['mean_month'] = df['month'].map(month_mean_serie)
    
    # we generate a color palette with Seaborn.color_palette()
    pal = sns.color_palette(palette='coolwarm', n_colors=12)

    # in the sns.FacetGrid class, the 'hue' argument is the one that is the one that will be represented by colors with 'palette'
    g = sns.FacetGrid(df, row='month', hue='mean_month', aspect=15, height=0.75, palette=pal)

    # then we add the densities kdeplots for each month
    g.map(sns.kdeplot, column_name,
          bw_adjust=1, clip_on=False,
          fill=True, alpha=1, linewidth=1.5)

    # here we add a white line that represents the contour of each kdeplot
    g.map(sns.kdeplot, column_name, 
          bw_adjust=1, clip_on=False, 
          color="w", lw=2)

    # here we add a horizontal line for each plot
    g.map(plt.axhline, y=0,
          lw=2, clip_on=False)

    # we loop over the FacetGrid figure axes (g.axes.flat) and add the month as text with the right color
    # notice how ax.lines[-1].get_color() enables you to access the last line's color in each matplotlib.Axes
    for i, ax in enumerate(g.axes.flat):
        ax.text(-15, 0.02, month_dict[i+1],
                fontweight='bold', fontsize=15,
                color=ax.lines[-1].get_color())

    # we use matplotlib.Figure.subplots_adjust() function to get the subplots to overlap
    g.fig.subplots_adjust(hspace=-0.3)

    # eventually we remove axes titles, yticks and spines
    g.set_titles("")
    g.set_ylabels("")
    g.set(yticks=[])
    g.despine(bottom=True, left=True)

    plt.setp(ax.get_xticklabels(), fontsize=15, fontweight='bold')
    plt.xlabel(xaxis, fontweight='bold', fontsize=15)
    g.fig.suptitle(title,
                   ha='right',
                   fontsize=20,
                   fontweight=20)

    plt.show()
# getting necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

def ridge_lines(weather, column_name, title, xaxis):
    df = weather.copy()
    sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})

    # we define a dictionary with months that we'll use later
    month_dict = {1: 'january',
                  2: 'february',
                  3: 'march',
                  4: 'april',
                  5: 'may',
                  6: 'june',
                  7: 'july',
                  8: 'august',
                  9: 'september',
                  10: 'october',
                  11: 'november',
                  12: 'december'}

    df['month'] = df.index.month.map(month_dict)

    month_mean_serie = df.groupby('month')[column_name].mean()
    df['mean_month'] = df['month'].map(month_mean_serie)
    
    # we generate a color palette with Seaborn.color_palette()
    pal = sns.color_palette(palette='coolwarm', n_colors=12)

    # in the sns.FacetGrid class, the 'hue' argument is the one that is the one that will be represented by colors with 'palette'
    g = sns.FacetGrid(df, row='month', hue='mean_month', aspect=15, height=0.75, palette=pal)

    # then we add the densities kdeplots for each month
    g.map(sns.kdeplot, column_name,
          bw_adjust=1, clip_on=False,
          fill=True, alpha=1, linewidth=1.5)

    # here we add a white line that represents the contour of each kdeplot
    g.map(sns.kdeplot, column_name, 
          bw_adjust=1, clip_on=False, 
          color="w", lw=2)

    # here we add a horizontal line for each plot
    g.map(plt.axhline, y=0,
          lw=2, clip_on=False)

    # we loop over the FacetGrid figure axes (g.axes.flat) and add the month as text with the right color
    # notice how ax.lines[-1].get_color() enables you to access the last line's color in each matplotlib.Axes
    for i, ax in enumerate(g.axes.flat):
        ax.text(-15, 0.02, month_dict[i+1],
                fontweight='bold', fontsize=15,
                color=ax.lines[-1].get_color())

    # we use matplotlib.Figure.subplots_adjust() function to get the subplots to overlap
    g.fig.subplots_adjust(hspace=-0.3)

    # eventually we remove axes titles, yticks and spines
    g.set_titles("")
    g.set_ylabels("")
    g.set(yticks=[])
    g.despine(bottom=True, left=True)

    plt.setp(ax.get_xticklabels(), fontsize=15, fontweight='bold')
    plt.xlabel(xaxis, fontweight='bold', fontsize=15)
    g.fig.suptitle(title,
                   ha='right',
                   fontsize=20,
                   fontweight=20)

    plt.show()

In [30]:

Copied!

ridge_lines(df, 'ta', 'Temperature distribution in Scharnhauser Park (2011)', 'Temperature in degree Celsius')
ridge_lines(df, 'ta', 'Temperature distribution in Scharnhauser Park (2011)', 'Temperature in degree Celsius')

pvlib tutorial¶

Install with conda install -c pvlib pvlib

This page contains introductory examples of pvlib python usage. It is based on the Object Oriented code from https://pvlib-python.readthedocs.io/en/stable/introtutorial.html

The goal is to simulate a small PV system in different locations, and try to predict how much energy it could produce.

The code should be as concise as possible, while still delivering plausible results and taking weather into account.

Uploaded to https://gist.github.com/EricDuminil/f646d406967fe965190d2d3fa58df618 Pinged to https://stackoverflow.com/questions/57682450/importing-non-tmy3-format-weather-data-for-use-in-pvlib-simulation/57826625#57826625

Module import¶

In [1]:

Copied!





from pvlib.pvsystem import PVSystem, retrieve_sam
from pvlib.location import Location
from pvlib.modelchain import ModelChain
from pvlib.temperature import TEMPERATURE_MODEL_PARAMETERS
from pvlib.iotools import pvgis
from pvlib.pvsystem import PVSystem, retrieve_sam
from pvlib.location import Location
from pvlib.modelchain import ModelChain
from pvlib.temperature import TEMPERATURE_MODEL_PARAMETERS
from pvlib.iotools import pvgis

In [2]:

Copied!





# Not required, but recommended.
# It avoids downloading same data over and over again from PVGIS.
# https://pypi.org/project/requests-cache/
# pip install requests-cache
import requests_cache
requests_cache.install_cache('pvgis_requests_cache', backend='sqlite')
# Not required, but recommended.
# It avoids downloading same data over and over again from PVGIS.
# https://pypi.org/project/requests-cache/
# pip install requests-cache
import requests_cache
requests_cache.install_cache('pvgis_requests_cache', backend='sqlite')

Locations, Module & Inverter¶

In [3]:

Copied!





#               latitude, longitude,  name                      , altitude, timezone
coordinates = [( 32.2   ,    -111.0, 'Tucson, Arizona'          ,      700, 'Etc/GMT+7'),
               ( 35.1   ,    -106.6, 'Albuquerque, New Mexico'  ,     1500, 'Etc/GMT+7'),
               ( 37.8   ,    -122.4, 'San Francisco, California',       10, 'Etc/GMT+8'),
               ( 52.5   ,      13.4, 'Berlin, Germany'          ,       34, 'Etc/GMT-1'),
               (-20.9   ,      55.5, 'St-Denis, La Réunion'     ,      100, 'Etc/GMT-4')]

# Get the module and inverter specifications from SAM (https://github.com/NREL/SAM)
module = retrieve_sam('SandiaMod')['Canadian_Solar_CS5P_220M___2009_']
inverter = retrieve_sam('cecinverter')['ABB__MICRO_0_25_I_OUTD_US_208__208V_']

temp_parameters = TEMPERATURE_MODEL_PARAMETERS['sapm']['open_rack_glass_glass']
#               latitude, longitude,  name                      , altitude, timezone
coordinates = [( 32.2   ,    -111.0, 'Tucson, Arizona'          ,      700, 'Etc/GMT+7'),
               ( 35.1   ,    -106.6, 'Albuquerque, New Mexico'  ,     1500, 'Etc/GMT+7'),
               ( 37.8   ,    -122.4, 'San Francisco, California',       10, 'Etc/GMT+8'),
               ( 52.5   ,      13.4, 'Berlin, Germany'          ,       34, 'Etc/GMT-1'),
               (-20.9   ,      55.5, 'St-Denis, La Réunion'     ,      100, 'Etc/GMT-4')]

# Get the module and inverter specifications from SAM (https://github.com/NREL/SAM)
module = retrieve_sam('SandiaMod')['Canadian_Solar_CS5P_220M___2009_']
inverter = retrieve_sam('cecinverter')['ABB__MICRO_0_25_I_OUTD_US_208__208V_']

temp_parameters = TEMPERATURE_MODEL_PARAMETERS['sapm']['open_rack_glass_glass']

Simulation¶

In [4]:

Copied!





for latitude, longitude, name, altitude, timezone in coordinates:
    location = Location(latitude, longitude, name=name,
                        altitude=altitude, tz=timezone)

    # Download weather data from PVGIS server
    weather, _, info, _ = pvgis.get_pvgis_tmy(location.latitude,
                                              location.longitude)

    # Rename columns from PVGIS TMY in order to define the required data.
    weather = weather.rename(columns={'G(h)': 'ghi',
                                      'Gb(n)': 'dni',
                                      'Gd(h)': 'dhi',
                                      'T2m': 'temp_air'
                                      })
    
    # Same logic as orientation_strategy='south_at_latitude_tilt', but might be
    # a bit clearer for locations in southern hemishpere.
    system = PVSystem(module_parameters=module,
                      inverter_parameters=inverter,
                      temperature_model_parameters=temp_parameters,
                      surface_tilt=abs(latitude),
                      surface_azimuth=180 if latitude > 0 else 0)
    mc = ModelChain(system, location)
    mc.run_model(weather)
    
    mount = system.arrays[0].mount

    # Reporting
    nominal_power = module.Impo * module.Vmpo
    annual_energy = mc.results.ac.sum()
    specific_yield = annual_energy / nominal_power
    global_poa = mc.results.total_irrad.poa_global.sum() / 1000
    average_ambient_temperature = weather.temp_air.mean()
    performance_ratio = specific_yield / global_poa
    weather_source = '%s (%d - %d)' % (info['meteo_data']['radiation_db'],
                                       info['meteo_data']['year_min'],
                                       info['meteo_data']['year_max'])
    latitude_NS = '%.1f°%s' % (abs(latitude), 'N' if latitude > 0 else 'S')
    longitude_EW = '%.1f°%s' % (abs(longitude), 'E' if longitude > 0 else 'W')

    print('## %s (%s %s, %s)' % (name, latitude_NS, longitude_EW, timezone))
    print('Nominal power         : %.2f kWp' % (nominal_power / 1000))
    print('Surface azimuth       : %.0f °' % mount.surface_azimuth)
    print('Surface tilt          : %.0f °' % mount.surface_tilt)
    print('Weather data source   : %s' % weather_source)
    print('Global POA irradiance : %.0f kWh / (m² · y)' % global_poa)
    print('Average temperature   : %.1f °C' % average_ambient_temperature)
    print('Total yield           : %.0f kWh / y' % (annual_energy / 1000))
    print('Specific yield        : %.0f kWh / (kWp · y)' % specific_yield)
    print('Performance ratio     : %.1f %%' % (performance_ratio * 100))
    print()
for latitude, longitude, name, altitude, timezone in coordinates:
    location = Location(latitude, longitude, name=name,
                        altitude=altitude, tz=timezone)

    # Download weather data from PVGIS server
    weather, _, info, _ = pvgis.get_pvgis_tmy(location.latitude,
                                              location.longitude)

    # Rename columns from PVGIS TMY in order to define the required data.
    weather = weather.rename(columns={'G(h)': 'ghi',
                                      'Gb(n)': 'dni',
                                      'Gd(h)': 'dhi',
                                      'T2m': 'temp_air'
                                      })
    
    # Same logic as orientation_strategy='south_at_latitude_tilt', but might be
    # a bit clearer for locations in southern hemishpere.
    system = PVSystem(module_parameters=module,
                      inverter_parameters=inverter,
                      temperature_model_parameters=temp_parameters,
                      surface_tilt=abs(latitude),
                      surface_azimuth=180 if latitude > 0 else 0)
    mc = ModelChain(system, location)
    mc.run_model(weather)
    
    mount = system.arrays[0].mount

    # Reporting
    nominal_power = module.Impo * module.Vmpo
    annual_energy = mc.results.ac.sum()
    specific_yield = annual_energy / nominal_power
    global_poa = mc.results.total_irrad.poa_global.sum() / 1000
    average_ambient_temperature = weather.temp_air.mean()
    performance_ratio = specific_yield / global_poa
    weather_source = '%s (%d - %d)' % (info['meteo_data']['radiation_db'],
                                       info['meteo_data']['year_min'],
                                       info['meteo_data']['year_max'])
    latitude_NS = '%.1f°%s' % (abs(latitude), 'N' if latitude > 0 else 'S')
    longitude_EW = '%.1f°%s' % (abs(longitude), 'E' if longitude > 0 else 'W')

    print('## %s (%s %s, %s)' % (name, latitude_NS, longitude_EW, timezone))
    print('Nominal power         : %.2f kWp' % (nominal_power / 1000))
    print('Surface azimuth       : %.0f °' % mount.surface_azimuth)
    print('Surface tilt          : %.0f °' % mount.surface_tilt)
    print('Weather data source   : %s' % weather_source)
    print('Global POA irradiance : %.0f kWh / (m² · y)' % global_poa)
    print('Average temperature   : %.1f °C' % average_ambient_temperature)
    print('Total yield           : %.0f kWh / y' % (annual_energy / 1000))
    print('Specific yield        : %.0f kWh / (kWp · y)' % specific_yield)
    print('Performance ratio     : %.1f %%' % (performance_ratio * 100))
    print()

/home/ricou/anaconda3/lib/python3.9/site-packages/pvlib/iotools/pvgis.py:477: pvlibDeprecationWarning: PVGIS variable names will be renamed to pvlib conventions by default starting in pvlib 0.10.0. Specify map_variables=True to enable that behavior now, or specify map_variables=False to hide this warning.
  warnings.warn(

## Tucson, Arizona (32.2°N 111.0°W, Etc/GMT+7)
Nominal power         : 0.22 kWp
Surface azimuth       : 180 °
Surface tilt          : 32 °
Weather data source   : PVGIS-NSRDB (2005 - 2015)
Global POA irradiance : 2405 kWh / (m² · y)
Average temperature   : 21.4 °C
Total yield           : 425 kWh / y
Specific yield        : 1936 kWh / (kWp · y)
Performance ratio     : 80.5 %

## Albuquerque, New Mexico (35.1°N 106.6°W, Etc/GMT+7)
Nominal power         : 0.22 kWp
Surface azimuth       : 180 °
Surface tilt          : 35 °
Weather data source   : PVGIS-NSRDB (2005 - 2015)
Global POA irradiance : 2390 kWh / (m² · y)
Average temperature   : 14.5 °C
Total yield           : 439 kWh / y
Specific yield        : 2001 kWh / (kWp · y)
Performance ratio     : 83.7 %

## San Francisco, California (37.8°N 122.4°W, Etc/GMT+8)
Nominal power         : 0.22 kWp
Surface azimuth       : 180 °
Surface tilt          : 38 °
Weather data source   : PVGIS-NSRDB (2005 - 2015)
Global POA irradiance : 2009 kWh / (m² · y)
Average temperature   : 12.8 °C
Total yield           : 383 kWh / y
Specific yield        : 1746 kWh / (kWp · y)
Performance ratio     : 86.9 %

## Berlin, Germany (52.5°N 13.4°E, Etc/GMT-1)
Nominal power         : 0.22 kWp
Surface azimuth       : 180 °
Surface tilt          : 52 °
Weather data source   : PVGIS-SARAH (2005 - 2016)
Global POA irradiance : 1254 kWh / (m² · y)
Average temperature   : 10.4 °C
Total yield           : 239 kWh / y
Specific yield        : 1088 kWh / (kWp · y)
Performance ratio     : 86.7 %

## St-Denis, La Réunion (20.9°S 55.5°E, Etc/GMT-4)
Nominal power         : 0.22 kWp
Surface azimuth       : 0 °
Surface tilt          : 21 °
Weather data source   : PVGIS-SARAH (2005 - 2016)
Global POA irradiance : 2141 kWh / (m² · y)
Average temperature   : 18.1 °C
Total yield           : 396 kWh / y
Specific yield        : 1802 kWh / (kWp · y)
Performance ratio     : 84.2 %

Detailed pvlib report¶

In [1]:

Copied!





LATITUDE = 48.77
LONGITUDE = 9.18
LOCATION = 'Stuttgart'
TIMEZONE = 'Etc/GMT-1'
ALTITUDE = 400
ALBEDO = 0.2 # Standard is 0.25. Why?

# -20.9   ,      55.5, 'St-Denis, La Réunion'     ,      100, 'Etc/GMT-4')
LATITUDE = 48.77
LONGITUDE = 9.18
LOCATION = 'Stuttgart'
TIMEZONE = 'Etc/GMT-1'
ALTITUDE = 400
ALBEDO = 0.2 # Standard is 0.25. Why?

# -20.9   ,      55.5, 'St-Denis, La Réunion'     ,      100, 'Etc/GMT-4')

In [2]:

Copied!

AZIMUTH = 180
TILT = 25
AZIMUTH = 180
TILT = 25

In [3]:

Copied!

#TODO: Get timezone automatically
#TODO: Add requirements.txt
#TODO: Define functions each time, with only the strictly required parameters
#TODO: Get timezone automatically
#TODO: Add requirements.txt
#TODO: Define functions each time, with only the strictly required parameters

Enable caching¶

pip install requests-cache

In [4]:

Copied!

# Not required. Avoids downloading same data over and over again:
import requests_cache
requests_cache.install_cache('pvgis_requests_cache', backend='sqlite')
# Not required. Avoids downloading same data over and over again:
import requests_cache
requests_cache.install_cache('pvgis_requests_cache', backend='sqlite')

Get weather¶

pip install pvlib

In [5]:

Copied!

from pvlib.iotools import pvgis
from pvlib.iotools import pvgis

In [6]:

Copied!

weather, _, info, _ = pvgis.get_pvgis_tmy(LATITUDE, LONGITUDE, map_variables=True)
weather, _, info, _ = pvgis.get_pvgis_tmy(LATITUDE, LONGITUDE, map_variables=True)

In [7]:

Copied!





weather_source = '%s (%d - %d)' % (info['meteo_data']['radiation_db'],
                                   info['meteo_data']['year_min'],
                                   info['meteo_data']['year_max'])
latitude_NS = '%.1f°%s' % (abs(LATITUDE), 'N' if LATITUDE > 0 else 'S')
longitude_EW = '%.1f°%s' % (abs(LONGITUDE), 'E' if LONGITUDE > 0 else 'W')
weather_source = '%s (%d - %d)' % (info['meteo_data']['radiation_db'],
                                   info['meteo_data']['year_min'],
                                   info['meteo_data']['year_max'])
latitude_NS = '%.1f°%s' % (abs(LATITUDE), 'N' if LATITUDE > 0 else 'S')
longitude_EW = '%.1f°%s' % (abs(LONGITUDE), 'E' if LONGITUDE > 0 else 'W')

In [8]:

Copied!





# Rename columns from PVGIS TMY in order to define the required data.
weather = weather.rename(columns={'G(h)': 'ghi',
                                  'Gb(n)': 'dni',
                                  'Gd(h)': 'dhi',
                                  'T2m': 'temp_air',
                                  'WS10m': 'wind_speed' # Does it make sense to use wind speed from 10m height?
                                  })
weather
# Rename columns from PVGIS TMY in order to define the required data.
weather = weather.rename(columns={'G(h)': 'ghi',
                                  'Gb(n)': 'dni',
                                  'Gd(h)': 'dhi',
                                  'T2m': 'temp_air',
                                  'WS10m': 'wind_speed' # Does it make sense to use wind speed from 10m height?
                                  })
weather

Out[8]:

	temp_air	relative_humidity	ghi	dni	dhi	IR(h)	wind_speed	wind_direction	pressure
time(UTC)
2016-01-01 00:00:00+00:00	2.70	96.70	0.0	0.0	0.0	292.75	1.06	219.0	99358.0
2016-01-01 01:00:00+00:00	3.26	97.01	0.0	0.0	0.0	299.49	1.05	228.0	99374.0
2016-01-01 02:00:00+00:00	3.83	97.32	0.0	0.0	0.0	306.23	1.03	238.0	99390.0
2016-01-01 03:00:00+00:00	4.39	97.62	0.0	0.0	0.0	312.97	1.01	222.0	99383.0
2016-01-01 04:00:00+00:00	4.96	97.93	0.0	0.0	0.0	319.71	0.99	207.0	99377.0
...	...	...	...	...	...	...	...	...	...
2007-12-31 19:00:00+00:00	-0.12	95.16	0.0	0.0	0.0	259.05	1.16	248.0	99586.0
2007-12-31 20:00:00+00:00	0.45	95.47	0.0	0.0	0.0	265.79	1.14	246.0	99574.0
2007-12-31 21:00:00+00:00	1.01	95.78	0.0	0.0	0.0	272.53	1.12	248.0	99561.0
2007-12-31 22:00:00+00:00	1.57	96.09	0.0	0.0	0.0	279.27	1.10	249.0	99548.0
2007-12-31 23:00:00+00:00	2.14	96.39	0.0	0.0	0.0	286.01	1.08	251.0	99535.0

8760 rows × 9 columns

In [9]:

Copied!

# Force all dates to be from the same year
COERCE_YEAR = 2019
weather.index = weather.index.map(lambda dt: dt.replace(year=COERCE_YEAR))
# Force all dates to be from the same year
COERCE_YEAR = 2019
weather.index = weather.index.map(lambda dt: dt.replace(year=COERCE_YEAR))

Check and display weather¶

In [10]:

Copied!

import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
plt.rcParams['figure.figsize'] = [15, 10]
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
plt.rcParams['figure.figsize'] = [15, 10]

Ambient temperature¶

In [11]:

Copied!

weather.temp_air.plot(title='Ambient temperature in %s\n%s' % (LOCATION, weather_source), color='#603a47')
plt.gca().yaxis.set_major_formatter(mticker.FormatStrFormatter('%d °C'))
weather.temp_air.plot(title='Ambient temperature in %s\n%s' % (LOCATION, weather_source), color='#603a47')
plt.gca().yaxis.set_major_formatter(mticker.FormatStrFormatter('%d °C'))

In [12]:

Copied!





print("Average temperature in %s : %.1f °C" % (LOCATION, weather.temp_air.mean()))
daily_temperatures = weather.temp_air.resample('D').mean()
print("Coldest day in %s         : %.1f °C" % (LOCATION, daily_temperatures.min()))
print("Warmest day in %s         : %.1f °C" % (LOCATION, daily_temperatures.max()))
print("Average temperature in %s : %.1f °C" % (LOCATION, weather.temp_air.mean()))
daily_temperatures = weather.temp_air.resample('D').mean()
print("Coldest day in %s         : %.1f °C" % (LOCATION, daily_temperatures.min()))
print("Warmest day in %s         : %.1f °C" % (LOCATION, daily_temperatures.max()))

Average temperature in Stuttgart : 11.7 °C
Coldest day in Stuttgart         : -6.7 °C
Warmest day in Stuttgart         : 26.8 °C

In [13]:

Copied!





plt.figure(figsize=(15, 8))
plt.imshow(weather.temp_air.values.reshape(-1,24).T,
           aspect='auto',
           origin='lower', cmap='inferno')
plt.title('Ambient temperature in %s\n%s' % (LOCATION, weather_source))
plt.xlabel('Day of the year')
plt.ylabel('Hour')
plt.gca().yaxis.set_major_formatter(mticker.FormatStrFormatter('%d h'))
plt.colorbar();
plt.figure(figsize=(15, 8))
plt.imshow(weather.temp_air.values.reshape(-1,24).T,
           aspect='auto',
           origin='lower', cmap='inferno')
plt.title('Ambient temperature in %s\n%s' % (LOCATION, weather_source))
plt.xlabel('Day of the year')
plt.ylabel('Hour')
plt.gca().yaxis.set_major_formatter(mticker.FormatStrFormatter('%d h'))
plt.colorbar();

Define system¶

In [14]:

Copied!





from pvlib.pvsystem import PVSystem, retrieve_sam
from pvlib.location import Location
from pvlib.modelchain import ModelChain
from pvlib.temperature import TEMPERATURE_MODEL_PARAMETERS
# Get the module and inverter specifications from SAM
module = retrieve_sam('SandiaMod')['Canadian_Solar_CS5P_220M___2009_']
inverter = retrieve_sam('cecinverter')['ABB__MICRO_0_25_I_OUTD_US_208__208V_']
temp_parameters = TEMPERATURE_MODEL_PARAMETERS['sapm']['open_rack_glass_glass']

location = Location(LATITUDE, LONGITUDE, name=LOCATION,
                    altitude=ALTITUDE, tz=TIMEZONE)

system = PVSystem(module_parameters=module,
                  inverter_parameters=inverter,
                  temperature_model_parameters=temp_parameters,
                  surface_tilt=TILT,
                  surface_azimuth=AZIMUTH,
                  albedo = ALBEDO
                 )
mc = ModelChain(system, location, transposition_model='haydavies')
results = mc.run_model(weather)
from pvlib.pvsystem import PVSystem, retrieve_sam
from pvlib.location import Location
from pvlib.modelchain import ModelChain
from pvlib.temperature import TEMPERATURE_MODEL_PARAMETERS
# Get the module and inverter specifications from SAM
module = retrieve_sam('SandiaMod')['Canadian_Solar_CS5P_220M___2009_']
inverter = retrieve_sam('cecinverter')['ABB__MICRO_0_25_I_OUTD_US_208__208V_']
temp_parameters = TEMPERATURE_MODEL_PARAMETERS['sapm']['open_rack_glass_glass']

location = Location(LATITUDE, LONGITUDE, name=LOCATION,
                    altitude=ALTITUDE, tz=TIMEZONE)

system = PVSystem(module_parameters=module,
                  inverter_parameters=inverter,
                  temperature_model_parameters=temp_parameters,
                  surface_tilt=TILT,
                  surface_azimuth=AZIMUTH,
                  albedo = ALBEDO
                 )
mc = ModelChain(system, location, transposition_model='haydavies')
results = mc.run_model(weather)

Global horizontal irradiance¶

In [15]:

Copied!

irradiances = weather.ghi.resample('M').mean().to_frame()
irradiances['poa'] = mc.results.total_irrad.poa_global.resample('M').mean()
irradiances = weather.ghi.resample('M').mean().to_frame()
irradiances['poa'] = mc.results.total_irrad.poa_global.resample('M').mean()

In [16]:

Copied!

irradiances.index = irradiances.index.month_name()
irradiances.index = irradiances.index.month_name()

In [17]:

Copied!





plt.figure(figsize=(15, 8))
plt.imshow(weather.ghi.values.reshape(-1,24).T,
           aspect='auto',
           origin='lower')
plt.title('Global Horizontal Irradiance in %s\n%s'% (LOCATION, weather_source))
plt.xlabel('Day of the year')
plt.ylabel('Hour')
plt.gca().yaxis.set_major_formatter(mticker.FormatStrFormatter('%d h'))
plt.colorbar();
plt.figure(figsize=(15, 8))
plt.imshow(weather.ghi.values.reshape(-1,24).T,
           aspect='auto',
           origin='lower')
plt.title('Global Horizontal Irradiance in %s\n%s'% (LOCATION, weather_source))
plt.xlabel('Day of the year')
plt.ylabel('Hour')
plt.gca().yaxis.set_major_formatter(mticker.FormatStrFormatter('%d h'))
plt.colorbar();

Solar position¶

In [18]:

Copied!





# Adapted from https://pvlib-python.readthedocs.io/en/stable/auto_examples/plot_sunpath_diagrams.html#polar-plot
from pvlib import solarposition
import pandas as pd
import numpy as np

solpos = mc.results.solar_position
# remove nighttime
solpos = solpos.loc[solpos['apparent_elevation'] > 0, :]

ax = plt.subplot(1, 1, 1, projection='polar')
# draw the analemma loops
points = ax.scatter(np.radians(solpos.azimuth), solpos.apparent_zenith,
                    s=2, label=None, c=solpos.index.dayofyear)
ax.figure.colorbar(points)

# draw hour labels
for hour in np.unique(solpos.index.hour):
    # choose label position by the smallest radius for each hour
    subset = solpos.loc[solpos.index.hour == hour, :]
    r = subset.apparent_zenith
    pos = solpos.loc[r.idxmin(), :]
    ax.text(np.radians(pos['azimuth']), pos['apparent_zenith'], str(hour))

    
# draw individual days
for day_of_year in [80, 172, 355]: # should correspond to March 21st, June 21st and December 21st:
    solpos = mc.results.solar_position[mc.results.solar_position.index.dayofyear == day_of_year]
    solpos = solpos.loc[solpos['apparent_elevation'] > -20, :]

    label = solpos.index[0].strftime('%Y-%m-%d')
    ax.plot(np.radians(solpos.azimuth), solpos.apparent_zenith, label=label)

ax.figure.legend(loc='upper left')

# change coordinates to be like a compass
ax.set_theta_zero_location('N')
ax.set_theta_direction(-1)
ax.set_rmax(90)

plt.title("Sun position in %s" % LOCATION)

plt.show()
# Adapted from https://pvlib-python.readthedocs.io/en/stable/auto_examples/plot_sunpath_diagrams.html#polar-plot
from pvlib import solarposition
import pandas as pd
import numpy as np

solpos = mc.results.solar_position
# remove nighttime
solpos = solpos.loc[solpos['apparent_elevation'] > 0, :]

ax = plt.subplot(1, 1, 1, projection='polar')
# draw the analemma loops
points = ax.scatter(np.radians(solpos.azimuth), solpos.apparent_zenith,
                    s=2, label=None, c=solpos.index.dayofyear)
ax.figure.colorbar(points)

# draw hour labels
for hour in np.unique(solpos.index.hour):
    # choose label position by the smallest radius for each hour
    subset = solpos.loc[solpos.index.hour == hour, :]
    r = subset.apparent_zenith
    pos = solpos.loc[r.idxmin(), :]
    ax.text(np.radians(pos['azimuth']), pos['apparent_zenith'], str(hour))

    
# draw individual days
for day_of_year in [80, 172, 355]: # should correspond to March 21st, June 21st and December 21st:
    solpos = mc.results.solar_position[mc.results.solar_position.index.dayofyear == day_of_year]
    solpos = solpos.loc[solpos['apparent_elevation'] > -20, :]

    label = solpos.index[0].strftime('%Y-%m-%d')
    ax.plot(np.radians(solpos.azimuth), solpos.apparent_zenith, label=label)

ax.figure.legend(loc='upper left')

# change coordinates to be like a compass
ax.set_theta_zero_location('N')
ax.set_theta_direction(-1)
ax.set_rmax(90)

plt.title("Sun position in %s" % LOCATION)

plt.show()

Optimum tilt for given azimuth and location¶

In [19]:

Copied!





tilts = range(91)
insolation_for_tilts = []
mount = system.arrays[0].mount
for tilt in tilts:
    #NOTE: Running a whole PV simulation just for POA irradiance isn't optimal. It requires fewer parameters, though.
    mount.surface_tilt = tilt
    mc.run_model(weather)
    print('.', end='')
    insolation_for_tilts.append(mc.results.total_irrad.poa_global.sum() / 1000)

# Reset mc back to defined tilt
mount.surface_tilt = TILT
mc.run_model(weather);
tilts = range(91)
insolation_for_tilts = []
mount = system.arrays[0].mount
for tilt in tilts:
    #NOTE: Running a whole PV simulation just for POA irradiance isn't optimal. It requires fewer parameters, though.
    mount.surface_tilt = tilt
    mc.run_model(weather)
    print('.', end='')
    insolation_for_tilts.append(mc.results.total_irrad.poa_global.sum() / 1000)

# Reset mc back to defined tilt
mount.surface_tilt = TILT
mc.run_model(weather);

...........................................................................................

In [20]:

Copied!





highest_insolation=max(insolation_for_tilts)
best_tilt = tilts[np.argmax(insolation_for_tilts)]
plt.plot(tilts, insolation_for_tilts, color='black')
plt.ylim(ymin=0, ymax=highest_insolation+100)
plt.xlim(xmin=0, xmax=90)
plt.xlabel('Tilt')
plt.ylabel('Global insolation')
plt.title("Yearly insolation on a tilted plane in %s\nAzimuth : %.0f°\n%s" % (LOCATION, AZIMUTH, weather_source))
plt.gca().xaxis.set_major_formatter(mticker.FormatStrFormatter('%d °'))
plt.gca().yaxis.set_major_formatter(mticker.FormatStrFormatter('%d kWh/(m².y)'))
plt.annotate('Highest insolation, at %.0f°\n%.0f kWh/(m².y)' % (best_tilt, highest_insolation),
             xy=(best_tilt, highest_insolation),
             xytext=(best_tilt, highest_insolation-200),
             arrowprops=dict(facecolor='orange', shrink=0.05)
            )
plt.show()
highest_insolation=max(insolation_for_tilts)
best_tilt = tilts[np.argmax(insolation_for_tilts)]
plt.plot(tilts, insolation_for_tilts, color='black')
plt.ylim(ymin=0, ymax=highest_insolation+100)
plt.xlim(xmin=0, xmax=90)
plt.xlabel('Tilt')
plt.ylabel('Global insolation')
plt.title("Yearly insolation on a tilted plane in %s\nAzimuth : %.0f°\n%s" % (LOCATION, AZIMUTH, weather_source))
plt.gca().xaxis.set_major_formatter(mticker.FormatStrFormatter('%d °'))
plt.gca().yaxis.set_major_formatter(mticker.FormatStrFormatter('%d kWh/(m².y)'))
plt.annotate('Highest insolation, at %.0f°\n%.0f kWh/(m².y)' % (best_tilt, highest_insolation),
             xy=(best_tilt, highest_insolation),
             xytext=(best_tilt, highest_insolation-200),
             arrowprops=dict(facecolor='orange', shrink=0.05)
            )
plt.show()

Plane of array irradiance¶

In [21]:

Copied!





ax=irradiances.plot.bar(title='Monthly average irradiances in %s\nAzimuth: %.0f° Tilt: %.0f°\n%s' % (LOCATION, AZIMUTH, TILT, weather_source),
                     color=['black', '#f47b20'],
                     alpha=0.6);
ax.yaxis.set_major_formatter(mticker.FormatStrFormatter('%d W/m²'))
ax=irradiances.plot.bar(title='Monthly average irradiances in %s\nAzimuth: %.0f° Tilt: %.0f°\n%s' % (LOCATION, AZIMUTH, TILT, weather_source),
                     color=['black', '#f47b20'],
                     alpha=0.6);
ax.yaxis.set_major_formatter(mticker.FormatStrFormatter('%d W/m²'))

In [22]:

Copied!





print("Average GHI irradiance in %s : %.1f W/m²" % (LOCATION, weather.ghi.mean()))
daily_temperatures = weather.temp_air.resample('D').mean()
print("Average POA irradiance in %s : %.1f W/m²" % (LOCATION, mc.results.total_irrad.poa_global.mean()))
print("Total GHI insolation in %s   : %.0f kWh/(m² . y)" % (LOCATION, weather.ghi.sum() / 1000))
print("Total POA insolation in %s   : %.0f kWh/(m² . y)" % (LOCATION, mc.results.total_irrad.poa_global.sum() / 1000))
print("Average GHI irradiance in %s : %.1f W/m²" % (LOCATION, weather.ghi.mean()))
daily_temperatures = weather.temp_air.resample('D').mean()
print("Average POA irradiance in %s : %.1f W/m²" % (LOCATION, mc.results.total_irrad.poa_global.mean()))
print("Total GHI insolation in %s   : %.0f kWh/(m² . y)" % (LOCATION, weather.ghi.sum() / 1000))
print("Total POA insolation in %s   : %.0f kWh/(m² . y)" % (LOCATION, mc.results.total_irrad.poa_global.sum() / 1000))

Average GHI irradiance in Stuttgart : 133.0 W/m²
Average POA irradiance in Stuttgart : 153.9 W/m²
Total GHI insolation in Stuttgart   : 1165 kWh/(m² . y)
Total POA insolation in Stuttgart   : 1348 kWh/(m² . y)

In [23]:

Copied!

# That's weird. POA seems too high!
  # Perez is even worse than HayDavies
  # Standard albedo is 0.25
# That's weird. POA seems too high!
  # Perez is even worse than HayDavies
  # Standard albedo is 0.25

In [24]:

Copied!

print(system.arrays[0].albedo)
print(system.arrays[0].albedo)

0.2

In [25]:

Copied!





plt.figure(figsize=(15, 8))
plt.imshow(mc.results.total_irrad.poa_global.values.reshape(-1,24).T,
           aspect='auto',
           origin='lower')
plt.title('POA global irradiance in %s\nAzimuth: %.0f° Tilt: %.0f°\n%s' %
          (LOCATION, AZIMUTH, TILT, weather_source))
plt.xlabel('Day of the year')
plt.ylabel('Hour')
plt.gca().yaxis.set_major_formatter(mticker.FormatStrFormatter('%d h'))
plt.colorbar();
plt.figure(figsize=(15, 8))
plt.imshow(mc.results.total_irrad.poa_global.values.reshape(-1,24).T,
           aspect='auto',
           origin='lower')
plt.title('POA global irradiance in %s\nAzimuth: %.0f° Tilt: %.0f°\n%s' %
          (LOCATION, AZIMUTH, TILT, weather_source))
plt.xlabel('Day of the year')
plt.ylabel('Hour')
plt.gca().yaxis.set_major_formatter(mticker.FormatStrFormatter('%d h'))
plt.colorbar();

Albedo + Diffuse + Direct¶

In [26]:

Copied!





#TODO: DRY with december
ax = weather[weather.index.isocalendar().week == 26].ghi.plot(style='--', color='#555555', legend='GHI')
mc.results.total_irrad[mc.results.total_irrad.index.isocalendar().week == 26].plot.area(
    ax=ax,
    title='POA irradiances around June solstice in %s\nAzimuth: %.0f° Tilt: %.0f°\n%s' %
    (LOCATION, AZIMUTH, TILT, weather_source),
    y=['poa_ground_diffuse', 'poa_sky_diffuse', 'poa_direct'],
    color=["#22cb22", "#89cbdf", "#f47b20"],
    lw=0
)
ax.yaxis.set_major_formatter(mticker.FormatStrFormatter('%d W/m²'))
#TODO: DRY with december
ax = weather[weather.index.isocalendar().week == 26].ghi.plot(style='--', color='#555555', legend='GHI')
mc.results.total_irrad[mc.results.total_irrad.index.isocalendar().week == 26].plot.area(
    ax=ax,
    title='POA irradiances around June solstice in %s\nAzimuth: %.0f° Tilt: %.0f°\n%s' %
    (LOCATION, AZIMUTH, TILT, weather_source),
    y=['poa_ground_diffuse', 'poa_sky_diffuse', 'poa_direct'],
    color=["#22cb22", "#89cbdf", "#f47b20"],
    lw=0
)
ax.yaxis.set_major_formatter(mticker.FormatStrFormatter('%d W/m²'))

In [27]:

Copied!





ax = weather[weather.index.isocalendar().week == 50].ghi.plot(style='--', color='#555555', legend='GHI')
mc.results.total_irrad[mc.results.total_irrad.index.isocalendar().week == 50].plot.area(
    ax=ax,
    title='POA irradiances around December solstice in %s\nAzimuth: %.0f° Tilt: %.0f°\n%s' %
    (LOCATION, AZIMUTH, TILT, weather_source),
    y=['poa_ground_diffuse', 'poa_sky_diffuse', 'poa_direct'],
    color=["#22cb22", "#89cbdf", "#f47b20"],
    lw=0
)
ax.yaxis.set_major_formatter(mticker.FormatStrFormatter('%d W/m²'))
ax = weather[weather.index.isocalendar().week == 50].ghi.plot(style='--', color='#555555', legend='GHI')
mc.results.total_irrad[mc.results.total_irrad.index.isocalendar().week == 50].plot.area(
    ax=ax,
    title='POA irradiances around December solstice in %s\nAzimuth: %.0f° Tilt: %.0f°\n%s' %
    (LOCATION, AZIMUTH, TILT, weather_source),
    y=['poa_ground_diffuse', 'poa_sky_diffuse', 'poa_direct'],
    color=["#22cb22", "#89cbdf", "#f47b20"],
    lw=0
)
ax.yaxis.set_major_formatter(mticker.FormatStrFormatter('%d W/m²'))

Misc¶

In [28]:

Copied!

i = mc.results.total_irrad
i = mc.results.total_irrad

In [29]:

Copied!

all(np.isclose(i.poa_diffuse, i.poa_sky_diffuse + i.poa_ground_diffuse))
 all(np.isclose(i.poa_diffuse, i.poa_sky_diffuse + i.poa_ground_diffuse))

Out[29]:

True

In [30]:

Copied!

all(np.isclose(i.poa_global, i.poa_direct + i.poa_sky_diffuse + i.poa_ground_diffuse))
 all(np.isclose(i.poa_global, i.poa_direct + i.poa_sky_diffuse + i.poa_ground_diffuse))

Out[30]:

True

In [31]:

Copied!

mc.dc_model()
mc.dc_model()

Out[31]:

ModelChain: 
  name: None
  clearsky_model: ineichen
  transposition_model: haydavies
  solar_position_method: nrel_numpy
  airmass_model: kastenyoung1989
  dc_model: sapm
  ac_model: sandia_inverter
  aoi_model: sapm_aoi_loss
  spectral_model: sapm_spectral_loss
  temperature_model: sapm_temp
  losses_model: no_extra_losses

In [32]:

Copied!

#TODO: Add I(V), P(V)
#TODO: Add I(V), P(V)

In [33]:

Copied!

#TODO: Add eta inverter curve
#TODO: Add eta inverter curve

In [34]:

Copied!

#TODO: Check what's missing from insel report
#TODO: Check what's missing from insel report

Fourier¶

In [1]:

Copied!





import numpy as np
import matplotlib.pyplot as plt

N = 1000
T = 0.01
x = np.linspace(0.0, N*T, N)
y = np.where(abs(x)<=0.5, 1, 0) # Rectangular function
plt.plot(x, y);
import numpy as np
import matplotlib.pyplot as plt

N = 1000
T = 0.01
x = np.linspace(0.0, N*T, N)
y = np.where(abs(x)<=0.5, 1, 0) # Rectangular function
plt.plot(x, y);

In [2]:

Copied!

yf = np.fft.fft(y)
xf = np.linspace(0.0, 1.0/(2.0*T), N//2)

plt.plot(xf, 2.0/N * yf[:N//2].real);
yf = np.fft.fft(y)
xf = np.linspace(0.0, 1.0/(2.0*T), N//2)

plt.plot(xf, 2.0/N * yf[:N//2].real);

Write XLSX¶

In [3]:

Copied!

# You could read a real CSV file instead
csv_lines = ['a;b;c;d\n', '1;2;3;4\n', '5;6;7;8\n']
# You could read a real CSV file instead
csv_lines = ['a;b;c;d\n', '1;2;3;4\n', '5;6;7;8\n']

In [4]:

Copied!





import xlsxwriter

workbook  = xlsxwriter.Workbook('filename.xlsx')
worksheet = workbook.add_worksheet()

i = 0 # line number
for line in csv_lines:
    date, time, gh, ta = line.replace('\n', '').split(';')
    worksheet.write(i, 0, date)
    worksheet.write(i, 1, time)
    worksheet.write(i, 2, gh)
    worksheet.write(i, 3, ta)
    i += 1

workbook.close()
import xlsxwriter

workbook  = xlsxwriter.Workbook('filename.xlsx')
worksheet = workbook.add_worksheet()

i = 0 # line number
for line in csv_lines:
    date, time, gh, ta = line.replace('\n', '').split(';')
    worksheet.write(i, 0, date)
    worksheet.write(i, 1, time)
    worksheet.write(i, 2, gh)
    worksheet.write(i, 3, ta)
    i += 1

workbook.close()

Read XLSX¶

In [5]:

Copied!

import pandas as pd
import pandas as pd

In [6]:

Copied!

pd.read_excel('filename.xlsx')
pd.read_excel('filename.xlsx')

Out[6]:

	a	b	c	d
0	1	2	3	4
1	5	6	7	8

Remove file¶

BE VERY CAREFUL!

In [ ]:

Copied!

import os
import os

In [ ]:

Copied!

os.remove('filename.xlsx')
os.remove('filename.xlsx')

SENCE 2021 Examples¶

Many examples are copied from https://www.python-graph-gallery.com/

Sankey (with plotly)¶

basic example¶

In [1]:

Copied!

import plotly.graph_objects as go
import plotly.graph_objects as go

In [2]:

Copied!

from IPython.display import Image
Image(filename='images/graph2.jpg')
# Graph with nodes, flows, and weights:
from IPython.display import Image
Image(filename='images/graph2.jpg')
# Graph with nodes, flows, and weights:

Out[2]:

In [3]:

Copied!

source = [0,   2,  2,  1,  3,  3]
target = [2,   1,  3,  4,  4,  3]
value =  [50, 30, 20, 30, 10, 10]
source = [0,   2,  2,  1,  3,  3]
target = [2,   1,  3,  4,  4,  3]
value =  [50, 30, 20, 30, 10, 10]

In [4]:

Copied!

link = dict(source = source, target = target, value = value)
data = go.Sankey(link = link, node = dict(label= ["A", "B", "C", "D", "E"]))

fig = go.Figure(data)

fig.write_html("simple-sankey.html")
# Alternative : fig.show()
link = dict(source = source, target = target, value = value)
data = go.Sankey(link = link, node = dict(label= ["A", "B", "C", "D", "E"]))

fig = go.Figure(data)

fig.write_html("simple-sankey.html")
# Alternative : fig.show()

In [5]:

Copied!

%%html
<iframe src="/simple-sankey.html" width="800" height="600"
 title="Sankey with plotly" style="border:none"></iframe>
%%html

More complex example¶

In [6]:

Copied!





import plotly.graph_objects as go
import urllib, json

url = 'https://raw.githubusercontent.com/plotly/plotly.js/master/test/image/mocks/sankey_energy.json'
response = urllib.request.urlopen(url)
data = json.loads(response.read())

# override gray link colors with 'source' colors
opacity = 0.4
# change 'magenta' to its 'rgba' value to add opacity
data['data'][0]['node']['color'] = ['rgba(255,0,255, 0.8)' if color == "magenta" else color for color in data['data'][0]['node']['color']]
data['data'][0]['link']['color'] = [data['data'][0]['node']['color'][src].replace("0.8", str(opacity))
                                    for src in data['data'][0]['link']['source']]

fig = go.Figure(data=[go.Sankey(
    valueformat = ".0f",
    valuesuffix = "TWh",
    # Define nodes
    node = dict(
      pad = 15,
      thickness = 15,
      line = dict(color = "black", width = 0.5),
      label =  data['data'][0]['node']['label'],
      color =  data['data'][0]['node']['color']
    ),
    # Add links
    link = dict(
      source =  data['data'][0]['link']['source'],
      target =  data['data'][0]['link']['target'],
      value =  data['data'][0]['link']['value'],
      label =  data['data'][0]['link']['label'],
      color =  data['data'][0]['link']['color']
))])

fig.write_html("sankey-plotly-python.html")
# Alternative : fig.show()
import plotly.graph_objects as go
import urllib, json

url = 'https://raw.githubusercontent.com/plotly/plotly.js/master/test/image/mocks/sankey_energy.json'
response = urllib.request.urlopen(url)
data = json.loads(response.read())

# override gray link colors with 'source' colors
opacity = 0.4
# change 'magenta' to its 'rgba' value to add opacity
data['data'][0]['node']['color'] = ['rgba(255,0,255, 0.8)' if color == "magenta" else color for color in data['data'][0]['node']['color']]
data['data'][0]['link']['color'] = [data['data'][0]['node']['color'][src].replace("0.8", str(opacity))
                                    for src in data['data'][0]['link']['source']]

fig = go.Figure(data=[go.Sankey(
    valueformat = ".0f",
    valuesuffix = "TWh",
    # Define nodes
    node = dict(
      pad = 15,
      thickness = 15,
      line = dict(color = "black", width = 0.5),
      label =  data['data'][0]['node']['label'],
      color =  data['data'][0]['node']['color']
    ),
    # Add links
    link = dict(
      source =  data['data'][0]['link']['source'],
      target =  data['data'][0]['link']['target'],
      value =  data['data'][0]['link']['value'],
      label =  data['data'][0]['link']['label'],
      color =  data['data'][0]['link']['color']
))])

fig.write_html("sankey-plotly-python.html")
# Alternative : fig.show()

In [7]:

Copied!

%%html
<iframe src="/sankey-plotly-python.html" width="800" height="600"
 title="Sankey with plotly" style="border:none"></iframe>
%%html

Contour plots¶

In [8]:

Copied!





# libraries & dataset
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('iris')
 
# set seaborn style
sns.set_style("white")

# Basic 2D density plot
sns.kdeplot(x=df.sepal_width, y=df.sepal_length)
plt.show()
 
# Custom the color, add shade and bandwidth
sns.kdeplot(x=df.sepal_width, y=df.sepal_length, cmap="Reds", shade=True, bw_adjust=.5)
plt.show()

# Add thresh parameter
sns.kdeplot(x=df.sepal_width, y=df.sepal_length, cmap="Blues", shade=True, thresh=0)
plt.show()
# libraries & dataset
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('iris')
 
# set seaborn style
sns.set_style("white")

# Basic 2D density plot
sns.kdeplot(x=df.sepal_width, y=df.sepal_length)
plt.show()
 
# Custom the color, add shade and bandwidth
sns.kdeplot(x=df.sepal_width, y=df.sepal_length, cmap="Reds", shade=True, bw_adjust=.5)
plt.show()

# Add thresh parameter
sns.kdeplot(x=df.sepal_width, y=df.sepal_length, cmap="Blues", shade=True, thresh=0)
plt.show()

Interactive maps¶

with plotly¶

In [9]:

Copied!





### https://www.python-graph-gallery.com/choropleth-map-plotly-python

# Import the pandas library
import pandas as pd

# Import the data from the web
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/fips-unemp-16.csv",
                   dtype={"fips": str})


# Load the county boundary coordinates
from urllib.request import urlopen
import json
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)

# Build the choropleth
import plotly.express as px
fig = px.choropleth(df, 
    geojson=counties, 
    locations='fips', 
    color='unemp',
    color_continuous_scale="Viridis",
    range_color=(0, 12),
    scope="usa",
    labels={'unemp':'unemployment rate'}
)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

# Improve the legend
fig.update_layout(coloraxis_colorbar=dict(
    thicknessmode="pixels", thickness=10,
    lenmode="pixels", len=150,
    yanchor="top", y=0.8,
    ticks="outside", ticksuffix=" %",
    dtick=5
))

fig.write_html("choropleth-map-plotly-python.html")
# Alternative : fig.show()
### https://www.python-graph-gallery.com/choropleth-map-plotly-python

# Import the pandas library
import pandas as pd

# Import the data from the web
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/fips-unemp-16.csv",
                   dtype={"fips": str})


# Load the county boundary coordinates
from urllib.request import urlopen
import json
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)

# Build the choropleth
import plotly.express as px
fig = px.choropleth(df, 
    geojson=counties, 
    locations='fips', 
    color='unemp',
    color_continuous_scale="Viridis",
    range_color=(0, 12),
    scope="usa",
    labels={'unemp':'unemployment rate'}
)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

# Improve the legend
fig.update_layout(coloraxis_colorbar=dict(
    thicknessmode="pixels", thickness=10,
    lenmode="pixels", len=150,
    yanchor="top", y=0.8,
    ticks="outside", ticksuffix=" %",
    dtick=5
))

fig.write_html("choropleth-map-plotly-python.html")
# Alternative : fig.show()

In [10]:

Copied!

%%html
<iframe src="/choropleth-map-plotly-python.html" width="800" height="600"
 title="Map with plotly" style="border:none"></iframe>
%%html

with folium¶

In [11]:

Copied!





# import the folium library
# pip install folium
import folium

# initialize the map and store it in a m object
m = folium.Map(location=[40, -95], zoom_start=4)

import pandas as pd

url = (
    "https://raw.githubusercontent.com/python-visualization/folium/master/examples/data"
)
state_geo = f"{url}/us-states.json"
state_unemployment = f"{url}/US_Unemployment_Oct2012.csv"
state_data = pd.read_csv(state_unemployment)

folium.Choropleth(
    geo_data=state_geo,
    name="choropleth",
    data=state_data,
    columns=["State", "Unemployment"],
    key_on="feature.id",
    fill_color="YlGn",
    fill_opacity=0.7,
    line_opacity=.1,
    legend_name="Unemployment Rate (%)",
).add_to(m)

folium.LayerControl().add_to(m)

m.save('choropleth-map-with-folium.html')
# import the folium library
# pip install folium
import folium

# initialize the map and store it in a m object
m = folium.Map(location=[40, -95], zoom_start=4)

import pandas as pd

url = (
    "https://raw.githubusercontent.com/python-visualization/folium/master/examples/data"
)
state_geo = f"{url}/us-states.json"
state_unemployment = f"{url}/US_Unemployment_Oct2012.csv"
state_data = pd.read_csv(state_unemployment)

folium.Choropleth(
    geo_data=state_geo,
    name="choropleth",
    data=state_data,
    columns=["State", "Unemployment"],
    key_on="feature.id",
    fill_color="YlGn",
    fill_opacity=0.7,
    line_opacity=.1,
    legend_name="Unemployment Rate (%)",
).add_to(m)

folium.LayerControl().add_to(m)

m.save('choropleth-map-with-folium.html')

In [12]:

Copied!

%%html
<iframe src="/choropleth-map-with-folium.html" width="800" height="600"
 title="Map with folium" style="border:none"></iframe>
%%html

Clustermap¶

In [13]:

Copied!





import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt
 
# Data set
url = 'https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/mtcars.csv'
df = pd.read_csv(url)
df = df.set_index('model')
 

sns.clustermap(df, metric="euclidean", standard_scale=1, method="ward", cmap="viridis")
plt.show()
import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt
 
# Data set
url = 'https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/mtcars.csv'
df = pd.read_csv(url)
df = df.set_index('model')
 

sns.clustermap(df, metric="euclidean", standard_scale=1, method="ward", cmap="viridis")
plt.show()

Wordcloud¶

It might be complex to install it on windows. :-/

In [14]:

Copied!





# Libraries
from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Create a list of word
text=("Python Python Python Matplotlib MMB MMB SENCE")

# Create the wordcloud object
wordcloud = WordCloud(width=480, height=480, margin=0).generate(text)

# Display the generated image:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.margins(x=0, y=0)
plt.savefig('foo.png')
plt.show()
# Libraries
from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Create a list of word
text=("Python Python Python Matplotlib MMB MMB SENCE")

# Create the wordcloud object
wordcloud = WordCloud(width=480, height=480, margin=0).generate(text)

# Display the generated image:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.margins(x=0, y=0)
plt.savefig('foo.png')
plt.show()

Ridge line¶

In [15]:

Copied!





# getting necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})

# getting the data
temp = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2016-weather-data-seattle.csv') # we retrieve the data from plotly's GitHub repository
temp['month'] = pd.to_datetime(temp['Date']).dt.month # we store the month in a separate column

# we define a dictionnary with months that we'll use later
month_dict = {1: 'january',
              2: 'february',
              3: 'march',
              4: 'april',
              5: 'may',
              6: 'june',
              7: 'july',
              8: 'august',
              9: 'september',
              10: 'october',
              11: 'november',
              12: 'december'}

# we create a 'month' column
temp['month'] = temp['month'].map(month_dict)

# we generate a pd.Serie with the mean temperature for each month (used later for colors in the FacetGrid plot), and we create a new column in temp dataframe
month_mean_serie = temp.groupby('month')['Mean_TemperatureC'].mean()
temp['mean_month'] = temp['month'].map(month_mean_serie)
# getting necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})

# getting the data
temp = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2016-weather-data-seattle.csv') # we retrieve the data from plotly's GitHub repository
temp['month'] = pd.to_datetime(temp['Date']).dt.month # we store the month in a separate column

# we define a dictionnary with months that we'll use later
month_dict = {1: 'january',
              2: 'february',
              3: 'march',
              4: 'april',
              5: 'may',
              6: 'june',
              7: 'july',
              8: 'august',
              9: 'september',
              10: 'october',
              11: 'november',
              12: 'december'}

# we create a 'month' column
temp['month'] = temp['month'].map(month_dict)

# we generate a pd.Serie with the mean temperature for each month (used later for colors in the FacetGrid plot), and we create a new column in temp dataframe
month_mean_serie = temp.groupby('month')['Mean_TemperatureC'].mean()
temp['mean_month'] = temp['month'].map(month_mean_serie)

In [16]:

Copied!





# we generate a color palette with Seaborn.color_palette()
pal = sns.color_palette(palette='coolwarm', n_colors=12)

# in the sns.FacetGrid class, the 'hue' argument is the one that is the one that will be represented by colors with 'palette'
g = sns.FacetGrid(temp, row='month', hue='mean_month', aspect=15, height=0.75, palette=pal)

# then we add the densities kdeplots for each month
g.map(sns.kdeplot, 'Mean_TemperatureC',
      bw_adjust=1, clip_on=False,
      fill=True, alpha=1, linewidth=1.5)

# here we add a white line that represents the contour of each kdeplot
g.map(sns.kdeplot, 'Mean_TemperatureC', 
      bw_adjust=1, clip_on=False, 
      color="w", lw=2)

# here we add a horizontal line for each plot
g.map(plt.axhline, y=0,
      lw=2, clip_on=False)

# we loop over the FacetGrid figure axes (g.axes.flat) and add the month as text with the right color
# notice how ax.lines[-1].get_color() enables you to access the last line's color in each matplotlib.Axes
for i, ax in enumerate(g.axes.flat):
    ax.text(-15, 0.02, month_dict[i+1],
            fontweight='bold', fontsize=15,
            color=ax.lines[-1].get_color())
    
# we use matplotlib.Figure.subplots_adjust() function to get the subplots to overlap
g.fig.subplots_adjust(hspace=-0.3)

# eventually we remove axes titles, yticks and spines
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)

plt.setp(ax.get_xticklabels(), fontsize=15, fontweight='bold')
plt.xlabel('Temperature in degree Celsius', fontweight='bold', fontsize=15)
g.fig.suptitle('Daily average temperature in Seattle per month',
               ha='right',
               fontsize=20,
               fontweight=20)

plt.show()
# we generate a color palette with Seaborn.color_palette()
pal = sns.color_palette(palette='coolwarm', n_colors=12)

# in the sns.FacetGrid class, the 'hue' argument is the one that is the one that will be represented by colors with 'palette'
g = sns.FacetGrid(temp, row='month', hue='mean_month', aspect=15, height=0.75, palette=pal)

# then we add the densities kdeplots for each month
g.map(sns.kdeplot, 'Mean_TemperatureC',
      bw_adjust=1, clip_on=False,
      fill=True, alpha=1, linewidth=1.5)

# here we add a white line that represents the contour of each kdeplot
g.map(sns.kdeplot, 'Mean_TemperatureC', 
      bw_adjust=1, clip_on=False, 
      color="w", lw=2)

# here we add a horizontal line for each plot
g.map(plt.axhline, y=0,
      lw=2, clip_on=False)

# we loop over the FacetGrid figure axes (g.axes.flat) and add the month as text with the right color
# notice how ax.lines[-1].get_color() enables you to access the last line's color in each matplotlib.Axes
for i, ax in enumerate(g.axes.flat):
    ax.text(-15, 0.02, month_dict[i+1],
            fontweight='bold', fontsize=15,
            color=ax.lines[-1].get_color())
    
# we use matplotlib.Figure.subplots_adjust() function to get the subplots to overlap
g.fig.subplots_adjust(hspace=-0.3)

# eventually we remove axes titles, yticks and spines
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)

plt.setp(ax.get_xticklabels(), fontsize=15, fontweight='bold')
plt.xlabel('Temperature in degree Celsius', fontweight='bold', fontsize=15)
g.fig.suptitle('Daily average temperature in Seattle per month',
               ha='right',
               fontsize=20,
               fontweight=20)

plt.show()

Larger plots¶

In [17]:

Copied!

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
import numpy as np

In [18]:

Copied!

plt.rcParams["figure.figsize"] = (20, 10)
plt.rcParams["figure.figsize"] = (20, 10)

In [19]:

Copied!

t = np.linspace(0, 2*np.pi, 500)
plt.plot(t, np.sin(t))
plt.show()
t = np.linspace(0, 2*np.pi, 500)
plt.plot(t, np.sin(t))
plt.show()

Some links¶

Sun path diagrams : http://andrewmarsh.com/software/sunpath2d-web/
Tutorial for graphical user-interface (GUI) in Python : https://realpython.com/python-gui-tkinter/
Questionnaire in Python : https://pypi.org/project/questionary/
Questionnaire with Django : https://github.com/Pierre-Sassoulas/django-survey

SENCE 2022 Examples¶

Open file with default program¶

import os
os.startfile('output/SchPark01.csv') # At least on Windows

Dataclasses¶

In [1]:

Copied!





from dataclasses import dataclass, astuple


@dataclass
class Point:
    x: float = 0
    y: float = 0
    z: float = 0

    def distance_square(self, other):
        return (other.x - self.x)**2 +\
               (other.y - self.y)**2 +\
               (other.z - self.z)**2

    def distance(self, other):
        return self.distance_square(other)**0.5
from dataclasses import dataclass, astuple


@dataclass
class Point:
    x: float = 0
    y: float = 0
    z: float = 0

    def distance_square(self, other):
        return (other.x - self.x)**2 +\
               (other.y - self.y)**2 +\
               (other.z - self.z)**2

    def distance(self, other):
        return self.distance_square(other)**0.5

In [2]:

Copied!

some_point = Point(1, 2)
another_point = Point(4, 6)
some_point = Point(1, 2)
another_point = Point(4, 6)

In [3]:

Copied!

some_point
some_point

Out[3]:

Point(x=1, y=2, z=0)

In [4]:

Copied!

another_point
another_point

Out[4]:

Point(x=4, y=6, z=0)

In [5]:

Copied!

some_point.x
some_point.x

Out[5]:

In [6]:

Copied!

some_point == another_point
some_point == another_point

Out[6]:

False

In [7]:

Copied!

some_point == Point(1, 2)
some_point == Point(1, 2)

Out[7]:

True

In [8]:

Copied!

some_point.distance(another_point)
some_point.distance(another_point)

Out[8]:

5.0

In [9]:

Copied!

some_point.x = 3
some_point.x = 3

In [10]:

Copied!

some_point.distance(another_point)
some_point.distance(another_point)

Out[10]:

4.123105625617661

In [11]:

Copied!

astuple(some_point)
astuple(some_point)

Out[11]:

(3, 2, 0)

Tests¶

In [12]:

Copied!





import unittest
from pathlib import Path

SCRIPT_DIR = Path('.')
OUTPUT_DIR = Path('output')


class TestCSVSolutions(unittest.TestCase):

    def test_output_folder(self):
        self.assertTrue(OUTPUT_DIR.exists(), 'Please create "%s" folder' % OUTPUT_DIR)

    def test_scripts_is_written(self):
        py_script = '01_workshop_example.py'
        self.assertTrue((SCRIPT_DIR / py_script).exists(), '"%s" should exist.' % py_script)
        with open(py_script) as f:
            content = f.readlines()
            self.assertFalse(all(line.startswith('#') for line in content),
                             '"%s" should have more than just comments. Please write some code.' % py_script)

    def test_csv_output_file(self):
        csv_path = OUTPUT_DIR / 'SchPark01.csv'
        self.assertTrue(csv_path.exists(), 'Please generate "%s" file' % csv_path)
        with open(csv_path) as out:
            content = out.readlines()
        self.assertEqual(8760 * 4, len(content),
                         "CSV should have 15-minute values for a complete year")
        self.assertTrue(
            "2011/01/01;00:00;0.0;-0.6" in content[0], "First line of %s should be for 1st of January" % csv_path)
        self.assertTrue(
            "2011/12/31;23:45;0.0;8.1" in content[-1], "Last line of %s should be for 31st of December" % csv_path)

        t_sum, g_sum = 0, 0
        i = 0
        for line in content:
            cells = line.replace(' ', '').split(';')
            if all(cells):
                t_sum += float(cells[3])
                g_sum += float(cells[2])
                i += 1
        t_average = t_sum / i
        g_average = g_sum / i
        self.assertAlmostEqual(0.97, i / 8760 / 4, msg="Most of lines should have values", places=2)
        self.assertAlmostEqual(12.8, t_average, places=2)
        self.assertAlmostEqual(137, g_average, places=0)



unittest.main(argv=[''], verbosity=2, exit=False);
import unittest
from pathlib import Path

SCRIPT_DIR = Path('.')
OUTPUT_DIR = Path('output')


class TestCSVSolutions(unittest.TestCase):

    def test_output_folder(self):
        self.assertTrue(OUTPUT_DIR.exists(), 'Please create "%s" folder' % OUTPUT_DIR)

    def test_scripts_is_written(self):
        py_script = '01_workshop_example.py'
        self.assertTrue((SCRIPT_DIR / py_script).exists(), '"%s" should exist.' % py_script)
        with open(py_script) as f:
            content = f.readlines()
            self.assertFalse(all(line.startswith('#') for line in content),
                             '"%s" should have more than just comments. Please write some code.' % py_script)

    def test_csv_output_file(self):
        csv_path = OUTPUT_DIR / 'SchPark01.csv'
        self.assertTrue(csv_path.exists(), 'Please generate "%s" file' % csv_path)
        with open(csv_path) as out:
            content = out.readlines()
        self.assertEqual(8760 * 4, len(content),
                         "CSV should have 15-minute values for a complete year")
        self.assertTrue(
            "2011/01/01;00:00;0.0;-0.6" in content[0], "First line of %s should be for 1st of January" % csv_path)
        self.assertTrue(
            "2011/12/31;23:45;0.0;8.1" in content[-1], "Last line of %s should be for 31st of December" % csv_path)

        t_sum, g_sum = 0, 0
        i = 0
        for line in content:
            cells = line.replace(' ', '').split(';')
            if all(cells):
                t_sum += float(cells[3])
                g_sum += float(cells[2])
                i += 1
        t_average = t_sum / i
        g_average = g_sum / i
        self.assertAlmostEqual(0.97, i / 8760 / 4, msg="Most of lines should have values", places=2)
        self.assertAlmostEqual(12.8, t_average, places=2)
        self.assertAlmostEqual(137, g_average, places=0)



unittest.main(argv=[''], verbosity=2, exit=False);

test_csv_output_file (__main__.TestCSVSolutions) ... ok
test_output_folder (__main__.TestCSVSolutions) ... ok
test_scripts_is_written (__main__.TestCSVSolutions) ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.147s

OK

Some links¶

PyCharm. Excellent Python IDE : https://www.jetbrains.com/pycharm/download/

SENCE 2023 Examples - 3. Semester¶

Common libraries and parameters¶

In [1]:

Copied!

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:

Copied!

# Don't show too many rows in Pandas Dataframes
pd.options.display.max_rows = 7
# Don't show too many rows in Pandas Dataframes
pd.options.display.max_rows = 7

In [3]:

Copied!

# Larger plots
plt.rcParams['figure.figsize'] = [16, 8]
# Larger plots
plt.rcParams['figure.figsize'] = [16, 8]

Bubble map¶

https://www.python-graph-gallery.com/bubble-map/

In [4]:

Copied!

# "pip install folium" might be needed first : https://pypi.org/project/folium/
import folium
# "pip install folium" might be needed first : https://pypi.org/project/folium/
import folium

In [5]:

Copied!





# Make a data frame with dots to show on the map.
# All the values are the same, in order to check if the projection distorts the circles
data = pd.DataFrame({
   'lon':[-58, 2, 145, 30.32, -4.03, -73.57, 36.82, -38.5],
   'lat':[-34, 49, -38, 59.93, 5.33, 45.52, -1.29, -12.97],
   'name':['Buenos Aires', 'Paris', 'Melbourne', 'St Petersbourg', 'Abidjan', 'Montreal', 'Nairobi', 'Salvador'],
   'value': [50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0]
})

data
# Make a data frame with dots to show on the map.
# All the values are the same, in order to check if the projection distorts the circles
data = pd.DataFrame({
   'lon':[-58, 2, 145, 30.32, -4.03, -73.57, 36.82, -38.5],
   'lat':[-34, 49, -38, 59.93, 5.33, 45.52, -1.29, -12.97],
   'name':['Buenos Aires', 'Paris', 'Melbourne', 'St Petersbourg', 'Abidjan', 'Montreal', 'Nairobi', 'Salvador'],
   'value': [50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0]
})

data

Out[5]:

	lon	lat	name	value
0	-58.00	-34.00	Buenos Aires	50.0
1	2.00	49.00	Paris	50.0
2	145.00	-38.00	Melbourne	50.0
...	...	...	...	...
5	-73.57	45.52	Montreal	50.0
6	36.82	-1.29	Nairobi	50.0
7	-38.50	-12.97	Salvador	50.0

8 rows × 4 columns

Circles are distorted by Mercator projection¶

see https://en.wikipedia.org/wiki/Tissot%27s_indicatrix for more information

In [6]:

Copied!





# Make an empty map
m = folium.Map(location=[20,0], tiles="OpenStreetMap", zoom_start=2)

# add marker one by one on the map
for city in data.itertuples():
    folium.Circle(
        location=[city.lat, city.lon],
        popup=city.name,
        radius=city.value * 20000.0,
        color='crimson',
        fill=True,
        fill_color='crimson'
    ).add_to(m)

m.get_root().html.add_child(folium.Element("<h3 align='center'>Map with distorted circles</h3>"))

# Show the map
m
# Make an empty map
m = folium.Map(location=[20,0], tiles="OpenStreetMap", zoom_start=2)

# add marker one by one on the map
for city in data.itertuples():
    folium.Circle(
        location=[city.lat, city.lon],
        popup=city.name,
        radius=city.value * 20000.0,
        color='crimson',
        fill=True,
        fill_color='crimson'
    ).add_to(m)

m.get_root().html.add_child(folium.Element("Map with distorted circles"))

# Show the map
m

Out[6]:

Make this Notebook Trusted to load map: File -> Trust Notebook

Avoiding deformation¶

In [7]:

Copied!





import math
m = folium.Map(location=[20,0], tiles="OpenStreetMap", zoom_start=2)

# add marker one by one on the map, and account for Mercator deformation
for city in data.itertuples():
    local_deformation = math.cos(city.lat * math.pi / 180)
    folium.Circle(
        location=[city.lat, city.lon],
        popup='%s (%.1f)' % (city.name, city.value),
        radius=city.value * 20000.0 * local_deformation,
        color='crimson',
        fill=True,
        fill_color='crimson'
    ).add_to(m)

m.get_root().html.add_child(folium.Element("<h3 align='center'>Map with circles of correct size</h3>"))

m.save('output/bubble_map.html')

m
import math
m = folium.Map(location=[20,0], tiles="OpenStreetMap", zoom_start=2)

# add marker one by one on the map, and account for Mercator deformation
for city in data.itertuples():
    local_deformation = math.cos(city.lat * math.pi / 180)
    folium.Circle(
        location=[city.lat, city.lon],
        popup='%s (%.1f)' % (city.name, city.value),
        radius=city.value * 20000.0 * local_deformation,
        color='crimson',
        fill=True,
        fill_color='crimson'
    ).add_to(m)

m.get_root().html.add_child(folium.Element("Map with circles of correct size"))

m.save('output/bubble_map.html')

m

Out[7]:

Make this Notebook Trusted to load map: File -> Trust Notebook

Heatmap¶

https://www.python-graph-gallery.com/heatmap/

Basic example¶

In [8]:

Copied!





# initialize columns
data = {
    'A': [0, 1, 2, 3, 4, 5, 6],
    'B': [1, 2, 3, 4, 5, 6, 7],
    'C': [2, 3, 4, 5, 6, 7, 8],
    'D': [3, 4, 5, 6, 7, 8, 9],
    'E': [4, 5, 6, 7, 8, 9, 10],
    'F': [5, 6, 7, 8, 9, 10, 11]
}
df = pd.DataFrame(data)
# initialize columns
data = {
    'A': [0, 1, 2, 3, 4, 5, 6],
    'B': [1, 2, 3, 4, 5, 6, 7],
    'C': [2, 3, 4, 5, 6, 7, 8],
    'D': [3, 4, 5, 6, 7, 8, 9],
    'E': [4, 5, 6, 7, 8, 9, 10],
    'F': [5, 6, 7, 8, 9, 10, 11]
}
df = pd.DataFrame(data)

In [9]:

Copied!

df
df

Out[9]:

	A	B	C	D	E	F
0	0	1	2	3	4	5
1	1	2	3	4	5	6
2	2	3	4	5	6	7
3	3	4	5	6	7	8
4	4	5	6	7	8	9
5	5	6	7	8	9	10
6	6	7	8	9	10	11

In [10]:

Copied!





colors = 'viridis' # See https://matplotlib.org/stable/gallery/color/colormap_reference.html
sns.heatmap(df, cmap=colors)
plt.title("Heatmap from pandas dataframe, with '%s' colormap." % colors)
plt.show()
colors = 'viridis' # See https://matplotlib.org/stable/gallery/color/colormap_reference.html
sns.heatmap(df, cmap=colors)
plt.title("Heatmap from pandas dataframe, with '%s' colormap." % colors)
plt.show()

Heatmap from timeseries¶

In [11]:

Copied!





# Parse a whole year of weather data
weather_df = pd.read_csv('output/SchPark01.csv',
            sep = ';',
            na_values = ' ',
            names = ['date', 'time', 'ghi', 'ta'],
            parse_dates = [[0, 1]],
            index_col = 'date_time'
           )
weather_df
# Parse a whole year of weather data
weather_df = pd.read_csv('output/SchPark01.csv',
            sep = ';',
            na_values = ' ',
            names = ['date', 'time', 'ghi', 'ta'],
            parse_dates = [[0, 1]],
            index_col = 'date_time'
           )
weather_df

Out[11]:

	ghi	ta
date_time
2011-01-01 00:00:00	0.0	-0.6
2011-01-01 00:15:00	0.0	-0.4
2011-01-01 00:30:00	0.0	-0.5
...	...	...
2011-12-31 23:15:00	0.0	8.4
2011-12-31 23:30:00	0.0	8.5
2011-12-31 23:45:00	0.0	8.1

35040 rows × 2 columns

In [12]:

Copied!

# Temperatures(day_of_year, time)
temperatures = pd.pivot_table(weather_df, values='ta', index=weather_df.index.time, columns=weather_df.index.dayofyear)
temperatures
# Temperatures(day_of_year, time)
temperatures = pd.pivot_table(weather_df, values='ta', index=weather_df.index.time, columns=weather_df.index.dayofyear)
temperatures

Out[12]:

date_time	1	2	3	4	5	6	7	8	9	10	...	356	357	358	359	360	361	362	363	364	365
00:00:00	-0.6	1.0	0.7	-3.3	-6.7	-2.0	8.4	9.6	10.9	3.9	...	2.9	6.9	7.2	4.9	6.0	7.9	5.3	3.5	4.7	4.6
00:15:00	-0.4	1.0	0.5	-3.7	-7.9	-2.5	8.4	9.4	10.7	3.8	...	2.8	7.0	7.1	4.9	6.0	8.2	5.2	3.6	4.3	4.6
00:30:00	-0.5	1.0	0.5	-3.0	-7.4	-2.1	8.4	9.2	10.8	3.6	...	2.8	7.0	7.1	4.8	6.0	8.2	4.9	4.4	5.2	4.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
23:15:00	1.0	0.6	-3.5	-6.9	-2.2	8.4	9.0	10.5	4.2	3.7	...	6.7	7.3	5.0	6.0	8.3	5.5	5.0	4.4	4.1	8.4
23:30:00	1.0	0.5	-3.5	-7.3	-2.2	8.4	10.2	10.7	4.2	3.5	...	6.7	7.3	5.2	6.1	8.5	5.4	4.4	4.5	4.2	8.5
23:45:00	1.0	0.7	-3.3	-6.9	-2.5	8.6	10.4	10.9	4.1	3.5	...	6.9	7.2	4.9	6.1	8.1	5.3	3.9	5.1	4.6	8.1

96 rows × 357 columns

In [13]:

Copied!

sns.heatmap(temperatures, annot=False)
plt.title('Temperatures in Scharnhauser Park, 2011')
plt.show()
sns.heatmap(temperatures, annot=False)
plt.title('Temperatures in Scharnhauser Park, 2011')
plt.show()

Correlogram¶

https://www.python-graph-gallery.com/correlogram/

In [14]:

Copied!

# What are the available datasets?
', '.join(sns.get_dataset_names())
# What are the available datasets?
', '.join(sns.get_dataset_names())

Out[14]:

'anagrams, anscombe, attention, brain_networks, car_crashes, diamonds, dots, dowjones, exercise, flights, fmri, geyser, glue, healthexp, iris, mpg, penguins, planets, seaice, taxis, tips, titanic'

In [15]:

Copied!

penguins_df = sns.load_dataset('penguins')
penguins_df
penguins_df = sns.load_dataset('penguins')
penguins_df

Out[15]:

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	Male
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	Female
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	Female
...	...	...	...	...	...	...	...
341	Gentoo	Biscoe	50.4	15.7	222.0	5750.0	Male
342	Gentoo	Biscoe	45.2	14.8	212.0	5200.0	Female
343	Gentoo	Biscoe	49.9	16.1	213.0	5400.0	Male

344 rows × 7 columns

In [16]:

Copied!

# Basic correlogram
sns.pairplot(penguins_df, hue='species')
plt.show()
# Basic correlogram
sns.pairplot(penguins_df, hue='species')
plt.show()

FlapPyBird¶

Slightly modified version of FlapPyBird, with high score file and plot if desired:

https://github.com/EricDuminil/FlapPyBird

In [17]:

Copied!

high_score_filename = 'output/my_high_score.csv'
high_score_filename = 'output/my_high_score.csv'

In [18]:

Copied!





# Find the best score, without any library

previous_record = 0

with open(high_score_filename) as high_score_file:
    for line in high_score_file:
        when, old_score = line.split(';')
        old_score = int(old_score)
        if old_score > previous_record:
            previous_record = old_score

print("Current best score is : %d" % previous_record )
# Find the best score, without any library

previous_record = 0

with open(high_score_filename) as high_score_file:
    for line in high_score_file:
        when, old_score = line.split(';')
        old_score = int(old_score)
        if old_score > previous_record:
            previous_record = old_score

print("Current best score is : %d" % previous_record )

Current best score is : 18

In [19]:

Copied!





# Parse high score file with Pandas
high_score_df = pd.read_csv(high_score_filename,
                 sep=';',
                 names=['datetime', 'score'],
                 parse_dates=True,
                 index_col='datetime')
high_score_df
# Parse high score file with Pandas
high_score_df = pd.read_csv(high_score_filename,
                 sep=';',
                 names=['datetime', 'score'],
                 parse_dates=True,
                 index_col='datetime')
high_score_df

Out[19]:

	score
datetime
2023-01-26 21:11:23	1
2023-01-26 21:11:32	3
2023-01-26 21:11:41	4
...	...
2023-01-27 15:56:00	1
2023-01-27 15:56:13	8
2023-01-27 15:56:29	9

35 rows × 1 columns

In [20]:

Copied!





high_score_df.plot(ylim=(0, None),
        title='My FlapPyBird scores',
        use_index=False,
        xlabel='Attempt #')

plt.show()
high_score_df.plot(ylim=(0, None),
        title='My FlapPyBird scores',
        use_index=False,
        xlabel='Attempt #')

plt.show()

SENCE 2023 Examples - 1. Semester¶

Secret messages¶

In [1]:

Copied!

from itertools import cycle
import base64
from itertools import cycle
import base64

Key definition¶

In [2]:

Copied!





KEY = """THIS IS A SUPER SECRET CODE. ONLY SHARE IT WITH A TRUSTED PERSON.
SHARE IT ONCE, BEFORE SENDING ANY MESSAGE. DO NOT SEND IT WITH THE ENCRYPTED MESSAGE.
It should be random and long.
This isn't random or very long. An alternative would be
   secrets.token_bytes(4096)
, written to a file.
This method is very secure for the first message, but weak if
multiple messages are encoded with the same key.
"""
KEY = """THIS IS A SUPER SECRET CODE. ONLY SHARE IT WITH A TRUSTED PERSON.
SHARE IT ONCE, BEFORE SENDING ANY MESSAGE. DO NOT SEND IT WITH THE ENCRYPTED MESSAGE.
It should be random and long.
This isn't random or very long. An alternative would be
   secrets.token_bytes(4096)
, written to a file.
This method is very secure for the first message, but weak if
multiple messages are encoded with the same key.
"""

Secret message¶

In [3]:

Copied!

MESSAGE = """My super secret message. Just a test.😀🤯"""
MESSAGE = """My super secret message. Just a test.😀🤯"""

Functions¶

In [4]:

Copied!





def encode_message(message: str, key: str = KEY) -> bytes:
    """Encode message with key as one-time pad"""
    pairs = zip(message.encode(), cycle(key.encode()))
    encrypted = [a ^ b for a, b in pairs]
    return base64.b85encode(bytes(encrypted))

def decode_message(encoded_message: bytes, key: str = KEY) -> str:
    """Decode message with key as one-time pad"""
    encoded_bytes = base64.b85decode(encoded_message)
    decrypted = bytes(a ^ b for a, b in
                      zip(encoded_bytes, cycle(key.encode())))
    return decrypted.decode()
def encode_message(message: str, key: str = KEY) -> bytes:
    """Encode message with key as one-time pad"""
    pairs = zip(message.encode(), cycle(key.encode()))
    encrypted = [a ^ b for a, b in pairs]
    return base64.b85encode(bytes(encrypted))

def decode_message(encoded_message: bytes, key: str = KEY) -> str:
    """Decode message with key as one-time pad"""
    encoded_bytes = base64.b85decode(encoded_message)
    decrypted = bytes(a ^ b for a, b in
                      zip(encoded_bytes, cycle(key.encode())))
    return decrypted.decode()

Encode¶

In [5]:

Copied!

encoded_message = encode_message(MESSAGE)
encoded_message
# This message could be shared safely over an untrusted channel.
encoded_message = encode_message(MESSAGE)
encoded_message
# This message could be shared safely over an untrusted channel.

Out[5]:

b'88K-fRXH|NVN*6XA|NIJJ|Hk5Br`>AZw@eBRBtbAEkz(aZ=%|`$)vyY<^'

Decode¶

In [6]:

Copied!

decode_message(encoded_message.decode())
decode_message(encoded_message.decode())

Out[6]:

'My super secret message. Just a test.😀🤯'

In [7]:

Copied!

decode_message(b'88K-fRXH|NVN*6XA|NIJJ|Hk5Br`>AZw@eBRBtbAEkz(aZ=%|`$)vyY<^')
decode_message(b'88K-fRXH|NVN*6XA|NIJJ|Hk5Br`>AZw@eBRBtbAEkz(aZ=%|`$)vyY<^')

Out[7]:

'My super secret message. Just a test.😀🤯'

Merit-Order¶

Notebook : https://python.ericduminil.com/merit_order/

Merit Order Diagram

Contour Plots¶

Seaborn has been updated (current version in Anaconda : 0.12.2), and sns.kdeplot has a slightly different syntax than before

In [8]:

Copied!





# libraries & dataset
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('iris')
 
# set seaborn style
sns.set_style("white")

# Basic 2D density plot
sns.kdeplot(data=df, x='sepal_width', y='sepal_length')
plt.show()
# libraries & dataset
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('iris')
 
# set seaborn style
sns.set_style("white")

# Basic 2D density plot
sns.kdeplot(data=df, x='sepal_width', y='sepal_length')
plt.show()

In [9]:

Copied!

# Custom the color, add shade and bandwidth
sns.kdeplot(data=df, x='sepal_width', y='sepal_length', cmap="Reds", fill=True, bw_adjust=.5)
plt.show()
# Custom the color, add shade and bandwidth
sns.kdeplot(data=df, x='sepal_width', y='sepal_length', cmap="Reds", fill=True, bw_adjust=.5)
plt.show()

In [10]:

Copied!

# Add thresh parameter
sns.kdeplot(data=df, x='sepal_width', y='sepal_length', cmap="Blues", fill=True, thresh=0)
plt.show()
# Add thresh parameter
sns.kdeplot(data=df, x='sepal_width', y='sepal_length', cmap="Blues", fill=True, thresh=0)
plt.show()

Map with connections between cities¶

In [11]:

Copied!





# libraries
#! pip install basemap
from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt
# libraries
#! pip install basemap
from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt

In [12]:

Copied!

# Set the plot size for this notebook:
plt.rcParams["figure.figsize"]=15,12
# Set the plot size for this notebook:
plt.rcParams["figure.figsize"]=15,12

In [13]:

Copied!





# A basic map
m=Basemap(llcrnrlon=-100, llcrnrlat=20, urcrnrlon=30, urcrnrlat=70, projection='merc')
m.drawmapboundary(fill_color='#A6CAE0', linewidth=0)
m.fillcontinents(color='grey', alpha=0.7, lake_color='grey')
m.drawcoastlines(linewidth=0.1, color="white");
# A basic map
m=Basemap(llcrnrlon=-100, llcrnrlat=20, urcrnrlon=30, urcrnrlat=70, projection='merc')
m.drawmapboundary(fill_color='#A6CAE0', linewidth=0)
m.fillcontinents(color='grey', alpha=0.7, lake_color='grey')
m.drawcoastlines(linewidth=0.1, color="white");

In [14]:

Copied!





# Background map
m=Basemap(llcrnrlon=-100, llcrnrlat=20, urcrnrlon=30, urcrnrlat=70, projection='merc')
m.drawmapboundary(fill_color='#A6CAE0', linewidth=0)
m.fillcontinents(color='grey', alpha=0.7, lake_color='grey')
m.drawcoastlines(linewidth=0.1, color="white")

# Add a connection between new york and London
startlat = 40.78; startlon = -73.98
arrlat = 51.53; arrlon = 0.08
m.drawgreatcircle(startlon, startlat, arrlon, arrlat, linewidth=2, color='orange');
# Background map
m=Basemap(llcrnrlon=-100, llcrnrlat=20, urcrnrlon=30, urcrnrlat=70, projection='merc')
m.drawmapboundary(fill_color='#A6CAE0', linewidth=0)
m.fillcontinents(color='grey', alpha=0.7, lake_color='grey')
m.drawcoastlines(linewidth=0.1, color="white")

# Add a connection between new york and London
startlat = 40.78; startlon = -73.98
arrlat = 51.53; arrlon = 0.08
m.drawgreatcircle(startlon, startlat, arrlon, arrlat, linewidth=2, color='orange');

In [15]:

Copied!





# Dataframe: list of a few cities with their coordinates:
import pandas as pd
import pandas as pd
cities = {
    'city': ["Paris", "Melbourne", "Saint.Petersburg", "Abidjan", "Montreal", "Nairobi", "Salvador"],
    'lon': [2, 145, 30.32, -4.03, -73.57, 36.82, -38.5],
    'lat': [49, -38, 59.93, 5.33, 45.52, -1.29, -12.97]
    }
df = pd.DataFrame(cities, columns = ['city', 'lon', 'lat'])
df
# Dataframe: list of a few cities with their coordinates:
import pandas as pd
import pandas as pd
cities = {
    'city': ["Paris", "Melbourne", "Saint.Petersburg", "Abidjan", "Montreal", "Nairobi", "Salvador"],
    'lon': [2, 145, 30.32, -4.03, -73.57, 36.82, -38.5],
    'lat': [49, -38, 59.93, 5.33, 45.52, -1.29, -12.97]
    }
df = pd.DataFrame(cities, columns = ['city', 'lon', 'lat'])
df

Out[15]:

	city	lon	lat
0	Paris	2.00	49.00
1	Melbourne	145.00	-38.00
2	Saint.Petersburg	30.32	59.93
3	Abidjan	-4.03	5.33
4	Montreal	-73.57	45.52
5	Nairobi	36.82	-1.29
6	Salvador	-38.50	-12.97

In [16]:

Copied!





# Background map
m=Basemap(llcrnrlon=-179, llcrnrlat=-60, urcrnrlon=179, urcrnrlat=70,  projection='cyl')
m.drawmapboundary(fill_color='white', linewidth=0)
m.fillcontinents(color='#f2f2f2', alpha=0.7)
m.drawcoastlines(linewidth=0.1, color="white")

# Loop on every pair of cities to add the connection
for startIndex, startRow in df.iterrows():
    for endIndex in range(startIndex + 1, len(df.index)):
        endRow = df.iloc[endIndex]
        # print(f"{startRow.city} -> {endRow.city}")
        m.drawgreatcircle(startRow.lon, startRow.lat, endRow.lon, endRow.lat, linewidth=1, color='#69b3a2');

# Add city names
for i, row in df.iterrows():
    plt.annotate(row.city, xy=m(row.lon+3, row.lat), verticalalignment='center')
# Background map
m=Basemap(llcrnrlon=-179, llcrnrlat=-60, urcrnrlon=179, urcrnrlat=70,  projection='cyl')
m.drawmapboundary(fill_color='white', linewidth=0)
m.fillcontinents(color='#f2f2f2', alpha=0.7)
m.drawcoastlines(linewidth=0.1, color="white")

# Loop on every pair of cities to add the connection
for startIndex, startRow in df.iterrows():
    for endIndex in range(startIndex + 1, len(df.index)):
        endRow = df.iloc[endIndex]
        # print(f"{startRow.city} -> {endRow.city}")
        m.drawgreatcircle(startRow.lon, startRow.lat, endRow.lon, endRow.lat, linewidth=1, color='#69b3a2');

# Add city names
for i, row in df.iterrows():
    plt.annotate(row.city, xy=m(row.lon+3, row.lat), verticalalignment='center')

Beeswarm¶

https://python-graph-gallery.com/beeswarm/

In [17]:

Copied!

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

In [18]:

Copied!

my_variable = np.random.normal(loc=10, scale=5, size=500)
my_variable = np.random.normal(loc=10, scale=5, size=500)

In [19]:

Copied!





# Create the swarm plot
sns.swarmplot(y=my_variable)

# Customization
plt.title('Swarm Plot of My Variable (y-axis)')  # Set the title
plt.ylabel('My variable')  # Set the label for the y-axis

plt.show() # Display the chart
# Create the swarm plot
sns.swarmplot(y=my_variable)

# Customization
plt.title('Swarm Plot of My Variable (y-axis)')  # Set the title
plt.ylabel('My variable')  # Set the label for the y-axis

plt.show() # Display the chart

Network Charts¶

https://python-graph-gallery.com/500-network-chart-with-edge-bundling/

In [20]:

Copied!





# Import useful libraries
import matplotlib.pyplot as plt
#! pip install networkx
import networkx as nx
#! pip install netgraph
from netgraph import Graph
# Import useful libraries
import matplotlib.pyplot as plt
#! pip install networkx
import networkx as nx
#! pip install netgraph
from netgraph import Graph

In [21]:

Copied!

# Create a modular graph (dummy data)
partition_sizes = [10, 20, 30, 40]
g = nx.random_partition_graph(partition_sizes, 0.5, 0.1)
# Create a modular graph (dummy data)
partition_sizes = [10, 20, 30, 40]
g = nx.random_partition_graph(partition_sizes, 0.5, 0.1)

In [22]:

Copied!

%%capture --no-display
# ^ Hide annoying warning for this cell

# Build graph
Graph(g);
%%capture --no-display
# ^ Hide annoying warning for this cell

# Build graph
Graph(g);

Out[22]:

<netgraph._main.Graph at 0x7f6fa0f05910>

In [23]:

Copied!





node_to_community = dict()
node = 0
for community_id, size in enumerate(partition_sizes):
    for _ in range(size):
        node_to_community[node] = community_id
        node += 1

# Color nodes according to their community.
community_to_color = {
    0 : 'tab:blue',
    1 : 'tab:orange',
    2 : 'tab:green',
    3 : 'tab:red',
}
node_color = {node: community_to_color[community_id] \
              for node, community_id in node_to_community.items()}
node_to_community = dict()
node = 0
for community_id, size in enumerate(partition_sizes):
    for _ in range(size):
        node_to_community[node] = community_id
        node += 1

# Color nodes according to their community.
community_to_color = {
    0 : 'tab:blue',
    1 : 'tab:orange',
    2 : 'tab:green',
    3 : 'tab:red',
}
node_color = {node: community_to_color[community_id] \
              for node, community_id in node_to_community.items()}

In [24]:

Copied!





fig, ax = plt.subplots()
Graph(g,
      node_color=node_color, # indicates the community each belongs to  
      node_edge_width=0,     # no black border around nodes 
      edge_width=0.1,        # use thin edges, as they carry no information in this visualisation
      edge_alpha=0.5,        # low edge alpha values accentuates bundles as they appear darker than single edges
      node_layout='community', node_layout_kwargs=dict(node_to_community=node_to_community),
      ax=ax,
)
plt.show()
fig, ax = plt.subplots()
Graph(g,
      node_color=node_color, # indicates the community each belongs to  
      node_edge_width=0,     # no black border around nodes 
      edge_width=0.1,        # use thin edges, as they carry no information in this visualisation
      edge_alpha=0.5,        # low edge alpha values accentuates bundles as they appear darker than single edges
      node_layout='community', node_layout_kwargs=dict(node_to_community=node_to_community),
      ax=ax,
)
plt.show()

$No description has been provided for this image$

Chess¶

In [25]:

Copied!

#! pip import chess
import chess
#! pip import chess
import chess

In [26]:

Copied!

board = chess.Board()
board = chess.Board()

In [27]:

Copied!

board
board

Out[27]:

In [28]:

Copied!

board.legal_moves
board.legal_moves

Out[28]:

<LegalMoveGenerator at 0x7f6f99b6dbb0 (Nh3, Nf3, Nc3, Na3, h3, g3, f3, e3, d3, c3, b3, a3, h4, g4, f4, e4, d4, c4, b4, a4)>

In [29]:

Copied!

chess.Move.from_uci("a8a1") in board.legal_moves
chess.Move.from_uci("a8a1") in board.legal_moves

Out[29]:

False

In [30]:

Copied!

board.push_san("e4")

board.push_san("e5")

board.push_san("Qh5")

board.push_san("Nc6")

board.push_san("Bc4")

board.push_san("Nf6")

board.push_san("Qxf7")

board
board.push_san("e4")

board.push_san("e5")

board.push_san("Qh5")

board.push_san("Nc6")

board.push_san("Bc4")

board.push_san("Nf6")

board.push_san("Qxf7")

board

Out[30]:

In [31]:

Copied!

board.is_checkmate()
board.is_checkmate()

Out[31]:

True

Stock prices¶

In [32]:

Copied!





#! pip install mplfinance
#! pip install yfinance
import mplfinance as mpf
import yfinance as yf #(for the dataset)
from datetime import datetime, timedelta
#! pip install mplfinance
#! pip install yfinance
import mplfinance as mpf
import yfinance as yf #(for the dataset)
from datetime import datetime, timedelta

In [33]:

Copied!





today = datetime.today()
one_month_ago = today - timedelta(days=30)

# Define the stock symbol and date range
stock_symbol = "AAPL"  # Example: Apple Inc.

# Load historical data
stock_data = yf.download(stock_symbol, start=one_month_ago, end=today)

# plot
mpf.plot(stock_data, type='candle')
today = datetime.today()
one_month_ago = today - timedelta(days=30)

# Define the stock symbol and date range
stock_symbol = "AAPL"  # Example: Apple Inc.

# Load historical data
stock_data = yf.download(stock_symbol, start=one_month_ago, end=today)

# plot
mpf.plot(stock_data, type='candle')

[*********************100%***********************]  1 of 1 completed

/home/ricou/www/PythonWorkshop/venv/lib/python3.9/site-packages/yfinance/base.py:304: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df["Dividends"].fillna(0, inplace=True)
/home/ricou/www/PythonWorkshop/venv/lib/python3.9/site-packages/yfinance/base.py:304: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df["Dividends"].fillna(0, inplace=True)
/home/ricou/www/PythonWorkshop/venv/lib/python3.9/site-packages/yfinance/base.py:305: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df["Stock Splits"].fillna(0, inplace=True)
/home/ricou/www/PythonWorkshop/venv/lib/python3.9/site-packages/yfinance/base.py:305: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df["Stock Splits"].fillna(0, inplace=True)
/home/ricou/www/PythonWorkshop/venv/lib/python3.9/site-packages/yfinance/utils.py:367: FutureWarning: The 'unit' keyword in TimedeltaIndex construction is deprecated and will be removed in a future version. Use pd.to_timedelta instead.
  df.index += _pd.TimedeltaIndex(dst_error_hours, 'h')

Music with Python¶

https://python.ericduminil.com/Music%20with%20Python.html

Venn diagrams¶

In [34]:

Copied!





#! pip install venn
from venn import venn

musicians = {
    "Members of The Beatles": {"Paul McCartney", "John Lennon", "George Harrison", "Ringo Starr"},
    "Guitarists": {"John Lennon", "George Harrison", "Jimi Hendrix", "Eric Clapton", "Carlos Santana"},
    "Played at Woodstock": {"Jimi Hendrix", "Carlos Santana", "Keith Moon"}
}
venn(musicians);
#! pip install venn
from venn import venn

musicians = {
    "Members of The Beatles": {"Paul McCartney", "John Lennon", "George Harrison", "Ringo Starr"},
    "Guitarists": {"John Lennon", "George Harrison", "Jimi Hendrix", "Eric Clapton", "Carlos Santana"},
    "Played at Woodstock": {"Jimi Hendrix", "Carlos Santana", "Keith Moon"}
}
venn(musicians);

SENCE 2024 Examples¶

Sankey with different colors¶

Find color combinations at https://designwizard.com/blog/colour-combination/#gray-ff-and-lime-punch-dedff

In [1]:

Copied!





import matplotlib.pyplot as plt

from matplotlib.sankey import Sankey

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, xticks=[], yticks=[], title="Two Systems")
flows = [0.25, 0.15, 0.60, -0.10, -0.05, -0.25, -0.15, -0.10, -0.35]
sankey = Sankey(ax=ax, unit=None)
sankey.add(flows=flows, label='one',
           orientations=[-1, 1, 0, 1, 1, 1, -1, -1, 0],
           facecolor='#606060FF')
sankey.add(flows=[-0.25, 0.15, 0.1], label='two',
           orientations=[-1, -1, -1], prior=0, connect=(0, 0),
           facecolor='#D6ED17FF')
diagrams = sankey.finish()
diagrams[-1].patch.set_hatch('/')
plt.legend();
import matplotlib.pyplot as plt

from matplotlib.sankey import Sankey

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, xticks=[], yticks=[], title="Two Systems")
flows = [0.25, 0.15, 0.60, -0.10, -0.05, -0.25, -0.15, -0.10, -0.35]
sankey = Sankey(ax=ax, unit=None)
sankey.add(flows=flows, label='one',
           orientations=[-1, 1, 0, 1, 1, 1, -1, -1, 0],
           facecolor='#606060FF')
sankey.add(flows=[-0.25, 0.15, 0.1], label='two',
           orientations=[-1, -1, -1], prior=0, connect=(0, 0),
           facecolor='#D6ED17FF')
diagrams = sankey.finish()
diagrams[-1].patch.set_hatch('/')
plt.legend();

Read online CSV¶

In [2]:

Copied!

import pandas as pd
import pandas as pd

In [3]:

Copied!

pd.read_csv("https://python.ericduminil.com/files/wetterstation.temp.csv")
pd.read_csv("https://python.ericduminil.com/files/wetterstation.temp.csv")

Out[3]:

	#group	false	false.1	true	true.1	false.2	false.3	true.2	true.3
0	#datatype	string	long	dateTime:RFC3339	dateTime:RFC3339	dateTime:RFC3339	double	string	string
1	#default	mean	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	NaN	result	table	_start	_stop	_time	_value	_field	_measurement
3	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-04-01T07:38:29.058Z	9.200975609756101	value	wetterstation.temperatur
4	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-04-01T22:42:28.424Z	8.58029850746268	value	wetterstation.temperatur
...	...	...	...	...	...	...	...	...	...
359	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-11-10T19:18:43.354Z	3.659710144927535	value	wetterstation.temperatur
360	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-11-11T10:22:42.72Z	1.9895384615384597	value	wetterstation.temperatur
361	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-11-12T01:26:42.086Z	5.282580645161291	value	wetterstation.temperatur
362	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-11-12T16:30:41.452Z	4.560792079207922	value	wetterstation.temperatur
363	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-11-12T21:56:11.652Z	2.4579310344827574	value	wetterstation.temperatur

364 rows × 9 columns

In [4]:

Copied!

pd.read_csv("https://python.ericduminil.com/files/wetterstation.temp.csv",
            skiprows=3)
pd.read_csv("https://python.ericduminil.com/files/wetterstation.temp.csv",
            skiprows=3)

Out[4]:

	Unnamed: 0	result	table	_start	_stop	_time	_value	_field	_measurement
0	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-04-01T07:38:29.058Z	9.200976	value	wetterstation.temperatur
1	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-04-01T22:42:28.424Z	8.580299	value	wetterstation.temperatur
2	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-04-02T13:46:27.79Z	8.436757	value	wetterstation.temperatur
3	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-04-03T04:50:27.156Z	6.948889	value	wetterstation.temperatur
4	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-04-03T19:54:26.522Z	9.091223	value	wetterstation.temperatur
...	...	...	...	...	...	...	...	...	...
356	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-11-10T19:18:43.354Z	3.659710	value	wetterstation.temperatur
357	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-11-11T10:22:42.72Z	1.989538	value	wetterstation.temperatur
358	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-11-12T01:26:42.086Z	5.282581	value	wetterstation.temperatur
359	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-11-12T16:30:41.452Z	4.560792	value	wetterstation.temperatur
360	NaN	NaN	0	2024-03-31T22:00:00Z	2024-11-12T21:56:11.652Z	2024-11-12T21:56:11.652Z	2.457931	value	wetterstation.temperatur

361 rows × 9 columns

In [5]:

Copied!

pd.read_csv("https://python.ericduminil.com/files/wetterstation.temp.csv",
            skiprows=3,
            parse_dates=[3, 4, 5])
pd.read_csv("https://python.ericduminil.com/files/wetterstation.temp.csv",
            skiprows=3,
            parse_dates=[3, 4, 5])

Out[5]:

	Unnamed: 0	result	table	_start	_stop	_time	_value	_field	_measurement
0	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	2024-04-01 07:38:29.058000+00:00	9.200976	value	wetterstation.temperatur
1	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	2024-04-01 22:42:28.424000+00:00	8.580299	value	wetterstation.temperatur
2	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	2024-04-02 13:46:27.790000+00:00	8.436757	value	wetterstation.temperatur
3	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	2024-04-03 04:50:27.156000+00:00	6.948889	value	wetterstation.temperatur
4	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	2024-04-03 19:54:26.522000+00:00	9.091223	value	wetterstation.temperatur
...	...	...	...	...	...	...	...	...	...
356	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	2024-11-10 19:18:43.354000+00:00	3.659710	value	wetterstation.temperatur
357	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	2024-11-11 10:22:42.720000+00:00	1.989538	value	wetterstation.temperatur
358	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	2024-11-12 01:26:42.086000+00:00	5.282581	value	wetterstation.temperatur
359	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	2024-11-12 16:30:41.452000+00:00	4.560792	value	wetterstation.temperatur
360	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	2024-11-12 21:56:11.652000+00:00	2.457931	value	wetterstation.temperatur

361 rows × 9 columns

In [6]:

Copied!





pd.read_csv("https://python.ericduminil.com/files/wetterstation.temp.csv",
            skiprows=3,
            parse_dates=[3, 4, 5],
            index_col='_time'
           )
pd.read_csv("https://python.ericduminil.com/files/wetterstation.temp.csv",
            skiprows=3,
            parse_dates=[3, 4, 5],
            index_col='_time'
           )

Out[6]:

	Unnamed: 0	result	table	_start	_stop	_value	_field	_measurement
_time
2024-04-01 07:38:29.058000+00:00	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	9.200976	value	wetterstation.temperatur
2024-04-01 22:42:28.424000+00:00	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	8.580299	value	wetterstation.temperatur
2024-04-02 13:46:27.790000+00:00	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	8.436757	value	wetterstation.temperatur
2024-04-03 04:50:27.156000+00:00	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	6.948889	value	wetterstation.temperatur
2024-04-03 19:54:26.522000+00:00	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	9.091223	value	wetterstation.temperatur
...	...	...	...	...	...	...	...	...
2024-11-10 19:18:43.354000+00:00	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	3.659710	value	wetterstation.temperatur
2024-11-11 10:22:42.720000+00:00	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	1.989538	value	wetterstation.temperatur
2024-11-12 01:26:42.086000+00:00	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	5.282581	value	wetterstation.temperatur
2024-11-12 16:30:41.452000+00:00	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	4.560792	value	wetterstation.temperatur
2024-11-12 21:56:11.652000+00:00	NaN	NaN	0	2024-03-31 22:00:00+00:00	2024-11-12 21:56:11.652000+00:00	2.457931	value	wetterstation.temperatur

361 rows × 8 columns

In [7]:

Copied!





pd.read_csv("https://python.ericduminil.com/files/wetterstation.temp.csv",
            skiprows=3,
            usecols=['_time', '_value'],
            parse_dates=[0],
            index_col='_time',
           )
pd.read_csv("https://python.ericduminil.com/files/wetterstation.temp.csv",
            skiprows=3,
            usecols=['_time', '_value'],
            parse_dates=[0],
            index_col='_time',
           )

Out[7]:

	_value
_time
2024-04-01 07:38:29.058000+00:00	9.200976
2024-04-01 22:42:28.424000+00:00	8.580299
2024-04-02 13:46:27.790000+00:00	8.436757
2024-04-03 04:50:27.156000+00:00	6.948889
2024-04-03 19:54:26.522000+00:00	9.091223
...	...
2024-11-10 19:18:43.354000+00:00	3.659710
2024-11-11 10:22:42.720000+00:00	1.989538
2024-11-12 01:26:42.086000+00:00	5.282581
2024-11-12 16:30:41.452000+00:00	4.560792
2024-11-12 21:56:11.652000+00:00	2.457931

361 rows × 1 columns

In [8]:

Copied!





df = pd.read_csv("https://python.ericduminil.com/files/wetterstation.temp.csv",
            skiprows=3,
            usecols=['_time', '_value'],
            parse_dates=[0],
            index_col='_time',
           )
df = df.rename(columns={'_value': 'temperature'})
df
df = pd.read_csv("https://python.ericduminil.com/files/wetterstation.temp.csv",
            skiprows=3,
            usecols=['_time', '_value'],
            parse_dates=[0],
            index_col='_time',
           )
df = df.rename(columns={'_value': 'temperature'})
df

Out[8]:

	temperature
_time
2024-04-01 07:38:29.058000+00:00	9.200976
2024-04-01 22:42:28.424000+00:00	8.580299
2024-04-02 13:46:27.790000+00:00	8.436757
2024-04-03 04:50:27.156000+00:00	6.948889
2024-04-03 19:54:26.522000+00:00	9.091223
...	...
2024-11-10 19:18:43.354000+00:00	3.659710
2024-11-11 10:22:42.720000+00:00	1.989538
2024-11-12 01:26:42.086000+00:00	5.282581
2024-11-12 16:30:41.452000+00:00	4.560792
2024-11-12 21:56:11.652000+00:00	2.457931

361 rows × 1 columns

In [9]:

Copied!

df.plot();
df.plot();

In [10]:

Copied!

df.resample('1W').mean().plot();
df.resample('1W').mean().plot();

Include image in Notebook¶

![Minion](http://octodex.github.com/images/daftpunktocat-thomas.gif)

Minion

Create output/ folder if needed¶

In [12]:

Copied!

from pathlib import Path
from pathlib import Path

In [13]:

Copied!

Path('output').mkdir(exist_ok=True)
Path('output').mkdir(exist_ok=True)

2-D Density Plot¶

https://python-graph-gallery.com/2d-density-plot/

In [14]:

Copied!





import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde as kde

# Create data: 200 points
data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 3]], 200)
x, y = data.T

# Create a figure with 6 plot areas
fig, axes = plt.subplots(ncols=6, nrows=1, figsize=(21, 5))

# Everything starts with a Scatterplot
axes[0].set_title('Scatterplot')
axes[0].plot(x, y, 'ko')

# Thus we can cut the plotting window in several hexbins
nbins = 20
axes[1].set_title('Hexbin')
axes[1].hexbin(x, y, gridsize=nbins, cmap=plt.cm.BuGn_r)

# 2D Histogram
axes[2].set_title('2D Histogram')
axes[2].hist2d(x, y, bins=nbins, cmap=plt.cm.BuGn_r)

# Evaluate a gaussian kde on a regular grid of nbins x nbins over data extents
k = kde(data.T)
xi, yi = np.mgrid[x.min():x.max():nbins*1j, y.min():y.max():nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))

# plot a density
axes[3].set_title('Calculate Gaussian KDE')
axes[3].pcolormesh(xi, yi, zi.reshape(xi.shape), cmap=plt.cm.BuGn_r)

# add shading
axes[4].set_title('2D Density with shading')
axes[4].pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.BuGn_r)

# contour
axes[5].set_title('Contour')
axes[5].pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.BuGn_r)
axes[5].contour(xi, yi, zi.reshape(xi.shape) );
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde as kde

# Create data: 200 points
data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 3]], 200)
x, y = data.T

# Create a figure with 6 plot areas
fig, axes = plt.subplots(ncols=6, nrows=1, figsize=(21, 5))

# Everything starts with a Scatterplot
axes[0].set_title('Scatterplot')
axes[0].plot(x, y, 'ko')

# Thus we can cut the plotting window in several hexbins
nbins = 20
axes[1].set_title('Hexbin')
axes[1].hexbin(x, y, gridsize=nbins, cmap=plt.cm.BuGn_r)

# 2D Histogram
axes[2].set_title('2D Histogram')
axes[2].hist2d(x, y, bins=nbins, cmap=plt.cm.BuGn_r)

# Evaluate a gaussian kde on a regular grid of nbins x nbins over data extents
k = kde(data.T)
xi, yi = np.mgrid[x.min():x.max():nbins*1j, y.min():y.max():nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))

# plot a density
axes[3].set_title('Calculate Gaussian KDE')
axes[3].pcolormesh(xi, yi, zi.reshape(xi.shape), cmap=plt.cm.BuGn_r)

# add shading
axes[4].set_title('2D Density with shading')
axes[4].pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.BuGn_r)

# contour
axes[5].set_title('Contour')
axes[5].pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.BuGn_r)
axes[5].contour(xi, yi, zi.reshape(xi.shape) );

Circular Barplot¶

https://python-graph-gallery.com/circular-barplot/

Simple¶

In [15]:

Copied!





# import numpy to get the value of Pi
import numpy as np

# Add a bar in the polar coordinates
plt.subplot(111, polar=True);
plt.bar(x=0, height=10, width=np.pi/2, bottom=5);
# import numpy to get the value of Pi
import numpy as np

# Add a bar in the polar coordinates
plt.subplot(111, polar=True);
plt.bar(x=0, height=10, width=np.pi/2, bottom=5);

In [16]:

Copied!





import pandas as pd

# Build a dataset
df = pd.DataFrame(
        {
            'Name': ['item ' + str(i) for i in list(range(1, 51)) ],
            'Value': np.random.randint(low=10, high=100, size=50)
        })

# Show 3 first rows
df.head(3)
import pandas as pd

# Build a dataset
df = pd.DataFrame(
        {
            'Name': ['item ' + str(i) for i in list(range(1, 51)) ],
            'Value': np.random.randint(low=10, high=100, size=50)
        })

# Show 3 first rows
df.head(3)

Out[16]:

	Name	Value
0	item 1	64
1	item 2	70
2	item 3	12

In [17]:

Copied!





# set figure size
plt.figure(figsize=(20,10))

# plot polar axis
ax = plt.subplot(111, polar=True)

# remove grid
plt.axis('off')

# Set the coordinates limits
upperLimit = 100
lowerLimit = 30

# Compute max and min in the dataset
max = df['Value'].max()

# Let's compute heights: they are a conversion of each item value in those new coordinates
# In our example, 0 in the dataset will be converted to the lowerLimit (10)
# The maximum will be converted to the upperLimit (100)
slope = (max - lowerLimit) / max
heights = slope * df.Value + lowerLimit

# Compute the width of each bar. In total we have 2*Pi = 360°
width = 2*np.pi / len(df.index)

# Compute the angle each bar is centered on:
indexes = list(range(1, len(df.index)+1))
angles = [element * width for element in indexes]
angles

# Draw bars
bars = ax.bar(
    x=angles, 
    height=heights, 
    width=width, 
    bottom=lowerLimit,
    linewidth=2, 
    edgecolor="white")
# set figure size
plt.figure(figsize=(20,10))

# plot polar axis
ax = plt.subplot(111, polar=True)

# remove grid
plt.axis('off')

# Set the coordinates limits
upperLimit = 100
lowerLimit = 30

# Compute max and min in the dataset
max = df['Value'].max()

# Let's compute heights: they are a conversion of each item value in those new coordinates
# In our example, 0 in the dataset will be converted to the lowerLimit (10)
# The maximum will be converted to the upperLimit (100)
slope = (max - lowerLimit) / max
heights = slope * df.Value + lowerLimit

# Compute the width of each bar. In total we have 2*Pi = 360°
width = 2*np.pi / len(df.index)

# Compute the angle each bar is centered on:
indexes = list(range(1, len(df.index)+1))
angles = [element * width for element in indexes]
angles

# Draw bars
bars = ax.bar(
    x=angles, 
    height=heights, 
    width=width, 
    bottom=lowerLimit,
    linewidth=2, 
    edgecolor="white")

Complex - Star Wars¶

https://python-graph-gallery.com/532-customizing-circular-barplot-in-matplotlib/

In [18]:

Copied!





import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from matplotlib.lines import Line2D
from matplotlib import font_manager

import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from matplotlib.lines import Line2D
from matplotlib import font_manager

import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning) 

In [19]:

Copied!





import tempfile
from pathlib import Path
import urllib

# Create a temporary directory for the font files
path = Path(tempfile.mkdtemp())

# URL and downloaded path of the fonts
url_label_font = "https://github.com/Lisa-Ho/small-data-projects/raw/main/assets/fonts/Ubuntu-R.ttf"
url_title_font = "https://github.com/Lisa-Ho/small-data-projects/raw/main/assets/fonts/Mandalore-K77lD.otf"
path_label_font = path / "Ubuntu-R.ttf"
path_title_font = path / "Mandalore-K77lD.otf"

# Download the fonts to our temporary directory
urllib.request.urlretrieve(url_label_font, path_label_font)
urllib.request.urlretrieve(url_title_font, path_title_font)

# Create a Matplotlib Font object from our `.ttf` files
label_font = font_manager.FontEntry(fname=str(path_label_font), name="Ubuntu-R")
title_font = font_manager.FontEntry(fname=str(path_title_font), name="Mandalore-K77lD")

# Register objects with Matplotlib's ttf list
font_manager.fontManager.ttflist.append(label_font)
font_manager.fontManager.ttflist.append(title_font)
import tempfile
from pathlib import Path
import urllib

# Create a temporary directory for the font files
path = Path(tempfile.mkdtemp())

# URL and downloaded path of the fonts
url_label_font = "https://github.com/Lisa-Ho/small-data-projects/raw/main/assets/fonts/Ubuntu-R.ttf"
url_title_font = "https://github.com/Lisa-Ho/small-data-projects/raw/main/assets/fonts/Mandalore-K77lD.otf"
path_label_font = path / "Ubuntu-R.ttf"
path_title_font = path / "Mandalore-K77lD.otf"

# Download the fonts to our temporary directory
urllib.request.urlretrieve(url_label_font, path_label_font)
urllib.request.urlretrieve(url_title_font, path_title_font)

# Create a Matplotlib Font object from our `.ttf` files
label_font = font_manager.FontEntry(fname=str(path_label_font), name="Ubuntu-R")
title_font = font_manager.FontEntry(fname=str(path_title_font), name="Mandalore-K77lD")

# Register objects with Matplotlib's ttf list
font_manager.fontManager.ttflist.append(label_font)
font_manager.fontManager.ttflist.append(title_font)

In [20]:

Copied!

# load cleaned data set
df = pd.read_csv('https://raw.githubusercontent.com/Lisa-Ho/small-data-projects/main/2023/2308-star-wars-scripts/episode1_each_line_of_anakin_clean.csv')

# print first rows to check it's all looking ok
df.head()
# load cleaned data set
df = pd.read_csv('https://raw.githubusercontent.com/Lisa-Ho/small-data-projects/main/2023/2308-star-wars-scripts/episode1_each_line_of_anakin_clean.csv')

# print first rows to check it's all looking ok
df.head()

Out[20]:

	id	to	text	number	episode
0	271.0	WATTO	Mel tassa cho-passa	3	1
1	274.0	PADME	Are you an angel?	4	1
2	276.0	PADME	An angel. I've heard the deep space pilots tal...	46	1
3	278.0	PADME	I listen to all the traders and star pilots wh...	27	1
4	280.0	PADME	All mylife.	2	1

In [21]:

Copied!

# calculate corect angular position in circular bar plot
x_max = 2*np.pi
df['angular_pos'] = np.linspace(0, x_max, len(df), endpoint=False)
# calculate corect angular position in circular bar plot
x_max = 2*np.pi
df['angular_pos'] = np.linspace(0, x_max, len(df), endpoint=False)

In [22]:

Copied!





# store colors to use in dictionary
chart_colors = {'bg': '#0C081F', 'QUI-GON': '#F271A7', 'PADME': '#40B8E1', 'OBI-WAN':'#75EAB6',
                'R2D2': '#F4E55E', 'other': '#444A68'}

# map colors for bars to the data
df['colors'] = df['to'].map(chart_colors)

# fill with neutral color for secondary characters
df['colors'] = df['colors'].fillna(chart_colors['other'])
# store colors to use in dictionary
chart_colors = {'bg': '#0C081F', 'QUI-GON': '#F271A7', 'PADME': '#40B8E1', 'OBI-WAN':'#75EAB6',
                'R2D2': '#F4E55E', 'other': '#444A68'}

# map colors for bars to the data
df['colors'] = df['to'].map(chart_colors)

# fill with neutral color for secondary characters
df['colors'] = df['colors'].fillna(chart_colors['other'])

In [23]:

Copied!





# layout  -----------------------------------------
# setup figure with polar projection
fig, ax = plt.subplots(figsize=(10, 10), 
                       subplot_kw={'projection': 'polar'})

# set background colors
ax.set_facecolor(chart_colors['bg'])
fig.set_facecolor(chart_colors['bg'])

# plot data  -----------------------------------------
ax.bar(df['angular_pos'], df['number'], alpha=1, color=df['colors'], 
       linewidth=0, width=0.052, zorder=3)

# format axis -----------------------------------------
# start on the top and plot bars clockwise
ax.set_theta_zero_location('N')
ax.set_theta_direction(-1)   

# scale y-axis to account for area size of bars 
max_value = 50
r_offset = -10
r2 = max_value - r_offset
alpha = r2 - r_offset
v_offset = r_offset**2 / alpha
forward = lambda value: ((value + v_offset) * alpha)**0.5 + r_offset
reverse = lambda radius: (radius - r_offset) ** 2 / alpha - v_offset
ax.set_rlim(0, max_value)
ax.set_rorigin(r_offset)
ax.set_yscale('function', functions=(
    lambda value: np.where(value >= 0, forward(value), value),
    lambda radius: np.where(radius > 0, reverse(radius), radius)))

# format labels and grid
ax.set_rlabel_position(0)
ax.set_yticks([10,20,30,40])
ax.set_yticklabels([10,20,30,40],fontsize=9, color='white',alpha=0.35)

# format gridlines
ax.set_thetagrids(angles=[])
ax.grid(visible=True, axis='y', zorder=2, color='white',
        linewidth=0.75, alpha=0.2)

# remove spines
ax.spines[:].set_visible(False)

# custom legend  -----------------------------------------
# add axis to hold legend
lgd = fig.add_axes([0.75,0.71, 0.15, 0.25]) 

# define legend elements
kw = dict(marker='o', color=chart_colors['bg'], markersize=8, alpha=1, 
          markeredgecolor='None', linewidth=0)
legend_elements =[Line2D([0],[0], 
                          markerfacecolor=chart_colors['PADME'],
                          label='Padme', 
                          **kw), 
                  Line2D([0], [0], 
                         markerfacecolor=chart_colors['QUI-GON'],
                         label='Qui-Gon', 
                         **kw), 
                  Line2D([0], [0], 
                         markerfacecolor=chart_colors['R2D2'], 
                         label='R2D2', 
                         **kw), 
                  Line2D([0], [0], 
                         markerfacecolor=chart_colors['OBI-WAN'], 
                         label='Obi-Wan', 
                         **kw), 
                  Line2D([0], [0], 
                         markerfacecolor=chart_colors['other'], 
                         label='Other', 
                         **kw)] 

# visualise legend and remove axis around it
L = lgd.legend(frameon=False, handles=legend_elements, loc='center', 
               ncol=1, handletextpad=0.2, labelspacing=1)
plt.setp(L.texts, va='baseline', color='white', size=12, 
         fontfamily=label_font.name)    
lgd.axis('off')

# circular annotation  -----------------------------------------
# draw an inner circle on a new axis
circ = fig.add_axes([0.453, 0.435, 0.12, 0.12],polar=True) 
line_angular_pos = df['angular_pos'][1:-5]
line_r = [5] * len(line_angular_pos)

#plot line and markers for start + end
circ.plot(line_angular_pos, line_r, zorder=5, color='white', 
          linewidth=0.75, alpha=0.4)
circ.plot(line_angular_pos.to_list()[0], line_r[0], zorder=5, color='white', 
          linewidth=0,marker='o', markersize=3,alpha=0.4)
circ.plot(line_angular_pos.to_list()[-1], line_r[-1], zorder=5, color='white', 
          linewidth=0,marker='>', markersize=3,alpha=0.4)

# format axis
circ.set_theta_zero_location('N')
circ.set_theta_direction(-1)  
circ.axis('off')

# text annotations -----------------------------------------
ax.annotate('1 line', xy=(0.1, 48), xycoords='data', xytext=(40, 20), 
            textcoords='offset points', 
            fontsize=10, fontfamily=label_font.name,
            ha='left', va='baseline', 
            annotation_clip=False, 
            color='#ababab',
            arrowprops=dict(arrowstyle='->',edgecolor='#ababab', 
                            connectionstyle='arc3,rad=.5', alpha=0.75))
ax.annotate('Words\nper line', xy=(-0.05, 22), xycoords='data', xytext=(0, 0), 
            textcoords='offset points', 
            fontsize=10, fontfamily=label_font.name,
            ha='right', va='baseline', 
            annotation_clip=False, 
            color='#ababab')
ax.annotate('', xy=(-0.02, 38), xycoords='data', xytext=(0, -105), 
            textcoords='offset points', 
            fontsize=10, fontfamily=label_font.name,
            ha='right', va='baseline', 
            annotation_clip=False, 
            color='#ababab',
            arrowprops=dict(arrowstyle='<->',edgecolor='#ababab', linewidth=0.75,
                            connectionstyle='arc3,rad=0', alpha=0.75 ))
lgd.annotate('Talking to', xy=(0.35, 0.78), xycoords='data', xytext=(-18, 14), 
             textcoords='offset points', 
            fontsize=10, fontfamily=label_font.name,
            ha='right', va='center', 
            annotation_clip=False, 
            color='#ababab',
            arrowprops=dict(arrowstyle='->',edgecolor='#ababab', 
                            connectionstyle='arc3,rad=-.5', alpha=0.75))

# Title + Credits  -----------------------------------------
plt.figtext(0.5,1.03, 'Star Wars Episode I', 
            fontfamily=title_font.name, 
            fontsize=55, color='white', ha='center')
plt.figtext(0.5,0.98, 'Each line of Anakin', 
            fontfamily=label_font.name,
            fontsize=24, color='white', ha='center')
plt.figtext(0.5,0.1, 'Data: jcwieme/data-scripts-star-wars  |  Design: Lisa Hornung', 
            fontfamily=label_font.name,
            fontsize=8, color='white', ha='center', alpha=0.75)

plt.savefig('output/anakin.png')
plt.show()
# layout  -----------------------------------------
# setup figure with polar projection
fig, ax = plt.subplots(figsize=(10, 10), 
                       subplot_kw={'projection': 'polar'})

# set background colors
ax.set_facecolor(chart_colors['bg'])
fig.set_facecolor(chart_colors['bg'])

# plot data  -----------------------------------------
ax.bar(df['angular_pos'], df['number'], alpha=1, color=df['colors'], 
       linewidth=0, width=0.052, zorder=3)

# format axis -----------------------------------------
# start on the top and plot bars clockwise
ax.set_theta_zero_location('N')
ax.set_theta_direction(-1)   

# scale y-axis to account for area size of bars 
max_value = 50
r_offset = -10
r2 = max_value - r_offset
alpha = r2 - r_offset
v_offset = r_offset**2 / alpha
forward = lambda value: ((value + v_offset) * alpha)**0.5 + r_offset
reverse = lambda radius: (radius - r_offset) ** 2 / alpha - v_offset
ax.set_rlim(0, max_value)
ax.set_rorigin(r_offset)
ax.set_yscale('function', functions=(
    lambda value: np.where(value >= 0, forward(value), value),
    lambda radius: np.where(radius > 0, reverse(radius), radius)))

# format labels and grid
ax.set_rlabel_position(0)
ax.set_yticks([10,20,30,40])
ax.set_yticklabels([10,20,30,40],fontsize=9, color='white',alpha=0.35)

# format gridlines
ax.set_thetagrids(angles=[])
ax.grid(visible=True, axis='y', zorder=2, color='white',
        linewidth=0.75, alpha=0.2)

# remove spines
ax.spines[:].set_visible(False)

# custom legend  -----------------------------------------
# add axis to hold legend
lgd = fig.add_axes([0.75,0.71, 0.15, 0.25]) 

# define legend elements
kw = dict(marker='o', color=chart_colors['bg'], markersize=8, alpha=1, 
          markeredgecolor='None', linewidth=0)
legend_elements =[Line2D([0],[0], 
                          markerfacecolor=chart_colors['PADME'],
                          label='Padme', 
                          **kw), 
                  Line2D([0], [0], 
                         markerfacecolor=chart_colors['QUI-GON'],
                         label='Qui-Gon', 
                         **kw), 
                  Line2D([0], [0], 
                         markerfacecolor=chart_colors['R2D2'], 
                         label='R2D2', 
                         **kw), 
                  Line2D([0], [0], 
                         markerfacecolor=chart_colors['OBI-WAN'], 
                         label='Obi-Wan', 
                         **kw), 
                  Line2D([0], [0], 
                         markerfacecolor=chart_colors['other'], 
                         label='Other', 
                         **kw)] 

# visualise legend and remove axis around it
L = lgd.legend(frameon=False, handles=legend_elements, loc='center', 
               ncol=1, handletextpad=0.2, labelspacing=1)
plt.setp(L.texts, va='baseline', color='white', size=12, 
         fontfamily=label_font.name)    
lgd.axis('off')

# circular annotation  -----------------------------------------
# draw an inner circle on a new axis
circ = fig.add_axes([0.453, 0.435, 0.12, 0.12],polar=True) 
line_angular_pos = df['angular_pos'][1:-5]
line_r = [5] * len(line_angular_pos)

#plot line and markers for start + end
circ.plot(line_angular_pos, line_r, zorder=5, color='white', 
          linewidth=0.75, alpha=0.4)
circ.plot(line_angular_pos.to_list()[0], line_r[0], zorder=5, color='white', 
          linewidth=0,marker='o', markersize=3,alpha=0.4)
circ.plot(line_angular_pos.to_list()[-1], line_r[-1], zorder=5, color='white', 
          linewidth=0,marker='>', markersize=3,alpha=0.4)

# format axis
circ.set_theta_zero_location('N')
circ.set_theta_direction(-1)  
circ.axis('off')

# text annotations -----------------------------------------
ax.annotate('1 line', xy=(0.1, 48), xycoords='data', xytext=(40, 20), 
            textcoords='offset points', 
            fontsize=10, fontfamily=label_font.name,
            ha='left', va='baseline', 
            annotation_clip=False, 
            color='#ababab',
            arrowprops=dict(arrowstyle='->',edgecolor='#ababab', 
                            connectionstyle='arc3,rad=.5', alpha=0.75))
ax.annotate('Words\nper line', xy=(-0.05, 22), xycoords='data', xytext=(0, 0), 
            textcoords='offset points', 
            fontsize=10, fontfamily=label_font.name,
            ha='right', va='baseline', 
            annotation_clip=False, 
            color='#ababab')
ax.annotate('', xy=(-0.02, 38), xycoords='data', xytext=(0, -105), 
            textcoords='offset points', 
            fontsize=10, fontfamily=label_font.name,
            ha='right', va='baseline', 
            annotation_clip=False, 
            color='#ababab',
            arrowprops=dict(arrowstyle='<->',edgecolor='#ababab', linewidth=0.75,
                            connectionstyle='arc3,rad=0', alpha=0.75 ))
lgd.annotate('Talking to', xy=(0.35, 0.78), xycoords='data', xytext=(-18, 14), 
             textcoords='offset points', 
            fontsize=10, fontfamily=label_font.name,
            ha='right', va='center', 
            annotation_clip=False, 
            color='#ababab',
            arrowprops=dict(arrowstyle='->',edgecolor='#ababab', 
                            connectionstyle='arc3,rad=-.5', alpha=0.75))

# Title + Credits  -----------------------------------------
plt.figtext(0.5,1.03, 'Star Wars Episode I', 
            fontfamily=title_font.name, 
            fontsize=55, color='white', ha='center')
plt.figtext(0.5,0.98, 'Each line of Anakin', 
            fontfamily=label_font.name,
            fontsize=24, color='white', ha='center')
plt.figtext(0.5,0.1, 'Data: jcwieme/data-scripts-star-wars  |  Design: Lisa Hornung', 
            fontfamily=label_font.name,
            fontsize=8, color='white', ha='center', alpha=0.75)

plt.savefig('output/anakin.png')
plt.show()

Waffle¶

https://python-graph-gallery.com/waffle-chart/

Simple¶

!pip install pywaffle

In [24]:

Copied!





import matplotlib.pyplot as plt
import matplotlib.patches as mpatches # for the legend
from pywaffle import Waffle
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches # for the legend
from pywaffle import Waffle
import pandas as pd

In [25]:

Copied!





data = {
    2018: [3032, 2892, 804],
    2019: [4537, 3379, 1096],
    2020: [8932, 3879, 896],
    2021: [22147, 6678, 2156],
    2022: [32384, 13354, 5245]
}

df = pd.DataFrame(data,
                  index=['car', 'truck', 'motorcycle'])
data = {
    2018: [3032, 2892, 804],
    2019: [4537, 3379, 1096],
    2020: [8932, 3879, 896],
    2021: [22147, 6678, 2156],
    2022: [32384, 13354, 5245]
}

df = pd.DataFrame(data,
                  index=['car', 'truck', 'motorcycle'])

In [26]:

Copied!





number_of_bars = len(df.columns) # one bar per year

# Init the whole figure and axes
fig, axs = plt.subplots(nrows=1,
                        ncols=number_of_bars,
                        figsize=(8,6),)

# Iterate over each bar and create it
for i,ax in enumerate(axs):
    
    col_name = df.columns[i]
    values = df[col_name] # values from the i-th column
    
    Waffle.make_waffle(
        ax=ax,  # pass axis to make_waffle 
        rows=20,
        columns=5,
        values=values,
    )

plt.show()
number_of_bars = len(df.columns) # one bar per year

# Init the whole figure and axes
fig, axs = plt.subplots(nrows=1,
                        ncols=number_of_bars,
                        figsize=(8,6),)

# Iterate over each bar and create it
for i,ax in enumerate(axs):
    
    col_name = df.columns[i]
    values = df[col_name] # values from the i-th column
    
    Waffle.make_waffle(
        ax=ax,  # pass axis to make_waffle 
        rows=20,
        columns=5,
        values=values,
    )

plt.show()

In [27]:

Copied!





number_of_bars = len(df.columns) # one bar per year
colors = ["darkred", "red", "darkorange"]

# Init the whole figure and axes
fig, axs = plt.subplots(nrows=1,
                        ncols=number_of_bars,
                        figsize=(8,6),)

# Iterate over each bar and create it
for i,ax in enumerate(axs):
    
    col_name = df.columns[i]
    values = df[col_name]/1000 # values from the i-th column
    
    Waffle.make_waffle(
        ax=ax,  # pass axis to make_waffle 
        rows=20,
        columns=5,
        values=values,
        title={"label": col_name, "loc": "left"},
        colors=colors,
        vertical=True,
        icons=['car-side', 'truck', 'motorcycle'],
        font_size=12, # size of each point
        icon_legend=True,
        legend={'loc': 'upper left', 'bbox_to_anchor': (1, 1)},
    )
    
# Add a title
fig.suptitle('Vehicle Production by Year and Vehicle Type',
             fontsize=14, fontweight='bold')


# Add a legend
legend_labels = df.index
legend_elements = [mpatches.Patch(color=colors[i],
                                  label=legend_labels[i]) for i in range(len(colors))]
fig.legend(handles=legend_elements,
           loc="upper right",
           title="Vehicle Types",
           bbox_to_anchor=(1.04, 0.9))

plt.subplots_adjust(right=0.85)
plt.show()
number_of_bars = len(df.columns) # one bar per year
colors = ["darkred", "red", "darkorange"]

# Init the whole figure and axes
fig, axs = plt.subplots(nrows=1,
                        ncols=number_of_bars,
                        figsize=(8,6),)

# Iterate over each bar and create it
for i,ax in enumerate(axs):
    
    col_name = df.columns[i]
    values = df[col_name]/1000 # values from the i-th column
    
    Waffle.make_waffle(
        ax=ax,  # pass axis to make_waffle 
        rows=20,
        columns=5,
        values=values,
        title={"label": col_name, "loc": "left"},
        colors=colors,
        vertical=True,
        icons=['car-side', 'truck', 'motorcycle'],
        font_size=12, # size of each point
        icon_legend=True,
        legend={'loc': 'upper left', 'bbox_to_anchor': (1, 1)},
    )
    
# Add a title
fig.suptitle('Vehicle Production by Year and Vehicle Type',
             fontsize=14, fontweight='bold')


# Add a legend
legend_labels = df.index
legend_elements = [mpatches.Patch(color=colors[i],
                                  label=legend_labels[i]) for i in range(len(colors))]
fig.legend(handles=legend_elements,
           loc="upper right",
           title="Vehicle Types",
           bbox_to_anchor=(1.04, 0.9))

plt.subplots_adjust(right=0.85)
plt.show()

More complex¶

https://python-graph-gallery.com/web-waffle-chart-as-share/

NOTE: Example should be updated because pyfonts has been changed

!pip install pyfonts highlight-text

In [28]:

Copied!





# Libraries
import matplotlib.pyplot as plt
import pandas as pd
from pywaffle import Waffle
from highlight_text import fig_text, ax_text
from pyfonts import load_font
# Libraries
import matplotlib.pyplot as plt
import pandas as pd
from pywaffle import Waffle
from highlight_text import fig_text, ax_text
from pyfonts import load_font

In [29]:

Copied!

path = 'https://raw.githubusercontent.com/holtzy/R-graph-gallery/master/DATA/share-cereals.csv'
df = pd.read_csv(path)

def remove_html_tag(s):
    return s.split('</b>')[0][3:]

df['lab'] = df['lab'].apply(remove_html_tag)
df = df[df['type'] == 'feed']
df.reset_index(inplace=True)
df
path = 'https://raw.githubusercontent.com/holtzy/R-graph-gallery/master/DATA/share-cereals.csv'
df = pd.read_csv(path)

def remove_html_tag(s):
    return s.split('')[0][3:]

df['lab'] = df['lab'].apply(remove_html_tag)
df = df[df['type'] == 'feed']
df.reset_index(inplace=True)
df

Out[29]:

	index	lab	type	percent
0	0	Africa	feed	21
1	2	Americas	feed	53
2	4	Asia	feed	32
3	6	Europe	feed	66
4	8	Oceania	feed	59

In [30]:

Copied!





#NOTE: URL has been updated
font_title = load_font("https://github.com/googlefonts/staatliches/raw/refs/heads/main/fonts/Staatliches-Regular.ttf")
font_credit = load_font("https://github.com/impallari/Raleway/raw/master/fonts/v4020/Raleway-v4020-Light.otf")
bold_font_credit = load_font("https://github.com/impallari/Raleway/raw/master/fonts/v4020/Raleway-v4020-Bold.otf")

background_color = "#222725"
pink = "#f72585"
dark_pink = "#7a0325"

number_of_bars = len(df)  # one bar per continent

# Init the whole figure and axes
fig, axs = plt.subplots(
   nrows=number_of_bars,
   ncols=1,
   figsize=(8, 8),
   dpi=300
)
fig.set_facecolor(background_color)
ax.set_facecolor('white')


# Iterate over each bar and create it
for (i, row), ax in zip(df.iterrows(), axs):

    share = row['percent']
    values = [share, 100-share]

    Waffle.make_waffle(
        ax=ax,
        rows=4,
        columns=25,
        values=values,
        colors=[pink, dark_pink],
    )

    text = f"{row['lab']}"
    ax.text(
        x=-0.4, y=0.5, s=text,
        font=bold_font_credit, color='white', rotation=90,
        ha='center', va='center', fontsize=13
    )
    text = f"{share}%"
    ax.text(
        x=-0.2, y=0.5, s=text,
        font=font_credit, color='white', rotation=90,
        ha='center', va='center', fontsize=13
    )

fig_text(
    x=0.05, y=0.95, s="SHARE OF CEREALS USED AS <ANIMAL FEEDS>",
    highlight_textprops=[{'color': pink}], color='white',
    fontsize=22, font=font_title
)
fig_text(
    x=0.05, y=0.05, s="<Data> OWID (year 2021) | <Plot> Benjamin Nowak",
    font=font_credit, color="white", fontsize=10,
    highlight_textprops=[{'font': bold_font_credit}]*2
)

plt.savefig('output/web-waffle-chart-as-share.png', dpi=300)
plt.show()
#NOTE: URL has been updated
font_title = load_font("https://github.com/googlefonts/staatliches/raw/refs/heads/main/fonts/Staatliches-Regular.ttf")
font_credit = load_font("https://github.com/impallari/Raleway/raw/master/fonts/v4020/Raleway-v4020-Light.otf")
bold_font_credit = load_font("https://github.com/impallari/Raleway/raw/master/fonts/v4020/Raleway-v4020-Bold.otf")

background_color = "#222725"
pink = "#f72585"
dark_pink = "#7a0325"

number_of_bars = len(df)  # one bar per continent

# Init the whole figure and axes
fig, axs = plt.subplots(
   nrows=number_of_bars,
   ncols=1,
   figsize=(8, 8),
   dpi=300
)
fig.set_facecolor(background_color)
ax.set_facecolor('white')


# Iterate over each bar and create it
for (i, row), ax in zip(df.iterrows(), axs):

    share = row['percent']
    values = [share, 100-share]

    Waffle.make_waffle(
        ax=ax,
        rows=4,
        columns=25,
        values=values,
        colors=[pink, dark_pink],
    )

    text = f"{row['lab']}"
    ax.text(
        x=-0.4, y=0.5, s=text,
        font=bold_font_credit, color='white', rotation=90,
        ha='center', va='center', fontsize=13
    )
    text = f"{share}%"
    ax.text(
        x=-0.2, y=0.5, s=text,
        font=font_credit, color='white', rotation=90,
        ha='center', va='center', fontsize=13
    )

fig_text(
    x=0.05, y=0.95, s="SHARE OF CEREALS USED AS ",
    highlight_textprops=[{'color': pink}], color='white',
    fontsize=22, font=font_title
)
fig_text(
    x=0.05, y=0.05, s=" OWID (year 2021) |  Benjamin Nowak",
    font=font_credit, color="white", fontsize=10,
    highlight_textprops=[{'font': bold_font_credit}]*2
)

plt.savefig('output/web-waffle-chart-as-share.png', dpi=300)
plt.show()

Multiple line charts¶

https://python-graph-gallery.com/web-line-chart-small-multiple/

In [31]:

Copied!





# Libraries
import matplotlib.pyplot as plt
import pandas as pd
import datetime
# Libraries
import matplotlib.pyplot as plt
import pandas as pd
import datetime

In [32]:

Copied!





# Open the dataset from Github
url = "https://raw.githubusercontent.com/holtzy/the-python-graph-gallery/master/static/data/dataConsumerConfidence.csv"
df = pd.read_csv(url)

# Reshape the DataFrame using pivot longer
df = df.melt(id_vars=['Time'], var_name='country', value_name='value')

# Convert to time format
df['Time'] = pd.to_datetime(df['Time'], format='%b-%Y')

# Remove rows with missing values (only one row)
df = df.dropna()
# Open the dataset from Github
url = "https://raw.githubusercontent.com/holtzy/the-python-graph-gallery/master/static/data/dataConsumerConfidence.csv"
df = pd.read_csv(url)

# Reshape the DataFrame using pivot longer
df = df.melt(id_vars=['Time'], var_name='country', value_name='value')

# Convert to time format
df['Time'] = pd.to_datetime(df['Time'], format='%b-%Y')

# Remove rows with missing values (only one row)
df = df.dropna()

In [33]:

Copied!





# Create a colormap with a color for each country
num_countries = len(df['country'].unique())
colors = plt.get_cmap('tab10', num_countries)

# Init a 3x3 charts
fig, ax = plt.subplots(nrows=3, ncols=3, figsize=(8, 12))

# Add a big title on top of the entire chart
fig.suptitle('\nConsumer \nConfidence \nAround the \nWorld\n\n', # Title ('\n' allows you to go to the line),
             fontsize=40,
             fontweight='bold', 
             x=0.05, # Shift the text to the left
             ha='left' # Align the text to the left
            )

# Add a paragraph of text on the right of the title
paragraph_text = (
    "The consumer confidence indicator\n"
    "provided an indication of future\n"
    "developments of households'.\n"
    "consumption and saving. An\n"
    "indicator above 100 signals a boost\n"
    "in the consumers' confidence\n"
    "towards the future economic\n"
    "situation. Values below 100 indicate\n"
    "a pessimistic attitude towards future\n"
    "developments in the economy,\n"
    "possibly resulting in a tendency to\n"
    "save more and consume less. During\n"
    "2022, the consuer confidence\n"
    "indicators have declined in many\n"
    "major economies around the world.\n"
)
fig.text(0.55, 0.9, # Position
         paragraph_text, # Content
         fontsize=12,
         va='top', # Put the paragraph at the top of the chart
         ha='left', # Align the text to the left
        )

# Plot each group in the subplots
for i, (group, ax) in enumerate(zip(df['country'].unique(), ax.flatten())):

    # Filter for the group
    filtered_df = df[df['country'] == group]
    x = filtered_df['Time']
    y = filtered_df['value']

    # Get last value (according to 'Time') for the group
    sorted_df = filtered_df.sort_values(by='Time')
    last_value = sorted_df.iloc[-1]['value']
    last_date = sorted_df.iloc[-1]['Time']

    # Set the background color for each subplot
    ax.set_facecolor('seashell')
    fig.set_facecolor('seashell')
    
    # Plot the line
    ax.plot(x, y, color=colors(i))
    
    # Add the final value
    ax.plot(last_date, # x-axis position
            last_value, # y-axis position
            marker='o', # Style of the point
            markersize=5, # Size of the point
            color=colors(i), # Color
           )
    
    # Add the text of the value
    ax.text(last_date,
             last_value*1.005, # slightly shift up
             f'{round(last_value)}', # round for more lisibility
             fontsize=7,
             color=colors(i), # color
             fontweight='bold',
           )

    # Add the 100 on the left
    ax.text(sorted_df.iloc[0]['Time'] - pd.Timedelta(days=300), # shift the position to the left
             100,
             '100',
             fontsize=10,
             color='black',)

    # Add line 
    sorted_df = df.sort_values(by='Time')
    start_x_position = sorted_df.iloc[0]['Time']
    end_x_position = sorted_df.iloc[-1]['Time']
    ax.plot([start_x_position, end_x_position], # x-axis position
            [100, 100], # y-axis position (constant position)
            color='black', # Color
            alpha=0.8, # Opacity
            linewidth=0.8, # width of the line
           )
    
    # Plot other groups with lighter colors (alpha argument)
    other_groups = df['country'].unique()[df['country'].unique() != group]
    for other_group in other_groups:

        # Filter observations that are not in the group
        other_y = df['value'][df['country'] == other_group]
        other_x = df['Time'][df['country'] == other_group]

        # Display the other observations with less opacity (alpha=0.2)
        ax.plot(other_x, other_y, color=colors(i), alpha=0.2)

    # Removes spines
    ax.spines[['right', 'top', 'left', 'bottom']].set_visible(False)
    
    # Add a bold title to each subplot
    ax.set_title(f'{group}', fontsize=12, fontweight='bold')
    
    # Remove axis labels
    ax.set_yticks([])
    ax.set_xticks([])

# Add a credit section at the bottom of the chart
fig.text(0.0, -0.01, # position
         "Design:", # text
         fontsize=10,
         va='bottom',
         ha='left',
         fontweight='bold',)
fig.text(0.1, -0.01, # position
         "Gilbert Fontana", # text
         fontsize=10,
         va='bottom',
         ha='left')
fig.text(0.0, -0.025, # position
         "Data:", # text
         fontsize=10,
         va='bottom',
         ha='left',
         fontweight='bold',)
fig.text(0.07, -0.025, # position
         "OECD, 2022",
         fontsize=10,
         va='bottom',
         ha='left')

# Adjust layout and spacing
plt.tight_layout()

# Show the plot
plt.show()
# Create a colormap with a color for each country
num_countries = len(df['country'].unique())
colors = plt.get_cmap('tab10', num_countries)

# Init a 3x3 charts
fig, ax = plt.subplots(nrows=3, ncols=3, figsize=(8, 12))

# Add a big title on top of the entire chart
fig.suptitle('\nConsumer \nConfidence \nAround the \nWorld\n\n', # Title ('\n' allows you to go to the line),
             fontsize=40,
             fontweight='bold', 
             x=0.05, # Shift the text to the left
             ha='left' # Align the text to the left
            )

# Add a paragraph of text on the right of the title
paragraph_text = (
    "The consumer confidence indicator\n"
    "provided an indication of future\n"
    "developments of households'.\n"
    "consumption and saving. An\n"
    "indicator above 100 signals a boost\n"
    "in the consumers' confidence\n"
    "towards the future economic\n"
    "situation. Values below 100 indicate\n"
    "a pessimistic attitude towards future\n"
    "developments in the economy,\n"
    "possibly resulting in a tendency to\n"
    "save more and consume less. During\n"
    "2022, the consuer confidence\n"
    "indicators have declined in many\n"
    "major economies around the world.\n"
)
fig.text(0.55, 0.9, # Position
         paragraph_text, # Content
         fontsize=12,
         va='top', # Put the paragraph at the top of the chart
         ha='left', # Align the text to the left
        )

# Plot each group in the subplots
for i, (group, ax) in enumerate(zip(df['country'].unique(), ax.flatten())):

    # Filter for the group
    filtered_df = df[df['country'] == group]
    x = filtered_df['Time']
    y = filtered_df['value']

    # Get last value (according to 'Time') for the group
    sorted_df = filtered_df.sort_values(by='Time')
    last_value = sorted_df.iloc[-1]['value']
    last_date = sorted_df.iloc[-1]['Time']

    # Set the background color for each subplot
    ax.set_facecolor('seashell')
    fig.set_facecolor('seashell')
    
    # Plot the line
    ax.plot(x, y, color=colors(i))
    
    # Add the final value
    ax.plot(last_date, # x-axis position
            last_value, # y-axis position
            marker='o', # Style of the point
            markersize=5, # Size of the point
            color=colors(i), # Color
           )
    
    # Add the text of the value
    ax.text(last_date,
             last_value*1.005, # slightly shift up
             f'{round(last_value)}', # round for more lisibility
             fontsize=7,
             color=colors(i), # color
             fontweight='bold',
           )

    # Add the 100 on the left
    ax.text(sorted_df.iloc[0]['Time'] - pd.Timedelta(days=300), # shift the position to the left
             100,
             '100',
             fontsize=10,
             color='black',)

    # Add line 
    sorted_df = df.sort_values(by='Time')
    start_x_position = sorted_df.iloc[0]['Time']
    end_x_position = sorted_df.iloc[-1]['Time']
    ax.plot([start_x_position, end_x_position], # x-axis position
            [100, 100], # y-axis position (constant position)
            color='black', # Color
            alpha=0.8, # Opacity
            linewidth=0.8, # width of the line
           )
    
    # Plot other groups with lighter colors (alpha argument)
    other_groups = df['country'].unique()[df['country'].unique() != group]
    for other_group in other_groups:

        # Filter observations that are not in the group
        other_y = df['value'][df['country'] == other_group]
        other_x = df['Time'][df['country'] == other_group]

        # Display the other observations with less opacity (alpha=0.2)
        ax.plot(other_x, other_y, color=colors(i), alpha=0.2)

    # Removes spines
    ax.spines[['right', 'top', 'left', 'bottom']].set_visible(False)
    
    # Add a bold title to each subplot
    ax.set_title(f'{group}', fontsize=12, fontweight='bold')
    
    # Remove axis labels
    ax.set_yticks([])
    ax.set_xticks([])

# Add a credit section at the bottom of the chart
fig.text(0.0, -0.01, # position
         "Design:", # text
         fontsize=10,
         va='bottom',
         ha='left',
         fontweight='bold',)
fig.text(0.1, -0.01, # position
         "Gilbert Fontana", # text
         fontsize=10,
         va='bottom',
         ha='left')
fig.text(0.0, -0.025, # position
         "Data:", # text
         fontsize=10,
         va='bottom',
         ha='left',
         fontweight='bold',)
fig.text(0.07, -0.025, # position
         "OECD, 2022",
         fontsize=10,
         va='bottom',
         ha='left')

# Adjust layout and spacing
plt.tight_layout()

# Show the plot
plt.show()

Bubble Map¶

https://python-graph-gallery.com/web-bubble-map-with-arrows/

!pip install cartopy geoplot

In [34]:

Copied!





# data manipulation
import numpy as np
import pandas as pd
import geopandas as gpd

# visualization
import matplotlib.pyplot as plt
from matplotlib import font_manager
from matplotlib.font_manager import FontProperties
from highlight_text import fig_text, ax_text
from matplotlib.patches import FancyArrowPatch

# geospatial manipulation
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import geoplot
import geoplot.crs as gcrs

# Easier way to get fonts
from pyfonts import load_font
# data manipulation
import numpy as np
import pandas as pd
import geopandas as gpd

# visualization
import matplotlib.pyplot as plt
from matplotlib import font_manager
from matplotlib.font_manager import FontProperties
from highlight_text import fig_text, ax_text
from matplotlib.patches import FancyArrowPatch

# geospatial manipulation
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import geoplot
import geoplot.crs as gcrs

# Easier way to get fonts
from pyfonts import load_font

In [35]:

Copied!





proj = ccrs.Miller()
# Alternative (see https://scitools.org.uk/cartopy/docs/v0.15/crs/projections.html):
# proj = ccrs.Robinson()
# Mercator looks too weird close to the poles
# proj = ccrs.Mercator()


url = "https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/all_world.geojson"
world = gpd.read_file(url)
world = world[~world['name'].isin(["Antarctica", "Greenland"])]
world = world.to_crs(proj.proj4_init)
world.head()
proj = ccrs.Miller()
# Alternative (see https://scitools.org.uk/cartopy/docs/v0.15/crs/projections.html):
# proj = ccrs.Robinson()
# Mercator looks too weird close to the poles
# proj = ccrs.Mercator()


url = "https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/all_world.geojson"
world = gpd.read_file(url)
world = world[~world['name'].isin(["Antarctica", "Greenland"])]
world = world.to_crs(proj.proj4_init)
world.head()

Out[35]:

	name	geometry
0	Fiji	MULTIPOLYGON (((20037508.343 -1803779.309, 200...
1	Tanzania	POLYGON ((3774143.866 -105756.618, 3792946.708...
2	W. Sahara	POLYGON ((-964649.018 3158195.645, -964597.245...
3	Canada	MULTIPOLYGON (((-13674486.249 5937950.601, -13...
4	United States of America	MULTIPOLYGON (((-13674486.249 5937950.601, -13...

In [36]:

Copied!





#Load data
url = "https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/earthquakes.csv"
df = pd.read_csv(url)

# Filter dataset: big earth quakes only
df = df[df['Depth (km)']>=0.01] # depth of at least 10 meters

# Sort: big bubbles must be below small bubbles for visibility
df.sort_values(by='Depth (km)', ascending=False, inplace=True)

df.head()
#Load data
url = "https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/earthquakes.csv"
df = pd.read_csv(url)

# Filter dataset: big earth quakes only
df = df[df['Depth (km)']>=0.01] # depth of at least 10 meters

# Sort: big bubbles must be below small bubbles for visibility
df.sort_values(by='Depth (km)', ascending=False, inplace=True)

df.head()

Out[36]:

	Date	Time (utc)	Region	Magnitude	Depth (km)	Latitude	Longitude	Mode	Map	year
7961	20/02/2019	06:50:47	Banda Sea	5.0	2026	-6.89	129.15	A	-	2019.0
6813	07/07/2019	07:50:53	Eastern New Guinea Reg, P.N.G.	5.4	1010	-5.96	147.90	A	-	2019.0
8293	17/01/2019	14:01:50	Fiji Islands	4.7	689	-18.65	179.44	A	-	2019.0
11258	03/01/2018	06:42:58	Fiji Islands Region	5.5	677	-19.93	-178.89	A	-	2018.0
9530	06/09/2018	18:22:24	Fiji Islands Region	5.8	672	-18.88	179.30	A	-	2018.0

Simple¶

In [37]:

Copied!





proj = ccrs.Miller()
fig, ax = plt.subplots(figsize=(12, 8), dpi=300, subplot_kw={'projection':proj})
ax.set_axis_off()

# background map
world.boundary.plot(ax=ax)

# transform the coordinates to the projection's CRS
pc = ccrs.PlateCarree()
new_coords = proj.transform_points(pc, df['Longitude'].values, df['Latitude'].values)

# bubble on top of the map
ax.scatter(
   new_coords[:, 0], new_coords[:, 1],
   s=df['Depth (km)']/3, # size of the bubbles
   zorder=10, # this specifies to put bubbles on top of the map
)

plt.show()
proj = ccrs.Miller()
fig, ax = plt.subplots(figsize=(12, 8), dpi=300, subplot_kw={'projection':proj})
ax.set_axis_off()

# background map
world.boundary.plot(ax=ax)

# transform the coordinates to the projection's CRS
pc = ccrs.PlateCarree()
new_coords = proj.transform_points(pc, df['Longitude'].values, df['Latitude'].values)

# bubble on top of the map
ax.scatter(
   new_coords[:, 0], new_coords[:, 1],
   s=df['Depth (km)']/3, # size of the bubbles
   zorder=10, # this specifies to put bubbles on top of the map
)

plt.show()

More complex¶

In [38]:

Copied!





def draw_arrow(tail_position, head_position, invert=False, radius=0.5, color='black', fig=None):
   if fig is None:
      fig = plt.gcf()
   kw = dict(arrowstyle="Simple, tail_width=0.5, head_width=4, head_length=8", color=color, lw=0.5)
   if invert:
      connectionstyle = f"arc3,rad=-{radius}"
   else:
      connectionstyle = f"arc3,rad={radius}"
   a = FancyArrowPatch(
      tail_position, head_position,
      connectionstyle=connectionstyle,
      transform=fig.transFigure,
      **kw
   )
   fig.patches.append(a)

# TODO: push updated example to graph-gallery
font = load_font('https://github.com/coreyhu/Urbanist/raw/refs/heads/main/fonts/ttf/Urbanist-Medium.ttf')
bold_font =  load_font('https://github.com/coreyhu/Urbanist/raw/refs/heads/main/fonts/ttf/Urbanist-Black.ttf')

# colors
background_color = '#14213d'
map_color = (233/255, 196/255, 106/255, 0.2)
text_color = 'white'
bubble_color = '#fefae0'
alpha_text = 0.7

# initialize the figure
fig, ax = plt.subplots(figsize=(12, 8), dpi=300, subplot_kw={'projection': proj})
fig.set_facecolor(background_color)
ax.set_facecolor(background_color)
ax.set_axis_off()

# background map
world.boundary.plot(ax=ax, linewidth=0, facecolor=map_color)

# transform the coordinates to the projection's CRS
pc = ccrs.PlateCarree()
new_coords = proj.transform_points(pc, df['Longitude'].values, df['Latitude'].values)

# bubble on top of the map
ax.scatter(
   new_coords[:, 0], new_coords[:, 1],
   s=df['Depth (km)'] * np.log(df['Depth (km)']) /10,
   color=bubble_color,
   linewidth=0.4,
   edgecolor='grey',
   alpha=0.6,
   zorder=10,
)

# title
fig_text(
   x=0.5, y=0.98, s='Earthquakes around the world',
   color=text_color, fontsize=30, ha='center', va='top', font=font,
   alpha=alpha_text
)

# subtitle
fig_text(
   x=0.5, y=0.92, s='Earthquakes between 2015 and 2024. Each dot is an earthquake with a size proportionnal to its depth.',
   color=text_color, fontsize=14, ha='center', va='top', font=font, alpha=alpha_text
)

# credit
text = """
<Data>: Pakistan Meteorological Department
<Map>: barbierjoseph.com
"""
fig_text(
   x=0.85, y=0.16, s=text, color=text_color, fontsize=7, ha='right', va='top',
   font=font, highlight_textprops=[{'font': bold_font}, {'font': bold_font}],
   alpha=alpha_text
)

# nazaca plate
highlight_textprops = [
   {"bbox": {"facecolor": "black", "pad": 2, "alpha": 1}, "alpha": alpha_text},
   {"bbox": {"facecolor": "black", "pad": 2, "alpha": 1}, "alpha": alpha_text}
]
draw_arrow((0.23, 0.27), (0.37, 0.35), fig=fig, color=text_color, invert=True, radius=0.2)
fig_text(x=0.16, y=0.265, s='<Collisions between Nazca Plate>\n<and South American plate>', fontsize=10, color=text_color, font=font, highlight_textprops=highlight_textprops, zorder=100)

# india plate
draw_arrow((0.69, 0.64), (0.64, 0.55), fig=fig, color=text_color, radius=0.4)
fig_text(x=0.7, y=0.66, s='<Collisions between Eurasian plate>\n<and Indian plate>', fontsize=10, color=text_color, font=font, highlight_textprops=highlight_textprops, zorder=100)

# philippine plate
draw_arrow((0.73, 0.22), (0.8, 0.51), fig=fig, color=text_color, radius=0.6)
fig_text(x=0.54, y=0.22, s='<Collisions between Philippine plate>\n<and Eurasian plate>', fontsize=10, color=text_color, font=font, highlight_textprops=highlight_textprops, zorder=100)

plt.savefig('output/web-bubble-map-with-arrows.png', dpi=300, bbox_inches="tight")
plt.show()
def draw_arrow(tail_position, head_position, invert=False, radius=0.5, color='black', fig=None):
   if fig is None:
      fig = plt.gcf()
   kw = dict(arrowstyle="Simple, tail_width=0.5, head_width=4, head_length=8", color=color, lw=0.5)
   if invert:
      connectionstyle = f"arc3,rad=-{radius}"
   else:
      connectionstyle = f"arc3,rad={radius}"
   a = FancyArrowPatch(
      tail_position, head_position,
      connectionstyle=connectionstyle,
      transform=fig.transFigure,
      **kw
   )
   fig.patches.append(a)

# TODO: push updated example to graph-gallery
font = load_font('https://github.com/coreyhu/Urbanist/raw/refs/heads/main/fonts/ttf/Urbanist-Medium.ttf')
bold_font =  load_font('https://github.com/coreyhu/Urbanist/raw/refs/heads/main/fonts/ttf/Urbanist-Black.ttf')

# colors
background_color = '#14213d'
map_color = (233/255, 196/255, 106/255, 0.2)
text_color = 'white'
bubble_color = '#fefae0'
alpha_text = 0.7

# initialize the figure
fig, ax = plt.subplots(figsize=(12, 8), dpi=300, subplot_kw={'projection': proj})
fig.set_facecolor(background_color)
ax.set_facecolor(background_color)
ax.set_axis_off()

# background map
world.boundary.plot(ax=ax, linewidth=0, facecolor=map_color)

# transform the coordinates to the projection's CRS
pc = ccrs.PlateCarree()
new_coords = proj.transform_points(pc, df['Longitude'].values, df['Latitude'].values)

# bubble on top of the map
ax.scatter(
   new_coords[:, 0], new_coords[:, 1],
   s=df['Depth (km)'] * np.log(df['Depth (km)']) /10,
   color=bubble_color,
   linewidth=0.4,
   edgecolor='grey',
   alpha=0.6,
   zorder=10,
)

# title
fig_text(
   x=0.5, y=0.98, s='Earthquakes around the world',
   color=text_color, fontsize=30, ha='center', va='top', font=font,
   alpha=alpha_text
)

# subtitle
fig_text(
   x=0.5, y=0.92, s='Earthquakes between 2015 and 2024. Each dot is an earthquake with a size proportionnal to its depth.',
   color=text_color, fontsize=14, ha='center', va='top', font=font, alpha=alpha_text
)

# credit
text = """
: Pakistan Meteorological Department
: barbierjoseph.com
"""
fig_text(
   x=0.85, y=0.16, s=text, color=text_color, fontsize=7, ha='right', va='top',
   font=font, highlight_textprops=[{'font': bold_font}, {'font': bold_font}],
   alpha=alpha_text
)

# nazaca plate
highlight_textprops = [
   {"bbox": {"facecolor": "black", "pad": 2, "alpha": 1}, "alpha": alpha_text},
   {"bbox": {"facecolor": "black", "pad": 2, "alpha": 1}, "alpha": alpha_text}
]
draw_arrow((0.23, 0.27), (0.37, 0.35), fig=fig, color=text_color, invert=True, radius=0.2)
fig_text(x=0.16, y=0.265, s='\n', fontsize=10, color=text_color, font=font, highlight_textprops=highlight_textprops, zorder=100)

# india plate
draw_arrow((0.69, 0.64), (0.64, 0.55), fig=fig, color=text_color, radius=0.4)
fig_text(x=0.7, y=0.66, s='\n', fontsize=10, color=text_color, font=font, highlight_textprops=highlight_textprops, zorder=100)

# philippine plate
draw_arrow((0.73, 0.22), (0.8, 0.51), fig=fig, color=text_color, radius=0.6)
fig_text(x=0.54, y=0.22, s='\n', fontsize=10, color=text_color, font=font, highlight_textprops=highlight_textprops, zorder=100)

plt.savefig('output/web-bubble-map-with-arrows.png', dpi=300, bbox_inches="tight")
plt.show()

Animations¶

Simple¶

In [39]:

Copied!

# libraries
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
# libraries
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

In [40]:

Copied!





# initiate figure
fig, ax = plt.subplots(figsize=(10, 8), dpi=120)

def update(frame):
    ax.clear()
    ax.scatter(
      1+frame, 10+frame*10,
      s=600, alpha=0.5,
      edgecolors="black"
    )
    ax.set_xlim(0, 10)
    ax.set_ylim(0, 100)
    return fig, ax

ani = FuncAnimation(fig, update, frames=range(10))
ani.save("output/my_animation.gif", fps=5);
plt.close(fig) # Don't show plot directly.
# initiate figure
fig, ax = plt.subplots(figsize=(10, 8), dpi=120)

def update(frame):
    ax.clear()
    ax.scatter(
      1+frame, 10+frame*10,
      s=600, alpha=0.5,
      edgecolors="black"
    )
    ax.set_xlim(0, 10)
    ax.set_ylim(0, 100)
    return fig, ax

ani = FuncAnimation(fig, update, frames=range(10))
ani.save("output/my_animation.gif", fps=5);
plt.close(fig) # Don't show plot directly.

my_animation.gif:

simple_animation

More Complex¶

In [41]:

Copied!





import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import pandas as pd
import numpy as np

data = pd.read_csv('https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/gapminderData.csv')
data['continent'] = pd.Categorical(data['continent'])
data.head()
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import pandas as pd
import numpy as np

data = pd.read_csv('https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/gapminderData.csv')
data['continent'] = pd.Categorical(data['continent'])
data.head()

Out[41]:

	country	year	pop	continent	lifeExp	gdpPercap
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314
1	Afghanistan	1957	9240934.0	Asia	30.332	820.853030
2	Afghanistan	1962	10267083.0	Asia	31.997	853.100710
3	Afghanistan	1967	11537966.0	Asia	34.020	836.197138
4	Afghanistan	1972	13079460.0	Asia	36.088	739.981106

In [42]:

Copied!





interp_data = pd.DataFrame()

multiple = 10
for country in data['country'].unique():
   
   # prepare a temporary dataframe and subset
   temp_df = pd.DataFrame()
   country_df = data[data['country']==country]

   # interpolate the data
   years = np.linspace(country_df['year'].min(), country_df['year'].max(), len(country_df) * multiple-(multiple-1))
   pops = np.linspace(country_df['pop'].min(), country_df['pop'].max(), len(country_df) * multiple-(multiple-1))
   lifeExps = np.linspace(country_df['lifeExp'].min(), country_df['lifeExp'].max(), len(country_df) * multiple-(multiple-1))
   gdps = np.linspace(country_df['gdpPercap'].min(), country_df['gdpPercap'].max(), len(country_df) * multiple-(multiple-1))
   continents = [country_df['continent'].values[0]] * len(years)

   # add the data to the temporary dataframe
   temp_df['year'] = years
   temp_df['pop'] = pops
   temp_df['lifeExp'] = lifeExps
   temp_df['gdpPercap'] = gdps
   temp_df['continent'] = continents
   temp_df['country'] = country

   # append the temporary dataframe to the final dataframe
   interp_data = pd.concat([interp_data, temp_df])
   interp_data['continent'] = pd.Categorical(interp_data['continent'])

interp_data.head()
interp_data = pd.DataFrame()

multiple = 10
for country in data['country'].unique():
   
   # prepare a temporary dataframe and subset
   temp_df = pd.DataFrame()
   country_df = data[data['country']==country]

   # interpolate the data
   years = np.linspace(country_df['year'].min(), country_df['year'].max(), len(country_df) * multiple-(multiple-1))
   pops = np.linspace(country_df['pop'].min(), country_df['pop'].max(), len(country_df) * multiple-(multiple-1))
   lifeExps = np.linspace(country_df['lifeExp'].min(), country_df['lifeExp'].max(), len(country_df) * multiple-(multiple-1))
   gdps = np.linspace(country_df['gdpPercap'].min(), country_df['gdpPercap'].max(), len(country_df) * multiple-(multiple-1))
   continents = [country_df['continent'].values[0]] * len(years)

   # add the data to the temporary dataframe
   temp_df['year'] = years
   temp_df['pop'] = pops
   temp_df['lifeExp'] = lifeExps
   temp_df['gdpPercap'] = gdps
   temp_df['continent'] = continents
   temp_df['country'] = country

   # append the temporary dataframe to the final dataframe
   interp_data = pd.concat([interp_data, temp_df])
   interp_data['continent'] = pd.Categorical(interp_data['continent'])

interp_data.head()

Out[42]:

	year	pop	lifeExp	gdpPercap	continent	country
0	1952.0	8.425333e+06	28.801000	635.341351	Asia	Afghanistan
1	1952.5	8.638647e+06	28.937609	638.456534	Asia	Afghanistan
2	1953.0	8.851962e+06	29.074218	641.571716	Asia	Afghanistan
3	1953.5	9.065276e+06	29.210827	644.686899	Asia	Afghanistan
4	1954.0	9.278591e+06	29.347436	647.802081	Asia	Afghanistan

In [43]:

Copied!





fig, ax = plt.subplots(figsize=(10, 10), dpi=120)

def update(frame):
    # Clear the current plot to redraw
    ax.clear()
    
    # Filter data for the specific year
    yearly_data = interp_data.loc[interp_data.year == frame, :]

    # Scatter plot for that year
    ax.scatter(
        x=yearly_data['lifeExp'], 
        y=yearly_data['gdpPercap'], 
        s=yearly_data['pop']/100000,
        c=yearly_data['continent'].cat.codes, 
        cmap="Accent", 
        alpha=0.6, 
        edgecolors="white", 
        linewidths=2
    )

    # Updating titles and layout
    ax.set_title(f"Global Development in {round(frame)}")
    ax.set_xlabel("Life Expectancy")
    ax.set_ylabel("GDP per Capita")
    ax.set_yscale('log')
    ax.set_ylim(100, 100000)
    ax.set_xlim(20, 90)

    return ax

ani = FuncAnimation(fig, update, frames=interp_data['year'].unique())
ani.save('output/gapminder-2.gif', fps=10)
plt.close(fig)
fig, ax = plt.subplots(figsize=(10, 10), dpi=120)

def update(frame):
    # Clear the current plot to redraw
    ax.clear()
    
    # Filter data for the specific year
    yearly_data = interp_data.loc[interp_data.year == frame, :]

    # Scatter plot for that year
    ax.scatter(
        x=yearly_data['lifeExp'], 
        y=yearly_data['gdpPercap'], 
        s=yearly_data['pop']/100000,
        c=yearly_data['continent'].cat.codes, 
        cmap="Accent", 
        alpha=0.6, 
        edgecolors="white", 
        linewidths=2
    )

    # Updating titles and layout
    ax.set_title(f"Global Development in {round(frame)}")
    ax.set_xlabel("Life Expectancy")
    ax.set_ylabel("GDP per Capita")
    ax.set_yscale('log')
    ax.set_ylim(100, 100000)
    ax.set_xlim(20, 90)

    return ax

ani = FuncAnimation(fig, update, frames=interp_data['year'].unique())
ani.save('output/gapminder-2.gif', fps=10)
plt.close(fig)

gapminder-2.gif:

animation