$\newcommand{\alicebob}[3]{#1 & \ra{#2} & #3\\[-5pt]}$ $\newcommand{\bobalice}[3]{#1 & \la{#2} & #3\\[-5pt]}$ $\newcommand{\alicework}[1]{#1 & &\\[-5pt]}$ $\newcommand{\bobwork}[1]{ & & #1\\[-5pt]}$ $\newcommand{\work}[2]{#1 & & #2\\}$ $\newcommand{\allwork}[1]{ & #1 & \\}$ $\newcommand{\aliceseparator}{-------&&\\}$ $\newcommand{\bobseparator}{&&-------\\}$ $\newcommand{\foo}{\phantom{\text{bigarrowfitsallthis}}}$ $\newcommand{\ra}[1]{% \vphantom{\xrightarrow{asd}}% \smash{\xrightarrow[\foo]{#1}}% }$ $\newcommand{\la}[1]{% \vphantom{\xleftarrow{asd}}% \smash{\xleftarrow[\foo]{#1}}% }$ $\newcommand{\z}[1]{\mathbb{Z}_{#1}}$ $\newcommand{\zq}{\mathbb{Z}_\varq}$ $\newcommand{\zqs}{\mathbb{Z}_q^\ast}$ $\newcommand{\zns}[1]{\mathbb{Z}_{#1}^\ast}$ $\require{action} \newcommand{\sampleSymb}{ {\overset{\$}{\leftarrow}} }$ $\newcommand{\field}[1]{\mathbb{F}_{#1}}$ $\newcommand{\sample}[1]{#1\sampleSymb\zq}$ $\newcommand{\sampleGeneric}[2]{#1\sampleSymb#2}$ $\newcommand{\sampleInterval}[2]{#1\sampleSymb\interval{#2}}$ $\newcommand{\sampleRange}[2]{#1\sampleSymb\range{#2}}$ $\newcommand{\samplezqs}[1]{\class{hover}{#1\sampleSymb\zqs}}$ $\newcommand{\sampleN}[2]{\class{hover}{#1\sampleSymb\z{#2}}}$ $\newcommand{\sampleNs}[2]{\class{hover}{#1\sampleSymb\z{#2}^\ast}}$ $\newcommand{\equalQ}{\overset{?}{=}}$ $\newcommand{\gQ}{\overset{?}{>}}$ $\newcommand{\inQ}{\overset{?}{\in}}$ $\newcommand{\cgroup}{\mathbb{G}}$ $\newcommand{\hash}[1]{\mathsf{Hash}({#1})}$ $\newcommand{\hashbit}[2]{\mathsf{Hash}({#1})\verb+[0:#2]+}$ $\newcommand{\naturals}{\mathbb{N}}$ $\newcommand{\sqfree}{L_\mathsf{square-free}}$ $\newcommand{\ceil}[1]{\lceil #1 \rceil}$ $\newcommand{\sampleSet}[2]{\class{hover}{#1\sampleSymb#2}}$ $\newcommand{\bunch}[1]{\{ #1_i\}_{i=1}^m}$ $\newcommand{\bunchi}[1]{\{ #1\}_{i=1}^m}$ $\newcommand{\forb}{\text{ for }i=1,\ldots,m}$ $\newcommand{\interval}[1]{[0, #1[}$ $\newcommand{\range}[1]{[#1]}$ $\newcommand{\rangeone}[1]{\{1, \dots,#1 -1 \}}$ $\newcommand{\vara}{\class{var var_a}{a}}$ $\newcommand{\varb}{\class{var var_b}{b}}$ $\newcommand{\varc}{\class{var var_c}{c}}$ $\newcommand{\vard}{\class{var var_d}{d}}$ $\newcommand{\varh}{\class{var var_h}{h}}$ $\newcommand{\varg}{\class{var var_g}{g}}$ $\newcommand{\varu}{\class{var var_u}{u}}$ $\newcommand{\varx}{\class{var var_x}{x}}$ $\newcommand{\varX}{\class{var var_X}{X}}$ $\newcommand{\varz}{\class{var var_z}{z}}$ $\newcommand{\varr}{\class{var var_r}{r}}$ $\newcommand{\varq}{\class{var var_q}{q}}$ $\newcommand{\varp}{\class{var var_p}{p}}$ $\newcommand{\vare}{\class{var var_e}{e}}$ $\newcommand{\vary}{\class{var var_y}{y}}$ $\newcommand{\varw}{\class{var var_w}{w}}$ $\newcommand{\varprover}{\class{var var_Prover}{\text{Prover}}}$ $\newcommand{\varprover}{\class{var var_Prover}{\text{Prover}}}$ $\newcommand{\varverifier}{\class{var var_Verifier}{\text{Verifier}}}$ $\newcommand{\varN}{\class{var var_N}{N}}$ $\newcommand{\rhovar}{\class{var var_ρ}{\rho}}$ $\newcommand{\sigmavar}{\class{var var_σ}{\sigma}}$ $\newcommand{\thetavar}{\class{var var_θ}{\theta}}$ $\newcommand{\muvar}{\class{var var_μ}{\mu}}$ $\newcommand{\true}{\mathsf{true}}$ $\newcommand{\false}{\mathsf{false}}$
Shamir's Secret Sharing Scheme

Shamir’s Secret Sharing Scheme #

Shamir’s Secret Sharing Scheme is a way of splitting a secret value $S$ into $n$ “pieces” (or “shares”) in such a way that any combination of $k$ pieces can be used to recover $S$, but any $k-1$ or fewer pieces provide no information about $S$.

Overview of Secret Sharing #

Shamir’s Secret Sharing Scheme relies on a very useful property of polynomials: any degree-$k$ polynomial can be uniquely identified by any set of $k+1$ distinct points. Two points define a line; three points define a parabola. However, with fewer than $k+1$ distinct points, no further information can be gained about the polynomial.

This leads directly to Shamir’s approach: encode a secret $S$ into a polynomial over a finite field, and distribute points on the polynomial as shares.

Splitting $S$ #

Given a secret $S$, a number of desired shares $n$, and a threshold $k$, splitting $S$ comes down to two steps:

  • encode $S$ into a secret polynomial $f\left(x\right)$ of degree $k-1$ over a finite field $\field{p}$
  • generate distinct shares $\left(x_{i}, f(x_i)\right)$

Encoding $S$ in a Polynomial #

Shamir proposed a straightforward method for encoding secrets into polynomials: set the constant term of random polynomial to the secret $S$, and select all other coefficients uniformly at random. For a degree-$(k-1)$ polynomial, there would be $k-1$ random coefficients $r_i$, and the constant coefficient of $S$: $$f(x) = S + r_1 x + \ldots + r_{k-1} x^{k-1} \enspace.$$

Example: We want to share our secret number 42 with three players so that any two of them can recover it. We define the degree-1 polynomial over $\field{73}$ as $$f(x) = 42 + 13 \cdot x \enspace,$$ where 13 was randomly sampled over $\field{73}$. We evaluate the polynomial at different points obtaining the shares $(x_i, f(x_i))$. Then, we can share 1 point of the coefficient to the three players, each getting one of $(1, 55), (2, 68), (3, 8)$. Since the polynomial is a line, any two of them could meet and recover the secret value 42!

This choice has the advantage of relative simplicity; there is no need to use oddball sampling techniques to select the coefficients of $f\left(x\right)$.

A Note on Selecting $p$ #

In the general case, the specific prime $p$ does not matter much. Shamir’s original paper proposed using 16-bit primes (the largest of which would be $p=2^{16}-15=65521$) for performance reasons. By limiting intermediate results to 32 bits, multi-precision arithmetic routines could be avoided on any 32-bit processor. Large secrets could be broken into blocks of 16 bits or less and shared with distinct polynomials. One limitation imposed by such a small $p$ is that only about 65000 distinct shares can be generated.

In practice, $p$ should be reasonably large. Breaking $S$ into multiple parts introduces complexity and opportunities for malicious actors. Also, in some verifiable secret sharing schemes, a large $p$ is needed to prevent discrete log attacks.

Generating the Shares #

Shares $s_{1},s_{2},\ldots,s_{n}$ of $S$ are ordered pairs $\left(x_{i}, f\left(x_{i}\right)\right)$. The $x_{i}$ values can be picked in several ways, but a counter starting in 1 is the most common method.

Recovering $S$ #

The most common and direct method of recovering $S$ from the shares $s_{1},\ldots,s_{k}$ is to evaluate $f\left(0\right)=S$ using Lagrange interpolation. It allows computing $f\left(t\right)$ for any $t\in\field{p}$ using the formula below:

$$f\left(t\right) = \displaystyle\sum_{j=1}^{k}{y_{j}\left(\displaystyle\prod_{\begin{smallmatrix}1\leq m\leq k\\ m\neq j\end{smallmatrix}}{\frac{t-x_m}{x_j - x_m}}\right)}$$

The $t=0$ case slightly simplifies the product. Each product term can also be precomputed since it only depends on the $x$-coordinate of the shares that can be publicly known. $$f\left(0\right) = \displaystyle\sum_{j=1}^{k}{y_{j}\left(\displaystyle\prod_{\begin{smallmatrix}1\leq m\leq k\\ m\neq j\end{smallmatrix}}{\frac{x_m}{x_m - x_j}}\right)}$$

The Lagrange interpolating polynomial is popular because it’s fast enough for several use cases and relatively simple to implement.

The Zero Share Problem #

How Shares are Generated #

As mentioned above, there are several ways to select the $x_{i}$ values during the share-generation phase of the algorithm.

The most common approach, and the one we recommend, is to use a counter, setting $x_{i}=i$ for $i=1,2,\ldots,n$. This has the advantage of simplicity and speed. No user-generated inputs are necessary, and evaluating $f\left(x\right)$ when $x$ is small can be faster than for arbitrary-selected values in the field.

Another approach is to use unique values associated with shareholders, such as userid values from a database, or users' provided values.

Finally, some implementations select $x_{i}$ by selecting them randomly from $\field{p}$.

What Can Go Wrong? #

  • Zero share problem: If it is possible to wind up with $x_{i}=0$ for some $i$, the corresponding share is $\left(0, S\right)$, revealing $S$ directly to the $i$-th shareholder. This is not a hypothetical attack; several instances of this problem have been spotted in the wild. Implementations must guarantee that all $x_i$ are modularly nonzero.
  • Non-unique shares: The $x_i$ for each value must be modularly unique; when users reconstruct the secret from a share, they need to compute $\frac{x_m}{x_m-x_j}$. If the polynomial is evaluated modulo $q$ and if $x_m\equiv x_j \pmod{q}$, the modular inverse of $(x_m-x_j)$ does not exist and the protocol does not work. Some applications do not check if this modular inverse exists and usually do not handle this gracefully.

How Things Go Wrong #

  • Counter-based shares: One of the most common control structures in programming is a loop with a starting index of i = 0. The line for(int i = 0; i < n; i++) is near-idiomatic in C; for i in range(n): is an idiom in Python. It’s easy to fall into an old pattern and fail to update these to the correct for(int i = 1; i <= n; i++) and for i in range(1, n + 1):, respectively. An initial draft of this page included this very error in the “correct” C loop! This common error is enough to destroy the security of the system entirely.

  • Userid-based shares: Trusting unique user identifiers is problematic, too. A user can have a userid of 0 in plenty of systems. Trusting user-supplied values in security-critical contexts is never a good idea. Even if you compare the userid with 0, you might fail to do it correctly: since the polynomial evaluates over the finite field $\field{p}$, you need to check if it is 0 in $\field{p}$. In the case of the integers modulo a prime number, you must check that the userid is modularly different from zero. Even transforming user inputs can be problematic if you’re not careful. For instance, given a user input $n$, a program may try to avoid this issue by taking the $x$-coordinate of $n\cdot G$, where $G$ is a generator of an elliptic curve group. However, if $n$ is 0, the resulting point is the “point at infinity”. With some curve parameterizations, the point at infinity is represented as $\left(0,1\right)$.

  • Random shares: Selecting random values seems safe– in theory, the probability of selecting 0 from $\field{p}$ is $\frac{1}{p}$. But winding up with a zero share is significantly more likely than that. Consider the case where a program generates random numbers by reading from /dev/random. For Linux kernels before version 5.6, it is possible for /dev/random to block when the “entropy runs out”. Since Linux 5.6, /dev/random can still block if the PRNG has not been initialized. If a programmer does not check the return value of the read function, then depending on how the destination buffer is allocated, the destination buffer can be left in a zeroed state, leading to a zero share.

References #