Longest Path Search

A simple proof of Shapley–Folkman

Posted 2024-06-09

This post goes over a simple proof of the Shapley–Folkman lemma which I haven't seen published (though I'm sure everyone has a version or variation of this sitting around!). It is (roughly) a combination of Zhao's proof (which uses a simple result about conic combinations, but has a funky construction) and Cassel's proof, which, while nice, has unfortunately very annoying notation. Specifically, this post proves the "slightly more general" statement given in Cassel's proof which is a little nicer to handle, via a technique that is similar to Zhao's, but uses only linear independence.

Anyways, if you're reading this I probably don't have to convince you that this lemma is very useful, but, if you're unsure, I'd recommend just Googling "Shapley–Folkman [your field of choice]" and you'll probably get a hit or two that might be interesting.

Statement

The statement of Shapley–Folkman is a little funny if you're unfamiliar with it, but it's the following.

We have a collection of sets S1,,SmRnS_1, \dots, S_m \subseteq \mathbf{R}^n, which need not be convex. Let yRny \in \mathbf{R}^n be a vector in the sum of the convex hulls of these sets; i.e., one which satisfies

y=x1+x2++xm, y = x_1 + x_2 + \dots + x_m,

where xiconv(Si)x_i \in \mathbf{conv}(S_i). Then yy can be written in the following way

y=x~1+x~2++x~m, y = \tilde x_1 + \tilde x_2 + \dots + \tilde x_m,

where x~iconv(Si)\tilde x_i \in \mathbf{conv}(S_i) and, importantly, for at least mnm-n indices ii, satisfies x~iSi\tilde x_i \in S_i. In other words, given any point yy which is the sum of points lying in the convex hulls of the sets, conv(Si)\mathbf{conv}(S_i), then yy can be written as the sum of points lying in the actual sets SiS_i, for 'most' indices ii; no more than nn indices will lie in conv(Si)\mathbf{conv}(S_i) but not SiS_i.

One simple interpretation of this statement is that, given a lot of sets SiS_i, relative to the number of dimensions (when mnm \gg n), then the resulting set, which is the (Minkowski) sum of sets, is 'very close' to a set that is convex.

Proof

The proof requires a little bit of extra set up, but not too much.

Statement variation

We will show the proof in an inductive way, with a slightly different set up than the one above. In our set up, there exist some sets S1,,SmRmS_1, \dots, S_m \subseteq \mathbf{R}^m and some point yRny \in \mathbf{R}^n such that

y=x1++xm, y = x_1 + \dots + x_m,

and each xiconv(Si)x_i \in \mathbf{conv}(S_i). Using the definition of the convex hull, this is the same as saying that, for each set i=1,,mi=1, \dots, m, there exist zijSiz_{ij} \in S_i and weights γij>0\gamma_{ij} > 0 with j=1,,nij=1, \dots, n_i, such that

xi=j=1niγijzij, x_i = \sum_{j=1}^{n_i} \gamma_{ij}z_{ij},

and the weights sum to 11:

j=1niγij=1, \sum_{j=1}^{n_i} \gamma_{ij} = 1,

for each i=1,,mi=1, \dots, m. In this case, nin_i denotes the number of elements of SiS_i whose convex combination results in xix_i, all with nonzero coefficients. (This is why we require that γij>0\gamma_{ij} > 0, otherwise, if γij=0\gamma_{ij} = 0 for some index jj, we can remove this entry and reduce nin_i by 1.) We will show the following inequality can be made true:

i=1m(ni1)n. \sum_{i=1}^m \left(n_i - 1\right) \le n.

This implies the original claim, since xix_i lies in SiS_i if, and only if, ni=1n_i = 1, so the sum is an upper bound on the number of sets which have ni>1n_i > 1; i.e., the largest number of indices ii with ni>1n_i > 1 is nn, or, equivalently, at least mnm-n indices satisfy ni=1n_i = 1 and therefore have xiSix_i \in S_i.

Proof

Given that set up, we will show that, if

i=1m(ni1)>n, \sum_{i=1}^m \left(n_i - 1\right) > n,

then we can always set at least one of the weights γij\gamma_{ij} to 00 (potentially changing some other weights along the way) such that the left hand side decreases by at least 1. Applying this statement inductively gives us the result.

So, let's get to it!

The one trick in this proof is to note that we can choose a 'privileged' index jj, which, in our case, we will just choose to be the first index. We'll write xix_i, for each ii, splitting out the first entry:

xi=γi1zi1+j=2niγijzij. x_i = \gamma_{i1} z_{i1} + \sum_{j=2}^{n_i} \gamma_{ij}z_{ij}.

We interpret the sum on the right as zero if ni=1n_i = 1. (Note that, by definition, nin_i cannot be zero! So this expression is always well-defined.)

We can then write yy as

y=i=1mγi1zi1+i=1mj=2niγijzij. y = \sum_{i=1}^m\gamma_{i1} z_{i1} + \sum_{i=1}^m\sum_{j=2}^{n_i} \gamma_{ij}z_{ij}.

Now, if i=1m(ni1)>n\sum_{i=1}^m (n_i - 1) > n, then there are at least n+1n+1 vectors in the second sum and nn is the dimension of the vectors. So there exists some weights αijR\alpha_{ij} \in \mathbf{R} such that

i=1mj=2niαij(zijzi1)=0, \sum_{i=1}^m\sum_{j=2}^{n_i} \alpha_{ij}(z_{ij} - z_{i1}) = 0,

where i=1,,mi=1, \dots, m and j=2,,nij=2, \dots, n_i. (Very importantly, note that the αij\alpha_{ij} need not be nonnegative!) Because this is zero, we can multiply it by any constant, ηR\eta \in \mathbf{R} and add it to our expression for yy to get, for any choice of η\eta:

y=i=1m(γi1ηj=2niαij)zi1+i=1mj=2ni(γij+ηαij)zij. y = \sum_{i=1}^m\left(\gamma_{i1} - \eta \sum_{j=2}^{n_i} \alpha_{ij}\right) z_{i1} + \sum_{i=1}^m\sum_{j=2}^{n_i} (\gamma_{ij} + \eta \alpha_{ij}) z_{ij}.

There exists at least one η\eta such that at least one of the terms

γi1ηj=2niαij,orγij+ηαij, \gamma_{i1} - \eta \sum_{j=2}^{n_i} \alpha_{ij}, \quad \text{or} \quad \gamma_{ij} + \eta\alpha_{ij},

where i=1,,ni=1, \dots, n and j=1,,nij=1, \dots, n_i, is equal to zero. The smallest (in absolute value) such η\eta will ensure that all of the terms are nonnegative and at least one is zero. (Why?) For this η\eta, define

γ~i1=γi1ηj=2niαij,andγ~ij=γij+ηαij, \tilde \gamma_{i1} = \gamma_{i1} - \eta \sum_{j=2}^{n_i} \alpha_{ij}, \quad \text{and} \quad \tilde \gamma_{ij} = \gamma_{ij} + \eta\alpha_{ij},

for i=1,,mi=1, \dots, m and j=1,,nij=1, \dots, n_i. From the definition of η\eta and the discussion above, we know that γ~ij0\tilde \gamma_{ij} \ge 0, with at least one entry zero, and satisfies

j=1niγ~ij=j=1niγij=1, \sum_{j=1}^{n_i} \tilde \gamma_{ij} = \sum_{j=1}^{n_i} \gamma_{ij} = 1,

for each i=1,,mi=1, \dots, m, which is easy to show from the definition. Removing the nonzero entries, we then reduce at least one nin_i by one, proving the claim.

Discussion

Interestingly, this procedure is essentially constructive: we only need the ability to solve a linear system to perform it, but the 'algorithm' provided will be very slow. (I say "essentially" here, since there's a hidden cost: we also need to be able to write an explicit convex combination of points in SiS_i that yield xix_i; it is not obvious how to do that in general if the set SiS_i does not admit a simple polyhedral description, but is fairly 'easy' for many structured sets in practice.)

Built with Franklin.jl and Julia.