On the Locality of Codeword Symbols
IEEE Transactions on Information Theory, November 2012

Consider a linear [ n , k , d ] q  code  C . We say that the  i th coordinate of  C  has  locality   r  , if the value at this coordinate can be recovered from accessing some other  r  coordinates of  C . Data storage applications require codes with small redundancy, low  locality  for information coordinates, large distance, and low  locality  for parity coordinates. In this paper, we carry out an in-depth study of the relations between these parameters. We establish a tight bound for the redundancy  n - k  in terms of the message length, the distance, and the  locality  of information coordinates. We refer to codes attaining the bound as optimal. We prove some structure theorems about optimal codes, which are particularly strong for small distances. This gives a fairly complete picture of the tradeoffs between codewords length, worst case distance, and  locality  of information  symbols . We then consider the locality  of parity check  symbols  and erasure correction beyond worst case distance for optimal codes. Using our structure theorem, we obtain a tight bound for the  locality  of parity  symbols possible in such codes for a broad class of parameter settings. We prove that there is a tradeoff between having good  locality  and the ability to correct erasures beyond the minimum distance.