List operations 201

Well well, last week we introduced Scala Lists and some of their operations. Today will continue to dive in and discover some more advanced operation. But first let’s recall the fundamentals:

val numbers = List(1,2,3,4,5)
numbers.map(a => a * a)          // --> List(1,4,9,16)
numbers.filter(a => a * a > 10)  // --> List(16)
List("ciao", "hello", "hola").map(_.toList)
// --> List(List('c', 'i', 'a', 'o'), List('h', 'e', 'l', 'l', 'o'), List('h', 'o', 'l', 'a'))
List("ciao", "hello", "hola").flatMap(_.toList) // note what happens with flatMap
// --> List('c', 'i', 'a', 'o', 'h', 'e', 'l', 'l', 'o', 'h', 'o', 'l', 'a')

Want more? take a look at these:

numbers.grouped(3)      // --> Iterator(List(1,2,3), List(4,5)). note: it's an Iterator
numbers.take(3)         // --> List(1,2,3)
numbers.drop(3)         // --> List(4,5)
// takeWhile and dropWhile take (or drop) items up to the first for which the predicate is true
numbers.takeWhile(x => x*x*x < 20 )  // --> List(1,2)
numbers.dropWhile(isPrime)           // --> List(4,5), suppose to have an isPrime function

Very good, now we can explore more powerful stuff. Fold, scan and reduce. The main idea here is similar to the map function, but:

List(1, 2, 3).map(f)        /* is equivalent to */  List( f(1), f(2), f(3) )
List(1, 2, 3).reduce(f)     /* is equivalent to */  f( f(1, 2), 3)
List(1, 2, 3).fold(100)(f)  /* is equivalent to */  f( f( f(100,1), 2), 3)
List(1, 2, 3).scan(100)(f)  // is equivalent a map to the fold's intermediate results
// --> List( 100, f(100, 1), f( f(100,1), 2),  f( f( f(100,1), 2), 3))

This can sound too abstract to be understood, but let’s visualize the process.

//  List          Map(f)       Reduce(f)                Fold(100)(f)
//   ::           ::             f          __f__            f      
//  /  \         /  \           / \    =>  /     \         /   \    // fold is like reduce  
// 1   ::      f(1) ::         f   3     f(1,2)   3       f     3   // with and explicit
//    /  \         /  \       / \                        / \        // initial value, while
//   2   ::      f(2) ::     1   2                      f   2       // reduce uses list's head
//      /  \         /  \                              / \
//     3   Nil     f(3) Nil                          100  1
//

Some examples:

numbers.reduce((a,b) => a * b)   // --> 120
numbers.fold(3)((a,b) => a * b)  // --> 360
numbers.scan(3)((a,b) => a * b)  // --> List(3, 3, 6, 18, 72, 360)

Let us try to solve a real problem using just list operations. The problem is as follows.

We have a list of pairs. Each pair is composed of a person and a fixed number of file paths. These files need to be backed up in pen drives with 1Gb of capacity. We want to group together all the files that should go in each pendrive, taking care of not splitting a person’s files into different pen drives. In Java(or C#, or others) we would use one or more loops and a mutable Collection to save the intermediate results, but we are too cool for that. For simplicity suppose each person have no more than 1Gb of data to backup.

import java.nio.file.Files
import java.nio.file.Paths

val limitSize = 1024
def getFilesFor( name : String ): List[String] = {...} // suppose to have a working implementation
def getFileSize( path : String ): List[Long] = Files.size(Paths.get(path))

val users = List("Mario", "John", "Paco", "Ray")

// let's get the files for each user
val filesList: List[List[String]] = users.map(getFilesFor)

// calculate the total size per person
val sizes = 
  filesList.map{ files =>
    files
      .map(getFileSize)
      .reduce((a,b) => a + b) // this can be substituted with a call to .sum
  }

val labels = 
  sizes
    .scan (0) ((a,b) => a + b )   // --> list of the intermediate sums
    .tail                         // drop the initial value (not part of sizes)
    .map(size => size/limitSize)  

// the last line works as such. Before the .map the list would be something like
// List(2, 5, 8, 14, 20, 31, 37). if the limit size is 12 we will have:
// List(0, 0, 0, 1, 1, 2, 2 ) which become file group identifiers, i.e. three groups 0, 1, 2

val groupedUsersAndFiles = 
  users
    .zip(filesList) // --> a List of (UserName, File list)
    .zip(labels)    // --> a List of (UserName, File list), Label)
    .groupBy(_._2)  // --> group by Label --> Map ( Label -> (List of (UserName, File list))
    .mapValues(_)   // --> get values

This can seem an overly complex solution, but let’s get rid of comments and unnecessary new lines.

// skipping imports, function definitions and users list declaration

val filesList: List[List[String]] = users.map(getFilesFor)
val sizes = filesList.map{ files => files.map(getFileSize).sum }
val labels = sizes.scan(0)(_+_).tail.map(_/limitSize)
users.zip(filesList).zip(labels).groupBy(_._2).mapValues(_)

Just one note, zip is usually more understandable if seen as an operator rather than a function call, since in Scala we can use function calls as infix operators I would perfer the form

((users zip filesList) zip labels) /* over users.zip(filesList).zip(labels)

Wow it was a pretty long post, hope you didn’t fall asleep while reading. As a final word I invite you to write the following line on the scala REPL (sbt console) and see what it evaluates to.

List("Hope", "to", "see", "you", "next", "time").mkString(" ")

Ciao

Back