admin管理员组

文章数量:1315305

I'm reading an input stream from which I can parse a list of objects. On the other hand I have a method that accepts a list of such objects.

Something like:

fun InputStream.parser(): List<Object> {
    val allBytes = this.readAllBytes()

    val result = mutableListOf<Object>()
    var currentIndex = 0
    for (i in allBytes.indices) {
        if (allBytes[i] == '\n'.toByte()) {
            result.add(myParser.parse(allBytes, currentIndex, i))
            currentIndex = i + 1
        }
    }
    return result
}

fun consumer (input: List<Object>): Unit {
    for (obj in input) {
        // Do something with it
    }
}

As an example, parser could be reading from an ndjson file, consumer could be sending it sending one property of the object via internet.

This implementation is not good, it requires loading the whole file into memory, then loading the whole list of objects into memory.

How could I make this without the overhead?

I guess something like Stream.generate where each generation is a new object, but I'm not sure how to close it, since Stream.generate is meant for infinite streams.

I'm guessing it shouldn't be an iterable, since iterables are expected to be iterable several times, which is not the case here, once consumed the data is lost.

Note: my actual code does not rely on lines, ndjson is just an example, so I'm looking for code that doesn't use buffereredRead.readLines

I'm reading an input stream from which I can parse a list of objects. On the other hand I have a method that accepts a list of such objects.

Something like:

fun InputStream.parser(): List<Object> {
    val allBytes = this.readAllBytes()

    val result = mutableListOf<Object>()
    var currentIndex = 0
    for (i in allBytes.indices) {
        if (allBytes[i] == '\n'.toByte()) {
            result.add(myParser.parse(allBytes, currentIndex, i))
            currentIndex = i + 1
        }
    }
    return result
}

fun consumer (input: List<Object>): Unit {
    for (obj in input) {
        // Do something with it
    }
}

As an example, parser could be reading from an ndjson file, consumer could be sending it sending one property of the object via internet.

This implementation is not good, it requires loading the whole file into memory, then loading the whole list of objects into memory.

How could I make this without the overhead?

I guess something like Stream.generate where each generation is a new object, but I'm not sure how to close it, since Stream.generate is meant for infinite streams.

I'm guessing it shouldn't be an iterable, since iterables are expected to be iterable several times, which is not the case here, once consumed the data is lost.

Note: my actual code does not rely on lines, ndjson is just an example, so I'm looking for code that doesn't use buffereredRead.readLines

Share Improve this question edited Feb 1 at 18:24 Gabriel Furstenheim asked Jan 30 at 8:51 Gabriel FurstenheimGabriel Furstenheim 3,65835 silver badges39 bronze badges 1
  • 1 I know you're not using lines, but just to avoid misleading anyone reading this: in the above example, "\n".toByte() doesn't do what was meant. String.toByte() works like "123".toByte() == 123.toByte(). What was meant was '\n'.toByte(). – k314159 Commented Jan 30 at 17:16
Add a comment  | 

1 Answer 1

Reset to default 1

You can create a Sequence, which is basically a lazy list.

It seems like the input stream contains text, and you want to read lines in the stream, so I would suggest operating on a BufferedReader instead.

fun BufferedReader.parser() = sequence {
    for ((currentIndex, line) in lineSequence().withIndex()) {
        yield(parser.parse(line, currentIndex))
    }
}

// this can also be simplified to

fun BufferedReader.parser() = lineSequence()
    .mapIndexed { currentIndex, line -> parser.parse(line, currentIndex) }

The code inside the sequence { ... } lambda will only be run when a new element is requested, and only runs until the next call to yield. The consumer will take a Sequence<Object> and consume it with a for loop for example.

Note that this sequence can only be consumed once, and you should make sure that you do not consume it after the buffered reader has been closed.

You can easily get a BufferedReader from an InputStream using .bufferedReader().

If my assumptions are wrong and you need an InputStream and not a Reader, the same principle applies. Write a loop that only reads as much data as you need in each iteration, and yield the parse result.

本文标签: kotlinParse an inputStream into objects and use them in streaming operationStack Overflow