-
Notifications
You must be signed in to change notification settings - Fork 4
Text Source: in depth
This page describes the ITextSource
interface, how it is used and how you can implement your own text source.
A text source object is an abstraction over any object that is text, can provide text or can be converted to text.
Example of text sources
-
is text:
System.String
objects. - can provide text:
System.IO.TextReader
objects. - can be converted to text:
System.Byte[]
objects.
The ITextSource
class is declared as follows:
public interface ITextSource : IDisposable
{
int Peek();
int Read();
int Read([NotNull] char[] buffer, int offset, int count);
Task<int> ReadAsync([NotNull] char[] buffer, int offset, int count);
int ReadBlock([NotNull] char[] buffer, int offset, int count);
Task<int> ReadBlockAsync([NotNull] char[] buffer, int offset, int count);
void Unread(char c);
void Unread([NotNull] char[] buffer, int offset, int count);
Task UnreadAsync([NotNull] char[] buffer, int offset, int count);
}
You don't have to implement all of those methods yourself. Txt contains an abstract class Txt.Core.TextSource
that takes care of a lot of the boilerplate code.
Txt does not already contain a TextSource that wraps System.IO.TextReader
. Let's create one now!
Create a class that extends Txt.Core.TextSource
.
public class TextReaderAdapter : TextSource
{
protected override int PeekImpl()
{
throw new NotImplementedException();
}
protected override int ReadImpl()
{
throw new NotImplementedException();
}
protected override int ReadImpl(char[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
protected override void UnreadImpl(char c)
{
throw new NotImplementedException();
}
protected override void UnreadImpl(char[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
}
Declare a field of type System.IO.TextReader
that represents the underlying text source. Then initialize it from the constructor.
private readonly TextReader text;
public TextReaderAdapter(TextReader text)
{
this.text = text;
}
Don't forget to dispose it!
protected override void Dispose(bool disposing)
{
if (disposing)
{
text.Dispose();
}
base.Dispose(disposing);
}
Add an implementation for the Peek()
method by overriding PeekImpl()
. We can simply delegate this method to the text
field: System.IO.TextReader.Peek()
is a compatible method.
protected override int PeekImpl()
{
return text.Peek();
}
Add an implementation for the Read()
method by overriding ReadImpl()
. Same as before, delegate this method to the text
field.
protected override int ReadImpl()
{
return text.Read();
}
Add an implementation for the Read(char[], int, int)
method by overriding ReadImpl(char[], int, int)
.
protected override int ReadImpl(char[] buffer, int offset, int count)
{
return text.Read(buffer, offset, count);
}
This is where it gets complicated. The Unread
method gets called a lot for grammars that allow partial matches. Those partial matches need to be "pushed" back to the text source. This is where Unread
fits in.
Unfortunately the System.IO.TextReader
class does not offer a way to "unread" characters. We will have to add our own mechanism. The simplest way is to add a stack of System.Char
that acts as a pushback buffer.
private readonly Stack<char> pushback= new Stack<char>();
protected override void UnreadImpl(char c)
{
pushback.Push(c);
}
Now update the PeekImpl()
and ReadImpl()
methods to read from the pushback buffer before the underlying text source.
protected override int PeekImpl()
{
if (pushback.Count != 0)
{
return pushback.Peek();
}
return text.Peek();
}
protected override int ReadImpl()
{
if (pushback.Count != 0)
{
return pushback.Pop();
}
return text.Read();
}
protected override int ReadImpl(char[] buffer, int offset, int count)
{
var read = 0;
while ((pushback.Count != 0) && (read < count))
{
buffer[offset + read] = pushback.Pop();
read++;
}
return read + text.Read(buffer, offset + read, count - read);
}
This is hardly the most efficient way to implement two-way text scanning functions, but you can't argue with its simplicity!
Add an implementation that unreads multiple characters at once by overriding the UnreadImpl(char[], int, int)
method. This method is used to unread partial matches of more than one character.
protected override void UnreadImpl(char[] buffer, int offset, int count)
{
for (int i = offset + count - 1; i >= offset; i--)
{
pushback.Push(buffer[i]);
}
}
I used a reverse loop to push back characters in reverse order of how they were read.
The result after putting all the pieces together.
public class TextReaderAdapter : TextSource
{
private readonly Stack<char> pushback = new Stack<char>();
private readonly TextReader text;
public TextReaderAdapter(TextReader text)
{
this.text = text;
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
text.Dispose();
}
base.Dispose(disposing);
}
protected override int PeekImpl()
{
if (pushback.Count != 0)
{
return pushback.Peek();
}
return text.Peek();
}
protected override int ReadImpl()
{
if (pushback.Count != 0)
{
return pushback.Pop();
}
return text.Read();
}
protected override int ReadImpl(char[] buffer, int offset, int count)
{
var read = 0;
while ((pushback.Count != 0) && (read < count))
{
buffer[offset + read] = pushback.Pop();
read++;
}
return read + text.Read(buffer, offset + read, count - read);
}
protected override void UnreadImpl(char c)
{
pushback.Push(c);
}
protected override void UnreadImpl(char[] buffer, int offset, int count)
{
for (var i = offset + count - 1; i >= offset; i--)
{
pushback.Push(buffer[i]);
}
}
}