You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm attempting to index some very large files (+10gb). Is that possible? What do I need to tweak in order to do that?
I've thoroughly read through https://github.com/oracle/opengrok/wiki/Tuning-for-large-code-bases but it appears this is intended for code bases with lot of files instead of code bases with large files? Perhaps "large code bases" could be updated to reflect if it's referring to "code bases with a large number of files" or "code bases with large files"?
It looks like I will need:
some kind of option to not get a OutOfMemoryError's on the search results when a large file is returned. is there a way to just display the found result and not the contents?
some way to index large/many GB files? Is the only option today to break apart each file up to a set X amount of size?
Right now, this is basically the error I get for anything that's big 1.5GB+:
May 10, 2022 11:49:38 AM org.opengrok.indexer.index.IndexDatabase lambda$indexParallel$4
WARNING: ERROR addFile(): X:\huge_file.txt
java.lang.ArrayIndexOutOfBoundsException: Index -65536 out of bounds for length 66192
at org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:207)
at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:233)
at org.apache.lucene.index.FreqProxTermsWriterPerField.writeOffsets(FreqProxTermsWriterPerField.java:96)
at org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:181)
at org.apache.lucene.index.TermsHashPerField.positionStreamSlice(TermsHashPerField.java:200)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:188)
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:974)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:527)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:491)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:208)
at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:415)
at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1757)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1400)
at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:867)
at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$4(IndexDatabase.java:1361)
at java.base/java.util.stream.Collectors.lambda$groupingByConcurrent$59(Collectors.java:1312)
at java.base/java.util.stream.ReferencePipeline.lambda$collect$1(ReferencePipeline.java:679)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm attempting to index some very large files (+10gb). Is that possible? What do I need to tweak in order to do that?
I've thoroughly read through https://github.com/oracle/opengrok/wiki/Tuning-for-large-code-bases but it appears this is intended for code bases with lot of files instead of code bases with large files? Perhaps "large code bases" could be updated to reflect if it's referring to "code bases with a large number of files" or "code bases with large files"?
It looks like I will need:
Right now, this is basically the error I get for anything that's big 1.5GB+:
Beta Was this translation helpful? Give feedback.
All reactions