I’m working with the NEAR Lake Framework to monitor blockchain activity on mainnet. The data retrieval process is extremely sluggish and often gets stuck at the message queue retrieval step. Sometimes the blockchain_queue.get() method hangs for several minutes before returning any data. I’m wondering if there are specific settings I need to adjust or if there are rate limits that might be causing these performance issues.
Check your initialize_stream settings - if you’re using default s3 client config it’s probably timing out. I had the same issue and adding max_pool_connections=50 to the boto3 session fixed most hangs. Also try --start-block-height closer to current block instead of syncing from genesis, that made a huge difference for me.
DancingCloud - wow, 282 seconds is brutal! I’ve been messing around with NEAR Lake recently and I’m curious about your setup. Are you using default stream config or did you tweak any parameters in initialize_stream()?
What block height are you starting from? When I tried syncing from way back in history, delays were much worse than starting from recent blocks. The framework seems to struggle more with older data.
Adding connection pooling settings helped me, but not sure if that’s relevant for your case. Have you monitored memory usage during those long waits? Sometimes it’s not network issues but the framework getting bogged down with buffering.
What’s your settings object look like? Any custom timeouts or batch sizes? Worth sharing that config if you’re comfortable with it!
This looks like a network issue, not configuration. I had the same hangs when my AWS region was too far from NEAR’s infrastructure. Lake Framework pulls data from S3 buckets, so bad network conditions cause these long timeouts. Try wrapping your blockchain_queue.get() call with asyncio.wait_for() and set a 60-second timeout. Also check if you’re on a server with stable bandwidth. When I switched cloud providers/regions, my fetch times dropped from 4+ minutes to under 30 seconds.
Your queue hanging is probably S3 throttling from NEAR’s data lake. I’ve hit this same issue when processing mainnet data nonstop without backoff logic. The Lake Framework doesn’t handle S3 rate limits well by default, so you get those long waits. Add exponential backoff around your queue operations. Also throw in a small delay between block processing cycles - like await asyncio.sleep(0.1) after each iteration. This stops you from hammering the S3 endpoints with rapid requests. Block range matters too. Blocks with tons of transactions or complex smart contracts take longer to pull from S3. Watch which block heights cause the worst delays - you’ll spot patterns in your data retrieval bottlenecks.