Hacker Newsnew | past | comments | ask | show | jobs | submit | lukes386's commentslogin

Vector is designed to work well with systems like Kafka, not to replace them. While it does have a _very_ simple durable queue in the optional disk buffer feature, it is nowhere near the durability, fault tolerance, performance, etc of a full Kafka cluster and we would not recommend thinking of them as the same type of system.

That being said, we do know of a few cases where Vector's in-flight processing and routing capabilities were enough that a full Kafka-based pipeline was no longer needed. This ability to push computation out to the edge and reduce the need for centralized infrastructure is one of the aspects of Vector that we're most excited about.


Yes, streaming k8s pod logs to Kafka and/or S3 is a great example of when you could use Vector.

The "collect, transform, and route all your logs, metrics, and traces" bit is our most succinct explanation of what Vector does, but I'll admit it's still not as clear as we'd like. To expand it slightly, Vector is a tool to collect observability data (logs, metrics, etc) from wherever it's generated, optionally process that data in-flight, and then forward it to whatever upstream system you'd like to consume it. It does this by providing a variety of different components that you configure into whatever pipeline you need. In your example, you could use our new k8s source and plug it into our Kafka sink, our S3 sink, or both.


Thanks for pointing this out! That limitation is largely a holdover from when the Kafka sink was written and our support for accepting multiple data types was not as good as it is now. As things stand today, it should be a pretty simple change to enable this.

I'll go ahead and open an issue to get that addressed, but in the future please feel free to do so yourself for anything that's tripping you up! We really value this kind of feedback and try to address it as promptly as possible.


Thank you, that’s great. We really do love having Vector available to us, for such a young project it’s amazing that is such a stabil and solid piece of software.


Very nice tool. I could find a really great use for it, if it supported sftp sinks. Is that in the plan?


I only mentioned it briefly at the end of the post, but metamorphic testing is a very interesting technique that addresses exactly this [0].

The basic idea is to start with some known-good inputs and outputs, and then generate ways to modify the input that should not change the output.

[0]: https://www.hillelwayne.com/post/metamorphic-testing/


Absolutely, Veneur is something we looked at quite a bit when it popped up. It's clear Stripe was feeling a lot of the same pain points we were when we started building Vector and they've come up with something really impressive.

As you mentioned, it seems they've focused more on metrics out of the gate, while we've spent more of our time on the logging side of things (for now). We're working to catch up on metrics functionality, but interoperability via SSF is an interesting idea!


We really love the idea of mtail and think it's criminally underused.

A big part of the reason we started building Vector was to integrate that kind of functionality into a larger project, so people wouldn't have to get over the hump of discovering, setting up, and rolling out a whole separate tool.

We're definitely not as mature as mtail yet, but we're working really hard to get there.


Thank you! Very glad it looks useful to you.

It's still slightly rough around the edges, but Vector can actually ingest metrics today in addition to deriving metrics from log events. We have a source component that speaks the statsd protocol which can then feed into our prometheus sink. We're planning to add more metrics-focused sources and sinks in the future (e.g. graphite, datadog, etc), so check back soon!


Just a question, are you familiar with work that's been done on the OpenCensus Collector and Agent [0]?

There was discussion earlier this year about creating a design doc for OpenCensus to handle logs. I'm not sure if that got finished, or it was sidelined while the OpenCensus & OpenTracing merger was worked on. Both projects will combine under the OpenTelemetry name.

I've been quite happy with the OpenCensus instrumentation SDKs.

I think the Logs & Metrics space is interesting, especially because there is so much overlap, both are just ways of representing data about an event that occurred in the software.

OpenCensus is fairly widely backed: Google, Microsoft, Etsy, Scalyr...[1]

[0] https://github.com/census-instrumentation/opencensus-service...

[1] https://opencensus.io/community/users/


I've looked into OpenCensus/OpenTracing/OpenTelemetry (and the apparently unaffiliated OpenMetrics?) a bit, but I'm not as familiar as I'd like to be. It does seem like they're focused primarily on application-level instrumentation and the ability to ship that metrics and tracing data to different backends.

Vector's perspective is that your applications and infrastructure are already emitting all kinds of interesting data via logs, metrics, etc, and the primary challenge is to collect, enrich, and manage the storage of that data. We have no plans to integrate Vector into your application or introduce some kind of Vector-specific method of exporting data.

We'll definitely be watching OpenTelemetry as it moves forward and would very much like to be a compatible part of that ecosystem. To the degree that they use common open standards for their communication protocols, that could just fall out naturally.


That's a good way to think about it! Heka was a big inspiration. The design isn't exactly the same, but we're aiming to solve a lot of the same problems.


Hi! I work on Vector. For a motivating example, let's say you have an application fronted by nginx. Using Vector would allow you to ingest your nginx logs off disk, parse them, expose status code and response time distributions to prometheus, and store the parsed logs as JSON on S3.

There are obviously plenty of ways to accomplish that same thing today, but we believe Vector is somewhat unique in allowing you to do it with one tool, without touching your application code or nginx config, and with enough performance to handle serious workloads. And Vector is far from done! There's a ton more we're working to add moving forward (thinking about observability data from an ETL and stream processing perspective should give you a rough idea).


Our company uses Splunk. I am not on admin/ops side so possibly missing details. The way I understand is that there is Splunk forwarder running on our app servers. And then there is Splunk server URL from there I get consolidated logs in browser where I can search and run many other statistical function.

So is Vector like Splunk forwarder or more than that?


Vector can act as a Splunk forwarder, but is designed to be much more flexible.

In addition to forwarding to more storage systems (S3, Elasticsearch, syslog, etc), Vector can do things like sampling logs, parsing them, and aggregating them into metrics. Depending on your needs, this makes it easier to reduce your Splunk volume and reduce costs, transition to something like an ELK stack, etc.

We're also working to build up the metrics side of Vector's capabilities. In a way, you can think of Vector as a stream processing system for observability data, capable of feeding into a variety of storage backends.


Thanks. This is all very interesting. I should try it on our app servers.


Thanks for your interest! And please feel free to get in touch if you have any questions or feel there are things we could do to better support your use case: https://vector.dev/community/



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: