Dev.to US tech 2026-06-27 03:22

働くエンジニアのためのシステム設計、面接準備ではない

原題: System Design for Working Engineers, Not Interview Prep

分析結果

カテゴリ: AI
重要度: 59
トレンドスコア: 21
要約: この記事では、実際の業務に役立つシステム設計のアプローチについて解説しています。面接対策としてのシステム設計ではなく、現場での実践的なスキルを重視し、エンジニアが直面する具体的な課題に対処する方法を提案しています。システム設計の基本原則や、効果的なコミュニケーション、チームでの協力の重要性についても触れています。
キーワード: need real don business design scaling requirements users

Originally published at malaymehta.com The Interview Trap If you look at most system design tutorials, you get an extreme use case. Design Twitter. Design YouTube. Scale it to a billion users. Draw boxes on a whiteboard for 45 minutes. Do you think your app will be used by a billion users on day one? The answer is almost always no. But the tutorials don't teach you what to do when you have 500 users, unclear requirements, a team of four, and a quarter to ship something that works. Real system design is nothing like a whiteboard interview. You don't get clean requirements, you don't design from scratch, and nobody asks you to handle a billion requests per second on day one. Real System Design Starts with Questions, Not Diagrams The very first thing that matters in system design is something most tutorials skip entirely: unclear and chaotic requirements. In the real world, requirements don't come as a clean problem statement. They come from non-technical business teams, and you need to navigate through cross-questions to get all the clarity you need. Ask as many questions as possible. Understand your functional and non-functional requirements. Which features need to be synchronous and which can be async? What are the read and write load patterns? What is the maximum and average number of concurrent users right now? What does authentication look like? Do you need role-based access control? These questions drive your choices. You don't always need an axe where a knife will do. Being minimalist with a reasonable growth prediction and a 3, 6, 9 month plan will take you in the right direction. There will be things the situation demands immediately but would take more time than expected. Taking a predictable hit now and fixing it at the right future time without missing that balance is truly important. Weighing what will be expensive to change later, in terms of dollar cost or human effort, is how real architectural decisions get made. Pushing Back on Bad Requirements Many times requirements come from non-technical business teams and you need to push back on why certain things should not be done the way they expect. Here is a real example. A business person once asked to duplicate data into another Kafka topic because their prediction was that the existing topic would not handle more load from a new subscriber. The technical reality? Kafka is built for exactly this. A new consumer group on the same topic would work without impacting existing consumers at all. If you don't push back, you end up creating tech debt with support and maintenance costs forever, just for replicating data that never needed to be replicated. Trade-off Decisions Nobody Teaches Monolith vs Microservices Typically the very first thing engineers want to talk about is microservices and how they can help. But do you realistically have even 100 users on the product? Why do you need K8s, Docker, distributed tracing, cross-cutting async messaging, and service mesh? Do you really need that scale, or are you doing it to make your resume look better? If you have no real users in the thousands, a modular monolith is the best choice. Deploy everything as one server on Linux with a reverse proxy and a CNAME record. That simple. You need a database, sure. But you don't need Kafka, distributed tracing, auto-scaling, or any complex distributed computing to begin with. When predictable growth comes, add monitoring and observability to understand which requests are hitting hardest. Decouple the modules doing heavy work into independent microservices. Then pivot. That is the right sequence. Synchronous vs Async If you don't need to process something immediately, decoupling via async helps. If it is fire-and-forget, use a simple queue. If you need multiple services to consume the same event with highly scalable producers and consumers, use Kafka. If the user is waiting for a response, keep it synchronous via a RESTful API because it needs to happen right now. Build vs Buy Rule of thumb: never reinvent the wheel. If something already exists at low cost and does the job, buy it. If companies like OpenAI and Anthropic are not building their own payment systems and instead use established financial integrators, then you should trust that. If giants are not building everything from scratch, why should you? Building only makes sense when no existing solution fits your needs. Consistency vs Availability: Real-World CAP When you are dealing with transactions that need ACID guarantees, use SQL. Ticket booking, inventory updates, financial debits and credits. These cannot tolerate stale reads or lost writes. If you need consistency and partition tolerance where stale reads must be errored out, NoSQL works better. Social media feeds, messaging, analytics, and streaming. If you need availability and partition tolerance with tolerance for eventual consistency, columnar databases like Cassandra fit well. IoT data, time series, high write throughput with low read frequency. Perfect Architecture vs Shipping This Quarter Perfect architecture is always the goal, but if you can balance it with shipping this quarter, that brings real business value and revenue. Find the healthy mix. Build a base that requires very little change even if the actual decision evolves later. For example, tightly couple your audit logging service synchronously because you don't have async processing yet. It ships real business value now. Later, when async communication is added, you decouple it without changing how the end user experience works. Analytics is another one. You might not have the full setup of MySQL CDC to Debezium to ClickHouse yet. But you can start by ingesting specific tables into ClickHouse directly for analytics. Solve it elegantly later when DevOps capacity allows the full event streaming pipeline. When to Scale and When Not To The time to scale is based on observability data and predictive customer expansion patterns. Your business understanding combined with analytical thinking will surface the signals that tell you when scaling is actually needed. Before jumping to horizontal or vertical scaling, check the basics first. Does your database have optimal indexing? Is your application connection pool configured properly? Are there N+1 queries firing hundreds of calls where one would do? These are high-level checks. Deeper concepts like partitioning and sharding are problems you encounter with billions of records, not a few million. Horizontal scaling is generally the better approach because it guarantees higher throughput with the ability to scale up or down without downtime. But only when you actually need it. A Real Story: Premature Scaling Gone Wrong I worked with a company that had fewer than 50-100 customers and less than 5,000 business transactions total. They had already added Docker, Kubernetes, Kafka, distributed monitoring, and auto-scaling. Now they had two problems instead of one: the real business problem and a tech problem. Very few developers on the team understood microservices as a whole. Nobody knew DevOps practices well enough to manage how scaling actually works. It was not just slowing business delivery but also burning cloud costs because nobody knew how to optimize the infrastructure bill. A double-edged sword. Premature scaling without proper architectural guidance creates more problems than it solves. Every Architecture Decision Is a Cost Decision Will you use managed Kubernetes or bare K8s? New Relic or Dynatrace or open-source alternatives? Managed database or self-hosted? It all depends on who is owning what. If you have DevOps engineers who can manage the nightmare of persistent storage, networking, constant upgrades, and maintenance, then self-hosted can work. If the answer is no, managed is better but it comes with a higher price tag. It is equally important to monitor your cloud costs and understand the incremental bills. Is your Docker image lifecycle policy set to delete old images within a few days? Is your S3 storage persistent forever or only for a retention period? Have you optimized or dropped high-cardinality metrics in your distributed tracing to save cost? How about networking costs for transporting data across regions? It all adds up. Here is a question I ask teams all the time: will you optimize your MySQL queries and indexing, or will you throw more money at bigger database instances so the app functions at a dollar cost that keeps increasing? Unless the root cause is identified and fixed, you are just burning money. Small teams with few users almost always face overly expensive microservices hosting and management. The operational overhead, debugging complexity, and cognitive load on the team need to be balanced against the actual benefits. Database Design Is Architecture Schema design decisions haunt you for years. The table structure you choose in month one determines how painful your queries are in year two. Foreign keys, indexes, data types, normalization vs denormalization. These are architecture decisions, not database admin tasks. Pages taking 10+ seconds to load because nobody thought about indexing. N+1 queries firing hundreds of database calls. Unused columns bloating tables. No caching layer. Complex business logic with if-else ladders that nobody can follow. For SQL vs NoSQL, the real answer is simpler than the blog posts make it: if you need transactions and relationships, use SQL. If you need flexible schema with high write throughput and can tolerate eventual consistency, use NoSQL. Most applications should start with SQL and add NoSQL for specific use cases when needed. Caching strategy is another design decision that gets treated as an afterthought. Cache the data that is read frequently but changes rarely. Product catalogs, user profiles, configuration data. Invalidate on write. Start with a simple TTL-based approach and add event-driven invalidation when your system complexity demands it. Observability Is a Design Decision